![](https://image.freepik.com/free-vector/opened-magic-book-realistic-image-with-bright-sparkling-light-rays-illuminating-pages-floating-balls-dark_1284-29035.jpg)

**Introduction**

In this notebook, I will analyse Amazon's top 50 bestselling books of each year from 2009 to 2019. 

**FEATURES:**

* Name - Name of the Book
* Author - The author of the Book
* User Rating - Amazon User Rating
* Reviews - Number of written reviews on amazon
* Price - The price of the book
* Year - The Year(s) it ranked on the bestseller
* Genre - Whether fiction or non-fiction

Load the required Libraries

In [None]:
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style(style='darkgrid')

In [None]:
book=pd.read_csv('../input/amazon-top-50-bestselling-books-2009-2019/bestsellers with categories.csv')

Checking first 5 and last 5 records from the datasets

In [None]:
book.head(5)

In [None]:
book.tail(5)

In [None]:
book.info()

In [None]:
book.isnull().sum()

In Amazon Book Selling  data set, there are 7 columns and 550 rows. Also Null values are not present.

In [None]:
book.shape

In [None]:
book.describe()

 Now Quickly check data dypes of each columns.

In [None]:
book.dtypes

Grouping the data by the year 

In [None]:

mean = book.groupby('Year').mean()
book.groupby('Year').mean()

Here we group every year based on the user rating, reviews and price. 

In [None]:
book['Genre'].value_counts()

# Analysing By Genre

In [None]:
plt.figure(figsize=(10,5))
plt.title('Non Friction v/s Friction', fontsize=14)
sns.countplot(x="Genre", data=book, palette=('#9b111e','#50c878'))
plt.xlabel("Book Type", fontsize=12)
plt.ylabel("Count", fontsize=12)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.show()

In [None]:
color=['teal', 'cyan']
fig,ax = plt.subplots(figsize=(12,9))
plt.title('Percentages Of Fiction And Non-fiction in 550 Bestsellers')
plt.pie([240,310],labels=['Fiction', 'Non Fiction'],autopct='%1.1f%%',colors=color,explode=[0,0.02])
plt.show()

we can see that 56.4 percent of the 550 bestseller books are non-fiction books. While this is useful information, it would be even better to group these books not only by genre but also by year. This way, we can see which genre was more popular in which year.

# Grouping By Genre And Year

In [None]:
grouped = book.groupby('Genre')
years = range(2009,2020)
non_fiction = grouped.get_group(("Non Fiction"))
non_fict = non_fiction.groupby('Year')['Name'].count().values
fiction = grouped.get_group("Fiction")
fict = fiction.groupby('Year')['Name'].count().values
#------------------------------------------------
#Creating a lineplot using values above.

figure, axes = plt.subplots(figsize=(12,9))

plt.plot(years,non_fict,label='Non Fiction Bestsellers',linestyle='--',marker='H')
plt.plot(years,fict,label='Fiction Bestsellers',linestyle=':',marker='p')
plt.title('Number Of Bestsellers By Each Year')
plt.xlabel('Years')
plt.ylabel('Number Of Bestsellers')
plt.legend()
plt.show()

As it can be seen from the above line plot, almost every year, non-fiction books dominated the Amazon Bestsellers over fiction books

# * Bestselling Author 2009-2019

In [None]:
fig, ax = plt.subplots(figsize=(12,5))
books_auth = book.copy()
books_auth['Author'] = books_auth['Author'].apply(lambda x: x.replace(" ","\n"))
sns.countplot(data=books_auth, 
              x='Author',
              palette='mako',
              order=books_auth['Author'].value_counts().head(10).index)
ax.set_title('Bestselling Author 2009 - 2019')
ax.set_xlabel('Count')
ax.set_xlabel('Author')




# Top 10 authors with highest rated books

In [None]:
book1 = book.groupby('Author')['User Rating'].max().sort_values(ascending=False).tail(10).reset_index()
book1.columns = ['Author','User Rating']
book1


In [None]:
plt.figure(figsize=(10,7))
sns.barplot(y=book1['Author'], x=book1['User Rating'], palette='Wistia_r')
plt.title('Top 10 Authors based on ratings', fontsize=15)
plt.xlabel('User Rating', fontsize=12)
plt.ylabel('Author', fontsize=12)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)


We can see here we extract top 10 Authors who got highest ratings.

# Top 10 Books with highest reviews

In [None]:
book1 = book.groupby('Name')['Reviews'].max().sort_values(ascending=False).tail(10).reset_index()
book1.columns = ['Name','Reviews']
book1


In [None]:



plt.figure(figsize=(10,7))
sns.barplot(y='Name', x='Reviews', palette='Wistia_r',data=book1)
plt.title('Top 10 books based on ratings', fontsize=15)
plt.xlabel('Reviews')
plt.ylabel('Name', fontsize=12)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)


We Extract top 10 books with based upon reviews

# Top 10 Author with the best reviews

In [None]:
book1 = book.groupby('Author')['Reviews'].max().sort_values(ascending=False).tail(10).reset_index()
book1.columns = ['Author','Reviews']
book1


In [None]:
plt.figure(figsize=(10,7))
sns.barplot(y=book1['Author'], x=book1['Reviews'], palette='Wistia_r')
plt.title('Top 10 Authors based on ratings', fontsize=15)
plt.xlabel('Reviews', fontsize=12)
plt.ylabel('Author', fontsize=12)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)


We extract top  10 authors names based on the ratings

In [None]:
fig, ax = plt.subplots(figsize=(10, 5))
sns.barplot( x = 'Year', y = 'Reviews', ax = ax, palette='Wistia',data=book)

Extract Year wise reviews got to books. we can see here 2019 got highest reviews compare to with others reviews.

# Highest Price Bestselling Books 2009 - 2019

In [None]:

book1 = book.groupby('Name')['Price'].max().sort_values(ascending=False).head(10).reset_index()
book1.columns = ['Name','Price']
book1

In [None]:
fig, ax = plt.subplots(figsize=(10, 5))
sns.barplot( x = 'Year', y = 'Price', ax = ax,palette='YlGnBu',data=book)

In [None]:
fig, ax = plt.subplots(figsize=(10, 5))
sns.lineplot(y='Price', x='Year', data=book, ax=ax)

Above the graph we can see here highest price of book sold out in 2013 and 2014.

# Most Expensive Authors

In [None]:
book.sort_values(by = 'Price', ascending = False)[:10]

We extract most expensive authors books based upon the price here.

# User Rating Scores Of Each Genre

In [None]:
fig, axarr = plt.subplots(1, 2, figsize=(12,5))
a = sns.countplot(book['User Rating'], ax=axarr[0], order=book['User Rating'].value_counts().index, palette="magma").set_title('User Rating Scores Of Each Genre')
axarr[1].set_title('Non Fiction & Fiction Based On Genre')
b = sns.countplot(x="User Rating", data=book, hue="Genre", palette=('#9b111e','#50c878'), order=book['User Rating'].value_counts().index, ax=axarr[1]).set_ylabel('Count')

If we look at the boxplot above, we see that in general, fiction bestsellers' ratings are higher than those of non-fiction bestsellers.

In [None]:

fig, axes = plt.subplots(1, 2, figsize=(10,4))
axes[0].plot(mean.index,mean['Reviews'])
axes[0].set_title('Average number of reviews per book over the years')

axes[1].plot(mean.index,mean['Price'])
axes[1].set_title('Average price per book over the years');


As you can see, there is a strong negative correlation of these two factors, as we expected. That means that when the price goes up, the number of the reviews tend to decrease.

In [None]:
book.corr()

In [None]:
sns.heatmap(book.corr(),annot=True)

# Conclusion


In this notebook, I analysed Amazon's top 50 bestselling books of each year from 2009 to 2019. As a result of this analysis, I found out that the number of Amazon bestselling non-fiction books is higher than that of fiction books, and even though non-fiction bestselling books were higher most of the times, there is an exception for the year 2014. Lastly, in contrast to the fact that there were more non-fiction bestsellers, I saw that fiction books usually had higher ratings than non-fiction books did.