# *Introduction*

In this notebook, I will analyse Amazon's top 50 bestselling books of each year from 2009 to 2019. This is an exploratory data analysis, so there won't be any machine learning models involved. Moreover, to make this notebook helpful, I will explain each process I do.  

### *Importing Needed Libraries And Loading The Dataset*

In [None]:
from matplotlib import pyplot as plt
import seaborn as sns
import numpy as np 
import pandas as pd 

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

data = pd.read_csv(r'/kaggle/input/amazon-top-50-bestselling-books-2009-2019/bestsellers with categories.csv')

In [None]:
data.info()

### *Inspecting The Dataset*

In [None]:
display(data.head())

If we look at the table above, we see that this dataset has seven columns. I believe that by using *Year*, *Genre*, *Reviews*, and *User Rating* columns, we can find some useful information. I will start by grouping these books by their genres. After that, we will be able to see and visualize which genre is preferred. 

# Analysis

### Analysing By Genre

In [None]:
grouped_by_genre = data.groupby('Genre')
counts = grouped_by_genre.count()
print(counts)

#-----------------------------------------------------------------
# Creating the pie chart.

plt.style.use('ggplot')
fig,ax = plt.subplots(figsize=(12,9))
plt.title('Percentages Of Fiction And Non-fiction in 550 Bestsellers')
plt.pie([240,310],labels=['Fiction', 'Non Fiction'],autopct='%1.1f%%')
plt.show()

As it can be seen from the above tabular data, there are more non-fiction books among the bestsellers than fiction books. To present it in a more appealing way, I created the pie chart above, and from that chart, we can see that 56.4 percent of the 550 bestseller books are non-fiction books. While this is useful information, it would be even better to group these books not only by genre but also by  year. This way, we can see which genre was more popular in which year.

### Grouping By Genre And Year

In [None]:
grouped = data.groupby('Genre')
years = range(2009,2020)
non_fiction = grouped.get_group(("Non Fiction"))
non_fict = non_fiction.groupby('Year')['Name'].count().values
fiction = grouped.get_group("Fiction")
fict = fiction.groupby('Year')['Name'].count().values
#------------------------------------------------
#Creating a lineplot using values above.

figure, axes = plt.subplots(figsize=(12,9))

plt.plot(years,non_fict,label='Non Fiction Bestsellers',linestyle='--',marker='H')
plt.plot(years,fict,label='Fiction Bestsellers',linestyle=':',marker='p')
plt.title('Number Of Bestsellers By Each Year')
plt.xlabel('Years')
plt.ylabel('Number Of Bestsellers')
plt.legend()
plt.show()

As it can be seen from the above line plot, almost every year, non-fiction books dominated the Amazon Bestsellers over fiction books. However, there is an exception for the year 2014: that year, 29 of 50 bestsellers were fiction books, which is 8 higher than the number of non-fiction bestsellers. In the next chapter, I will analyse genres by their mean and median User Rating scores.

### User Rating Scores Of Each Genre

To visualize and compare user rating scores of each genre, I will use a boxplot. A major benefit of boxplot is it shows you both the outliers and the interquartile range.

In [None]:
non_fict_rat = list(non_fiction['User Rating'])
fict_rat = list(fiction['User Rating'])
ratings = [non_fict_rat,fict_rat]
names= ['Non Fiction Bestsellers','Fiction Bestsellers']

figu, axi = plt.subplots(figsize=(9,5))
ax = sns.boxplot(data = ratings)
axi.set_xticklabels(names)
plt.show()

If we look at the boxplot above, we see that in general, fiction bestsellers' ratings are higher than those of non-fiction bestsellers. This information was a surprise since the number of non-fiction bestsellers was higher than that of fiction bestsellers. What's more, non-fiction bestsellers' range is wider than that of fiction bestsellers, which means that most of the times, non-fiction ones have more variety of ratings than fiction ones. Finally, we see that there are more outliers among the fiction bestsellers, whereas there are only two in non-fiction bestsellers.

# Conclusion

In this notebook, I analysed Amazon's top 50 bestselling books of each year from 2009 to 2019. As a result of this analysis, I found out that the number of Amazon bestselling non-fiction books is higher than that of fiction books, and even though non-fiction bestselling books were higher most of the times, there is an exception for the year 2014. Lastly, in contrast to the fact that there were more non-fiction bestsellers, I saw that fiction books usually had higher ratings than non-fiction books did.