# Amazon Top 50 Bestselling Books 2009 - 2019

Dataset on Amazon's Top 50 bestselling books from 2009 to 2019. Contains 550 books, data has been categorized into fiction and non-fiction using Goodreads

In [32]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objects as go

## Reading the Data

In [4]:
dataset = pd.read_csv('Datasets/bestsellers with categories.csv')

In [5]:
dataset.head()

Unnamed: 0,Name,Author,User Rating,Reviews,Price,Year,Genre
0,10-Day Green Smoothie Cleanse,JJ Smith,4.7,17350,8,2016,Non Fiction
1,11/22/63: A Novel,Stephen King,4.6,2052,22,2011,Fiction
2,12 Rules for Life: An Antidote to Chaos,Jordan B. Peterson,4.7,18979,15,2018,Non Fiction
3,1984 (Signet Classics),George Orwell,4.7,21424,6,2017,Fiction
4,"5,000 Awesome Facts (About Everything!) (Natio...",National Geographic Kids,4.8,7665,12,2019,Non Fiction


In [11]:
dataset.isnull().values.any()

False

## Data Visualization

### Popularity of the books

- The columns *User Rating* and *Reviews* can decide the popularity of the books.

In [18]:
fig = px.scatter(dataset, x="User Rating", y="Reviews", title="Popularity of Books", color="Genre")
fig.show()

According to the graph above:
- User Rating and Number of reviews affect the popularity of the book.
- Majority of the popular books have a Rating above *4.6*.
- Rating of 7 books are less than 4

#### The most popular book on Amazon

In [25]:
dataset[dataset['Reviews'] == dataset['Reviews'].max()]

Unnamed: 0,Name,Author,User Rating,Reviews,Price,Year,Genre
534,Where the Crawdads Sing,Delia Owens,4.8,87841,15,2019,Fiction


#### Highest Rated Books

In [57]:
dataset[dataset['User Rating'] == 4.9][['Name', 'Author', 'Year']].reset_index(drop=True)

Unnamed: 0,Name,Author,Year
0,"Brown Bear, Brown Bear, What Do You See?",Bill Martin Jr.,2017
1,"Brown Bear, Brown Bear, What Do You See?",Bill Martin Jr.,2019
2,Dog Man and Cat Kid: From the Creator of Capta...,Dav Pilkey,2018
3,Dog Man: A Tale of Two Kitties: From the Creat...,Dav Pilkey,2017
4,Dog Man: Brawl of the Wild: From the Creator o...,Dav Pilkey,2018
5,Dog Man: Brawl of the Wild: From the Creator o...,Dav Pilkey,2019
6,Dog Man: Fetch-22: From the Creator of Captain...,Dav Pilkey,2019
7,Dog Man: For Whom the Ball Rolls: From the Cre...,Dav Pilkey,2019
8,Dog Man: Lord of the Fleas: From the Creator o...,Dav Pilkey,2018
9,"Goodnight, Goodnight Construction Site (Hardco...",Sherri Duskey Rinker,2012


### Does the name of books affect User Rating?
- Finding out relationship between The length of book title and rating.

In [59]:
length_of_title = [len(name) for name in dataset['Name']]
dataset['Title Length'] = length_of_title

In [60]:
fig = go.Figure(data=[go.Histogram(x=length_of_title)])
fig.update_layout(title="The Length of Book Titles")
fig.update_xaxes(title_text="Length")
fig.update_yaxes(title_text="Count")
fig.show()

- Books with a small title have higher chances of being the best seller on Amazon.

### Popularity and Length of Book Title

In [61]:
fig = px.scatter_matrix(dataset, dimensions=['User Rating', 'Title Length'], color="User Rating", title="Relation between Rating and Title Length")
fig.show()

- There is no corelation between Popularity and User Ratings

### The author who wrote the most number of bestselling books

In [78]:
authors = dataset['Author'].value_counts()
author_data = pd.DataFrame({'Author':authors.keys(), 'Books': authors.values})
author_data.head()

Unnamed: 0,Author,Books
0,Jeff Kinney,12
1,Suzanne Collins,11
2,Gary Chapman,11
3,Rick Riordan,11
4,American Psychological Association,10


- The author who wrote most number of best selling books is *Jeff Kinney*.