# Best-Selling Amazon Books Analysis

## Project Overview

This project analyzes best-selling books data from Amazon using Python and pandas. The analysis explores trends, patterns, and insights from the dataset to understand what makes books successful on the platform.

## Inspiration

This project is inspired by the comprehensive tutorial available at [Codédex](https://www.codedex.io/projects/analyze-spreadsheet-data-with-pandas-chatgpt). The tutorial provides an excellent foundation for data analysis using pandas and serves as a practical guide for exploring spreadsheet data.

## Objectives

- Analyze sales patterns and trends in Amazon's best-selling books
- Identify key factors that contribute to book success
- Visualize data insights using matplotlib and seaborn
- Practice data cleaning and manipulation techniques with pandas

## Load the dataset
- The dataset contains 550 books. The following columns are listed here:
    - `Name`: Book name
    - `Author`: Book author
    - `User Rating`: Amazon user rating (0.0 - 5.0)
    - `Reviews`: Number of user reviews
    - `Price`: Book price (as of 2020)
    - `Year`: The year(s) it ranked
    - `Genre`: Fiction or non-fiction

In [None]:
# Import necessary libraries and dataset
import pandas as pd
import kagglehub
path = kagglehub.dataset_download("sootersaalu/amazon-top-50-bestselling-books-2009-2019")
df = pd.read_csv("bestsellers.csv")

print(df.head())

                                                Name  \
0                      10-Day Green Smoothie Cleanse   
1                                  11/22/63: A Novel   
2            12 Rules for Life: An Antidote to Chaos   
3                             1984 (Signet Classics)   
4  5,000 Awesome Facts (About Everything!) (Natio...   

                     Author  User Rating  Reviews  Price  Year        Genre  
0                  JJ Smith          4.7    17350      8  2016  Non Fiction  
1              Stephen King          4.6     2052     22  2011      Fiction  
2        Jordan B. Peterson          4.7    18979     15  2018  Non Fiction  
3             George Orwell          4.7    21424      6  2017      Fiction  
4  National Geographic Kids          4.8     7665     12  2019  Non Fiction  


## Clean the data

In [8]:
df.drop_duplicates(inplace = True)
df.rename(columns = {"Name": "Title", "Year": "Publication Year", "User Rating": "Rating"}, inplace = True)
df["Price"] = df["Price"].astype(float)

print(df.head())

                                               Title  \
0                      10-Day Green Smoothie Cleanse   
1                                  11/22/63: A Novel   
2            12 Rules for Life: An Antidote to Chaos   
3                             1984 (Signet Classics)   
4  5,000 Awesome Facts (About Everything!) (Natio...   

                     Author  Rating  Reviews  Price  Publication Year  \
0                  JJ Smith     4.7    17350    8.0              2016   
1              Stephen King     4.6     2052   22.0              2011   
2        Jordan B. Peterson     4.7    18979   15.0              2018   
3             George Orwell     4.7    21424    6.0              2017   
4  National Geographic Kids     4.8     7665   12.0              2019   

         Genre  
0  Non Fiction  
1      Fiction  
2  Non Fiction  
3      Fiction  
4  Non Fiction  
