# **Import Libraries**

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
import seaborn as sns
import warnings

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

### **Setup Libraries**

In [None]:
warnings.simplefilter('ignore')

%matplotlib inline
%reload_ext autoreload
%autoreload 2

sns.set_style('darkgrid')
sns.set_context('paper', font_scale=1.5)

plt.style.use('seaborn')

pd.set_option('display.width', 100)
pd.set_option('display.max_rows', 25)
pd.set_option('display.max_columns', 25)

# **Load Data**

In [None]:
amazon_data = pd.read_csv('../input/amazon-top-50-bestselling-books-2009-2019/bestsellers with categories.csv') # Load dataset

# **Data Preprocessing**

In [None]:
amazon_data # Let's check overall data

> Wow, you see that? the data contain duplicates value, let's remove that later.

In [None]:
amazon_data.info() # Getting the information about the data

> I see, there's 550 rows and total 7 columns, 3 object type, 1 float type, and, 3 int type.

In [None]:
amazon_data.isna().mean().to_frame() # Let's hunt the null value

> Okay, great! there's no missing value! now let's drop the duplicate value!

In [None]:
amazon_data.drop_duplicates( # Use Drop Duplicates
    inplace=True, # Inplace
    subset=['Name'] # Subset column that contain the duplicate value
) # Dropping the duplicate value in a rows

amazon_data.shape # Let's check the shape

> Great, now the data not contain duplicate value!

# **Let's Explore the Data**

In [None]:
amazon_data['Genre'].value_counts().to_frame() # Let's check the Genre

> I see, Non Fiction book is greater a little bit than Fiction, If you guys ask me what's my favorite genre, I'll choose Fiction book, why? I like imagination, because imagining is fun, and reality is painful, how bout you guys? what genre do you like? Non Fiction or Fiction? you can write your favorite genre in the comments section.  

In [None]:
amazon_data[['Name', 'Author',
             'User Rating', 'Reviews']].value_counts() \
                                       .to_frame() # Let's check the name of books, the authors, and ratings.

> Do you guys know the books and the authors above? if yes, you can write it down your opinion bout the books in the comment section.

In [None]:
amazon_data['Price'].value_counts().to_frame() # Let's check the price of the books

> Wow, can you guys see it? i think the data contain outlier, but, it's okay, since we just want to analysis this data XD.

In [None]:
amazon_data['Year'].value_counts().to_frame() # Let's see the year columns

> See it, most books published in 2009 and 2010.

## **Top 5 Book Reviews**

In [None]:
top_5_books_reviewed = amazon_data[['Name', 'Author', 'Reviews', 'User Rating', 'Year']]
top_5_books_reviewed = top_5_books_reviewed.sort_values('Reviews', ascending=False)[:5]
top_5_books_reviewed

> The most viewed book is `Where the Crawdads Sing` by `Delia Owens` with `87841` reviews and the rating is `4.8`.

## **Top 5 Book Rating**

In [None]:
top_5_books_rating = amazon_data[['Name', 'Author', 'Reviews', 'User Rating', 'Year']]
top_5_books_rating = top_5_books_rating.sort_values('User Rating', ascending=False)[:5]
top_5_books_rating

> The book that has the highest rating is `Hamilton: The Revolution` by `Lin-Manuel Miranda` with `5867` reviews and the rating is `4.9`.

## **Top 5 Book Price**

In [None]:
top_5_books_price = amazon_data[['Name', 'Author', 'Reviews', 'User Rating', 'Price', 'Year']]
top_5_books_price = top_5_books_price.sort_values('Price', ascending=False)[:5]
top_5_books_price

> The book that has the highest price is `Diagnostic and Statistical Manual of Mental` by `American Psychiatric Association` with `6679` reviews and the rating is `4.5` and the price is `105`$.

Okay, it's enough to explore every data, now it's time to visualize the data, are you guys ready? say YASHHH !!!! XD

# **Data Visualization**

### **1. Countplot Visualization Genre**

In [None]:
sns.countplot( # Let's make countplot
    x='Genre', # X axis
    data=amazon_data, # Data
    palette='summer', # Palette
    saturation=0.75 # Saturation
)

plt.title('Genre') # Set the title
plt.show() # Show the plot

> Okay, the data shows things that are indeed similar to what we talked about above, the Non Fiction is bigger than Fiction.

### **2. Top 5 Reviews Book Visualization**

In [None]:
plt.figure(figsize=(20, 6)) # Set the size

sns.barplot( # Barplot
    x='Name', # X axis
    y='Reviews', # Y axis
    hue='Author', # Hue
    data=top_5_books_reviewed, # Data
    palette='summer', # Palette
    errcolor='.26', # Errcolor
    ci=95 # Ci
)

plt.title('Top 5 Reviews Book') # Title
plt.show() # Show 

### **3. Top 5 Rating Book Visualization**

In [None]:
plt.figure(figsize=(20, 6)) # Set the size

sns.barplot( # Barplot
    x='Author', # X axis
    y='Reviews', # Y axis
    hue='Name', # Hue
    data=top_5_books_rating, # Data
    palette='summer', # Palette
    errcolor='.26', # Errcolor
    ci=95 # Ci 
)

plt.title('Top 5 Rating Book') # Title
plt.show() # Show

### **4. Top 5 Price Book Visualization**

In [None]:
plt.figure(figsize=(20, 6)) # Set the size

sns.barplot( # Barplot
    x='Author', # X axis
    y='Price', # Y axis
    hue='Name', # Hue
    data=top_5_books_price, # Data
    palette='summer', # Palette
    errcolor='.26', # Errcolor
    ci=95 # Ci
)

plt.title('Top 5 Price Book') # Title
plt.show() # Show

### **5. Heatmap Visualization Rating in Year**

In [None]:
plt.figure(figsize=(20, 6)) # Set the size

amazon = amazon_data.pivot_table( # Pivot Table
    index='User Rating', # Index
    columns='Year', # Columns
    values='Price' # Value
)

sns.heatmap( # Let's make a heatmap
    amazon, # Data
    cmap='summer', # Color map
    linecolor='white', # Line color
    linewidth=1 # Line width
)

plt.title('Amazon Books') # Set the title
plt.show() # Show 

### **6. Scatter Plot Visualization Rating in Year**

In [None]:
years = list(range(2009,2020)) # Make a list or year for the xticks

plt.figure(figsize=(20, 6)) # Figuring the size

sns.scatterplot( # Make a scatterplot
    x=amazon_data['Year'], # X axis
    y=amazon_data['User Rating'], # Y axis
    hue=amazon_data['Genre'], # Hue
    palette='summer' # Palette
)

plt.xticks(ticks=years) # Ticks
plt.title('User Ratings in 2009-2019') # Set the title
plt.show() # Show the plot

### **7. Scatter Plot Visualization Price in Year**

In [None]:
plt.figure(figsize=(20, 6)) # Figuring the size

sns.scatterplot( # Make a scatterplot
    x=amazon_data['Year'], # X axis
    y=amazon_data['Price'], # Y axis
    hue=amazon_data['Genre'], # Hue
    palette='summer' # Palette
)

plt.title('Book Price in 2009-2019') # Set the title
plt.show() # Show the plot

### **8. Scatter Plot Visualization Reviews in Year**

In [None]:
plt.figure(figsize=(20, 6)) # Figuring the size

sns.scatterplot( # Make a scatterplot
    x=amazon_data['Year'], # X axis
    y=amazon_data['Reviews'], # Y axis
    hue=amazon_data['Genre'], # Hue
    palette='summer' # Palette
)

plt.title('Book Reviews in 2009-2019') # Set the title
plt.show() # Show the plot

> Nice, everything looks good with visualization right?

 # **Time Series Lag Plot**

A lag plot helps to check if a time series data set is random or not. A random data will be evenly spread whereas a shape or trend indicates the data is not random.

## **1. Price**

In [None]:
pd.plotting.lag_plot(amazon_data['Price']) # Lag plot
plt.title('Price Lag Plot') # Title
plt.show() # Show

## **2. Reviews**

In [None]:
pd.plotting.lag_plot(amazon_data['Reviews']) # Reviews Lag Plot
plt.title('Reviews Lag Plot') # Title
plt.show() # Show

## **3. User Rating**

In [None]:
pd.plotting.lag_plot(amazon_data['User Rating']) # User Rating Lag Plot 
plt.title('User Rating Lag Plot') # Title
plt.show() # Show 

# **That's it! thanks for watching! hope you guys like it! don't forget to give me feedback and upvote if you like it!**

## **Here's my another notebook that i made:**

**Data Analysis and Visualization:**

- [Apple Stock Price Analysis](https://www.kaggle.com/knightbearr/apple-stock-price-analysis-knightbearr/edit/run/74687535)
- [World Covid Vaccination](https://www.kaggle.com/knightbearr/data-visualization-world-vaccination-knightbearr)
- [Netflix Time Series Visualization](https://www.kaggle.com/knightbearr/netflix-visualization-time-series-knightbearr)
- [Taiwan Weight Stock Analysist](https://www.kaggle.com/knightbearr/taiwan-weight-stock-index-analysis-knightbearr)

**Regression and Classification:**

- [Pizza Price Prediction](https://www.kaggle.com/knightbearr/pizza-price-prediction-xgb-knightbearr)
- [S&P 500 Companies](https://www.kaggle.com/knightbearr/pricesales-eda-rfr-knightbearr)
- [Credit Card Fraud Detection](https://www.kaggle.com/knightbearr/credit-card-fraud-detection-knightbearr)
- [Car Price V3](https://www.kaggle.com/knightbearr/car-price-v3-xgbregressor-knightbearr)
- [House Price Iran](https://www.kaggle.com/knightbearr/house-price-iran-knightbearr)
- [Loan Prediction](https://www.kaggle.com/knightbearr/loan-prediction-eda-knightbearr)

**Deep Learning:**

- [Rock Paper Scissors](https://www.kaggle.com/knightbearr/rock-paper-scissors-knightbearr)

**Some Python Code:**

- [Python Cheat Sheet](https://www.kaggle.com/knightbearr/python-cheat-sheet-knightbearr)
- [22 Python Progam](https://www.kaggle.com/knightbearr/22-simple-python-program-knightbearr)