The task at hand:
- Visualize the data and discover patterns.

## Loading the libraries

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px

## Data loading

In [None]:
df = pd.read_csv('/kaggle/input/amazons-top-50-bestselling-novels-20092020/AmazonBooks - Sheet1.csv')

In [None]:
print(df.info())
print(df.shape)
df.head()

- No missing values
- No columns that require deleting or cleaning
- Variable types seem to be what we want them to be

## Data Exploration

In [None]:
# Check for correlations
pd.get_dummies(df[['Year','User Rating', 'Price', 'Genre']]).corr()

![](https://miro.medium.com/max/932/1*Qz_gwy4ZaSZuOpl3IyO2HA.png) </br>
There are no direct correlations visible between any of the variables. </br>
This means that the User Rating is not influenced by the genre, price or year of release

In [None]:
# Checking the genre distribution
sns.set_theme(style='darkgrid')
plt.figure(figsize=(15, 7))

g = sns.countplot(
    data=df,
    x='Genre'
)
plt.title(
    'Genre distribution',
    fontdict={
        'fontsize': 16
    }
)

In [None]:
plt.figure(figsize=(15, 7))
sns.lineplot(
    data=df,
    x='Year',
    y='User Rating',
    hue='Genre',
    ci=True
)
plt.title(
    'Genre ratings over the years',
    fontdict={
        'fontsize': 16
    }
)

In [None]:
plt.figure(figsize=(15, 7))
sns.kdeplot(
    data=df,
    x='User Rating',
    hue='Genre',
    fill=True,
    common_norm=False
)

plt.title(
    'Genre rating distribution',
    fontdict={
        'fontsize': 16
    }
)

Overall Fiction books seem to be doing better than non-fiction. Non-fiction however seems to be on the rise in 2020 to possible surpas fiction books with higher ratings

In [None]:
df.sort_values(by=['User Rating'], ascending=False)

Interesting to note is that J.K. Rowling is both in the top rated as well as the lowest rated books.

In [None]:
plt.figure(figsize=(15, 7))
sns.histplot(
    data=df,
    x='Price'
)

plt.title(
    'Price distribution',
    fontdict={
        'fontsize': 16
    }
)

In [None]:
plt.figure(figsize=(15, 7))
sns.kdeplot(
    data=df,
    x='Price',
    hue='Genre'
)

plt.title(
    'Price distribution by genre',
    fontdict={
        'fontsize': 16
    }
)

Seeing as there are several books with an above average price it might be worth it to analys why this could be

In [None]:
df[df['Price'] > 75]

The highest priced books seem to consiset of a collection of books, or medical related books.

In [None]:
plt.figure(figsize=(15, 7))
sns.lineplot(
    data=df,
    x='Year',
    y='Price',
)

plt.ylim(0)
plt.title(
    'Price changes over the years',
    fontdict={
        'fontsize': 16
    }
)

In [None]:
plt.figure(figsize=(15, 7))
sns.lineplot(
    data=df,
    x='Year',
    y='Price',
    hue='Genre'
)
plt.ylim(0)
plt.title(
    'Price distribution over the years by genre',
    fontdict={
        'fontsize': 16
    }
)

Overall the best selling books have become cheaper over time.

In [None]:
plt.figure(figsize=(15, 7))
sns.histplot(
    data=df,
    x='Reviews',
    kde=True
)

plt.title(
    'Review distribution',
    fontdict={
        'fontsize': 16
    }
)
plt.xlim(0)

In [None]:
df[df['Reviews'] >= 80000]

Interesting to note is that 3 of the highest rated books are written by the president, or family of the presitend of the Unted States of America.<br/>
**Noteworthy is that Delia Owens's "*Where the Crawdads Sing*" made it to the best selling novels in two consecutive years**

In [None]:
plt.figure(figsize=(15, 7))
sns.lineplot(
    data=df,
    x='Year',
    y='Reviews'
)
plt.title('Amount of reviews per year')
plt.xlim(df['Year'].min())

In [None]:
plt.figure(figsize=(15, 7))
sns.lineplot(
    data=df,
    x='Year',
    y='Reviews',
    hue='Genre'
)
plt.title('Amount of reviews per year by genre')
plt.xlim(df['Year'].min())

The amount of reviews has significantly increased since 2010, this is most likely related to the growth of Amazon users

In [None]:
# Check the best selling author
df.groupby(['Author', 'Genre'])['Author'].count().nlargest(10)

The overall best selling author is Jeff Kinney, wirting Fiction books