### Load dataset

In [None]:
# import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
# Display number rows of a dataframe.
pd.set_option('display.max_rows', 10)

In [None]:
# load csv file.
file = pd.read_csv('../input/amazon-top-50-bestselling-books-2009-2019/bestsellers with categories.csv')
file

### Basic informations

In [None]:
file.shape

In [None]:
file.info()

In [None]:
file.describe(include='all')

There are 351 unique book names, 248 unique authors in this dataset. User rating ranges from 3.3 to 4.9, average is 4.6. Average Reviews is 11953. Price ranges from 0 to 105. Includes 2 unique Genres.

### Visualization - Changes over the years

In [None]:
# Non Fiction vs Fiction over the years
sns.countplot(data=file, x='Year', hue='Genre')
plt.show()

Among bestselling books, from 2009 to 2019, Non Fiction books are usually more than Fictions, except 2014. 

In [None]:
# Price boxplot
sns.catplot(kind='box', data=file, x='Year', y='Price')

Most bestselling book's price range from 5 to 20, and didn't change much over the years.

Count each book's winning times as "Amazon bestseller".

In [None]:
winning_time = file['Name'].value_counts()

# plot the result as histogram
winning_time.hist()
plt.grid(False)
plt.show()

Among 351 books, about 250 won "Amazon bestseller" once, about 50 won twice.  

Show books that appeared more than once as "amazon bestseller book".

In [None]:
# Create a dataframe that show book name and winning_times.
winning_times = pd.DataFrame(winning_time)
winning_times.reset_index(inplace=True)
winning_times.columns = ['Name', 'Winning_times']
winning_times

Visualize the result for books that won more than once.

In [None]:
winning_times[winning_times['Winning_times']>1].plot(kind='bar', x='Name', y='Winning_times', figsize=(20,10))

Get book details by joining 2 dataframes.

In [None]:
# Drop 'Year' column from file dataframe, then drop duplicated books.
file_ny = file.drop(columns='Year')
file_ny.drop_duplicates(subset='Name', inplace=True)
file_ny

In [None]:
file_ny.shape

In [None]:
# Join file and multiwinner dataframe
file_times = winning_times.join(file_ny.set_index('Name'), on='Name', how='left')
file_times

Now we get "Amazon bestseller book" information about their winning times.

In [None]:
# Count how many books won this prize more than once.
multiwinner = len(file_times[file_times['Winning_times']>1])
print("{} books won 'Amazon bestseller books' prize more than once.".format(multiwinner) )

In [None]:
file_times['Genre'].hist()
plt.grid(False)

In [None]:
file_times['Price'].hist()
plt.grid(False)

In [None]:
file_times['User Rating'].hist()
plt.grid(False)

Find authors who has more than one book won "Amazon bestseller book" prize.

In [None]:
author_counts = file_times['Author'].value_counts()
print("{} authors wrote 'Amazon bestseller books'.\n".format(len(author_counts)))
print(author_counts)

In [None]:
author_counts.plot(kind='bar', figsize=(40,10))

In [None]:
print("Among {} authors, {} authors has more than one book won 'Amazon bestseller books'.".format(len(author_counts), len(author_counts[author_counts > 1])))

In [None]:
author_list = author_counts[author_counts > 1].index
file_author = file_times[file_times['Author'].isin(author_list)].sort_values(by='Author')
file_author.reset_index(drop=True)