# **I Love Books - A Story with Data**

It's amazing to me how people can express their knowledge and imagination through the written word. I don't know what age I was when I started to read or at what point reading and books became so important to me, but there is rarely a day when I don't pick up a book and enjoy either the story or the information I am learning from it.

This notebook was inspired by Shivam Ralli and his notebook, Goodreads: Analysis and Recommending Books. I found it by searching through Kaglle for Good Reads data. It is a way for me to learn more about Kaggle and Data Science projects.

**Start by loading libraries and data**

In [None]:
#For now, I am using the same libraries as Shivam, as I am heavily inspired by his work with this data
import numpy as np 
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

In [None]:
df = pd.read_csv('../input/goodreadsbooks/books.csv', error_bad_lines = False)
df = df[(df.language_code =='eng')|(df.language_code =='eng-US')|(df.language_code =='eng-GB')]

In [None]:
#Finding Number of rows and columns
print("Dataset contains {} rows and {} columns".format(df.shape[0], df.shape[1]))

Let's check out the first 10 rows.

In [None]:
df.head(10)

In [None]:
print("Earliest publication: {}".format(df.publication_date.min()))
print("Most recent publication: {}".format(df.publication_date.max()))

In [None]:
sns.set_context('poster')
plt.figure(figsize=(20,15))
books = df['title'].value_counts()[:10]
sns.barplot(x = books, y = books.index, palette='rocket')
plt.title("Most Occurring Books")
plt.xlabel("Number of occurances")
plt.ylabel("Books")
plt.show()

In [None]:
most_rated_book = df.sort_values('ratings_count', ascending = False).head(50).set_index('title')
plt.figure(figsize=(30,23))
sns.barplot(most_rated_book['ratings_count'], most_rated_book.index, palette='icefire')

In [None]:
sns.set_context('poster')
plt.figure(figsize=(20,20))
authors = df['authors'].value_counts()[:50]
sns.barplot(x = authors, y = authors.index, palette='mako')
plt.title("Most Occurring Authors")
plt.xlabel("Number of occurances")
plt.ylabel("Authors")
plt.show()

In [None]:
#Issac Asimov Books
asimov = df[(df.authors == 'Isaac Asimov')]
asimov.head(15)

**Books with the highest average ratings (with over 2,000,000 ratings)**

In [None]:
high_average_rating =df[df['ratings_count'] > 2000000]
high_average_rating = high_average_rating.sort_values('average_rating',ascending=False).head(25).set_index('title')
plt.subplots(figsize=(20,15))
ax = high_average_rating['average_rating'].sort_values().plot.barh(width=0.9,color=sns.color_palette('rocket',12))
ax.set_xlabel("Average rating ", fontsize=15)
ax.set_ylabel("Books", fontsize=15)
ax.set_title("Top 10 books with highest average rating",fontsize=20,color='black')
totals = []
for i in ax.patches:
    totals.append(i.get_width())
total = sum(totals)
for i in ax.patches:
    ax.text(i.get_width()+.05, i.get_y()+.2,str(round(i.get_width())), fontsize=15,color='black')
plt.show()