<H1>BOOK RECOMMENDATION SYSTEM</H1>
<p>This is a simple collaberative filtering based machine learning model that recommends book titles to the users.</p>

<h2>STAGE 1: IMPORTING THE DEPENDENCIES</h2>

In [1]:
import numpy as np
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity

In [2]:
book = pd.read_csv('Books.csv')
user = pd.read_csv('Users.csv')
rat = pd.read_csv('Ratings.csv')

  book = pd.read_csv('Books.csv')


<h2>STAGE 2: ANALYSING THE DATA</h2>
<p>In this stage we would try to identify and eliminate the rows with null column to ensure the integrity of other book titles recommendations.</p>


In [3]:
book.isnull().sum()

ISBN                   0
Book-Title             0
Book-Author            1
Year-Of-Publication    0
Publisher              2
Image-URL-S            0
Image-URL-M            0
Image-URL-L            3
dtype: int64

In [4]:
rat.isnull().sum()

User-ID        0
ISBN           0
Book-Rating    0
dtype: int64

<p>Only value that rows that needs to eliminated are from the book csv file as ratings csv file does not consist of any anomly and the user csv file only has the parameter of age missing which is of no concern to us in this project.</p>

<p>Now let us check for duplicate values in this model, any such value will cause error in the model and therefore has to be removed.</p>

In [5]:
book.duplicated().sum()

0

In [6]:
user.duplicated().sum()

0

In [7]:
rat.duplicated().sum()

0

In [8]:
book.describe()

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L
count,271360,271360,271359,271360,271358,271360,271360,271357
unique,271360,242135,102023,202,16807,271044,271044,271041
top,195153448,Selected Poems,Agatha Christie,2002,Harlequin,http://images.amazon.com/images/P/185326119X.0...,http://images.amazon.com/images/P/185326119X.0...,http://images.amazon.com/images/P/225307649X.0...
freq,1,27,632,13903,7535,2,2,2


## POPULARITY BASED BOOK RECOMMENDATION SYSTEM

<p>For this we will be returning the top 20 books with more that 500 votes and averaging the highest rating incomparison with other books</p>

In [9]:
ratings = rat.merge(book, on = 'ISBN')

In [10]:
num_of_rat= ratings.groupby('Book-Title').count()['Book-Rating'].reset_index()
num_of_rat.rename(columns = {'Book-Rating':'Num_of_Votes'}, inplace = True)
num_of_rat

Unnamed: 0,Book-Title,Num_of_Votes
0,A Light in the Storm: The Civil War Diary of ...,4
1,Always Have Popsicles,1
2,Apple Magic (The Collector's series),1
3,"Ask Lily (Young Women of Faith: Lily Series, ...",1
4,Beyond IBM: Leadership Marketing and Finance ...,1
...,...,...
241066,Ã?Â?lpiraten.,2
241067,Ã?Â?rger mit Produkt X. Roman.,4
241068,Ã?Â?sterlich leben.,1
241069,Ã?Â?stlich der Berge.,3


In [11]:
avg_of_rat= ratings.groupby('Book-Title').mean()['Book-Rating'].reset_index()
avg_of_rat.rename(columns = {'Book-Rating':'Average_Rating'}, inplace = True)
avg_of_rat

Unnamed: 0,Book-Title,Average_Rating
0,A Light in the Storm: The Civil War Diary of ...,2.250000
1,Always Have Popsicles,0.000000
2,Apple Magic (The Collector's series),0.000000
3,"Ask Lily (Young Women of Faith: Lily Series, ...",8.000000
4,Beyond IBM: Leadership Marketing and Finance ...,0.000000
...,...,...
241066,Ã?Â?lpiraten.,0.000000
241067,Ã?Â?rger mit Produkt X. Roman.,5.250000
241068,Ã?Â?sterlich leben.,7.000000
241069,Ã?Â?stlich der Berge.,2.666667


In [12]:
pop_df = num_of_rat.merge(avg_of_rat, on = 'Book-Title')
pop_df

Unnamed: 0,Book-Title,Num_of_Votes,Average_Rating
0,A Light in the Storm: The Civil War Diary of ...,4,2.250000
1,Always Have Popsicles,1,0.000000
2,Apple Magic (The Collector's series),1,0.000000
3,"Ask Lily (Young Women of Faith: Lily Series, ...",1,8.000000
4,Beyond IBM: Leadership Marketing and Finance ...,1,0.000000
...,...,...,...
241066,Ã?Â?lpiraten.,2,0.000000
241067,Ã?Â?rger mit Produkt X. Roman.,4,5.250000
241068,Ã?Â?sterlich leben.,1,7.000000
241069,Ã?Â?stlich der Berge.,3,2.666667


In [13]:
popular_df = pop_df[pop_df['Num_of_Votes']>=500].sort_values('Average_Rating', ascending = False).head(20)

In [14]:
popular_df = popular_df.merge(book,on = 'Book-Title').drop_duplicates('Book-Title')[['Book-Title', 'Book-Author' ,'Image-URL-L' , 'Num_of_Votes' , 'Average_Rating']]

In [15]:
popular_df['Image-URL-L'][0]

'http://images.amazon.com/images/P/0439064872.01.LZZZZZZZ.jpg'

## COLLABORATIVE FILTERING BASED BOOK RECOMMENDATION SYSTEM

In [16]:
x = ratings.groupby('User-ID').count()['Book-Rating'] > 220
critics = x[x].index

In [17]:
filter_rating = ratings[ratings['User-ID'].isin(critics)]

In [18]:
a = filter_rating.groupby('Book-Title').count()['Book-Rating']>= 50
famous_books = a[a].index

In [19]:
final_rat = filter_rating[filter_rating['Book-Title'].isin(famous_books)]

In [20]:
final_rat.drop_duplicates()

Unnamed: 0,User-ID,ISBN,Book-Rating,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L
63,278418,0446520802,0,The Notebook,Nicholas Sparks,1996,Warner Books,http://images.amazon.com/images/P/0446520802.0...,http://images.amazon.com/images/P/0446520802.0...,http://images.amazon.com/images/P/0446520802.0...
65,3363,0446520802,0,The Notebook,Nicholas Sparks,1996,Warner Books,http://images.amazon.com/images/P/0446520802.0...,http://images.amazon.com/images/P/0446520802.0...,http://images.amazon.com/images/P/0446520802.0...
66,7158,0446520802,10,The Notebook,Nicholas Sparks,1996,Warner Books,http://images.amazon.com/images/P/0446520802.0...,http://images.amazon.com/images/P/0446520802.0...,http://images.amazon.com/images/P/0446520802.0...
69,11676,0446520802,10,The Notebook,Nicholas Sparks,1996,Warner Books,http://images.amazon.com/images/P/0446520802.0...,http://images.amazon.com/images/P/0446520802.0...,http://images.amazon.com/images/P/0446520802.0...
74,23768,0446520802,6,The Notebook,Nicholas Sparks,1996,Warner Books,http://images.amazon.com/images/P/0446520802.0...,http://images.amazon.com/images/P/0446520802.0...,http://images.amazon.com/images/P/0446520802.0...
...,...,...,...,...,...,...,...,...,...,...
1026724,266865,0531001725,10,The Catcher in the Rye,Jerome David Salinger,1973,Scholastic Library Pub,http://images.amazon.com/images/P/0531001725.0...,http://images.amazon.com/images/P/0531001725.0...,http://images.amazon.com/images/P/0531001725.0...
1027923,269566,0670809381,0,Echoes,Maeve Binchy,1986,Penguin USA,http://images.amazon.com/images/P/0670809381.0...,http://images.amazon.com/images/P/0670809381.0...,http://images.amazon.com/images/P/0670809381.0...
1028777,271284,0440910927,0,The Rainmaker,John Grisham,1995,Island,http://images.amazon.com/images/P/0440910927.0...,http://images.amazon.com/images/P/0440910927.0...,http://images.amazon.com/images/P/0440910927.0...
1029070,271705,B0001PIOX4,0,Fahrenheit 451,Ray Bradbury,1993,Simon &amp; Schuster,http://images.amazon.com/images/P/B0001PIOX4.0...,http://images.amazon.com/images/P/B0001PIOX4.0...,http://images.amazon.com/images/P/B0001PIOX4.0...


In [21]:
pt = final_rat.pivot_table(index='Book-Title', columns='User-ID',values='Book-Rating')

In [22]:
pt.fillna(0, inplace=True)

In [23]:
pt

User-ID,254,2276,2766,2977,3363,4385,6251,6543,6575,7158,...,271705,273979,274004,274061,274301,274308,275970,277427,277639,278418
Book-Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1984,9.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,10.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1st to Die: A Novel,0.0,0.0,0.0,0.0,0.0,0.0,0.0,9.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2nd Chance,0.0,10.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4 Blondes,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
A Bend in the Road,0.0,0.0,7.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Year of Wonders,0.0,0.0,0.0,7.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,9.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
You Belong To Me,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Zen and the Art of Motorcycle Maintenance: An Inquiry into Values,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Zoya,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [24]:
ss = cosine_similarity(pt)

In [25]:
def rec(book_name):
    #index fetching
    index = np.where(pt.index == book_name)[0][0]
    similar_item = sorted(list(enumerate(ss[index])), key = lambda x:x[1], reverse =True)[1:6]
    
    data=[]
    for i in similar_item:
        item = []
        temp = book[book['Book-Title'] == pt.index[i[0]]]
        item.extend(list(temp.drop_duplicates('Book-Title')['Book-Title'].values))
        item.extend(list(temp.drop_duplicates('Book-Title')['Book-Author'].values))
        item.extend(list(temp.drop_duplicates('Book-Title')['Image-URL-L'].values))
        data.append(item)
    return data

In [26]:
rec('1984')

[["The Handmaid's Tale",
  'Margaret Atwood',
  'http://images.amazon.com/images/P/0449212602.01.LZZZZZZZ.jpg'],
 ['Animal Farm',
  'George Orwell',
  'http://images.amazon.com/images/P/0451526341.01.LZZZZZZZ.jpg'],
 ['The Vampire Lestat (Vampire Chronicles, Book II)',
  'ANNE RICE',
  'http://images.amazon.com/images/P/0345313860.01.LZZZZZZZ.jpg'],
 ['Brave New World',
  'Aldous Huxley',
  'http://images.amazon.com/images/P/0060809833.01.LZZZZZZZ.jpg'],
 ['The Hours : A Novel',
  'Michael Cunningham',
  'http://images.amazon.com/images/P/0312243022.01.LZZZZZZZ.jpg']]

In [27]:
import pickle
pickle.dump(popular_df,open('pop.pkl','wb'))

In [28]:
pickle.dump(pt,open('pt.pkl' , 'wb'))
pickle.dump(book,open('book.pkl' , 'wb'))
pickle.dump(ss,open('ss.pkl' , 'wb'))