### Book Rental Recommendation
BookRent is the largest online and offline book rental chain in India. The company charges a fixed rental fee for a book per month. Lately, the company has been losing its user base.
The main reason for this is that users are not able to choose the right books for themselves. The company wants to solve this problem and increase its revenue and profit.
It is required to model a recommendation engine so that users get recommendations for books based on the behavior of similar users. This will ensure that users are renting books based on their individual tastes.

In [1]:
#importing libraries
import numpy as np
import pandas as pd

In [2]:
#reading the user dataset
df_user = pd.read_csv('BX-Users.csv',encoding='latin-1')
df_user.head()

  interactivity=interactivity, compiler=compiler, result=result)


Unnamed: 0,user_id,Location,Age
0,1,"nyc, new york, usa",
1,2,"stockton, california, usa",18.0
2,3,"moscow, yukon territory, russia",
3,4,"porto, v.n.gaia, portugal",17.0
4,5,"farnborough, hants, united kingdom",


In [3]:
#reading the books dataset
df_books = pd.read_csv('BX-Books.csv', encoding='latin-1')
df_books.head()

  interactivity=interactivity, compiler=compiler, result=result)


Unnamed: 0,isbn,book_title,book_author,year_of_publication,publisher
0,195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press
1,2005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada
2,60973129,Decision in Normandy,Carlo D'Este,1991,HarperPerennial
3,374157065,Flu: The Story of the Great Influenza Pandemic...,Gina Bari Kolata,1999,Farrar Straus Giroux
4,393045218,The Mummies of Urumchi,E. J. W. Barber,1999,W. W. Norton &amp; Company


In [4]:
#reading the book ratings dataset
df_ratings = pd.read_csv('BX-Book-Ratings.csv',encoding='latin-1',nrows=10000)
df_ratings.head()

Unnamed: 0,user_id,isbn,rating
0,276725,034545104X,0
1,276726,155061224,5
2,276727,446520802,0
3,276729,052165615X,3
4,276729,521795028,6


In [5]:
#printing the shape of the loaded datasets
print(df_user.shape,df_books.shape,df_ratings.shape)

(278859, 3) (271379, 5) (10000, 3)


In [6]:
#merging the ratings and book dataset
df_comb=pd.merge(df_ratings,df_books,on='isbn')
df_comb.head()

Unnamed: 0,user_id,isbn,rating,book_title,book_author,year_of_publication,publisher
0,276725,034545104X,0,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books
1,276726,155061224,5,Rites of Passage,Judith Rae,2001,Heinle
2,276727,446520802,0,The Notebook,Nicholas Sparks,1996,Warner Books
3,278418,446520802,0,The Notebook,Nicholas Sparks,1996,Warner Books
4,276729,052165615X,3,Help!: Level 1,Philip Prowse,1999,Cambridge University Press


In [7]:
df_comb.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 8701 entries, 0 to 8700
Data columns (total 7 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   user_id              8701 non-null   int64 
 1   isbn                 8701 non-null   object
 2   rating               8701 non-null   int64 
 3   book_title           8701 non-null   object
 4   book_author          8701 non-null   object
 5   year_of_publication  8701 non-null   object
 6   publisher            8701 non-null   object
dtypes: int64(2), object(5)
memory usage: 543.8+ KB


In [8]:
#determining the umique number of users and books
no_users=df_comb['user_id'].nunique()
no_books=df_comb['isbn'].nunique()
print('Total Unique Users:%d'%no_users)
print('Total Unique Books:%d'%no_books)

Total Unique Users:828
Total Unique Books:8051


In [9]:
#numbering the user id and isbn 
users=df_comb['user_id'].unique()
books=df_comb['isbn'].unique()

In [10]:
#Function for numbering user id 
def user_seq(u):
    user_ind=np.where(users==u)
    return user_ind[0][0]

In [11]:
#Function for numbering isbn 
def item_seq(i):
    item_ind=np.where(books==i)
    return item_ind[0][0]

In [12]:
#Applying the function
df_comb['user_n']=df_comb['user_id'].apply(user_seq)
df_comb['item_n']=df_comb['isbn'].apply(item_seq)
df_comb.head()

Unnamed: 0,user_id,isbn,rating,book_title,book_author,year_of_publication,publisher,user_n,item_n
0,276725,034545104X,0,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books,0,0
1,276726,155061224,5,Rites of Passage,Judith Rae,2001,Heinle,1,1
2,276727,446520802,0,The Notebook,Nicholas Sparks,1996,Warner Books,2,2
3,278418,446520802,0,The Notebook,Nicholas Sparks,1996,Warner Books,3,2
4,276729,052165615X,3,Help!: Level 1,Philip Prowse,1999,Cambridge University Press,4,3


In [13]:
#reordering the columns
new_col_order = ['user_n', 'item_n', 'rating', 'book_title', 'book_author','year_of_publication','publisher','isbn','user_id']
df_comb = df_comb.reindex(columns= new_col_order)
df_comb.head()

Unnamed: 0,user_n,item_n,rating,book_title,book_author,year_of_publication,publisher,isbn,user_id
0,0,0,0,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books,034545104X,276725
1,1,1,5,Rites of Passage,Judith Rae,2001,Heinle,155061224,276726
2,2,2,0,The Notebook,Nicholas Sparks,1996,Warner Books,446520802,276727
3,3,2,0,The Notebook,Nicholas Sparks,1996,Warner Books,446520802,278418
4,4,3,3,Help!: Level 1,Philip Prowse,1999,Cambridge University Press,052165615X,276729


In [14]:
#splitting the data into train and test datasets
from sklearn.model_selection import train_test_split
train_data, test_data = train_test_split(df_comb, test_size=0.30)

In [15]:
#creating training and testing user-book matrices
train_mat = np.zeros((no_users, no_books))
for x in train_data.itertuples():
    train_mat[x[1]-1, x[2]-1] = x[3]  

test_mat = np.zeros((no_users, no_books))
for y in test_data.itertuples():
    test_mat[y[1]-1, y[2]-1] = y[3]

In [16]:
#using pairwise distance to calculate cosine similarity
from sklearn.metrics.pairwise import pairwise_distances
user_similarity = pairwise_distances(train_mat, metric='cosine')
item_similarity = pairwise_distances(train_mat.T, metric='cosine')

In [17]:
#making predictions
def predict(ratings, similarity, type='user'):
    if type == 'user':
        mean_user_rating = ratings.mean(axis=1)
        #np.newaxis is used so that mean_user_rating has same format as ratings
        ratings_diff = (ratings - mean_user_rating[:, np.newaxis]) 
        pred = mean_user_rating[:, np.newaxis] + similarity.dot(ratings_diff) / np.array([np.abs(similarity).sum(axis=1)]).T
    elif type == 'item':
        pred = ratings.dot(similarity) / np.array([np.abs(similarity).sum(axis=1)])     
    return pred

In [18]:
#making predictions
item_pred = predict(train_mat, item_similarity, type='item')
user_pred = predict(train_mat, user_similarity, type='user')

In [19]:
#Performance Evaluation
from sklearn.metrics import mean_squared_error

In [20]:
#defining the RMSE function
def rmse(prediction, test):
    prediction = prediction[test.nonzero()].flatten() 
    test = test[test.nonzero()].flatten()
    return np.sqrt(mean_squared_error(prediction, test))

In [31]:
print('User Based RMSE:%f'%rmse(user_pred, test_mat))
print('Item Based RMSE:%f'%rmse(item_pred, test_mat))

User Based RMSE:7.644888
Item Based RMSE:7.644208
