# Book Rental Recommendation
### DESCRIPTION

BookRent is the largest online and offline book rental chain in India. The company charges a fixed rental fee for a book per month. Lately, the company has been losing its user base.
The main reason for this is that users are not able to choose the right books for themselves. The company wants to solve this problem and increase its revenue and profit.

### Objective
You, as an ML expert, have to model a recommendation engine so that users get recommendations for books based on the behavior of similar users. This will ensure that users are renting books based on their individual tastes.

### Actions to Perform

- Read the books dataset and explore it.
- Clean up NaN values.
- Read the data where ratings are given by users.
- Take a quick look at the number of unique users and books.
- Convert ISBN to numeric numbers in the correct order.
- Do the same for user_id. Convert it into numeric order.
- Convert both user_id and ISBN to the ordered list i.e. from 0...n-1.
- Re-index columns to build matrix later on.
- Split your data into two sets (training and testing).
- Calculate the cosine similarity.
- Use the evaluation metrics to make predictions.

In [1]:
# Import Required Libraries
import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split

In [2]:
# import user dataset
df_user = pd.read_csv('BX-Users.csv', encoding='latin-1')

  interactivity=interactivity, compiler=compiler, result=result)


In [3]:
df_user.head()

Unnamed: 0,user_id,Location,Age
0,1,"nyc, new york, usa",
1,2,"stockton, california, usa",18.0
2,3,"moscow, yukon territory, russia",
3,4,"porto, v.n.gaia, portugal",17.0
4,5,"farnborough, hants, united kingdom",


In [4]:
df_user.isna().sum()

user_id          0
Location         1
Age         110763
dtype: int64

In [5]:
df_user.shape

(278859, 3)

In [6]:
# Read the books dataset and explore it.
df_books = pd.read_csv('BX-Books.csv', encoding='latin-1')

  interactivity=interactivity, compiler=compiler, result=result)


In [7]:
df_books.head()

Unnamed: 0,isbn,book_title,book_author,year_of_publication,publisher
0,195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press
1,2005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada
2,60973129,Decision in Normandy,Carlo D'Este,1991,HarperPerennial
3,374157065,Flu: The Story of the Great Influenza Pandemic...,Gina Bari Kolata,1999,Farrar Straus Giroux
4,393045218,The Mummies of Urumchi,E. J. W. Barber,1999,W. W. Norton &amp; Company


In [8]:
# Read the data where ratings are given by users.
df_rating = pd.read_csv('BX-Book-Ratings.csv', encoding='latin-1', nrows=10000)

In [9]:
df_rating

Unnamed: 0,user_id,isbn,rating
0,276725,034545104X,0
1,276726,155061224,5
2,276727,446520802,0
3,276729,052165615X,3
4,276729,521795028,6
...,...,...,...
9995,243,425164403,0
9996,243,440224764,0
9997,243,440225701,0
9998,243,440226430,0


In [10]:
df_rating.describe()

Unnamed: 0,user_id,rating
count,10000.0,10000.0
mean,265844.3796,1.9747
std,56937.189618,3.424884
min,2.0,0.0
25%,277478.0,0.0
50%,278418.0,0.0
75%,278418.0,4.0
max,278854.0,10.0


In [11]:
# Merge the dataset

df = pd.merge(df_rating, df_books, on = 'isbn')

In [12]:
df.head()

Unnamed: 0,user_id,isbn,rating,book_title,book_author,year_of_publication,publisher
0,276725,034545104X,0,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books
1,276726,155061224,5,Rites of Passage,Judith Rae,2001,Heinle
2,276727,446520802,0,The Notebook,Nicholas Sparks,1996,Warner Books
3,278418,446520802,0,The Notebook,Nicholas Sparks,1996,Warner Books
4,276729,052165615X,3,Help!: Level 1,Philip Prowse,1999,Cambridge University Press


In [13]:
# Now, let's take a quick look at the number of unique users and books.
n_users = df.user_id.unique().shape[0]
n_users

828

In [14]:
n_books = df.isbn.unique().shape[0]
n_books

8051

In [15]:
# Convert ISBN to numeric numbers in order
isbn_list = df.isbn.unique()

In [16]:
isbn_list

array(['034545104X', '155061224', '446520802', ..., '425098834',
       '425163407', '425164403'], dtype=object)

In [17]:
isbn_list.shape

(8051,)

In [18]:
def get_isbn_numeric_id(isbn):
    itemindex = np.where(isbn_list==isbn)
    return itemindex[0][0]

In [19]:
# Do the same for user_id. Convert it into numeric order
userid_list = df.user_id.unique()
def get_user_id_numeric_id(user_id):
    itemindex = np.where(userid_list==user_id)
    return itemindex[0][0]

In [20]:
# Converting both user_id and isbn to ordered list i.e. from 0...n-1
df['user_id_order'] = df['user_id'].apply(get_user_id_numeric_id)

In [21]:
df['isbn_order'] = df['isbn'].apply(get_isbn_numeric_id)

In [22]:
df.head()

Unnamed: 0,user_id,isbn,rating,book_title,book_author,year_of_publication,publisher,user_id_order,isbn_order
0,276725,034545104X,0,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books,0,0
1,276726,155061224,5,Rites of Passage,Judith Rae,2001,Heinle,1,1
2,276727,446520802,0,The Notebook,Nicholas Sparks,1996,Warner Books,2,2
3,278418,446520802,0,The Notebook,Nicholas Sparks,1996,Warner Books,3,2
4,276729,052165615X,3,Help!: Level 1,Philip Prowse,1999,Cambridge University Press,4,3


In [23]:
# Re-index columns to build matrix later on.
new_col_order = ['user_id_order', 'isbn_order', 'rating', 'book_title', 'book_author','year_of_publication','publisher','isbn','user_id']
df = df.reindex(columns= new_col_order)
df.head()

Unnamed: 0,user_id_order,isbn_order,rating,book_title,book_author,year_of_publication,publisher,isbn,user_id
0,0,0,0,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books,034545104X,276725
1,1,1,5,Rites of Passage,Judith Rae,2001,Heinle,155061224,276726
2,2,2,0,The Notebook,Nicholas Sparks,1996,Warner Books,446520802,276727
3,3,2,0,The Notebook,Nicholas Sparks,1996,Warner Books,446520802,278418
4,4,3,3,Help!: Level 1,Philip Prowse,1999,Cambridge University Press,052165615X,276729


## Train Test Split

In [24]:
train_data, test_data = train_test_split(df, test_size = 0.3)

### Approach: You Will Use Memory-Based Collaborative Filtering
Memory-Based Collaborative Filtering approaches can be divided into two main sections: user-item filtering and item-item filtering.

A user-item filtering will take a particular user, find users that are similar to that user based on similarity of ratings, and recommend items that those similar users liked.

In contrast, item-item filtering will take an item, find users who liked that item, and find other items that those users or similar users also liked. It takes items as input and outputs other items as recommendations.

Item-Item Collaborative Filtering: “Users who liked this item also liked …”
User-Item Collaborative Filtering: “Users who are similar to you also liked …”
In both cases, you create a user-book matrix which is built from the entire dataset.

Since you have split the data into testing and training, you will need to create two [828 x 8051] matrices (all users by all books). This is going to be a very large matrix

The training matrix contains 70% of the ratings and the testing matrix contains 30% of the ratings.

In [25]:
train_data_matrix = np.zeros((n_users, n_books))
for line in train_data.itertuples():
    train_data_matrix[line[1]-1, line[2]-1] = line[3]
train_data_matrix

array([[5., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

In [26]:
test_data_matrix = np.zeros((n_users, n_books))
for line in test_data.itertuples():
    test_data_matrix[line[1]-1, line[2]-1] = line[3]
test_data_matrix

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

In [27]:
from sklearn.metrics import pairwise_distances
user_similarity = pairwise_distances(train_data_matrix, metric ='cosine')
item_similarity = pairwise_distances(train_data_matrix.T, metric = 'cosine')

In [28]:
user_similarity

array([[0., 1., 1., ..., 1., 1., 1.],
       [1., 0., 1., ..., 1., 1., 1.],
       [1., 1., 0., ..., 1., 1., 1.],
       ...,
       [1., 1., 1., ..., 0., 1., 1.],
       [1., 1., 1., ..., 1., 0., 1.],
       [1., 1., 1., ..., 1., 1., 0.]])

In [29]:
# Make predictions
def predict(ratings, similarity, type='user'):
    if type == 'user':
        mean_user_rating = ratings.mean(axis=1)
        #You use np.newaxis so that mean_user_rating has same format as ratings
        ratings_diff = (ratings - mean_user_rating[:, np.newaxis]) 
        pred = mean_user_rating[:, np.newaxis] + similarity.dot(ratings_diff) / np.array([np.abs(similarity).sum(axis=1)]).T
    elif type == 'item':
        pred = ratings.dot(similarity) / np.array([np.abs(similarity).sum(axis=1)])     
    return pred

In [30]:
item_prediction = predict(train_data_matrix, item_similarity, type='item')
user_prediction = predict(train_data_matrix, user_similarity, type='user')

## Evaluation


In [31]:
from sklearn.metrics import mean_squared_error
from math import sqrt
def rmse(prediction, ground_truth):
    prediction = prediction[ground_truth.nonzero()].flatten() 
    ground_truth = ground_truth[ground_truth.nonzero()].flatten()
    return sqrt(mean_squared_error(prediction, ground_truth))

In [32]:
print('User-based CF RMSE: ' + str(rmse(user_prediction, test_data_matrix)))
print('Item-based CF RMSE: ' + str(rmse(item_prediction, test_data_matrix)))

User-based CF RMSE: 7.623293778065466
Item-based CF RMSE: 7.622635459579946


# Thank you