## Module 11 - Case study 1

Business challenge/requirement:

BookRent is the largest online and offline book rental chain in India. The Company charges a fixed fee per month plus rental per book. So, the company makes more money when user rent more books.
You as an ML expert have to model recommendation engine so that user gets recommendation of books based on the behavior of similar users. This will ensure that users are renting books based on their individual taste.
Company is still unprofitable and is looking to improve both revenue and profit.

Key issues:

As of now lot users return book and do not take new rental. Right recommendation will entice user to rent more books

Data volume
- Approx 1 M records – file BX-Book-Ratings.csv and 2 more. But only 10K records will be used

Fields in Data
• user_id: Unique Id of the User
• isbn: International Standard Book Number is a unique numeric commercial book identifier
• rating: rating given by user

In [97]:
import numpy as np
import pandas as pd

In [158]:
df = pd.read_csv(r'D:\E\Courses\Edureka\Assignments\Dataset\module11\BX-Book-Ratings.csv', nrows=10000)

In [159]:
df.head(2)

Unnamed: 0,user_id,isbn,rating
0,276725,034545104X,0
1,276726,155061224,5


In [163]:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
df['isbn_new'] = le.fit_transform(df['isbn'])

In [164]:
df['user_id_new'] = le.fit_transform(df['user_id'])

In [165]:
df.head(2)

Unnamed: 0,user_id,isbn,rating,user_id_new,isbn_new
0,276725,034545104X,0,91,124
1,276726,155061224,5,92,840


In [166]:
from sklearn.model_selection import train_test_split
n_users = df.user_id_new.unique().shape[0] 
n_books = df.isbn_new.unique().shape[0]
train_data, test_data = train_test_split(df, test_size=0.25)

In [167]:
train_data_matrix = np.zeros((n_users, n_books))
for line in train_data.itertuples():
    train_data_matrix[line[4]-1, line[5]-1] = line[3] 
train_data_matrix

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

In [174]:
test_data_matrix = np.zeros((n_users, n_books))
for line in test_data.itertuples():
    test_data_matrix[line[4]-1, line[5]-1] = line[3]
test_data_matrix

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

In [168]:
from sklearn.metrics import pairwise_distances
user_similarity = pairwise_distances(train_data_matrix, metric='cosine')
book_similarity = pairwise_distances(train_data_matrix.T, metric='cosine')
mean_user_rating = train_data_matrix.mean(axis=1)[:, np.newaxis] 
ratings_diff = (train_data_matrix - mean_user_rating) 
user_pred = mean_user_rating + user_similarity.dot(ratings_diff) / np.array([np.abs(user_similarity).sum(axis=1)]).T
user_pred

array([[-0.00167899, -0.00167899,  0.02491675, ...,  0.00257633,
         0.00576782, -0.00167899],
       [ 0.00132366,  0.00132366,  0.02791941, ...,  0.00557898,
         0.00877047,  0.00132366],
       [-0.00167899, -0.00167899,  0.02491675, ...,  0.00257633,
         0.00576782, -0.00167899],
       ...,
       [-0.00082109, -0.00082109,  0.02577465, ...,  0.00343423,
         0.00662572, -0.00082109],
       [ 0.00132366,  0.00132366,  0.02791941, ...,  0.00557898,
         0.00877047,  0.00132366],
       [-0.00167899, -0.00167899,  0.02491675, ...,  0.00257633,
         0.00576782, -0.00167899]])

In [170]:
books_pred = train_data_matrix.dot(book_similarity) / np.array([np.abs(book_similarity).sum(axis=1)])
books_pred

array([[0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.00299979, 0.00299979, 0.00302747, ..., 0.00300043, 0.00299979,
        0.00299979],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       ...,
       [0.00085708, 0.00085708, 0.00086499, ..., 0.00085727, 0.00085708,
        0.00085708],
       [0.00299979, 0.00299979, 0.00302747, ..., 0.00300043, 0.00299979,
        0.00299979],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ]])

In [171]:
from sklearn.metrics import mean_squared_error
from math import sqrt
def rmse(pred, test):
    pred = pred[test.nonzero()].flatten() 
    test = test[test.nonzero()].flatten()
    return sqrt(mean_squared_error(pred, test))

In [175]:
rmse(user_pred, test_data_matrix)

7.69019752882995

In [176]:
rmse(books_pred, test_data_matrix)

7.689775858838727