Modern day recommender systems should exploit all available interactions both explicit (e.g. numerical ratings) and implicit (e.g. likes, purchases, skipped, bookmarked). To this end SVD++ was designed to take into account implicit interactions as well. <p><p>
I have used this algorithm on the 'user_product_review' dataset which is extracted from the orginal dataset.


In [1]:
#importing libraries
import surprise 
import numpy as np
import pandas as pd

In [2]:
#importing dataset
dataset = pd.read_csv('user_product_review.csv')
dataset = dataset.drop_duplicates()
dataset.head()

Unnamed: 0,customer_id,product_id,review_score
0,3ce436f183e68e07877b285a838db11a,4244733e06e7ecb4970a6e2683c13e61,5
1,f6dd3ec061db4e3987629fe6b26e5cce,e5f2d52b802189ee658865ca93d83a8f,4
2,6489ae5e4333f3693df5ad4372dab6d3,c777355d18b72b67abbeef9df44fd0fd,5
3,d4eb9395c8c0431ee92fce09860c5a06,7634da152a4610f1595efa32f14722fc,4
4,58dbd0b2d70206bf40e62cd34e84d795,ac6c3623068f30de03045865e4e10089,5


In [3]:
#finding the range of our ratings
lower_rating  = dataset['review_score'].min()
upper_rating = dataset['review_score'].max()
print('Review range {0} to {1}'.format(lower_rating,upper_rating))

Review range 1 to 5


In [4]:
#defining a reader class based on the range of our ratings and loading the dataset
reader = surprise.Reader(rating_scale = (1,5))
data = surprise.Dataset.load_from_df(dataset,reader)

In [5]:
#implementing SVD++ algorithm on the dataset
alg = surprise.SVDpp()
output = alg.fit(data.build_full_trainset())
#For now we have just trained the model on the whole dataset, which is not good practice.

In [6]:
#Checking predict score for customer_id '3ce436f183e68e07877b285a838db11a' on product_id '777d2e438a1b645f3aec9bd57e92672c'.
pred = alg.predict(uid='3ce436f183e68e07877b285a838db11a', iid='777d2e438a1b645f3aec9bd57e92672c')
score = pred.est
print(score)

3.8705551390071062


In [7]:
#making recomendations for customer_id  '3ce436f183e68e07877b285a838db11a'
#geting the list of all product_id.
pids = dataset['product_id'].unique()
#getting the list of product_id customer_id '3ce436f183e68e07877b285a838db11a' has rated.
pids_cus = dataset.loc[dataset['customer_id']=='3ce436f183e68e07877b285a838db11a' ,'product_id']
#removing the product_id that customer_id '3ce436f183e68e07877b285a838db11a' has rated.
pids_to_pred = np.setdiff1d(pids,pids_cus)


In [11]:
#making a new dataset for the product_ids that customer_id '3ce436f183e68e07877b285a838db11a' has not rated
testset = [['3ce436f183e68e07877b285a838db11a', pid, 5] for pid in pids_to_pred]
#making a predictions
prediction  = alg.test(testset)
#first prediction
prediction[0]

Prediction(uid='3ce436f183e68e07877b285a838db11a', iid='00066f42aeeb9f3007548bb9d3f33c38', r_ui=5, est=4.274855770321975, details={'was_impossible': False})

In [14]:
pred_ratings = np.array([pred.est for pred in prediction])
#finding the index of maximum rating
i_max = pred_ratings.argmax()
#using the index to find the top item to recommend 
pid = pids_to_pred[i_max]
print('Top item for user_id 3ce436f183e68e07877b285a838db11a has product_id {0} with prediction rating {1}'.format(pid,pred_ratings[i_max]))

Top item for user_id 3ce436f183e68e07877b285a838db11a has product_id 17a019676883dce326999c11a46a14f0 with prediction rating 4.903366709591662


In [15]:
#tuning the learning rate and regularisation of the algorithm
parm_grid = {'lr_all': [.001, .01], 'reg_all' : [.1, .5]}
gs = surprise.model_selection.GridSearchCV(surprise.SVDpp,parm_grid, measures =['rmse', 'mae'], cv=3)
gs.fit(data)
#print combination of parameters that gave best rmse score
print(gs.best_params['rmse'])

{'lr_all': 0.01, 'reg_all': 0.5}


In [16]:
#evaluating the performance of the algorithm
alg = surprise.SVDpp(lr_all = .001)
output = surprise.model_selection.cross_validate(alg, data , verbose = True)

Evaluating RMSE, MAE of algorithm SVDpp on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    1.3709  1.3446  1.3610  1.3438  1.3488  1.3538  0.0105  
MAE (testset)     1.0814  1.0597  1.0708  1.0610  1.0651  1.0676  0.0079  
Fit time          7.05    7.62    7.42    8.02    7.34    7.49    0.32    
Test time         0.11    0.30    0.11    0.11    0.11    0.15    0.08    
