# **Working With Surprise Library** 

*Author: Kunal PATIL (AIS S20)*

Created with Google Colab

# Data Loading


Install Surprise Library

In [1]:
pip install surprise

Collecting surprise
  Downloading https://files.pythonhosted.org/packages/61/de/e5cba8682201fcf9c3719a6fdda95693468ed061945493dea2dd37c5618b/surprise-0.1-py2.py3-none-any.whl
Collecting scikit-surprise
[?25l  Downloading https://files.pythonhosted.org/packages/97/37/5d334adaf5ddd65da99fc65f6507e0e4599d092ba048f4302fe8775619e8/scikit-surprise-1.1.1.tar.gz (11.8MB)
[K     |████████████████████████████████| 11.8MB 290kB/s 
Building wheels for collected packages: scikit-surprise
  Building wheel for scikit-surprise (setup.py) ... [?25l[?25hdone
  Created wheel for scikit-surprise: filename=scikit_surprise-1.1.1-cp36-cp36m-linux_x86_64.whl size=1618294 sha256=7b5822879be1b2743b6e4ac7c07368040684a4cedde2f79f49e219fb1c6f233c
  Stored in directory: /root/.cache/pip/wheels/78/9c/3d/41b419c9d2aff5b6e2b4c0fc8d25c538202834058f9ed110d0
Successfully built scikit-surprise
Installing collected packages: scikit-surprise, surprise
Successfully installed scikit-surprise-1.1.1 surprise-0.1


Install Necessary Libraries

In [2]:
from surprise import KNNWithMeans, BaselineOnly
from surprise import Dataset, Reader
from surprise.model_selection import cross_validate
from surprise.model_selection import train_test_split
from surprise import accuracy
import pandas as pd
import numpy as np
from tabulate import tabulate

Read the file

In [3]:
df = pd.read_csv('ratings.csv')
df.columns

Index(['userId', 'movieId', 'rating', 'timestamp'], dtype='object')

Create dataset with Reader

In [4]:
reader = Reader(line_format='user item rating timestamp', sep=',', skip_lines=1)
data = Dataset.load_from_file('ratings.csv', reader)
data

<surprise.dataset.DatasetAutoFolds at 0x7f80223726d8>

Split data into train and test sets

In [5]:
trainset, testset = train_test_split(data, test_size=.25)

# Model Pipelines

## User Based Model

### 1. Cosine Similarity

Train a user based model using cosine similarity

In [6]:
sim_options = {
    "name": "cosine",
    "user_based": True
}

In [7]:
algo = KNNWithMeans(sim_options=sim_options)

Fit the model

In [8]:
algo.fit(trainset)

Computing the cosine similarity matrix...
Done computing similarity matrix.


<surprise.prediction_algorithms.knns.KNNWithMeans at 0x7f8022372198>

Make predictions with the model

In [9]:
predictions = algo.test(testset)

Evaluate model with RMSE & MAE

In [10]:
accuracy.rmse(predictions)
accuracy.mae(predictions)

RMSE: 0.9065
MAE:  0.6915


0.6915396583359437

### 2. Pearson Correlation

Train a user based model using pearson correlation

In [11]:
sim_options = {"name": "pearson_baseline",
               "shrinkage": 0,  # no shrinkage,
               "user_based": True, 
               }

In [12]:
algo = KNNWithMeans(sim_options=sim_options)

Fit the model

In [13]:
algo.fit(trainset)

Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.


<surprise.prediction_algorithms.knns.KNNWithMeans at 0x7f802238cb00>

Make predictions with the model

In [14]:
predictions = algo.test(testset)

Evaluate model with RMSE & MAE

In [15]:
accuracy.rmse(predictions)
accuracy.mae(predictions)

RMSE: 0.9175
MAE:  0.6978


0.6977995584707508

## Item Based Model

### 1. Cosine Similarity

Train a item based model using cosine similarity

In [16]:
sim_options = {
    "name": "cosine",
    "user_based": False, 
}

In [17]:
algo = KNNWithMeans(sim_options=sim_options)

Fit the model

In [18]:
algo.fit(trainset)

Computing the cosine similarity matrix...
Done computing similarity matrix.


<surprise.prediction_algorithms.knns.KNNWithMeans at 0x7f8024355860>

Make predictions with the model

In [19]:
predictions = algo.test(testset)

Evaluate model with RMSE & MAE

In [20]:
accuracy.rmse(predictions)
accuracy.mae(predictions)

RMSE: 0.9097
MAE:  0.6933


0.693279599224944

### 2. Pearson Correlation

Train a item based model using pearson correlation

In [21]:
sim_options = {"name": "pearson_baseline",
               "shrinkage": 0,  # no shrinkage,
               "user_based": False, 
               }

In [22]:
algo = KNNWithMeans(sim_options=sim_options)

Fit the model

In [23]:
algo.fit(trainset)

Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.


<surprise.prediction_algorithms.knns.KNNWithMeans at 0x7f8041886a20>

Make predictions with model

In [24]:
predictions = algo.test(testset)

Evaluate model with RMSE & MAE

In [25]:
accuracy.rmse(predictions)
accuracy.mae(predictions)

RMSE: 0.9045
MAE:  0.6868


0.6867927291825342

## BaselineOnly Model

Train a item based baselineOnly model

In [26]:
bsl_options = {'method': 'als',
               'n_epochs': 5,
               'reg_u': 12,
               'reg_i': 5
               }

In [27]:
algo = BaselineOnly(bsl_options=bsl_options)

Fit the model

In [28]:
algo.fit(trainset)

Estimating biases using als...


<surprise.prediction_algorithms.baseline_only.BaselineOnly at 0x7f802441cbe0>

Make the predictions with model

In [29]:
predictions = algo.test(testset)

Evaluate model with RMSE & MAE

In [30]:
accuracy.rmse(predictions)
accuracy.mae(predictions)

RMSE: 0.8715
MAE:  0.6683


0.6682637291760654

# Model BenchMarking

Here we will perform benchmarking for algorithm **KNNWithMeans** using cross-validate:


1.   User Based Model    
    * Cosine Similarity
    * Pearson Similarity    


2.   Item Based Model
    * Cosine Similarity
    * Pearson Similarity

3.   BaselineOnly Model


In [31]:
algo_csn_user = KNNWithMeans(sim_options={"name": "cosine", "user_based": True})
algo_prsn_user = KNNWithMeans(sim_options={"name": "pearson_baseline", "shrinkage": 0, "user_based": True})
algo_csn_item = KNNWithMeans(sim_options={"name": "cosine", "user_based": False})
algo_prsn_item = KNNWithMeans(sim_options={"name": "pearson_baseline", "shrinkage": 0, "user_based": False})   
algo_bsln = BaselineOnly(bsl_options={'method': 'als', 'n_epochs': 5, 'reg_u': 12, 'reg_i': 5})

In [32]:
dict_of_models = {'Cosine Similarity User Based Model': algo_csn_user, 
                  'Pearson Similarity User Based Model': algo_prsn_user, 
                  'Cosine Similarity Item Based Model': algo_csn_item, 
                  'Pearson Similarity Item Based Model': algo_prsn_item, 
                  'BaselineOnly Model': algo_bsln}

In [33]:
table=[]
for name in dict_of_models.keys():  
  out = cross_validate(dict_of_models[name], data, ['rmse', 'mae'], cv=2, verbose=True)
  mean_rmse = '{:.3f}'.format(np.mean(out['test_rmse']))
  mean_mae = '{:.3f}'.format(np.mean(out['test_mae']))
  fit_time = '{:.3f}'.format(np.mean(out['fit_time']))  
  new_line = [name, mean_rmse, mean_mae, fit_time]
  print(tabulate([new_line], tablefmt="pipe"))
  table.append(new_line)   

Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Evaluating RMSE, MAE of algorithm KNNWithMeans on 2 split(s).

                  Fold 1  Fold 2  Mean    Std     
RMSE (testset)    0.9223  0.9185  0.9204  0.0019  
MAE (testset)     0.7063  0.7025  0.7044  0.0019  
Fit time          0.31    0.36    0.34    0.03    
Test time         3.73    4.29    4.01    0.28    
|:-----------------------------------|-----:|------:|------:|
| Cosine Similarity User Based Model | 0.92 | 0.704 | 0.335 |
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Evaluating RMSE, MAE of algorithm KNNWithMeans on 2 split(s).

                  Fold 1  Fold 2  Mean    Std     
RMSE (testset)    0.9468  0.9438  0.9453  0.0015  
MAE (te

In [34]:
header = ['Model Name',
          'RMSE',
          'MAE',
          'Fit Time'
          ]
df = pd.DataFrame(table)
df.columns = header
df

Unnamed: 0,Model Name,RMSE,MAE,Fit Time
0,Cosine Similarity User Based Model,0.92,0.704,0.335
1,Pearson Similarity User Based Model,0.945,0.721,0.527
2,Cosine Similarity Item Based Model,0.923,0.706,10.089
3,Pearson Similarity Item Based Model,0.93,0.708,5.816
4,BaselineOnly Model,0.878,0.677,0.124


## Comparative Graph 

### Graph for RMSE

In [35]:
import plotly.express as px
fig = px.bar(df, x='Model Name', y='RMSE', range_y=[0.8,1], title="RMSE")
fig.show()

### Graph for MAE

In [36]:
import plotly.express as px
fig = px.bar(df, x='Model Name', y='MAE', range_y=[0.6, 0.8], title="MAE")
fig.show()

### Graph for Fit Time

In [37]:
import plotly.express as px
fig = px.bar(df, x='Model Name', y='Fit Time', title="Fit Time")
fig.show()