# Sample Evaluation in Surprise

## Install Surprise

``` shell
$ pip install scikit-surprise
```

## Using fit() method after train-test split

In [1]:
from surprise import SVD
from surprise import Dataset
from surprise import accuracy
from surprise.model_selection import train_test_split

# Load the movielens-100k dataset (download it if needed),
data = Dataset.load_builtin('ml-100k')

# sample random trainset and testset
# test set is made of 25% of the ratings.
trainset, testset = train_test_split(data, test_size=.25)

# We'll use the famous SVD algorithm.
algo = SVD()

# Train the algorithm on the trainset, and predict ratings for the testset
algo.fit(trainset)
predictions = algo.test(testset)

# Then compute RMSE
accuracy.rmse(predictions)

RMSE: 0.9350


0.9349850075768633

Other available prediction algorithms are:

| Algorithm       | Description                                                                                                        |
|:----------------|:-------------------------------------------------------------------------------------------------------------------|
| `NormalPredictor` | Algorithm predicting a random rating based on the distribution of the training set, which is assumed to be normal. |
| `BaselineOnly`    | Algorithm predicting the baseline estimate for given user and item.                                                |
| `KNNBasic`        | A basic collaborative filtering algorithm.                                                                         |
| `KNNWithMeans`    | A basic collaborative filtering algorithm, taking into account the mean ratings of each user.                      |
| `KNNWithZScore`   | A basic collaborative filtering algorithm, taking into account the z-score normalization of each user.             |
| `KNNBaseline`     | A basic collaborative filtering algorithm taking into account a baseline rating.                                   |
| `SVDpp`           | The SVD++ algorithm, an extension of `SVD` taking into account implicit ratings.                                     |
| `NMF`             | A collaborative filtering algorithm based on Non-negative Matrix Factorization.                                    |
| `SlopeOne`        | A simple yet accurate collaborative filtering algorithm.                                                           |
| `CoClustering`    | A collaborative filtering algorithm based on co-clustering.                                                        |

Other available accuracy metrics are `mse`, `mae`, and `fcp` (Fraction of Concordant Pairs).

### Base Surprise Evaluation

In [2]:
from surprise import (NormalPredictor, BaselineOnly, KNNBasic, KNNWithMeans,
                      KNNWithZScore, KNNBaseline, NMF, SlopeOne, CoClustering)
import pandas as pd
import numpy as np

In [3]:
algos = [NormalPredictor(), BaselineOnly(), KNNBasic(), KNNWithMeans(), 
         KNNWithZScore(), KNNBaseline(), SVD(), NMF(), SlopeOne(),
         CoClustering()]
algo_names = ["NormalPredictor", "BaselineOnly", "KNNBasic", "KNNWithMeans",
              "KNNWithZScore", "KNNBaseline", "SVD", "NMF", "SlopeOne",
              "CoClustering"]

In [4]:
i = 0
rmses = np.zeros(len(algos))
mses = np.zeros(len(algos))
maes = np.zeros(len(algos))

for algo in algos:
    print(algo_names[i])
    algo.fit(trainset)
    predictions = algo.test(testset)
    rmses[i] = accuracy.rmse(predictions, verbose=False)
    mses[i] = accuracy.mse(predictions, verbose=False)
    maes[i] = accuracy.mae(predictions, verbose=False)
    i += 1

NormalPredictor
BaselineOnly
Estimating biases using als...
KNNBasic
Computing the msd similarity matrix...
Done computing similarity matrix.
KNNWithMeans
Computing the msd similarity matrix...
Done computing similarity matrix.
KNNWithZScore
Computing the msd similarity matrix...
Done computing similarity matrix.
KNNBaseline
Estimating biases using als...
Computing the msd similarity matrix...
Done computing similarity matrix.
SVD
NMF
SlopeOne


Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  algo.fit(trainset)


CoClustering


In [5]:
cols = ['Prediction Algorithm', 'RMSE', 'MSE', 'MAE']
df = pd.DataFrame(columns=cols)

In [6]:
df['Prediction Algorithm'] = algo_names
df['RMSE'] = rmses
df['MSE'] = mses
df['MAE'] = maes
df

Unnamed: 0,Prediction Algorithm,RMSE,MSE,MAE
0,NormalPredictor,1.519409,2.308603,1.220866
1,BaselineOnly,0.942609,0.888511,0.745891
2,KNNBasic,0.978922,0.958288,0.770645
3,KNNWithMeans,0.948836,0.90029,0.745781
4,KNNWithZScore,0.948678,0.899989,0.742454
5,KNNBaseline,0.92974,0.864417,0.730747
6,SVD,0.936611,0.877241,0.73793
7,NMF,0.9605,0.92256,0.752521
8,SlopeOne,0.943907,0.89096,0.740921
9,CoClustering,0.962844,0.927068,0.75288


## Using predict() method after training on a whole trainset

In [7]:
from surprise import KNNBasic

# Retrieve the trainset.
trainset = data.build_full_trainset()

# Build an algorithm, and train it.
algo = KNNBasic()
algo.fit(trainset)

Computing the msd similarity matrix...
Done computing similarity matrix.


<surprise.prediction_algorithms.knns.KNNBasic at 0x7fb7c3ae4d30>

To predict rating for user 196 and item 302 (true rating $r_{ui} = 4$): 

In [8]:
uid = str(196)  # raw user id (as in the ratings file). They are **strings**!
iid = str(302)  # raw item id (as in the ratings file). They are **strings**!

# get a prediction for specific users and items.
pred = algo.predict(uid, iid, r_ui=4, verbose=True)

user: 196        item: 302        r_ui = 4.00   est = 4.06   {'actual_k': 40, 'was_impossible': False}
