# Surprise

Documentation:
* [Overview](https://surprise.readthedocs.io/en/stable/getting_started.html)
* [Reader](https://surprise.readthedocs.io/en/stable/reader.html)
* [KNNBasic](https://surprise.readthedocs.io/en/stable/knn_inspired.html)
* [GridSearchCV](https://surprise.readthedocs.io/en/stable/model_selection.html)
* [SVD](https://surprise.readthedocs.io/en/stable/matrix_factorization.html)

----------------------------------------------

In [None]:
# Installing the surprise library
!pip install surprise

Collecting surprise
  Downloading surprise-0.1-py2.py3-none-any.whl (1.8 kB)
Collecting scikit-surprise (from surprise)
  Downloading scikit-surprise-1.1.3.tar.gz (771 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m772.0/772.0 kB[0m [31m11.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: scikit-surprise
  Building wheel for scikit-surprise (setup.py) ... [?25l[?25hdone
  Created wheel for scikit-surprise: filename=scikit_surprise-1.1.3-cp310-cp310-linux_x86_64.whl size=3096312 sha256=db91baad98ab5a0e623ff5d9ddf9ef53efaf44c443a9dc8110f315d17b36d5c3
  Stored in directory: /root/.cache/pip/wheels/a5/ca/a8/4e28def53797fdc4363ca4af740db15a9c2f1595ebc51fb445
Successfully built scikit-surprise
Installing collected packages: scikit-surprise, surprise
Successfully installed scikit-surprise-1.1.3 surprise-0.1


## Automatic cross-validation
Surprise has a set of built-in algorithms and datasets for you to play with. In its simplest form, it only takes a few lines of code to run a cross-validation procedure:

In [None]:
from surprise import Dataset, SVD
from surprise.model_selection import cross_validate


# Load the movielens-100k dataset (download it if needed),
data = Dataset.load_builtin("ml-100k")

# We'll use the famous SVD algorithm.
algo = SVD()

# Run 5-fold cross-validation and print results
cross_validate(algo, data, measures=["RMSE", "MAE"], cv=5, verbose=True)

Dataset ml-100k could not be found. Do you want to download it? [Y/n] Y
Trying to download dataset from https://files.grouplens.org/datasets/movielens/ml-100k.zip...
Done! Dataset ml-100k has been saved to /root/.surprise_data/ml-100k
Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.9299  0.9382  0.9367  0.9428  0.9342  0.9364  0.0043  
MAE (testset)     0.7306  0.7402  0.7389  0.7421  0.7374  0.7379  0.0040  
Fit time          1.92    1.41    1.34    1.34    1.31    1.46    0.23    
Test time         0.25    0.23    0.15    0.22    0.15    0.20    0.04    


{'test_rmse': array([0.92994542, 0.9382178 , 0.93671236, 0.94281865, 0.93421103]),
 'test_mae': array([0.73058309, 0.74020641, 0.73889782, 0.74214171, 0.7374359 ]),
 'fit_time': (1.9205353260040283,
  1.411344051361084,
  1.3413453102111816,
  1.3377370834350586,
  1.311593770980835),
 'test_time': (0.24835419654846191,
  0.22771310806274414,
  0.14618396759033203,
  0.2190074920654297,
  0.14524269104003906)}

## Train-test split and the fit() method

If you don’t want to run a full cross-validation procedure, you can use the train_test_split() to sample a trainset and a testset with given sizes, and use the accuracy metric of your chosing. You’ll need to use the fit() method which will train the algorithm on the trainset, and the test() method which will return the predictions made from the testset:

In [None]:
from surprise import accuracy, Dataset, SVD
from surprise.model_selection import train_test_split

# Load the movielens-100k dataset (download it if needed),
data = Dataset.load_builtin("ml-100k")

# sample random trainset and testset
# test set is made of 25% of the ratings.
trainset, testset = train_test_split(data, test_size=0.25)

# We'll use the famous SVD algorithm.
algo = SVD()

# Train the algorithm on the trainset, and predict ratings for the testset
algo.fit(trainset)
predictions = algo.test(testset)

# Then compute RMSE
accuracy.rmse(predictions)

RMSE: 0.9406


0.9406495875390333

Note that you can train and test an algorithm with the following one-line:

In [None]:
predictions = algo.fit(trainset).test(testset)
predictions[:5]

[Prediction(uid='557', iid='739', r_ui=3.0, est=3.4470389200291676, details={'was_impossible': False}),
 Prediction(uid='463', iid='242', r_ui=2.0, est=3.7757362956163436, details={'was_impossible': False}),
 Prediction(uid='927', iid='374', r_ui=4.0, est=3.2157036523257547, details={'was_impossible': False}),
 Prediction(uid='847', iid='426', r_ui=2.0, est=2.3555473500956223, details={'was_impossible': False}),
 Prediction(uid='354', iid='664', r_ui=5.0, est=3.7842489915909137, details={'was_impossible': False})]

## Train on a whole trainset and the predict() method

Obviously, we could also simply fit our algorithm to the whole dataset, rather than running cross-validation. This can be done by using the build_full_trainset() method which will build a trainset object:

In [None]:
from surprise import Dataset, KNNBasic

# Load the movielens-100k dataset
data = Dataset.load_builtin("ml-100k")

# Retrieve the trainset.
trainset = data.build_full_trainset()

# Build an algorithm, and train it.
algo = KNNBasic()
algo.fit(trainset)

Computing the msd similarity matrix...
Done computing similarity matrix.


<surprise.prediction_algorithms.knns.KNNBasic at 0x7ff92d2f8a30>

We can now predict ratings by directly calling the predict() method. Let’s say you’re interested in user 196 and item 302 (make sure they’re in the trainset!), and you know that the true rating $r_{ui} = 4$
:

In [None]:
uid = str(196)  # raw user id (as in the ratings file). They are **strings**!
iid = str(302)  # raw item id (as in the ratings file). They are **strings**!

# get a prediction for specific users and items.
pred = algo.predict(uid, iid, r_ui=4, verbose=True)
pred

user: 196        item: 302        r_ui = 4.00   est = 4.06   {'actual_k': 40, 'was_impossible': False}


Prediction(uid='196', iid='302', r_ui=4, est=4.06292421377939, details={'actual_k': 40, 'was_impossible': False})

------------------------------------------