<div style="width: 100%; clear: both;">
<div style="float: left; width: 50%;">
<img src="http://www.uoc.edu/portal/_resources/common/imatges/marca_UOC/UOC_Masterbrand.jpg", align="left">
</div>
<div style="float: right; width: 50%;">
<p style="margin: 0; padding-top: 22px; text-align:right;">22.418 · Aprenentatge automàtic</p>
<p style="margin: 0; text-align:right;">Grau en Ciència de Dades Aplicada</p>
<p style="margin: 0; text-align:right; padding-button: 100px;">Estudis de Informàtica, Multimèdia i Telecomunicació</p>
</div>
</div>
<div style="width:100%;">&nbsp;</div>

# Get top n recommendations
This module illustrates how to retrieve the top-10 items with highest rating prediction. We first train an SVD algorithm on the MovieLens dataset, and then predict all the ratings for the pairs (user, item) that are not in the training set. We then retrieve the top-10 prediction for each user.

This notebook is based on the code from https://github.com/NicolasHug/Surprise/blob/master/examples/top_n_recommendations.py


## Imports



In [1]:
!pip install scikit-surprise

Collecting scikit-surprise
[?25l  Downloading https://files.pythonhosted.org/packages/97/37/5d334adaf5ddd65da99fc65f6507e0e4599d092ba048f4302fe8775619e8/scikit-surprise-1.1.1.tar.gz (11.8MB)
[K     |████████████████████████████████| 11.8MB 6.0MB/s 
Building wheels for collected packages: scikit-surprise
  Building wheel for scikit-surprise (setup.py) ... [?25l[?25hdone
  Created wheel for scikit-surprise: filename=scikit_surprise-1.1.1-cp36-cp36m-linux_x86_64.whl size=1618266 sha256=25782b530b1ab5d783fab87e7d71afa65fdb8352511bd0fa26261d0f1a3d82a8
  Stored in directory: /root/.cache/pip/wheels/78/9c/3d/41b419c9d2aff5b6e2b4c0fc8d25c538202834058f9ed110d0
Successfully built scikit-surprise
Installing collected packages: scikit-surprise
Successfully installed scikit-surprise-1.1.1


In [2]:
from __future__ import (absolute_import, division, print_function,
                        unicode_literals)
from collections import defaultdict

from surprise import SVD
from surprise import Dataset

## Define the get_top_n function

We define a function that will receive the list of predictions for the user and return the highest ranked ones

In [3]:
def get_top_n(predictions, n=10):
    """Return the top-N recommendation for each user from a set of predictions.

    Args:
        predictions(list of Prediction objects): The list of predictions, as
            returned by the test method of an algorithm.
        n(int): The number of recommendation to output for each user. Default
            is 10.

    Returns:
    A dict where keys are user (raw) ids and values are lists of tuples:
        [(raw item id, rating estimation), ...] of size n.
    """

    # First map the predictions to each user.
    top_n = defaultdict(list)
    for uid, iid, true_r, est, _ in predictions:
        top_n[uid].append((iid, est))

    # Then sort the predictions for each user and retrieve the k highest ones.
    for uid, user_ratings in top_n.items():
        user_ratings.sort(key=lambda x: x[1], reverse=True)
        top_n[uid] = user_ratings[:n]

    return top_n


## Train and predict 
As usual, we use the trainset to train the model and the testset to get the predictions

In [4]:
# First train an SVD algorithm on the movielens dataset as done in previous notebooks
data = Dataset.load_builtin('ml-100k')
trainset = data.build_full_trainset()
algo = SVD()
algo.fit(trainset)

# Now we want to predict new items for each user, what we will do is first predict ratings for all the user, item pairs that are NOT in the training set.
testset = trainset.build_anti_testset()
predictions = algo.test(testset)


Dataset ml-100k could not be found. Do you want to download it? [Y/n] y
Trying to download dataset from http://files.grouplens.org/datasets/movielens/ml-100k.zip...
Done! Dataset ml-100k has been saved to /root/.surprise_data/ml-100k


## Get the top n predictions


With the get_top_n we get the top n predictions for all the users at once <br>
Once we have all the predictions computed with algo.test, we can use the function we defined to get the n best predictions for each user:

In [5]:
top_n = get_top_n(predictions, n=10)

We can get the predictions for the user 196 with the following instruction:


In [6]:
top_n["196"]

[('318', 4.628768146903208),
 ('272', 4.621451476588522),
 ('169', 4.4992458654565235),
 ('114', 4.492765383836569),
 ('64', 4.473376394404551),
 ('187', 4.466271866864173),
 ('313', 4.438447934132743),
 ('513', 4.435112209749498),
 ('357', 4.430694607973522),
 ('657', 4.429828103427867)]

We get a list of items and their predictions, sorted with decreasing prediction values.

If we want to see all the predictions, we can print the recommended items for each user iterating the top_n (it is a list):

In [7]:
# Print the recommended items for each user
for uid, user_ratings in top_n.items():
    print(uid, [iid for (iid, _) in user_ratings])

196 ['318', '272', '169', '114', '64', '187', '313', '513', '357', '657']
186 ['318', '96', '133', '480', '143', '313', '513', '923', '479', '272']
22 ['22', '100', '64', '169', '69', '98', '12', '496', '318', '483']
244 ['603', '269', '272', '190', '408', '483', '606', '57', '134', '479']
166 ['603', '114', '169', '192', '272', '316', '427', '79', '474', '83']
298 ['313', '64', '272', '169', '659', '170', '528', '515', '89', '251']
115 ['169', '199', '408', '134', '179', '168', '285', '483', '276', '223']
253 ['408', '169', '313', '174', '963', '114', '511', '223', '515', '480']
305 ['519', '657', '137', '603', '647', '641', '316', '1039', '19', '659']
6 ['603', '654', '60', '179', '641', '657', '114', '753', '611', '694']
62 ['175', '408', '169', '661', '603', '150', '513', '223', '197', '1007']
286 ['474', '178', '318', '427', '515', '519', '493', '480', '9', '488']
200 ['181', '64', '114', '316', '275', '190', '408', '963', '12', '427']
210 ['408', '318', '64', '100', '169', '178',