<div style="width: 100%; clear: both;">
<div style="float: left; width: 50%;">
<img src="http://www.uoc.edu/portal/_resources/common/imatges/marca_UOC/UOC_Masterbrand.jpg", align="left">
</div>
<div style="float: right; width: 50%;">
<p style="margin: 0; padding-top: 22px; text-align:right;">22.418 · Aprenentatge automàtic</p>
<p style="margin: 0; text-align:right;">Grau en Ciència de Dades Aplicada</p>
<p style="margin: 0; text-align:right; padding-button: 100px;">Estudis de Informàtica, Multimèdia i Telecomunicació</p>
</div>
</div>
<div style="width:100%;">&nbsp;</div>

# Matrix Factorization Collaborative filtering

In this notebook, we will the SVD (Singular Value Decomposition) algorithm on the movielens dataset.
The original notebook from the surprise dataset can be found: <br>
https://github.com/NicolasHug/Surprise/blob/master/examples/notebooks/KNNBasic_analysis.ipynb

## Imports

In [None]:
!pip install scikit-surprise

Collecting scikit-surprise
[?25l  Downloading https://files.pythonhosted.org/packages/97/37/5d334adaf5ddd65da99fc65f6507e0e4599d092ba048f4302fe8775619e8/scikit-surprise-1.1.1.tar.gz (11.8MB)
[K     |████████████████████████████████| 11.8MB 1.7MB/s 
Building wheels for collected packages: scikit-surprise


In [None]:
from __future__ import (absolute_import, division, print_function,             
                        unicode_literals)                                      
import pickle
import os

import pandas as pd

from surprise import SVD
from surprise import Dataset                                                     
from surprise import Reader                                                      
from surprise.model_selection import train_test_split
from surprise import dump
from surprise.accuracy import rmse
from collections import defaultdict

from surprise import accuracy

## Load the dataset


Let's do it as we learnt in the surprise_introduction/2_train_test_split notebook:

In [None]:
data = Dataset.load_builtin('ml-100k')
trainset, testset = train_test_split(data, test_size=.25)

## Train


The SVD algorithm has many parameters: <br>
https://surprise.readthedocs.io/en/stable/matrix_factorization.html#surprise.prediction_algorithms.matrix_factorization.SVD

We will work with the default configuration:

In [None]:
algo = SVD()

Lets train the algorithm:

In [None]:
algo.fit(trainset)                     


## Save model
Let's save the model as we did in the 6_save_load_models.ipynb in the surprise_introduction:


In [None]:
file_name = os.path.expanduser('~/dump_file')
dump.dump(file_name, algo=algo)

## Load the model

In [None]:
_, loaded_algo = dump.load(file_name)

# Predictions

In [None]:
predictions = loaded_algo.test(testset)


Let's define the function that will receive the list of predictions for the user and return the highest ranked ones <br>
we defined it in the 4_get_top_n_recommendations.ipynb from surprise_introduction


In [None]:
def get_top_n(predictions, n=10):
    """Return the top-N recommendation for each user from a set of predictions.

    Args:
        predictions(list of Prediction objects): The list of predictions, as
            returned by the test method of an algorithm.
        n(int): The number of recommendation to output for each user. Default
            is 10.

    Returns:
    A dict where keys are user (raw) ids and values are lists of tuples:
        [(raw item id, rating estimation), ...] of size n.
    """

    # First map the predictions to each user.
    top_n = defaultdict(list)
    for uid, iid, true_r, est, _ in predictions:
        top_n[uid].append((iid, est))

    # Then sort the predictions for each user and retrieve the k highest ones.
    for uid, user_ratings in top_n.items():
        user_ratings.sort(key=lambda x: x[1], reverse=True)
        top_n[uid] = user_ratings[:n]

    return top_n


Get the top n predictions: With the get_top_n we get the top n predictions for all the users at once <br>
Once we have all the predictions computed with algo.test, we can use the function we defined to get the n best predictions for each user:


In [None]:
top_n = get_top_n(predictions, n=10)

We can get the predictions for the user 196 with the following instruction:

In [None]:
top_n["196"]

## Accuracy measures

In [None]:
print("accuracy measures:")
accuracy.rmse(predictions)
accuracy.mse(predictions)
accuracy.mae(predictions)
accuracy.fcp(predictions)



The accuracy measures obtained are lower than the ones obtained with the user and item based approaches

In the next notebook we will compare the 3 approaches