<div style="width: 100%; clear: both;">
<div style="float: left; width: 50%;">
<img src="http://www.uoc.edu/portal/_resources/common/imatges/marca_UOC/UOC_Masterbrand.jpg", align="left">
</div>
<div style="float: right; width: 50%;">
<p style="margin: 0; padding-top: 22px; text-align:right;">22.418 · Aprenentatge automàtic</p>
<p style="margin: 0; text-align:right;">Grau en Ciència de Dades Aplicada</p>
<p style="margin: 0; text-align:right; padding-button: 100px;">Estudis de Informàtica, Multimèdia i Telecomunicació</p>
</div>
</div>
<div style="width:100%;">&nbsp;</div>

# Precision Recall

In this notebook we will explore the precision recall tools as an addicional measure to assess the performance of the algorithms trained

The precision-recall in recommended systems will answer how many of the recommendations done by the model are good recommendations. Specifically, if we get k recommendations, how we compute the precision and recall?


## Imports

In [1]:
!pip install scikit-surprise

Collecting scikit-surprise
[?25l  Downloading https://files.pythonhosted.org/packages/97/37/5d334adaf5ddd65da99fc65f6507e0e4599d092ba048f4302fe8775619e8/scikit-surprise-1.1.1.tar.gz (11.8MB)
[K     |████████████████████████████████| 11.8MB 5.0MB/s 
Building wheels for collected packages: scikit-surprise
  Building wheel for scikit-surprise (setup.py) ... [?25l[?25hdone
  Created wheel for scikit-surprise: filename=scikit_surprise-1.1.1-cp36-cp36m-linux_x86_64.whl size=1618265 sha256=57e0bac72517ccb18f86f4324f5f6f93be69c5e104f90bac8c365ab258ed525a
  Stored in directory: /root/.cache/pip/wheels/78/9c/3d/41b419c9d2aff5b6e2b4c0fc8d25c538202834058f9ed110d0
Successfully built scikit-surprise
Installing collected packages: scikit-surprise
Successfully installed scikit-surprise-1.1.1


In [2]:
from __future__ import (absolute_import, division, print_function,
                        unicode_literals)
from collections import defaultdict

from surprise import Dataset
from surprise import SVD
from surprise.model_selection import train_test_split

# Precision - recall
Let's define the precision-recall function

In [3]:

# This function is based on the code from: https://github.com/NicolasHug/Surprise/blob/master/examples/precision_recall_at_k.py
# It has been modified not to work with kfold iterators

def precision_recall_at_k(predictions, k=10, threshold=3.5):
    """Return precision and recall at k metrics for each user"""

    # First map the predictions to each user.
    user_est_true = defaultdict(list)
    for uid, _, true_r, est, _ in predictions:
        user_est_true[uid].append((est, true_r))

    precisions = dict()
    recalls = dict()
    for uid, user_ratings in user_est_true.items():

        # Sort user ratings by estimated value: the ones with higher value will be the first recommendations
        user_ratings.sort(key=lambda x: x[0], reverse=True)

        # Number of relevant items (we want to compare only the items that have a true ranking over a certain threshold)
        n_rel = sum((true_r >= threshold) for (_, true_r) in user_ratings)

        # Number of recommended items in top k (we want to compare only the predictions of our algorithm that are over a certain threshold)
        n_rec_k = sum((est >= threshold) for (est, _) in user_ratings[:k])

        # Number of relevant and recommended items in top k (how many of the items satisfy the previous two conditions?)
        n_rel_and_rec_k = sum(((true_r >= threshold) and (est >= threshold))
                              for (est, true_r) in user_ratings[:k])

        # Precision@K: Proportion of recommended items that are relevant
        # When n_rec_k is 0, Precision is undefined. We here set it to 0.

        precisions[uid] = n_rel_and_rec_k / n_rec_k if n_rec_k != 0 else 0

        # Recall@K: Proportion of relevant items that are recommended
        # When n_rel is 0, Recall is undefined. We here set it to 0.

        recalls[uid] = n_rel_and_rec_k / n_rel if n_rel != 0 else 0

    return precisions, recalls



# Train the model and predict
Once we have the function defined we can train a model to compute their precision-recall:

In [4]:
data = Dataset.load_builtin('ml-100k')
trainset, testset = train_test_split(data, test_size=.25)
algo = SVD()

Dataset ml-100k could not be found. Do you want to download it? [Y/n] y
Trying to download dataset from http://files.grouplens.org/datasets/movielens/ml-100k.zip...
Done! Dataset ml-100k has been saved to /root/.surprise_data/ml-100k


In [5]:
algo.fit(trainset)


<surprise.prediction_algorithms.matrix_factorization.SVD at 0x7f6cf850b470>

In [6]:
predictions = algo.test(testset)


## Compute Precision - Recall at K recommendations



Once we have the prediction for each element in the testset we are interested in seeing how many of the recommendations given for the system are good. The function precision_recall_at_k computes the precision and recall for each user. It receives the predictions and two extra parameters:
* k -> the number of predictions that we want the system to recommend 
     * we have predictions for all the items
     * but we are only interested in the first k: the ones with higher prediction values
* threshold -> is the minimum ranking that we consider good to make a good recommendation. 
    * It depends on the dataset used. With ml-100k we have rankings from 1 to 5, so we consider 4 as a good recommendation


In [7]:
precisions, recalls = precision_recall_at_k(predictions, k=5, threshold=4)

# Precision and recall can then be averaged over all users
print("precision = " + str(sum(prec for prec in precisions.values()) / len(precisions)))
print("recall = " + str(sum(rec for rec in recalls.values()) / len(recalls)))


precision = 0.6386765746638363
recall = 0.21578671404964997


Notice that if we reduce the threshold we are giving more recommendations as valid, so the precision and recall metrics increase:

In [8]:
precisions, recalls = precision_recall_at_k(predictions, k=5, threshold=3)

# Precision and recall can then be averaged over all users
print("precision = " + str(sum(prec for prec in precisions.values()) / len(precisions)))
print("recall = " + str(sum(rec for rec in recalls.values()) / len(recalls)))


precision = 0.9149681528662399
recall = 0.39037634083359685


If we change the k value to get fewer recommendations, the precision is similar but we get lower recall:

In [9]:
precisions, recalls = precision_recall_at_k(predictions, k=1, threshold=4)

# Precision and recall can then be averaged over all users
print("precision = " + str(sum(prec for prec in precisions.values()) / len(precisions)))
print("recall = " + str(sum(rec for rec in recalls.values()) / len(recalls)))

precision = 0.6645435244161358
recall = 0.0796813491893602


On the other hand, if we increace the k value, the recall increases:

In [10]:
precisions, recalls = precision_recall_at_k(predictions, k=10, threshold=4)

# Precision and recall can then be averaged over all users
print("precision = " + str(sum(prec for prec in precisions.values()) / len(precisions)))
print("recall = " + str(sum(rec for rec in recalls.values()) / len(recalls)))

precision = 0.629616991878138
recall = 0.27343242788926453
