# Metrics

Here are common metrics that have been designed or adapted specifically for recommendation systems.

In [1]:
import numpy as np
import pandas as pd

from IPython.display import HTML
from sklearn.datasets import make_blobs
from sklearn.model_selection import train_test_split

from surprise.prediction_algorithms.slope_one import SlopeOne
from surprise.model_selection import cross_validate
from surprise.dataset import Dataset
from surprise.reader import Reader
from surprise.prediction_algorithms.knns import KNNBasic

**Sources**

- Article on TSD <a href="https://towardsdatascience.com/evaluation-metrics-for-recommendation-systems-an-overview-71290690ecba">Evaluation Metrics for Recommendation Systems — An Overview</a>;

## Example creating

For this page we will need task and also model that we will use as examples.

### Task

The following cell has generated a taskt that we will use as an example.  It is generated in the format `<user/item> <-> rating`. 

All the sources I checked describe how to estimate the performance of the models in the case of binary output, where there are "relevant" and "non-relevant". Despite the fact that in life often occur and I have met with tasks where the pair user/item put in correlation to non-binary values (ratings or even preferences expressed in spent money), for simplicity in the beginning let's consider the classical variant.

In [2]:
r_width = 10
r_height = 30
np.random.seed(10)

R, c = make_blobs(
    n_samples=r_height,
    n_features=r_width,
    centers=3,
    random_state=10,
    cluster_std=1
)
R = np.round((R-R.min())/(R.max()-R.min())).astype(int)

# genrating combinations of object/item to be empty
combination_counts = 20
nan_combinations = np.concatenate(
    [
        np.random.randint(0, R.shape[0], [combination_counts,1]),
        np.random.randint(0, R.shape[1], [combination_counts,1])
    ],
    axis=1
)

R_frame = pd.Series(
    R.ravel(),
    index = pd.MultiIndex.from_tuples(
            [
                (j,i) 
                for j in np.arange(R.shape[1]) 
                for i in np.arange(R.shape[0])
            ],
            names = ["object", "item"]
    ),
    name = "relevant"
).reset_index()

R_frame.sample(10)

Unnamed: 0,object,item,relevant
148,4,28,0
147,4,27,1
154,5,4,1
8,0,8,1
105,3,15,1
24,0,24,0
125,4,5,0
276,9,6,0
130,4,10,1
226,7,16,0


The process of dividing observations into test and train sets is crucial. The methods employed to predict expected values often tend to reproduce the same numbers they have already encountered. Therefore, incorporating the test/train split becomes highly significant as it introduces necessary errors to learn specifics of the metrics under consideration.

In [3]:
R_train, R_test = train_test_split(
    R_frame, stratify=R_frame[["object"]], 
    test_size=0.25,
    random_state = 10
)

### Models

We need some algorithms result to compute metrics for them. So here I use some basic approaches from the `surprise` library to build some solutions. We will then compare metrics between them and learn why this or that approach is better.

In [7]:
reader = Reader(rating_scale=(0,1))
surp_dataset = Dataset.load_from_df(
    R_train[["object", "item", 'relevant']], 
    reader
)
my_data_set = surp_dataset.build_full_trainset()

model = KNNBasic(k=25,verbose=False)
model = model.fit(my_data_set)
R_test["model1_results"] = R_test[["object", "item"]].apply(
    lambda row: model.predict(
        row["object"], row["item"]
    ).est, 
    axis = 1
)

model = KNNBasic(k=5, verbose=False)
model.fit(my_data_set)
R_test["model2_results"] = R_test[["object", "item"]].apply(
    lambda row: model.predict(
        row["object"], row["item"]
    ).est, 
    axis = 1
)

## recall@k

`recall@k` gives a measure of how many of the relevant items are present in top K out of all the relevant items, where $k$ is the number of recommendations generated for a user. Or more formally:

$$recall@k = \frac{\text{number of relevant in first k items}}{\text{total number of relevant}}$$

In [11]:
R_test.groupby("object").apply(
    lambda x: x.sort_values(
        "model1_results", 
        ascending=False
    ),
    include_groups=False
).loc[1]

Unnamed: 0,item,relevant,model1_results,model2_results
58,28,1,0.71035,0.617911
48,18,1,0.657161,0.597356
43,13,1,0.574953,0.77554
34,4,1,0.542741,0.813607
52,22,0,0.333579,0.390165
31,1,1,0.332373,0.440908
42,12,0,0.298503,0.402644
55,25,0,0.208337,0.186393
