# UBCF & IBCF - MultiEval Example

This notebook uses the example from the LKPY package as a template. Modifications are made as necessary, but this is largely a copy of the example given with the LKPY package. 

## Setup

We first need to import our libraries.

In [None]:
import sys
sys.path.insert(0,'C:\\Users\\Jacob\\Documents\\GitHub\\lenskit_confidence')

In [None]:
from lenskit.batch import MultiEval
from lenskit.crossfold import partition_users, SampleN, partition_netflix
from lenskit.algorithms import basic, als, item_knn, user_knn
from lenskit.datasets import MovieLens, Netflix
from lenskit import topn, util #, metrics
from lenskit.metrics import predict
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

Progress bars are useful:

In [None]:
from tqdm.notebook import tqdm_notebook as tqdm
tqdm.pandas()

It takes a little while to run things, and can get kinda quiet in here. Let's set up logging so we can see the logging output in the notebook's message stream:

In [None]:
util.log_to_notebook()

Then set up the data access.

In [None]:
#mlsmall = MovieLens('../data/ml-latest-small')
#mlsmall = MovieLens('../data/ml-1m')
#mlsmall = MovieLens('../data/ml-10m')
#mlsmall = MovieLens('../data/ml-20m')
data = Netflix('../data/netflix')

## Experiment

We're going to run our evaluation and store its output in the `my-eval` directory, generating 20-item recommendation lists::

In [None]:
eval = MultiEval('my-eval', predict = True, recommend = 100, eval_n_jobs = 4)

We're going to use a 5-fold cross-validation setup.  We save the data into a list in memory so we have access to the test data later.  In a larger experiment, you might write the partitions to disk and pass the file names to `add_datasets`.

In [None]:
pairs = list(partition_netflix(data))
#pairs = list(partition_users(mlsmall.ratings, 5, SampleN(5)))


In [None]:
eval.add_datasets(pairs, name = 'Netflix')

In [None]:
nhbr_range = [10, 25, 50, 75] #, 50] #, 75] #, 200] #, 50, 75, 100]

UBCF

In [None]:
eval.add_algorithms([user_knn.UserUser(nnbrs = f, aggregate = 'weighted-average') for f in nhbr_range], 
                    attrs = ['nnbrs'], name = 'UserKNN-Weighted')

eval.add_algorithms([user_knn.UserUser(nnbrs = f, aggregate = 'average') for f in nhbr_range], 
                    attrs = ['nnbrs'], name = 'UserKNN-Average')

IBCF

In [None]:
eval.add_algorithms([item_knn.ItemItem(nnbrs = f, aggregate = 'weighted-average') for f in nhbr_range], 
                    attrs = ['nnbrs'], name = 'ItemKNN-Weighted')

eval.add_algorithms([item_knn.ItemItem(nnbrs = f, aggregate = 'average') for f in nhbr_range], 
                    attrs = ['nnbrs'], name = 'ItemKNN-Average')

And finally, we will run the experiment!

In [None]:
eval.run(progress = tqdm)

## Analysis

Now that the experiment is run, we can read its outputs.

First the run metadata:

In [None]:
runs = pd.read_csv('my-eval/runs.csv')
runs.set_index('RunId', inplace = True)
runs.head()

This describes each run - a data set, partition, and algorithm combination.  To evaluate, we need to get the actual recommendations, and combine them with this:

In [None]:
recs = pd.read_parquet('my-eval/recommendations.parquet')
#del recs['RunId']
recs.head()

In [None]:
recs

Getting the predictions...

In [None]:
preds = pd.read_parquet('my-eval/predictions.parquet')
preds

We're going to compute per-(run,user) evaluations of the recommendations *before* combining with metadata. 

In order to evaluate the recommendation list, we need to build a combined set of truth data. Since this is a disjoint partition of users over a single data set, we can just concatenate the individual test frames:

In [None]:
truth = pd.concat((p.test for p in pairs), ignore_index = True)
truth

Now we can set up an analysis and compute the results.

In [None]:
rla = topn.RecListAnalysis()
rla.add_metric(topn.ndcg) # precision, recall, recip_rank, dcg, ndcg
rla.add_metric(topn.precision)
#rla.add_metric(predict.rmse)
raw_ndcg = rla.compute(recs, truth)
raw_ndcg.head()

Next, we need to combine this with our run data, so that we know what algorithms and configurations we are evaluating:

In [None]:
### FOR NEIGHBORHOOD-BASED METHODS ONLY ###
ndcg = raw_ndcg.join(runs[['name', 'nnbrs']], on = 'RunId')
ndcg.head()

We can compute the overall average performance for each algorithm configuration - fillna makes the group-by happy with Popular's lack of a feature count:

In [None]:
### FOR NEIGHBORHOOD-BASED METHODS ONLY ###
ndcg.fillna(0).groupby(['name', 'nnbrs'])['ndcg','precision'].mean()

Now, we can plot this:

In [None]:
### FOR NEIGHBORHOOD-BASED METHODS ONLY ###
scores = ndcg.groupby(['name', 'nnbrs'])['ndcg'].mean().reset_index()
#pop_score = ndcg[ndcg['AlgoClass'] == 'Popular']['ndcg'].mean()
#plt.axhline(pop_score, color='grey', linestyle='--', label='Popular')
for algo, data in scores.groupby('name'):
    plt.plot(data['nnbrs'], data['ndcg'], label=algo)
    
#plt.yticks(np.arange(0.002, 0.011, 0.001))
plt.legend()
plt.xlabel('nnbrs')
plt.ylabel('nDCG')

In [None]:
### FOR NEIGHBORHOOD-BASED METHODS ONLY ###
scores = ndcg.groupby(['name', 'nnbrs'])['precision'].mean().reset_index()
#pop_score = ndcg[ndcg['AlgoClass'] == 'Popular']['ndcg'].mean()
#plt.axhline(pop_score, color='grey', linestyle='--', label='Popular')
for algo, data in scores.groupby('name'):
    plt.plot(data['nnbrs'], data['precision'], label=algo)
    
#plt.yticks(np.arange(0.0015, 0.006, 0.0005))
plt.legend()
plt.xlabel('nnbrs')
plt.ylabel('Precision')

In [None]:
#truth # user, item, rating, timestamp - 3355
#preds # RunId, user, item rating, prediction

### FOR NEIGHBORHOOD-BASED METHODS ONLY ###
pred_acc = preds.join(runs[['name', 'nnbrs']], on = 'RunId')
pred_acc.head()


#from lenskit.metrics.predict import rmse
#rmse(preds['prediction'], preds['rating'])

In [None]:
#pred_acc.loc[pred_acc['prediction'] > 5,'prediction'] = 5
#pred_acc.loc[pred_acc['prediction'] < 1,'prediction'] = 1

pred_acc['se'] = (pred_acc['rating'] - pred_acc['prediction'])**2

In [None]:
np.sqrt(pred_acc.groupby(['name', 'nnbrs'])['se'].mean())

In [None]:
### FOR NEIGHBORHOOD-BASED METHODS ONLY ###
knn_pred_scores = np.sqrt(pred_acc.groupby(['name', 'nnbrs'])['se'].mean()).reset_index()
knn_pred_scores.head()
#pop_score = ndcg[ndcg['AlgoClass'] == 'Popular']['ndcg'].mean()
#plt.axhline(pop_score, color='grey', linestyle='--', label='Popular')
for algo, data in knn_pred_scores.groupby('name'):
    plt.plot(data['nnbrs'], data['se'], label=algo)
plt.legend()
plt.xlabel('nnbrs')
plt.ylabel('RMSE')