# UBCF & IBCF - Baseline Model Runs

Much is this structure and organization is borrowed from the Lenskit sample evaluation walkthrough

## Setup

In [1]:
import sys # set path of locally install lenskit_confidence module
sys.path.insert(0,'C:\\Users\\Jacob\\Documents\\GitHub\\lenskit_confidence') # Looks like this on my machine

In [2]:
from lenskit.batch import MultiEval
from lenskit.crossfold import partition_users, SampleN
from lenskit.algorithms import item_knn, user_knn
from lenskit.datasets import MovieLens
from lenskit import topn, util 
from lenskit.metrics import predict
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

Setting up a progress bar...

In [3]:
from tqdm.notebook import tqdm_notebook as tqdm
tqdm.pandas()

  from pandas import Panel


Setup logging to the notebook...

In [4]:
util.log_to_notebook()

[   INFO] lenskit.util.log notebook logging configured


Pick a dataset to run...

In [5]:
data = MovieLens('../data/ml-1m')
#data = MovieLens('../data/ml-10m')
#data = MovieLens('../data/ml-20m')
#data = MovieLens('../data/jester')

## Experiment

Run experiment and store output in the `my-eval` directory. 

We're not producing prediction, generating 10-item recommendation lists, and setting up 4 workers.

In [6]:
eval = MultiEval('my-eval', predict = False, recommend = 20, eval_n_jobs = 4) 

We'll use 5-fold CV, partitioning users and putting 5 ratings per user in the test set.  

In [7]:
pairs = list(partition_users(data.ratings, 5, SampleN(5)))

[   INFO] lenskit.crossfold partitioning 1000209 rows for 6040 users into 5 partitions
[   INFO] lenskit.crossfold fold 0: selecting test ratings
[   INFO] lenskit.crossfold fold 0: partitioning training data
[   INFO] lenskit.crossfold fold 1: selecting test ratings
[   INFO] lenskit.crossfold fold 1: partitioning training data
[   INFO] lenskit.crossfold fold 2: selecting test ratings
[   INFO] lenskit.crossfold fold 2: partitioning training data
[   INFO] lenskit.crossfold fold 3: selecting test ratings
[   INFO] lenskit.crossfold fold 3: partitioning training data
[   INFO] lenskit.crossfold fold 4: selecting test ratings
[   INFO] lenskit.crossfold fold 4: partitioning training data


Add the dataset to MultiEval with `add_datasets`.

In [8]:
eval.add_datasets(pairs, name = 'ML1M') # give the added dataset a name

In [9]:
nhbr_range = [25] # We'll use just K=25 for our sample evaluation 

Add the algorithms to MultiEval with `add_algorithms`; the three CIBCF options are listed

UBCF

In [10]:
eval.add_algorithms([user_knn.UserUser(nnbrs = f, aggregate = 'average') for f in nhbr_range], 
                    attrs = ['nnbrs'], name = 'UserKNN-Average')

IBCF

In [11]:
eval.add_algorithms([item_knn.ItemItem(nnbrs = f, aggregate = 'average') for f in nhbr_range], 
                    attrs = ['nnbrs'], name = 'ItemKNN-Average')

Run the experiment...

In [12]:
eval.run(progress = tqdm)

HBox(children=(FloatProgress(value=0.0, max=10.0), HTML(value='')))

[   INFO] lenskit.batch._multi starting run 1: UserUser(nnbrs=25, min_sim=0) on ML1M:1
[   INFO] lenskit.batch._multi adapting UserUser(nnbrs=25, min_sim=0) into a recommender
[   INFO] lenskit.batch._multi training algorithm UserUser(nnbrs=25, min_sim=0) on 994169 ratings
[   INFO] lenskit.algorithms.basic trained unrated candidate selector for 994169 ratings
[   INFO] lenskit.batch._multi trained algorithm UserUser(nnbrs=25, min_sim=0) in 6.29s
[   INFO] lenskit.batch._multi generating recommendations for 1208 users for TopN/UserUser(nnbrs=25, min_sim=0)
[   INFO] lenskit.sharing.shm serialized TopN/UserUser(nnbrs=25, min_sim=0) to 1112 pickle bytes with 13 buffers of 28104104 bytes
[   INFO] lenskit.util.parallel setting up ProcessPoolExecutor w/ 4 workers
[   INFO] lenskit.batch._recommend recommending with TopN/UserUser(nnbrs=25, min_sim=0) for 1208 users (n_jobs=4)
[   INFO] lenskit.batch._recommend recommended for 1208 users in 25.61s
[   INFO] lenskit.batch._multi generated rec

[   INFO] lenskit.algorithms.item_knn [ 302ms] computed means for 3706 items
[   INFO] lenskit.algorithms.item_knn [ 431ms] normalized rating matrix columns
[   INFO] lenskit.algorithms.item_knn [ 435ms] computing similarity matrix
[   INFO] lenskit.algorithms.item_knn [ 507ms] splitting 3706 items (992339 ratings) into 4 blocks
[   INFO] lenskit.algorithms.item_knn [4.57s] computed 7917458 similarities for 3706 items in 4 blocks
[   INFO] lenskit.algorithms.item_knn [4.67s] sorting similarity matrix with 7917458 entries
[   INFO] lenskit.algorithms.item_knn [4.93s] got neighborhoods for 3560 of 3706 items
[   INFO] lenskit.algorithms.item_knn [4.93s] computed 7917458 neighbor pairs
[   INFO] lenskit.algorithms.item_knn [5.40s] transposed matrix for optimization
[   INFO] lenskit.algorithms.basic trained unrated candidate selector for 994169 ratings
[   INFO] lenskit.batch._multi trained algorithm ItemItem(nnbrs=25, msize=None) in 5.58s
[   INFO] lenskit.batch._multi generating recomme

[   INFO] lenskit.batch._multi finished run 10: ItemItem(nnbrs=25, msize=None) on ML1M:5



## Analyzing Results

We need to read in experiment outputs.

First the run metadata:

In [13]:
runs = pd.read_csv('my-eval/runs.csv')
runs.set_index('RunId', inplace = True)
runs.head() # a quick visual check

Unnamed: 0_level_0,DataSet,Partition,AlgoClass,AlgoStr,name,nnbrs,TrainTime,PredTime,RecTime
RunId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1,ML1M,1,UserUser,"UserUser(nnbrs=25, min_sim=0)",UserKNN-Average,25,6.288795,,26.339011
2,ML1M,1,ItemItem,"ItemItem(nnbrs=25, msize=None)",ItemKNN-Average,25,12.135656,,26.231404
3,ML1M,2,UserUser,"UserUser(nnbrs=25, min_sim=0)",UserKNN-Average,25,0.479742,,28.160372
4,ML1M,2,ItemItem,"ItemItem(nnbrs=25, msize=None)",ItemKNN-Average,25,5.410119,,26.119443
5,ML1M,3,UserUser,"UserUser(nnbrs=25, min_sim=0)",UserKNN-Average,25,0.45043,,26.274025


This describes each run - a data set, partition, and algorithm combination.  To evaluate, we need to get the actual recommendations, and combine them with this:

In [14]:
recs = pd.read_parquet('my-eval/recommendations.parquet')
recs.head()

Unnamed: 0,item,score,user,rank,RunId
0,3382,7.026087,3,1,1
1,572,5.817902,3,2,1
2,3517,5.274229,3,3,1
3,2999,5.148148,3,4,1
4,853,5.129498,3,5,1


Getting the predictions... (this is here for posterity, we're not actually making predictions on test set now)

In [None]:
#preds = pd.read_parquet('my-eval/predictions.parquet')
#preds

We're going to compute per-(run,user) evaluations of the recommendations *before* combining with metadata. 

In order to evaluate the recommendation list, we need to build a combined set of truth data. Since this is a disjoint partition of users over a single data set, we can just concatenate the individual test frames:

In [15]:
truth = pd.concat((p.test for p in pairs), ignore_index = True)
truth

Unnamed: 0.1,Unnamed: 0,user,item,rating,timestamp
0,193,3,1079,5.0,978298296
1,212,3,1196,4.0,978297539
2,210,3,1266,5.0,978297396
3,221,3,1304,5.0,978298166
4,223,3,2470,4.0,978297777
...,...,...,...,...,...
30195,998347,6034,1260,5.0,956712333
30196,998340,6034,2186,4.0,956712333
30197,998348,6034,1267,5.0,956712333
30198,998343,6034,344,2.0,956711771


In [None]:
truth.to_csv('my-eval/truth.csv') # saving truth values to a csv for future evaluation
# truth = pd.read_csv('my-eval/truth.csv')

In [16]:
truth = truth[['user', 'item', 'rating']] # just grabbing what we need

In [17]:
truth.head() # a visual check

Unnamed: 0,user,item,rating
0,3,1079,5.0
1,3,1196,4.0
2,3,1266,5.0
3,3,1304,5.0
4,3,2470,4.0


Now we can set up an analysis and compute the results.

In [18]:
rla = topn.RecListAnalysis()
rla.add_metric(topn.ndcg) # precision, recall, recip_rank, dcg, ndcg
rla.add_metric(topn.precision)
topn_compute = rla.compute(recs, truth)
topn_compute.head()

[   INFO] lenskit.topn analyzing 241600 recommendations (30200 truth rows)
[   INFO] lenskit.topn using rec key columns ['RunId', 'user']
[   INFO] lenskit.topn using truth key columns ['user']
[   INFO] lenskit.topn collecting truth data
[   INFO] lenskit.topn collecting metric results
[   INFO] lenskit.sharing.shm serialized <lenskit.topn._RLAJob object at 0x0000022D2C87B3D0> to 1474968 pickle bytes with 12083 buffers of 8214400 bytes
[   INFO] lenskit.util.parallel setting up ProcessPoolExecutor w/ 2 workers
[   INFO] lenskit.topn measured 12080 lists in 30.39s


Unnamed: 0_level_0,Unnamed: 1_level_0,nrecs,ndcg,precision
RunId,user,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,3,20.0,0.0,0.0
1,13,20.0,0.079206,0.05
1,14,20.0,0.0,0.0
1,17,20.0,0.0,0.0
1,18,20.0,0.0,0.0


Next, we need to combine this with our run data, so that we know what algorithms and configurations we are evaluating:

In [19]:
topn_results = topn_compute.join(runs[['name', 'nnbrs']], on = 'RunId')
topn_results.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,nrecs,ndcg,precision,name,nnbrs
RunId,user,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,3,20.0,0.0,0.0,UserKNN-Average,25
1,13,20.0,0.079206,0.05,UserKNN-Average,25
1,14,20.0,0.0,0.0,UserKNN-Average,25
1,17,20.0,0.0,0.0,UserKNN-Average,25
1,18,20.0,0.0,0.0,UserKNN-Average,25


We can compute the overall average performance for each algorithm configuration

In [20]:
topn_results.fillna(0).groupby(['name', 'nnbrs'])['ndcg','precision'].mean()

  topn_results.fillna(0).groupby(['name', 'nnbrs'])['ndcg','precision'].mean()


Unnamed: 0_level_0,Unnamed: 1_level_0,ndcg,precision
name,nnbrs,Unnamed: 2_level_1,Unnamed: 3_level_1
ItemKNN-Average,25,0.01506,0.007392
UserKNN-Average,25,0.009853,0.004851
