Evaluating Recommender Output

LensKit's evaluation support is based on post-processing the output of recommenders and predictors. The batch utilities provide support for generating these outputs.

We generally recommend using Jupyter notebooks for evaluation.

predict-metrics topn-metrics

Loading Outputs

We typically store the output of recommendation runs in LensKit experiments in CSV or Parquet files. The :pylenskit.batch.MultiEval class arranges to run a set of algorithms over a set of data sets, and store the results in a collection of Parquet files in a specified output directory.

There are several files:

runs.parquet: The _runs, algorithm-dataset combinations. This file contains the names & any associated properties of each algorithm and data set run, such as a feature count.
recommendations.parquet: The recommendations, with columns RunId, user, rank, item, and rating.
predictions.parquet: The rating predictions, if the test data includes ratings.

For example, if you want to examine nDCG by neighborhood count for a set of runs on a single data set, you can do:

import pandas as pd
from lenskit.metrics import topn as lm

runs = pd.read_parquet('eval-dir/runs.parquet')
recs = pd.read_parquet('eval-dir/recs.parquet')
meta = runs.loc[:, ['RunId', 'max_neighbors']]

# compute each user's nDCG    
user_ndcg = recs.groupby(['RunId', 'user']).rating.apply(lm.ndcg)
user_ndcg = user_ndcg.reset_index(name='nDCG')
# combine with metadata for feature count
user_ndcg = pd.merge(user_ndcg, meta)
# group and aggregate
nbr_ndcg = user_ndcg.groupby('max_neighbors').nDCG.mean()
nbr_ndcg.plot()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

index.rst

index.rst

Evaluating Recommender Output

Loading Outputs

Files

index.rst

Latest commit

History

index.rst

File metadata and controls

Evaluating Recommender Output

Loading Outputs