Skip to content

Latest commit

 

History

History
90 lines (64 loc) · 3.02 KB

batch.rst

File metadata and controls

90 lines (64 loc) · 3.02 KB

Batch-Running Recommenders

python

lenskit.batch

The functions in :pylenskit.batch enable you to generate many recommendations or predictions at the same time, useful for evaluations and experiments.

The batch functions can parallelize over users with the optional n_jobs parameter, or the LK_NUM_PROCS environment variable.

Note

Scripts calling the batch recommendation or prediction facilites must be protected; that is, they should not directly perform their work when run, but should define functions and call a main function when run as a script, with a block like this at the end of the file:

def main():

# do the actual work

if __name__ == '__main__':

main()

If you are using the batch functions from a Jupyter notbook, you should be fine - the Jupyter programs are appropriately protected.

Recommendation

recommend

Rating Prediction

predict

Isolated Training

This function isn't a batch function per se, as it doesn't perform multiple operations, but it is primarily useful with batch operations. The :pytrain_isolated function trains an algorithm in a subprocess, so all temporary resources are released by virtue of the training process exiting. It returns a shared memory serialization of the trained model, which can be passed directly to :pyrecommend or :pypredict in lieu of an algorithm object, to reduce the total memory consumption.

Example usage:

algo = BiasedMF(50)
algo = Recommender.adapt(algo)
algo = batch.train_isolated(algo, train_ratings)
preds = batch.predict(algo, test_ratings)

train_isolated

Scripting Evaluation

The :pyMultiEval class is useful to build scripts that evaluate multiple algorithms or algorithm variants, simultaneously, across multiple data sets. It can extract parameters from algorithms and include them in the output, useful for hyperparameter search.

For example:

from lenskit.batch import MultiEval
from lenskit.crossfold import partition_users, SampleN
from lenskit.algorithms import basic, als
from lenskit.datasets import MovieLens
from lenskit import topn
import pandas as pd

ml = MovieLens('ml-latest-small')

eval = MultiEval('my-eval', recommend=20)
eval.add_datasets(partition_users(ml.ratings, 5, SampleN(5)), name='ML-Small')
eval.add_algorithms(basic.Popular(), name='Pop')
eval.add_algorithms([als.BiasedMF(f) for f in [20, 30, 40, 50]],
                    attrs=['features'], name='ALS')
eval.run()

The my-eval/runs.csv file will then contain the results of running these algorithms on this data set. A more complete example is available in the MultiEval notebook.

MultiEval