python
lenskit.batch
The functions in :pylenskit.batch
enable you to generate many recommendations or predictions at the same time, useful for evaluations and experiments.
The batch functions can parallelize over users with the optional n_jobs
parameter, or the LK_NUM_PROCS
environment variable.
Note
Scripts calling the batch recommendation or prediction facilites must be protected; that is, they should not directly perform their work when run, but should define functions and call a main
function when run as a script, with a block like this at the end of the file:
- def main():
# do the actual work
- if __name__ == '__main__':
main()
If you are using the batch functions from a Jupyter notbook, you should be fine - the Jupyter programs are appropriately protected.
recommend
predict
This function isn't a batch function per se, as it doesn't perform multiple operations, but it is primarily useful with batch operations. The :pytrain_isolated
function trains an algorithm in a subprocess, so all temporary resources are released by virtue of the training process exiting. It returns a shared memory serialization of the trained model, which can be passed directly to :pyrecommend
or :pypredict
in lieu of an algorithm object, to reduce the total memory consumption.
Example usage:
algo = BiasedMF(50)
algo = Recommender.adapt(algo)
algo = batch.train_isolated(algo, train_ratings)
preds = batch.predict(algo, test_ratings)
train_isolated
The :pyMultiEval
class is useful to build scripts that evaluate multiple algorithms or algorithm variants, simultaneously, across multiple data sets. It can extract parameters from algorithms and include them in the output, useful for hyperparameter search.
For example:
from lenskit.batch import MultiEval
from lenskit.crossfold import partition_users, SampleN
from lenskit.algorithms import basic, als
from lenskit.datasets import MovieLens
from lenskit import topn
import pandas as pd
ml = MovieLens('ml-latest-small')
eval = MultiEval('my-eval', recommend=20)
eval.add_datasets(partition_users(ml.ratings, 5, SampleN(5)), name='ML-Small')
eval.add_algorithms(basic.Popular(), name='Pop')
eval.add_algorithms([als.BiasedMF(f) for f in [20, 30, 40, 50]],
attrs=['features'], name='ALS')
eval.run()
The my-eval/runs.csv
file will then contain the results of running these algorithms on this data set. A more complete example is available in the MultiEval notebook.
MultiEval