This repository contains full source code for reproducing results from the HybridSVD paper. If you want to run it on your own machine, make sure to prepare conda environment according to this configuation file, which contains the list of all required packages (including their versions).
You can also interactively run experiments directly in your browser with the help of Binder cloud technologies. Simply click on the badge below to get started:
This will launch interactive JupyterLab environment with access to all repository files. By default it starts with the HybridSVD.ipynb
notebook that contains the code for HybridSVD model evaluated on the Movielens and Bookcrossing datasets.
Due to restrictions on Binder's cloud resources only small datasets, e.g., Movielens-1M
or Amazon Video Games
, allow performing full experiments without interruption. Attempts to work with larger files will likely crash the environment. Originally all experiments were conducted on HPC servers with much larger amount of hardware resources. It is, therefore, advised to make the following modifications to run jupyter notebooks safely in the Binder cloud:
Experiments with this dataset are available in the following files:
- Baselines.ipynb
- HybridSVD.ipynb
- FactorizationMachines.ipynb
- LCE.ipynb
- ScaledSVD.ipynb
- ScaledHybridSVD.ipynb
You need to change the data_labels
variable in the Experiment setup
section of each notebook from
data_labels = ['ML1M', 'ML10M', 'BX']
to
data_labels = ['ML1M']
Accordingly, do not run cells under Movielens10M
and BookCrossing
headers (these datasets are not provided in the cloud environment). Also make sure that the first argument to the get_movielens_data
is ../datasets/movielens/ml-1m.zip (originally the notebooks were executed on several machines that's why the path may vary), e.g., it should start as:
data_dict[lbl], meta_dict[lbl] = get_movielens_data('../datasets/movielens/ml-1m.zip',
<other arguments>
Experiments with this dataset are available in the following files:
- Baselines_AMZ.ipynb
- HybridSVD_AMZ.ipynb
- FactorizationMachines_AMZ.ipynb
- LCE_AMZ.ipynb
- ScaledSVD_AMZ.ipynb
- ScaledHybridSVD_AMZ.ipynb
You need to change the data_labels
variable in the Experiment setup
section from
data_labels = ['AMZe', 'AMZvg']
to
data_labels = ['AMZvg']
Accordingly, do not run cells under AMZe
header. Again, make sure to provide correct input arguments to the get_amazon_data
. In this case they are:
data_dict[lbl], meta_dict[lbl] = get_amazon_data('../datasets/amazon/ratings_Video_Games.csv',
meta_path='../datasets/amazon/meta/meta_Video_Games.json.gz',
<other arguments>
Keep in mind that some models require much longer training time than others. For example, the whole experiment for HybridSVD
in both standard and cold start scenarios on the Movielens-1M dataset completes even before the initial tuning of Factorization Machines
is done for standard scenario. As Binder automatically shuts down long running tasks you may not be able to perform all computations before the timeout. To reduce the risk of such shutdown you may want to run different notebooks (different models) in independent Binder sessions. You may also want to reduce the number of points to consider in the random grid search for tuning non SVD-based models. For example, in the FM case you can change the ntrial=60
input to ntrials=30
in the fine_tune_fm(model, params, label, ntrials=60)
function calls. This may, however, slightly decrease the resulting quality of FM.
Alternatively, you can skip parameter tuning sections for long-running models and reuse previously found set of nearly optimal hyper-parameters. They are printed in the end of each section with model tuning. You can also find them in the View optimal parameters notebook.