cu2rec: CUDA Meets Recommender Systems
cu2rec is a Matrix Factorization library designed to accelerate training Recommender Systems models using GPUs in CUDA. It implements Parallel Stochastic Gradient Descent for training the matrix factorization model.
The input data should be a CSV file in the form of
userId,itemId,rating and should have an header. If the user ids and the item ids are not sequential, run
python preprocessing/map_items.py <ratings_file> to convert the user ids and item ids into sequential integers, starting with 1.
Once you have a mapped CSV, you can use
python preprocessing/split_to_test_train.py <mapped_file> <test_ratio> to split the data into training and tests sets to use with
Alternatively, you can also use the datasets below:
- Download movielens data here and save in
python preprocessing/map_items.py <ratings_file>to create a user-item mapped ratings file.
python preprocessing/split_to_test_train.py <mapped_file> <test_ratio>to split it into training and test files.
- Download the Netflix dataset here and place in under
python preprocessing/map_netflix.pyto create the mapped training and test files.
- SSH into Prince or
cuda2using NYU credentials
srun -t5:00:00 --mem=30000 --gres=gpu:1 --pty /bin/bash
module load cuda/9.2.88
cd matrix_factorization && make
The makefile compiles for compute capability 5.2. If you have a GPU that does not support that, please change it to compile for your device's compute capability. The code has been tested for compute capability down to 3.5.
bin/mf -c <config_file> <ratings_file_train> <ratings_file_test>
Running all possible configurations
In order to run all of the experiments mentioned in the report, you can
cd experiments and run the included bash scripts.
cu2rec.sh will give you the total runtimes and error metrics for all configurations, while
cu2rec_prof.sh will give you all the
nvprof results. Make sure you have all the data as described in the data section.
Getting recommendations for a user
- Make sure you get the user data into the same ratings format as MovieLens.
bin/predict -c <config_file> -i <trained_item_bias_file> -g <trained_global_bias_file> -q <trained_Q_file> <ratings_file>
- If you want to run all tests,