<i>Copyright (c) Recommenders contributors.</i>

<i>Licensed under the MIT License.</i>

# Benchmark with Movielens dataset

This illustrative comparison applies to collaborative filtering algorithms available in this repository such as Spark ALS, Surprise SVD, SAR and others using the Movielens dataset. These algorithms are usable in a variety of recommendation tasks, including product or news recommendations.

The main purpose of this notebook is not to produce comprehensive benchmarking results on multiple datasets. Rather, it is intended to illustrate on how one could evaluate different recommender algorithms using tools in this repository.

## Experimentation setup:

* Objective
  * To compare how each collaborative filtering algorithm perform in predicting ratings and recommending relevant items.

* Environment
  * The comparison is run on a [Azure Data Science Virtual Machine](https://azure.microsoft.com/en-us/services/virtual-machines/data-science-virtual-machines/). 
  * The virtual machine size is a [Standard_NC6s_v2](https://learn.microsoft.com/es-es/azure/virtual-machines/ncv2-series) with 6 CPUs, 112Gb of RAM, and 1 GPU NVIDIA Tesla P100 with 16Gb of memory.
  * It should be noted that the single node DSVM is not supposed to run scalable benchmarking analysis. Either scaling up or out the computing instances is necessary to run the benchmarking in an run-time efficient way without any memory issue.
  * **NOTE ABOUT THE DEPENDENCIES TO INSTALL**: This notebook uses CPU, GPU and PySpark algorithms, so make sure you install the `full environment` as detailed in the [SETUP.md](../../SETUP.md). 
  
* Datasets
  * [Movielens 100K](https://grouplens.org/datasets/movielens/100k/).
  * [Movielens 1M](https://grouplens.org/datasets/movielens/1m/).

* Data split
  * The data is split into train and test sets.
  * The split ratios are 75-25 for train and test datasets.
  * The splitting is stratified based on items. 

* Model training
  * A recommendation model is trained by using each of the collaborative filtering algorithms. 
  * Empirical parameter values reported [here](http://mymedialite.net/examples/datasets.html) are used in this notebook.  More exhaustive hyper parameter tuning would be required to further optimize results.

* Evaluation metrics
  * Ranking metrics:
    * Precision@k.
    * Recall@k.
    * Normalized discounted cumulative gain@k (NDCG@k).
    * Mean-average-precision (MAP). 
    * In the evaluation metrics above, k = 10. 
  * Rating metrics:
    * Root mean squared error (RMSE).
    * Mean average error (MAE).
    * R squared.
    * Explained variance.
  * Run time performance
    * Elapsed for training a model and using a model for predicting/recommending k items. 
    * The time may vary across different machines. 

## Globals settings

In [2]:
import warnings
warnings.filterwarnings("ignore")
import logging
logging.basicConfig(level=logging.ERROR) 

In [None]:
import os
import sys
import numpy as np
import pandas as pd
import surprise
import cornac

try:
    import pyspark
except ImportError:
    pass  # skip this import if we are not in a Spark environment

try:
    import tensorflow as tf # NOTE: TF needs to be imported before PyTorch, otherwise we get an error
    tf.get_logger().setLevel('ERROR') # only show error messages
    import torch
    import fastai
except ImportError:
    pass  # skip this import if we are not in a GPU environment

current_path = os.path.join(os.getcwd(), "examples", "06_benchmarks") # To execute the notebook programmatically from root folder
sys.path.append(current_path)
from benchmark_utils import * 

from recommenders.datasets import movielens
from recommenders.utils.general_utils import get_number_processors
from recommenders.datasets.python_splitters import python_stratified_split
try:
    from recommenders.utils.spark_utils import start_or_get_spark
except ImportError:
    pass  # skip this import if we are not in a Spark environment
try:
    from recommenders.utils.gpu_utils import get_cuda_version, get_cudnn_version
    from recommenders.models.fastai.fastai_utils import hide_fastai_progress_bar
    hide_fastai_progress_bar()
except ImportError:
    pass  # skip this import if we are not in a GPU environment
from recommenders.utils.notebook_utils import store_metadata


print(f"System version: {sys.version}")
print(f"Number of cores: {get_number_processors()}")
print(f"NumPy version: {np.__version__}")
print(f"Pandas version: {pd.__version__}")
print(f"Surprise version: {surprise.__version__}")
print(f"Cornac version: {cornac.__version__}")
try:
    print(f"PySpark version: {pyspark.__version__}")
except NameError:
    pass  # skip this import if we are not in a Spark environment
try:
    print(f"CUDA version: {get_cuda_version()}")
    print(f"CuDNN version: {get_cudnn_version()}")
    print(f"TensorFlow version: {tf.__version__}")
    print(f"PyTorch version: {torch.__version__}")
    print(f"Fast AI version: {fastai.__version__}")
except NameError:
    pass  # skip this import if we are not in a GPU environment

System version: 3.7.13 (default, Mar 29 2022, 02:18:16) 
[GCC 7.5.0]
Number of cores: 6
NumPy version: 1.21.6
Pandas version: 1.3.5
Surprise version: 1.1.1
Cornac version: 1.14.2
PySpark version: 3.2.2
CUDA version: 10.2
CuDNN version: 7605
TensorFlow version: 2.7.4
PyTorch version: 1.12.1+cu102
Fast AI version: 1.0.61
The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [6]:
try:
    spark = start_or_get_spark("PySpark", memory="32g")
    spark.conf.set("spark.sql.analyzer.failAmbiguousSelfJoin", "false")
except NameError:
    pass  # skip this import if we are not in a Spark environment

In [9]:
# fix random seeds to make sure out runs are reproducible
np.random.seed(SEED)
try:
    tf.random.set_seed(SEED)
    torch.manual_seed(SEED)
    torch.cuda.manual_seed_all(SEED)
except NameError:
    pass  # skip this import if we are not in a GPU environment

## Parameters

In [None]:
data_sizes = ["100k", "1m"] # Movielens data size: 100k, 1m, 10m, or 20m
algorithms = ["als", "svd", "sar", "ncf", "fastai", "bpr", "bivae", "lightgcn"]

In [6]:
environments = {
    "als": "pyspark",
    "sar": "python_cpu",
    "svd": "python_cpu",
    "fastai": "python_gpu",
    "ncf": "python_gpu",
    "bpr": "python_cpu",
    "bivae": "python_gpu",
    "lightgcn": "python_gpu",
}

metrics = {
    "als": ["rating", "ranking"],
    "sar": ["ranking"],
    "svd": ["rating", "ranking"],
    "fastai": ["rating", "ranking"],
    "ncf": ["ranking"],
    "bpr": ["ranking"],
    "bivae": ["ranking"],
    "lightgcn": ["ranking"]
}

Algorithm parameters

In [7]:
als_params = {
    "rank": 10,
    "maxIter": 20,
    "implicitPrefs": False,
    "alpha": 0.1,
    "regParam": 0.05,
    "coldStartStrategy": "drop",
    "nonnegative": False,
    "userCol": DEFAULT_USER_COL,
    "itemCol": DEFAULT_ITEM_COL,
    "ratingCol": DEFAULT_RATING_COL,
}

sar_params = {
    "similarity_type": "jaccard",
    "time_decay_coefficient": 30,
    "time_now": None,
    "timedecay_formula": True,
    "col_user": DEFAULT_USER_COL,
    "col_item": DEFAULT_ITEM_COL,
    "col_rating": DEFAULT_RATING_COL,
    "col_timestamp": DEFAULT_TIMESTAMP_COL,
}

svd_params = {
    "n_factors": 150,
    "n_epochs": 15,
    "lr_all": 0.005,
    "reg_all": 0.02,
    "random_state": SEED,
    "verbose": False
}

fastai_params = {
    "n_factors": 40, 
    "y_range": [0,5.5], 
    "wd": 1e-1,
    "lr_max": 5e-3,
    "epochs": 15
}

ncf_params = {
    "model_type": "NeuMF",
    "n_factors": 4,
    "layer_sizes": [16, 8, 4],
    "n_epochs": 15,
    "batch_size": 1024,
    "learning_rate": 1e-3,
    "verbose": 10
}

bpr_params = {
    "k": 200,
    "max_iter": 200,
    "learning_rate": 0.01,
    "lambda_reg": 1e-3,
    "seed": SEED,
    "verbose": False
}

bivae_params = {
    "k": 100,
    "encoder_structure": [200],
    "act_fn": "tanh",
    "likelihood": "pois",
    "n_epochs": 500,
    "batch_size": 1024,
    "learning_rate": 0.001,
    "seed": SEED,
    "use_gpu": True,
    "verbose": False
}

lightgcn_param = {
    "model_type": "lightgcn",
    "n_layers": 3,
    "batch_size": 1024,
    "embed_size": 64,
    "decay": 0.0001,
    "epochs": 20,
    "learning_rate": 0.005,
    "eval_epoch": 5,
    "top_k": DEFAULT_K,
    "metrics": ["recall", "ndcg", "precision", "map"],
    "save_model":False,
    "MODEL_DIR":".",
}

params = {
    "als": als_params,
    "sar": sar_params,
    "svd": svd_params,
    "fastai": fastai_params,
    "ncf": ncf_params,
    "bpr": bpr_params,
    "bivae": bivae_params,
    "lightgcn": lightgcn_param,
}

In [8]:
prepare_training_data = {
    "als": prepare_training_als,
    "sar": prepare_training_sar,
    "svd": prepare_training_svd,
    "fastai": prepare_training_fastai,
    "ncf": prepare_training_ncf,
    "bpr": prepare_training_cornac,
    "bivae": prepare_training_cornac,
    "lightgcn": prepare_training_lightgcn,
}

In [9]:
prepare_metrics_data = {
    "als": lambda train, test: prepare_metrics_als(train, test),
    "fastai": lambda train, test: prepare_metrics_fastai(train, test),    
}

In [10]:
trainer = {
    "als": lambda params, data: train_als(params, data),
    "svd": lambda params, data: train_svd(params, data),
    "sar": lambda params, data: train_sar(params, data), 
    "fastai": lambda params, data: train_fastai(params, data),
    "ncf": lambda params, data: train_ncf(params, data),
    "bpr": lambda params, data: train_bpr(params, data),
    "bivae": lambda params, data: train_bivae(params, data),
    "lightgcn": lambda params, data: train_lightgcn(params, data),
}

In [11]:
rating_predictor = {
    "als": lambda model, test: predict_als(model, test),
    "svd": lambda model, test: predict_svd(model, test),
    "fastai": lambda model, test: predict_fastai(model, test),
}

In [12]:
ranking_predictor = {
    "als": lambda model, test, train: recommend_k_als(model, test, train),
    "sar": lambda model, test, train: recommend_k_sar(model, test, train),
    "svd": lambda model, test, train: recommend_k_svd(model, test, train),
    "fastai": lambda model, test, train: recommend_k_fastai(model, test, train),
    "ncf": lambda model, test, train: recommend_k_ncf(model, test, train),
    "bpr": lambda model, test, train: recommend_k_cornac(model, test, train),
    "bivae": lambda model, test, train: recommend_k_cornac(model, test, train),
    "lightgcn": lambda model, test, train: recommend_k_lightgcn(model, test, train),
}

In [13]:
rating_evaluator = {
    "als": lambda test, predictions: rating_metrics_pyspark(test, predictions),
    "svd": lambda test, predictions: rating_metrics_python(test, predictions),
    "fastai": lambda test, predictions: rating_metrics_python(test, predictions)
}
    
    
ranking_evaluator = {
    "als": lambda test, predictions, k: ranking_metrics_pyspark(test, predictions, k),
    "sar": lambda test, predictions, k: ranking_metrics_python(test, predictions, k),
    "svd": lambda test, predictions, k: ranking_metrics_python(test, predictions, k),
    "fastai": lambda test, predictions, k: ranking_metrics_python(test, predictions, k),
    "ncf": lambda test, predictions, k: ranking_metrics_python(test, predictions, k),
    "bpr": lambda test, predictions, k: ranking_metrics_python(test, predictions, k),
    "bivae": lambda test, predictions, k: ranking_metrics_python(test, predictions, k),
    "lightgcn": lambda test, predictions, k: ranking_metrics_python(test, predictions, k),
}

In [14]:
def generate_summary(data, algo, k, train_time, time_rating, rating_metrics, time_ranking, ranking_metrics):
    summary = {"Data": data, "Algo": algo, "K": k, "Train time (s)": train_time, "Predicting time (s)": time_rating, "Recommending time (s)": time_ranking}
    if rating_metrics is None:
        rating_metrics = {
            "RMSE": np.nan,
            "MAE": np.nan,
            "R2": np.nan,
            "Explained Variance": np.nan,
        }
    if ranking_metrics is None:
        ranking_metrics = {
            "MAP": np.nan,
            "nDCG@k": np.nan,
            "Precision@k": np.nan,
            "Recall@k": np.nan,
        }
    summary.update(rating_metrics)
    summary.update(ranking_metrics)
    return summary

## Benchmark loop

In [16]:
%%time

# For each data size and each algorithm, a recommender is evaluated. 
cols = ["Data", "Algo", "K", "Train time (s)", "Predicting time (s)", "RMSE", "MAE", "R2", "Explained Variance", "Recommending time (s)", "MAP", "nDCG@k", "Precision@k", "Recall@k"]
df_results = pd.DataFrame(columns=cols)

for data_size in data_sizes:
    # Load the dataset
    df = movielens.load_pandas_df(
        size=data_size,
        header=[DEFAULT_USER_COL, DEFAULT_ITEM_COL, DEFAULT_RATING_COL, DEFAULT_TIMESTAMP_COL]
    )
    print("Size of Movielens {}: {}".format(data_size, df.shape))
    
    # Split the dataset
    df_train, df_test = python_stratified_split(df,
                                                ratio=0.75, 
                                                min_rating=1, 
                                                filter_by="item", 
                                                col_user=DEFAULT_USER_COL, 
                                                col_item=DEFAULT_ITEM_COL
                                                )
   
    # Loop through the algos
    for algo in algorithms:
        print(f"\nComputing {algo} algorithm on Movielens {data_size}")
          
        # Data prep for training set
        train = prepare_training_data.get(algo, lambda x,y:(x,y))(df_train, df_test)
        
        # Get model parameters
        model_params = params[algo]
          
        # Train the model
        model, time_train = trainer[algo](model_params, train)
        print(f"Training time: {time_train}s")
                
        # Predict and evaluate
        train, test = prepare_metrics_data.get(algo, lambda x,y:(x,y))(df_train, df_test)
        
        if "rating" in metrics[algo]:   
            # Predict for rating
            preds, time_rating = rating_predictor[algo](model, test)
            print(f"Rating prediction time: {time_rating}s")
            
            # Evaluate for rating
            ratings = rating_evaluator[algo](test, preds)
        else:
            ratings = None
            time_rating = np.nan
        
        if "ranking" in metrics[algo]:
            # Predict for ranking
            top_k_scores, time_ranking = ranking_predictor[algo](model, test, train)
            print(f"Ranking prediction time: {time_ranking}s")
            
            # Evaluate for rating
            rankings = ranking_evaluator[algo](test, top_k_scores, DEFAULT_K)
        else:
            rankings = None
            time_ranking = np.nan
            
        # Record results
        summary = generate_summary(data_size, algo, DEFAULT_K, time_train, time_rating, ratings, time_ranking, rankings)
        df_results.loc[df_results.shape[0] + 1] = summary
        
print("\nComputation finished")


100%|██████████| 4.81k/4.81k [00:00<00:00, 12.5kKB/s]


Size of Movielens 100k: (100000, 4)

Computing als algorithm on Movielens 100k


                                                                                

Training time: 6.8526s
Rating prediction time: 0.0587s


22/10/19 08:58:41 WARN Column: Constructing trivially true equals predicate, 'userID#225 = userID#225'. Perhaps you need to use aliases.


Ranking prediction time: 0.0782s


                                                                                


Computing svd algorithm on Movielens 100k
Training time: 4.0902s
Rating prediction time: 0.2698s
Ranking prediction time: 13.8704s

Computing sar algorithm on Movielens 100k
Training time: 0.3344s
Ranking prediction time: 0.0836s

Computing ncf algorithm on Movielens 100k
Training time: 66.8339s
Ranking prediction time: 3.5393s

Computing fastai algorithm on Movielens 100k
Training time: 69.7469s
Rating prediction time: 0.0338s
Ranking prediction time: 2.7415s

Computing bpr algorithm on Movielens 100k
Training time: 5.8205s
Ranking prediction time: 1.9365s

Computing bivae algorithm on Movielens 100k
Training time: 11.4762s
Ranking prediction time: 1.4382s

Computing lightgcn algorithm on Movielens 100k
Already create adjacency matrix.
Already normalize adjacency matrix.
Using xavier initialization.
Epoch 1 (train)0.9s: train loss = 0.47340 = (mf)0.47316 + (embed)0.00024
Epoch 2 (train)0.8s: train loss = 0.28803 = (mf)0.28739 + (embed)0.00064
Epoch 3 (train)0.8s: train loss = 0.25425

100%|██████████| 5.78k/5.78k [00:00<00:00, 15.4kKB/s]


Size of Movielens 1m: (1000209, 4)

Computing als algorithm on Movielens 1m


22/10/19 09:03:02 WARN TaskSetManager: Stage 588 contains a task of very large size (2759 KiB). The maximum recommended task size is 1000 KiB.
22/10/19 09:03:02 WARN TaskSetManager: Stage 589 contains a task of very large size (2759 KiB). The maximum recommended task size is 1000 KiB.


Training time: 7.0365s
Rating prediction time: 0.0355s


22/10/19 09:03:19 WARN Column: Constructing trivially true equals predicate, 'userID#2403 = userID#2403'. Perhaps you need to use aliases.
22/10/19 09:03:19 WARN TaskSetManager: Stage 1045 contains a task of very large size (2759 KiB). The maximum recommended task size is 1000 KiB.


Ranking prediction time: 0.0491s


22/10/19 09:03:19 WARN TaskSetManager: Stage 1046 contains a task of very large size (2759 KiB). The maximum recommended task size is 1000 KiB.
22/10/19 09:03:20 WARN TaskSetManager: Stage 1092 contains a task of very large size (2759 KiB). The maximum recommended task size is 1000 KiB.
                                                                                


Computing svd algorithm on Movielens 1m
Training time: 41.6351s
Rating prediction time: 2.8386s
Ranking prediction time: 190.6115s

Computing sar algorithm on Movielens 1m
Training time: 3.2292s
Ranking prediction time: 1.9796s

Computing ncf algorithm on Movielens 1m
Training time: 816.1049s
Ranking prediction time: 48.9872s

Computing fastai algorithm on Movielens 1m
Training time: 663.2788s
Rating prediction time: 0.3985s
Ranking prediction time: 47.2290s

Computing bpr algorithm on Movielens 1m
Training time: 66.0371s
Ranking prediction time: 27.4882s

Computing bivae algorithm on Movielens 1m
Training time: 152.0912s
Ranking prediction time: 27.5727s

Computing lightgcn algorithm on Movielens 1m
Already create adjacency matrix.
Already normalize adjacency matrix.
Using xavier initialization.
Epoch 1 (train)23.2s: train loss = 0.34771 = (mf)0.34712 + (embed)0.00059
Epoch 2 (train)22.8s: train loss = 0.27739 = (mf)0.27605 + (embed)0.00134
Epoch 3 (train)22.8s: train loss = 0.22916 

## Results

In [17]:
df_results

Unnamed: 0,Data,Algo,K,Train time (s),Predicting time (s),RMSE,MAE,R2,Explained Variance,Recommending time (s),MAP,nDCG@k,Precision@k,Recall@k
1,100k,als,10,6.8526,0.0587,0.962863,0.747088,0.258405,0.255016,0.0782,0.004697,0.046619,0.049629,0.016688
2,100k,svd,10,4.0902,0.2698,0.938681,0.74269,0.291967,0.291971,13.8704,0.012873,0.09593,0.091198,0.032783
3,100k,sar,10,0.3344,,,,,,0.0836,0.113028,0.388321,0.333828,0.183179
4,100k,ncf,10,66.8339,,,,,,3.5393,0.108609,0.398754,0.349735,0.181576
5,100k,fastai,10,69.7469,0.0338,0.942754,0.745138,0.28581,0.288468,2.7415,0.025896,0.151481,0.131813,0.054491
6,100k,bpr,10,5.8205,,,,,,1.9365,0.132478,0.441997,0.388229,0.212522
7,100k,bivae,10,11.4762,,,,,,1.4382,0.146126,0.475077,0.411771,0.219145
8,100k,lightgcn,10,16.229,,,,,,0.0451,0.121633,0.417629,0.360976,0.196052
9,1m,als,10,7.0365,0.0355,0.858791,0.677568,0.413262,0.408737,0.0491,0.002683,0.030447,0.036707,0.011461
10,1m,svd,10,41.6351,2.8386,0.883017,0.695366,0.37491,0.374911,190.6115,0.008828,0.08932,0.082856,0.021582


In [None]:
# Record results for tests - ignore this cell
for algo in algorithms:
    store_metadata(algo, df_results.loc[df_results["Algo"] == algo, "nDCG@k"].values[0])
