<i>Copyright (c) Microsoft Corporation. All rights reserved.</i>

<i>Licensed under the MIT License.</i>

# Benchmark with Movielens dataset

This illustrative comparison applies to collaborative filtering algorithms available in this repository such as Spark ALS, Surprise SVD, SAR and others using the Movielens dataset. These algorithms are usable in a variety of recommendation tasks, including product or news recommendations.

The main purpose of this notebook is not to produce comprehensive benchmarking results on multiple datasets. Rather, it is intended to illustrate on how one could evaluate different recommender algorithms using tools in this repository.

## Experimentation setup:

* Objective
  * To compare how each collaborative filtering algorithm perform in predicting ratings and recommending relevant items.

* Environment
  * The comparison is run on a [Azure Data Science Virtual Machine](https://azure.microsoft.com/en-us/services/virtual-machines/data-science-virtual-machines/). 
  * The virtual machine size is a [Standard_NC6s_v2](https://learn.microsoft.com/es-es/azure/virtual-machines/ncv2-series) with 6 CPUs, 112Gb of RAM, and 1 GPU NVIDIA Tesla P100 with 16Gb of memory.
  * It should be noted that the single node DSVM is not supposed to run scalable benchmarking analysis. Either scaling up or out the computing instances is necessary to run the benchmarking in an run-time efficient way without any memory issue.
  * **NOTE ABOUT THE DEPENDENCIES TO INSTALL**: This notebook uses CPU, GPU and PySpark algorithms, so make sure you install the `full environment` as detailed in the [SETUP.md](../../SETUP.md). 
  
* Datasets
  * [Movielens 100K](https://grouplens.org/datasets/movielens/100k/).
  * [Movielens 1M](https://grouplens.org/datasets/movielens/1m/).

* Data split
  * The data is split into train and test sets.
  * The split ratios are 75-25 for train and test datasets.
  * The splitting is stratified based on items. 

* Model training
  * A recommendation model is trained by using each of the collaborative filtering algorithms. 
  * Empirical parameter values reported [here](http://mymedialite.net/examples/datasets.html) are used in this notebook.  More exhaustive hyper parameter tuning would be required to further optimize results.

* Evaluation metrics
  * Ranking metrics:
    * Precision@k.
    * Recall@k.
    * Normalized discounted cumulative gain@k (NDCG@k).
    * Mean-average-precision (MAP). 
    * In the evaluation metrics above, k = 10. 
  * Rating metrics:
    * Root mean squared error (RMSE).
    * Mean average error (MAE).
    * R squared.
    * Explained variance.
  * Run time performance
    * Elapsed for training a model and using a model for predicting/recommending k items. 
    * The time may vary across different machines. 

## Globals settings

In [1]:
import warnings
warnings.filterwarnings("ignore")
import logging
logging.basicConfig(level=logging.ERROR) 

In [2]:
import os
import sys
import json
import pandas as pd
import numpy as np
import seaborn as sns
import pyspark
import torch
import fastai
import tensorflow as tf
tf.get_logger().setLevel('ERROR') # only show error messages
import surprise
import cornac

from recommenders.utils.spark_utils import start_or_get_spark
from recommenders.utils.general_utils import get_number_processors
from recommenders.utils.gpu_utils import get_cuda_version, get_cudnn_version
from recommenders.datasets import movielens
from recommenders.datasets.python_splitters import python_stratified_split
from recommenders.models.fastai.fastai_utils import hide_fastai_progress_bar

from benchmark_utils import * 

print(f"System version: {sys.version}")
print(f"NumPy version: {np.__version__}")
print(f"Pandas version: {pd.__version__}")
print(f"PySpark version: {pyspark.__version__}")
print(f"Surprise version: {surprise.__version__}")
print(f"PyTorch version: {torch.__version__}")
print(f"Fast AI version: {fastai.__version__}")
print(f"Cornac version: {cornac.__version__}")
print(f"TensorFlow version: {tf.__version__}")
print(f"CUDA version: {get_cuda_version()}")
print(f"CuDNN version: {get_cudnn_version()}")
print(f"Number of cores: {get_number_processors()}")

%load_ext autoreload
%autoreload 2

System version: 3.7.13 (default, Mar 29 2022, 02:18:16) 
[GCC 7.5.0]
NumPy version: 1.21.6
Pandas version: 1.3.5
PySpark version: 3.2.2
Surprise version: 1.1.1
PyTorch version: 1.12.1+cu102
Fast AI version: 1.0.61
Cornac version: 1.14.2
TensorFlow version: 2.7.4
CUDA version: 10.2
CuDNN version: 7605
Number of cores: 6


In [3]:
spark = start_or_get_spark("PySpark", memory="32g")
spark.conf.set("spark.sql.analyzer.failAmbiguousSelfJoin", "false")

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/10/04 09:45:39 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable


In [4]:
# Hide fastai progress bar
hide_fastai_progress_bar()

In [5]:
# fix random seeds to make sure out runs are reproducible
np.random.seed(SEED)
torch.manual_seed(SEED)
torch.cuda.manual_seed_all(SEED)

## Parameters

In [6]:
environments = {
    "als": "pyspark",
    "sar": "python_cpu",
    "svd": "python_cpu",
    "fastai": "python_gpu",
    "ncf": "python_gpu",
    "bpr": "python_cpu",
    "bivae": "python_gpu",
    "lightgcn": "python_gpu",
}

metrics = {
    "als": ["rating", "ranking"],
    "sar": ["ranking"],
    "svd": ["rating", "ranking"],
    "fastai": ["rating", "ranking"],
    "ncf": ["ranking"],
    "bpr": ["ranking"],
    "bivae": ["ranking"],
    "lightgcn": ["ranking"]
}

Algorithm parameters

In [7]:
als_params = {
    "rank": 10,
    "maxIter": 20,
    "implicitPrefs": False,
    "alpha": 0.1,
    "regParam": 0.05,
    "coldStartStrategy": "drop",
    "nonnegative": False,
    "userCol": DEFAULT_USER_COL,
    "itemCol": DEFAULT_ITEM_COL,
    "ratingCol": DEFAULT_RATING_COL,
}

sar_params = {
    "similarity_type": "jaccard",
    "time_decay_coefficient": 30,
    "time_now": None,
    "timedecay_formula": True,
    "col_user": DEFAULT_USER_COL,
    "col_item": DEFAULT_ITEM_COL,
    "col_rating": DEFAULT_RATING_COL,
    "col_timestamp": DEFAULT_TIMESTAMP_COL,
}

svd_params = {
    "n_factors": 150,
    "n_epochs": 15,
    "lr_all": 0.005,
    "reg_all": 0.02,
    "random_state": SEED,
    "verbose": False
}

fastai_params = {
    "n_factors": 40, 
    "y_range": [0,5.5], 
    "wd": 1e-1,
    "max_lr": 5e-3,
    "epochs": 15
}

ncf_params = {
    "model_type": "NeuMF",
    "n_factors": 4,
    "layer_sizes": [16, 8, 4],
    "n_epochs": 15,
    "batch_size": 1024,
    "learning_rate": 1e-3,
    "verbose": 10
}

bpr_params = {
    "k": 200,
    "max_iter": 200,
    "learning_rate": 0.01,
    "lambda_reg": 1e-3,
    "seed": SEED,
    "verbose": False
}

bivae_params = {
    "k": 100,
    "encoder_structure": [200],
    "act_fn": "tanh",
    "likelihood": "pois",
    "n_epochs": 500,
    "batch_size": 1024,
    "learning_rate": 0.001,
    "seed": SEED,
    "use_gpu": True,
    "verbose": False
}

lightgcn_param = {
    "yaml_file": os.path.join("..","..","recommenders", "models", "deeprec", "config", "lightgcn.yaml"),
    "n_layers": 3,
    "batch_size": 1024,
    "epochs": 15,
    "learning_rate": 0.005,
    "eval_epoch": 5,
    "top_k": DEFAULT_K,
}

params = {
    "als": als_params,
    "sar": sar_params,
    "svd": svd_params,
    "fastai": fastai_params,
    "ncf": ncf_params,
    "bpr": bpr_params,
    "bivae": bivae_params,
    "lightgcn": lightgcn_param,
}

In [8]:
prepare_training_data = {
    "als": prepare_training_als,
    "sar": prepare_training_sar,
    "svd": prepare_training_svd,
    "fastai": prepare_training_fastai,
    "ncf": prepare_training_ncf,
    "bpr": prepare_training_cornac,
    "bivae": prepare_training_cornac,
    "lightgcn": prepare_training_lightgcn,
}

In [9]:
prepare_metrics_data = {
    "als": lambda train, test: prepare_metrics_als(train, test),
    "fastai": lambda train, test: prepare_metrics_fastai(train, test),    
}

In [10]:
trainer = {
    "als": lambda params, data: train_als(params, data),
    "svd": lambda params, data: train_svd(params, data),
    "sar": lambda params, data: train_sar(params, data), 
    "fastai": lambda params, data: train_fastai(params, data),
    "ncf": lambda params, data: train_ncf(params, data),
    "bpr": lambda params, data: train_bpr(params, data),
    "bivae": lambda params, data: train_bivae(params, data),
    "lightgcn": lambda params, data: train_lightgcn(params, data),
}

In [11]:
rating_predictor = {
    "als": lambda model, test: predict_als(model, test),
    "svd": lambda model, test: predict_svd(model, test),
    "fastai": lambda model, test: predict_fastai(model, test),
}

In [12]:
ranking_predictor = {
    "als": lambda model, test, train: recommend_k_als(model, test, train),
    "sar": lambda model, test, train: recommend_k_sar(model, test, train),
    "svd": lambda model, test, train: recommend_k_svd(model, test, train),
    "fastai": lambda model, test, train: recommend_k_fastai(model, test, train),
    "ncf": lambda model, test, train: recommend_k_ncf(model, test, train),
    "bpr": lambda model, test, train: recommend_k_cornac(model, test, train),
    "bivae": lambda model, test, train: recommend_k_cornac(model, test, train),
    "lightgcn": lambda model, test, train: recommend_k_lightgcn(model, test, train),
}

In [13]:
rating_evaluator = {
    "als": lambda test, predictions: rating_metrics_pyspark(test, predictions),
    "svd": lambda test, predictions: rating_metrics_python(test, predictions),
    "fastai": lambda test, predictions: rating_metrics_python(test, predictions)
}
    
    
ranking_evaluator = {
    "als": lambda test, predictions, k: ranking_metrics_pyspark(test, predictions, k),
    "sar": lambda test, predictions, k: ranking_metrics_python(test, predictions, k),
    "svd": lambda test, predictions, k: ranking_metrics_python(test, predictions, k),
    "fastai": lambda test, predictions, k: ranking_metrics_python(test, predictions, k),
    "ncf": lambda test, predictions, k: ranking_metrics_python(test, predictions, k),
    "bpr": lambda test, predictions, k: ranking_metrics_python(test, predictions, k),
    "bivae": lambda test, predictions, k: ranking_metrics_python(test, predictions, k),
    "lightgcn": lambda test, predictions, k: ranking_metrics_python(test, predictions, k),
}

In [14]:
def generate_summary(data, algo, k, train_time, time_rating, rating_metrics, time_ranking, ranking_metrics):
    summary = {"Data": data, "Algo": algo, "K": k, "Train time (s)": train_time, "Predicting time (s)": time_rating, "Recommending time (s)": time_ranking}
    if rating_metrics is None:
        rating_metrics = {
            "RMSE": np.nan,
            "MAE": np.nan,
            "R2": np.nan,
            "Explained Variance": np.nan,
        }
    if ranking_metrics is None:
        ranking_metrics = {
            "MAP": np.nan,
            "nDCG@k": np.nan,
            "Precision@k": np.nan,
            "Recall@k": np.nan,
        }
    summary.update(rating_metrics)
    summary.update(ranking_metrics)
    return summary

## Benchmark loop

In [15]:
data_sizes = ["100k"]#, "1m"] # Movielens data size: 100k, 1m, 10m, or 20m
algorithms = ["als", "svd", "sar", "ncf", "fastai", "bpr", "bivae", "lightgcn"]
#algorithms = ["als", "svd", "sar", "fastai", "bpr", "bivae"]
algorithms = ["lightgcn"]

In [16]:
%%time

# For each data size and each algorithm, a recommender is evaluated. 
cols = ["Data", "Algo", "K", "Train time (s)", "Predicting time (s)", "RMSE", "MAE", "R2", "Explained Variance", "Recommending time (s)", "MAP", "nDCG@k", "Precision@k", "Recall@k"]
df_results = pd.DataFrame(columns=cols)

for data_size in data_sizes:
    # Load the dataset
    df = movielens.load_pandas_df(
        size=data_size,
        header=[DEFAULT_USER_COL, DEFAULT_ITEM_COL, DEFAULT_RATING_COL, DEFAULT_TIMESTAMP_COL]
    )
    print("Size of Movielens {}: {}".format(data_size, df.shape))
    
    # Split the dataset
    df_train, df_test = python_stratified_split(df,
                                                ratio=0.75, 
                                                min_rating=1, 
                                                filter_by="item", 
                                                col_user=DEFAULT_USER_COL, 
                                                col_item=DEFAULT_ITEM_COL
                                                )
   
    # Loop through the algos
    for algo in algorithms:
        print(f"\nComputing {algo} algorithm on Movielens {data_size}")
          
        # Data prep for training set
        train = prepare_training_data.get(algo, lambda x,y:(x,y))(df_train, df_test)
        
        # Get model parameters
        model_params = params[algo]
          
        # Train the model
        model, time_train = trainer[algo](model_params, train)
        print(f"Training time: {time_train}s")
                
        # Predict and evaluate
        train, test = prepare_metrics_data.get(algo, lambda x,y:(x,y))(df_train, df_test)
        
        if "rating" in metrics[algo]:   
            # Predict for rating
            preds, time_rating = rating_predictor[algo](model, test)
            print(f"Rating prediction time: {time_rating}s")
            
            # Evaluate for rating
            ratings = rating_evaluator[algo](test, preds)
        else:
            ratings = None
            time_rating = np.nan
        
        if "ranking" in metrics[algo]:
            # Predict for ranking
            top_k_scores, time_ranking = ranking_predictor[algo](model, test, train)
            print(f"Ranking prediction time: {time_ranking}s")
            
            # Evaluate for rating
            rankings = ranking_evaluator[algo](test, top_k_scores, DEFAULT_K)
        else:
            rankings = None
            time_ranking = np.nan
            
        # Record results
        summary = generate_summary(data_size, algo, DEFAULT_K, time_train, time_rating, ratings, time_ranking, rankings)
        df_results.loc[df_results.shape[0] + 1] = summary
        
print("\nComputation finished")


100%|██████████| 4.81k/4.81k [00:00<00:00, 10.3kKB/s]


Size of Movielens 100k: (100000, 4)

Computing lightgcn algorithm on Movielens 100k
Already create adjacency matrix.
Already normalize adjacency matrix.
Using xavier initialization.
Epoch 1 (train)1.0s: train loss = 0.47360 = (mf)0.47336 + (embed)0.00024
Epoch 2 (train)0.8s: train loss = 0.29019 = (mf)0.28956 + (embed)0.00063
Epoch 3 (train)0.8s: train loss = 0.25493 = (mf)0.25413 + (embed)0.00079
Epoch 4 (train)0.9s: train loss = 0.23511 = (mf)0.23413 + (embed)0.00098


InternalError: 2 root error(s) found.
  (0) INTERNAL: Failed initializing math mode
	 [[node MatMul
 (defined at /home/hoaphumanoid/notebooks/repos/recommenders/recommenders/models/deeprec/models/graphrec/lightgcn.py:102)
]]
	 [[MatMul/_49]]
  (1) INTERNAL: Failed initializing math mode
	 [[node MatMul
 (defined at /home/hoaphumanoid/notebooks/repos/recommenders/recommenders/models/deeprec/models/graphrec/lightgcn.py:102)
]]
0 successful operations.
0 derived errors ignored.

Errors may have originated from an input operation.
Input Source operations connected to node MatMul:
In[0] embedding_lookup/Identity (defined at /home/hoaphumanoid/notebooks/repos/recommenders/recommenders/models/deeprec/models/graphrec/lightgcn.py:80)	
In[1] embedding_lookup_1/Identity (defined at /home/hoaphumanoid/notebooks/repos/recommenders/recommenders/models/deeprec/models/graphrec/lightgcn.py:83)

Operation defined at: (most recent call last)
>>>   File "/anaconda/envs/reco/lib/python3.7/runpy.py", line 193, in _run_module_as_main
>>>     "__main__", mod_spec)
>>> 
>>>   File "/anaconda/envs/reco/lib/python3.7/runpy.py", line 85, in _run_code
>>>     exec(code, run_globals)
>>> 
>>>   File "/anaconda/envs/reco/lib/python3.7/site-packages/ipykernel_launcher.py", line 17, in <module>
>>>     app.launch_new_instance()
>>> 
>>>   File "/anaconda/envs/reco/lib/python3.7/site-packages/traitlets/config/application.py", line 978, in launch_instance
>>>     app.start()
>>> 
>>>   File "/anaconda/envs/reco/lib/python3.7/site-packages/ipykernel/kernelapp.py", line 712, in start
>>>     self.io_loop.start()
>>> 
>>>   File "/anaconda/envs/reco/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 199, in start
>>>     self.asyncio_loop.run_forever()
>>> 
>>>   File "/anaconda/envs/reco/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
>>>     self._run_once()
>>> 
>>>   File "/anaconda/envs/reco/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
>>>     handle._run()
>>> 
>>>   File "/anaconda/envs/reco/lib/python3.7/asyncio/events.py", line 88, in _run
>>>     self._context.run(self._callback, *self._args)
>>> 
>>>   File "/anaconda/envs/reco/lib/python3.7/site-packages/ipykernel/kernelbase.py", line 510, in dispatch_queue
>>>     await self.process_one()
>>> 
>>>   File "/anaconda/envs/reco/lib/python3.7/site-packages/ipykernel/kernelbase.py", line 499, in process_one
>>>     await dispatch(*args)
>>> 
>>>   File "/anaconda/envs/reco/lib/python3.7/site-packages/ipykernel/kernelbase.py", line 406, in dispatch_shell
>>>     await result
>>> 
>>>   File "/anaconda/envs/reco/lib/python3.7/site-packages/ipykernel/kernelbase.py", line 730, in execute_request
>>>     reply_content = await reply_content
>>> 
>>>   File "/anaconda/envs/reco/lib/python3.7/site-packages/ipykernel/ipkernel.py", line 387, in do_execute
>>>     cell_id=cell_id,
>>> 
>>>   File "/anaconda/envs/reco/lib/python3.7/site-packages/ipykernel/zmqshell.py", line 528, in run_cell
>>>     return super().run_cell(*args, **kwargs)
>>> 
>>>   File "/anaconda/envs/reco/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 2976, in run_cell
>>>     raw_cell, store_history, silent, shell_futures, cell_id
>>> 
>>>   File "/anaconda/envs/reco/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3030, in _run_cell
>>>     return runner(coro)
>>> 
>>>   File "/anaconda/envs/reco/lib/python3.7/site-packages/IPython/core/async_helpers.py", line 78, in _pseudo_sync_runner
>>>     coro.send(None)
>>> 
>>>   File "/anaconda/envs/reco/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3258, in run_cell_async
>>>     interactivity=interactivity, compiler=compiler, result=result)
>>> 
>>>   File "/anaconda/envs/reco/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3473, in run_ast_nodes
>>>     if (await self.run_code(code, result,  async_=asy)):
>>> 
>>>   File "/anaconda/envs/reco/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3553, in run_code
>>>     exec(code_obj, self.user_global_ns, self.user_ns)
>>> 
>>>   File "/tmp/ipykernel_24736/1409672059.py", line 1, in <module>
>>>     get_ipython().run_cell_magic('time', '', '\n# For each data size and each algorithm, a recommender is evaluated. \ncols = ["Data", "Algo", "K", "Train time (s)", "Predicting time (s)", "RMSE", "MAE", "R2", "Explained Variance", "Recommending time (s)", "MAP", "nDCG@k", "Precision@k", "Recall@k"]\ndf_results = pd.DataFrame(columns=cols)\n\nfor data_size in data_sizes:\n    # Load the dataset\n    df = movielens.load_pandas_df(\n        size=data_size,\n        header=[DEFAULT_USER_COL, DEFAULT_ITEM_COL, DEFAULT_RATING_COL, DEFAULT_TIMESTAMP_COL]\n    )\n    print("Size of Movielens {}: {}".format(data_size, df.shape))\n    \n    # Split the dataset\n    df_train, df_test = python_stratified_split(df,\n                                                ratio=0.75, \n                                                min_rating=1, \n                                                filter_by="item", \n                                                col_user=DEFAULT_USER_COL, \n                                                col_item=DEFAULT_ITEM_COL\n                                                )\n   \n    # Loop through the algos\n    for algo in algorithms:\n        print(f"\\nComputing {algo} algorithm on Movielens {data_size}")\n          \n        # Data prep for training set\n        train = prepare_training_data.get(algo, lambda x,y:(x,y))(df_train, df_test)\n        \n        # Get model parameters\n        model_params = params[algo]\n          \n        # Train the model\n        model, time_train = trainer[algo](model_params, train)\n        print(f"Training time: {time_train}s")\n                \n        # Predict and evaluate\n        train, test = prepare_metrics_data.get(algo, lambda x,y:(x,y))(df_train, df_test)\n        \n        if "rating" in metrics[algo]:   \n            # Predict for rating\n            preds, time_rating = rating_predictor[algo](model, test)\n            print(f"Rating prediction time: {time_rating}s")\n            \n            # Evaluate for rating\n            ratings = rating_evaluator[algo](test, preds)\n        else:\n            ratings = None\n            time_rating = np.nan\n        \n        if "ranking" in metrics[algo]:\n            # Predict for ranking\n            top_k_scores, time_ranking = ranking_predictor[algo](model, test, train)\n            print(f"Ranking prediction time: {time_ranking}s")\n            \n            # Evaluate for rating\n            rankings = ranking_evaluator[algo](test, top_k_scores, DEFAULT_K)\n        else:\n            rankings = None\n            time_ranking = np.nan\n            \n        # Record results\n        summary = generate_summary(data_size, algo, DEFAULT_K, time_train, time_rating, ratings, time_ranking, rankings)\n        df_results.loc[df_results.shape[0] + 1] = summary\n        \nprint("\\nComputation finished")\n')
>>> 
>>>   File "/anaconda/envs/reco/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 2473, in run_cell_magic
>>>     result = fn(*args, **kwargs)
>>> 
>>>   File "/anaconda/envs/reco/lib/python3.7/site-packages/decorator.py", line 232, in fun
>>>     return caller(func, *(extras + args), **kw)
>>> 
>>>   File "/anaconda/envs/reco/lib/python3.7/site-packages/IPython/core/magic.py", line 187, in <lambda>
>>>     call = lambda f, *a, **k: f(*a, **k)
>>> 
>>>   File "/anaconda/envs/reco/lib/python3.7/site-packages/IPython/core/magics/execution.py", line 1335, in time
>>>     exec(code, glob, local_ns)
>>> 
>>>   File "<timed exec>", line 33, in <module>
>>> 
>>>   File "/tmp/ipykernel_24736/1119464800.py", line 9, in <lambda>
>>>     "lightgcn": lambda params, data: train_lightgcn(params, data),
>>> 
>>>   File "/home/hoaphumanoid/notebooks/repos/recommenders/examples/06_benchmarks/benchmark_utils.py", line 347, in train_lightgcn
>>>     model = LightGCN(hparams, data)
>>> 
>>>   File "/home/hoaphumanoid/notebooks/repos/recommenders/recommenders/models/deeprec/models/graphrec/lightgcn.py", line 102, in __init__
>>>     transpose_b=True,
>>> 

Input Source operations connected to node MatMul:
In[0] embedding_lookup/Identity (defined at /home/hoaphumanoid/notebooks/repos/recommenders/recommenders/models/deeprec/models/graphrec/lightgcn.py:80)	
In[1] embedding_lookup_1/Identity (defined at /home/hoaphumanoid/notebooks/repos/recommenders/recommenders/models/deeprec/models/graphrec/lightgcn.py:83)

Operation defined at: (most recent call last)
>>>   File "/anaconda/envs/reco/lib/python3.7/runpy.py", line 193, in _run_module_as_main
>>>     "__main__", mod_spec)
>>> 
>>>   File "/anaconda/envs/reco/lib/python3.7/runpy.py", line 85, in _run_code
>>>     exec(code, run_globals)
>>> 
>>>   File "/anaconda/envs/reco/lib/python3.7/site-packages/ipykernel_launcher.py", line 17, in <module>
>>>     app.launch_new_instance()
>>> 
>>>   File "/anaconda/envs/reco/lib/python3.7/site-packages/traitlets/config/application.py", line 978, in launch_instance
>>>     app.start()
>>> 
>>>   File "/anaconda/envs/reco/lib/python3.7/site-packages/ipykernel/kernelapp.py", line 712, in start
>>>     self.io_loop.start()
>>> 
>>>   File "/anaconda/envs/reco/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 199, in start
>>>     self.asyncio_loop.run_forever()
>>> 
>>>   File "/anaconda/envs/reco/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
>>>     self._run_once()
>>> 
>>>   File "/anaconda/envs/reco/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
>>>     handle._run()
>>> 
>>>   File "/anaconda/envs/reco/lib/python3.7/asyncio/events.py", line 88, in _run
>>>     self._context.run(self._callback, *self._args)
>>> 
>>>   File "/anaconda/envs/reco/lib/python3.7/site-packages/ipykernel/kernelbase.py", line 510, in dispatch_queue
>>>     await self.process_one()
>>> 
>>>   File "/anaconda/envs/reco/lib/python3.7/site-packages/ipykernel/kernelbase.py", line 499, in process_one
>>>     await dispatch(*args)
>>> 
>>>   File "/anaconda/envs/reco/lib/python3.7/site-packages/ipykernel/kernelbase.py", line 406, in dispatch_shell
>>>     await result
>>> 
>>>   File "/anaconda/envs/reco/lib/python3.7/site-packages/ipykernel/kernelbase.py", line 730, in execute_request
>>>     reply_content = await reply_content
>>> 
>>>   File "/anaconda/envs/reco/lib/python3.7/site-packages/ipykernel/ipkernel.py", line 387, in do_execute
>>>     cell_id=cell_id,
>>> 
>>>   File "/anaconda/envs/reco/lib/python3.7/site-packages/ipykernel/zmqshell.py", line 528, in run_cell
>>>     return super().run_cell(*args, **kwargs)
>>> 
>>>   File "/anaconda/envs/reco/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 2976, in run_cell
>>>     raw_cell, store_history, silent, shell_futures, cell_id
>>> 
>>>   File "/anaconda/envs/reco/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3030, in _run_cell
>>>     return runner(coro)
>>> 
>>>   File "/anaconda/envs/reco/lib/python3.7/site-packages/IPython/core/async_helpers.py", line 78, in _pseudo_sync_runner
>>>     coro.send(None)
>>> 
>>>   File "/anaconda/envs/reco/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3258, in run_cell_async
>>>     interactivity=interactivity, compiler=compiler, result=result)
>>> 
>>>   File "/anaconda/envs/reco/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3473, in run_ast_nodes
>>>     if (await self.run_code(code, result,  async_=asy)):
>>> 
>>>   File "/anaconda/envs/reco/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3553, in run_code
>>>     exec(code_obj, self.user_global_ns, self.user_ns)
>>> 
>>>   File "/tmp/ipykernel_24736/1409672059.py", line 1, in <module>
>>>     get_ipython().run_cell_magic('time', '', '\n# For each data size and each algorithm, a recommender is evaluated. \ncols = ["Data", "Algo", "K", "Train time (s)", "Predicting time (s)", "RMSE", "MAE", "R2", "Explained Variance", "Recommending time (s)", "MAP", "nDCG@k", "Precision@k", "Recall@k"]\ndf_results = pd.DataFrame(columns=cols)\n\nfor data_size in data_sizes:\n    # Load the dataset\n    df = movielens.load_pandas_df(\n        size=data_size,\n        header=[DEFAULT_USER_COL, DEFAULT_ITEM_COL, DEFAULT_RATING_COL, DEFAULT_TIMESTAMP_COL]\n    )\n    print("Size of Movielens {}: {}".format(data_size, df.shape))\n    \n    # Split the dataset\n    df_train, df_test = python_stratified_split(df,\n                                                ratio=0.75, \n                                                min_rating=1, \n                                                filter_by="item", \n                                                col_user=DEFAULT_USER_COL, \n                                                col_item=DEFAULT_ITEM_COL\n                                                )\n   \n    # Loop through the algos\n    for algo in algorithms:\n        print(f"\\nComputing {algo} algorithm on Movielens {data_size}")\n          \n        # Data prep for training set\n        train = prepare_training_data.get(algo, lambda x,y:(x,y))(df_train, df_test)\n        \n        # Get model parameters\n        model_params = params[algo]\n          \n        # Train the model\n        model, time_train = trainer[algo](model_params, train)\n        print(f"Training time: {time_train}s")\n                \n        # Predict and evaluate\n        train, test = prepare_metrics_data.get(algo, lambda x,y:(x,y))(df_train, df_test)\n        \n        if "rating" in metrics[algo]:   \n            # Predict for rating\n            preds, time_rating = rating_predictor[algo](model, test)\n            print(f"Rating prediction time: {time_rating}s")\n            \n            # Evaluate for rating\n            ratings = rating_evaluator[algo](test, preds)\n        else:\n            ratings = None\n            time_rating = np.nan\n        \n        if "ranking" in metrics[algo]:\n            # Predict for ranking\n            top_k_scores, time_ranking = ranking_predictor[algo](model, test, train)\n            print(f"Ranking prediction time: {time_ranking}s")\n            \n            # Evaluate for rating\n            rankings = ranking_evaluator[algo](test, top_k_scores, DEFAULT_K)\n        else:\n            rankings = None\n            time_ranking = np.nan\n            \n        # Record results\n        summary = generate_summary(data_size, algo, DEFAULT_K, time_train, time_rating, ratings, time_ranking, rankings)\n        df_results.loc[df_results.shape[0] + 1] = summary\n        \nprint("\\nComputation finished")\n')
>>> 
>>>   File "/anaconda/envs/reco/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 2473, in run_cell_magic
>>>     result = fn(*args, **kwargs)
>>> 
>>>   File "/anaconda/envs/reco/lib/python3.7/site-packages/decorator.py", line 232, in fun
>>>     return caller(func, *(extras + args), **kw)
>>> 
>>>   File "/anaconda/envs/reco/lib/python3.7/site-packages/IPython/core/magic.py", line 187, in <lambda>
>>>     call = lambda f, *a, **k: f(*a, **k)
>>> 
>>>   File "/anaconda/envs/reco/lib/python3.7/site-packages/IPython/core/magics/execution.py", line 1335, in time
>>>     exec(code, glob, local_ns)
>>> 
>>>   File "<timed exec>", line 33, in <module>
>>> 
>>>   File "/tmp/ipykernel_24736/1119464800.py", line 9, in <lambda>
>>>     "lightgcn": lambda params, data: train_lightgcn(params, data),
>>> 
>>>   File "/home/hoaphumanoid/notebooks/repos/recommenders/examples/06_benchmarks/benchmark_utils.py", line 347, in train_lightgcn
>>>     model = LightGCN(hparams, data)
>>> 
>>>   File "/home/hoaphumanoid/notebooks/repos/recommenders/recommenders/models/deeprec/models/graphrec/lightgcn.py", line 102, in __init__
>>>     transpose_b=True,
>>> 

Original stack trace for 'MatMul':
  File "/anaconda/envs/reco/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/anaconda/envs/reco/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/anaconda/envs/reco/lib/python3.7/site-packages/ipykernel_launcher.py", line 17, in <module>
    app.launch_new_instance()
  File "/anaconda/envs/reco/lib/python3.7/site-packages/traitlets/config/application.py", line 978, in launch_instance
    app.start()
  File "/anaconda/envs/reco/lib/python3.7/site-packages/ipykernel/kernelapp.py", line 712, in start
    self.io_loop.start()
  File "/anaconda/envs/reco/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 199, in start
    self.asyncio_loop.run_forever()
  File "/anaconda/envs/reco/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
    self._run_once()
  File "/anaconda/envs/reco/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
    handle._run()
  File "/anaconda/envs/reco/lib/python3.7/asyncio/events.py", line 88, in _run
    self._context.run(self._callback, *self._args)
  File "/anaconda/envs/reco/lib/python3.7/site-packages/ipykernel/kernelbase.py", line 510, in dispatch_queue
    await self.process_one()
  File "/anaconda/envs/reco/lib/python3.7/site-packages/ipykernel/kernelbase.py", line 499, in process_one
    await dispatch(*args)
  File "/anaconda/envs/reco/lib/python3.7/site-packages/ipykernel/kernelbase.py", line 406, in dispatch_shell
    await result
  File "/anaconda/envs/reco/lib/python3.7/site-packages/ipykernel/kernelbase.py", line 730, in execute_request
    reply_content = await reply_content
  File "/anaconda/envs/reco/lib/python3.7/site-packages/ipykernel/ipkernel.py", line 387, in do_execute
    cell_id=cell_id,
  File "/anaconda/envs/reco/lib/python3.7/site-packages/ipykernel/zmqshell.py", line 528, in run_cell
    return super().run_cell(*args, **kwargs)
  File "/anaconda/envs/reco/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 2976, in run_cell
    raw_cell, store_history, silent, shell_futures, cell_id
  File "/anaconda/envs/reco/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3030, in _run_cell
    return runner(coro)
  File "/anaconda/envs/reco/lib/python3.7/site-packages/IPython/core/async_helpers.py", line 78, in _pseudo_sync_runner
    coro.send(None)
  File "/anaconda/envs/reco/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3258, in run_cell_async
    interactivity=interactivity, compiler=compiler, result=result)
  File "/anaconda/envs/reco/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3473, in run_ast_nodes
    if (await self.run_code(code, result,  async_=asy)):
  File "/anaconda/envs/reco/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3553, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "/tmp/ipykernel_24736/1409672059.py", line 1, in <module>
    get_ipython().run_cell_magic('time', '', '\n# For each data size and each algorithm, a recommender is evaluated. \ncols = ["Data", "Algo", "K", "Train time (s)", "Predicting time (s)", "RMSE", "MAE", "R2", "Explained Variance", "Recommending time (s)", "MAP", "nDCG@k", "Precision@k", "Recall@k"]\ndf_results = pd.DataFrame(columns=cols)\n\nfor data_size in data_sizes:\n    # Load the dataset\n    df = movielens.load_pandas_df(\n        size=data_size,\n        header=[DEFAULT_USER_COL, DEFAULT_ITEM_COL, DEFAULT_RATING_COL, DEFAULT_TIMESTAMP_COL]\n    )\n    print("Size of Movielens {}: {}".format(data_size, df.shape))\n    \n    # Split the dataset\n    df_train, df_test = python_stratified_split(df,\n                                                ratio=0.75, \n                                                min_rating=1, \n                                                filter_by="item", \n                                                col_user=DEFAULT_USER_COL, \n                                                col_item=DEFAULT_ITEM_COL\n                                                )\n   \n    # Loop through the algos\n    for algo in algorithms:\n        print(f"\\nComputing {algo} algorithm on Movielens {data_size}")\n          \n        # Data prep for training set\n        train = prepare_training_data.get(algo, lambda x,y:(x,y))(df_train, df_test)\n        \n        # Get model parameters\n        model_params = params[algo]\n          \n        # Train the model\n        model, time_train = trainer[algo](model_params, train)\n        print(f"Training time: {time_train}s")\n                \n        # Predict and evaluate\n        train, test = prepare_metrics_data.get(algo, lambda x,y:(x,y))(df_train, df_test)\n        \n        if "rating" in metrics[algo]:   \n            # Predict for rating\n            preds, time_rating = rating_predictor[algo](model, test)\n            print(f"Rating prediction time: {time_rating}s")\n            \n            # Evaluate for rating\n            ratings = rating_evaluator[algo](test, preds)\n        else:\n            ratings = None\n            time_rating = np.nan\n        \n        if "ranking" in metrics[algo]:\n            # Predict for ranking\n            top_k_scores, time_ranking = ranking_predictor[algo](model, test, train)\n            print(f"Ranking prediction time: {time_ranking}s")\n            \n            # Evaluate for rating\n            rankings = ranking_evaluator[algo](test, top_k_scores, DEFAULT_K)\n        else:\n            rankings = None\n            time_ranking = np.nan\n            \n        # Record results\n        summary = generate_summary(data_size, algo, DEFAULT_K, time_train, time_rating, ratings, time_ranking, rankings)\n        df_results.loc[df_results.shape[0] + 1] = summary\n        \nprint("\\nComputation finished")\n')
  File "/anaconda/envs/reco/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 2473, in run_cell_magic
    result = fn(*args, **kwargs)
  File "/anaconda/envs/reco/lib/python3.7/site-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "/anaconda/envs/reco/lib/python3.7/site-packages/IPython/core/magic.py", line 187, in <lambda>
    call = lambda f, *a, **k: f(*a, **k)
  File "/anaconda/envs/reco/lib/python3.7/site-packages/IPython/core/magics/execution.py", line 1335, in time
    exec(code, glob, local_ns)
  File "<timed exec>", line 33, in <module>
  File "/tmp/ipykernel_24736/1119464800.py", line 9, in <lambda>
    "lightgcn": lambda params, data: train_lightgcn(params, data),
  File "/home/hoaphumanoid/notebooks/repos/recommenders/examples/06_benchmarks/benchmark_utils.py", line 347, in train_lightgcn
    model = LightGCN(hparams, data)
  File "/home/hoaphumanoid/notebooks/repos/recommenders/recommenders/models/deeprec/models/graphrec/lightgcn.py", line 102, in __init__
    transpose_b=True,
  File "/anaconda/envs/reco/lib/python3.7/site-packages/tensorflow/python/util/traceback_utils.py", line 150, in error_handler
    return fn(*args, **kwargs)
  File "/anaconda/envs/reco/lib/python3.7/site-packages/tensorflow/python/util/dispatch.py", line 1096, in op_dispatch_handler
    return dispatch_target(*args, **kwargs)
  File "/anaconda/envs/reco/lib/python3.7/site-packages/tensorflow/python/ops/math_ops.py", line 3701, in matmul
    a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
  File "/anaconda/envs/reco/lib/python3.7/site-packages/tensorflow/python/ops/gen_math_ops.py", line 6036, in mat_mul
    name=name)
  File "/anaconda/envs/reco/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py", line 746, in _apply_op_helper
    attrs=attr_protos, op_def=op_def)
  File "/anaconda/envs/reco/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3705, in _create_op_internal
    op_def=op_def)
  File "/anaconda/envs/reco/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 2101, in __init__
    self._traceback = tf_stack.extract_stack_for_node(self._c_op)


## Results

In [17]:
df_results

Unnamed: 0,Data,Algo,K,Train time (s),Predicting time (s),RMSE,MAE,R2,Explained Variance,Recommending time (s),MAP,nDCG@k,Precision@k,Recall@k
