# Benchmarking Collaborative Filtering Recommendation Algorithms

The benchmarking applies to collaborative filtering algorithms available in Microsoft/Recommenders repository like Spark ALS, Surprise SVD, Microsoft SAR, etc.

## Experimentation setup:
* Objective
  * To compare how each collaborative filtering algorithm perform in recommending list of items.
* Datasets
  * Movielens 100K.
  * Movielens 1M.
  * Movielens 10M.
  * Movielens 20M.
* Data split
  * The data is split into train, valid, and test sets.
  * The split ratios are 60-20-20 for train, valid, and test datasets, respectively.
  * The splitting is performed in a chronological and stratified way, which means that, ratings of each user will be split by timestamps with regard to the split ratios, and the same set of users appear in the train, valid, and test datasets.
  * Only users with more than 10 ratings are kept, to make sure the split is valid.
* Model training
  * A recommendation model is trained by using each of the collaborative filtering algorithms. 
  * It is well known in the literature that cross-validation is tricky for model validation. Only hyperparameter sweeping is performed to select the optimal model on a grid of hyper parameters. Knowing that, depending on actual business scenario, evaluation metric of interest may vary, recall is used in our benchmarking to select the optimal model. 
  * Hyper parameter range is chosen empirically. This may affect the final evaluation results. 
* Evaluation metrics
  * Precision@k.
  * Recall@k.
  * Normalized discounted cumulative gain@k (NDCG@k).
  * Mean-average-precision (MAP). 
  * In the evaluation metrics above, k = 10. 

## 0 Global settings

In [19]:
# set the environment path to find Recommenders
import sys
sys.path.append("../../")
import os
import numpy as np
from zipfile import ZipFile
import papermill as pm
import pyspark
from pyspark.ml.recommendation import ALS
import pyspark.sql.functions as F
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField
from pyspark.sql.types import StringType, FloatType, IntegerType

from reco_utils.dataset.url_utils import maybe_download
from reco_utils.dataset.movielens import load_spark_df
from reco_utils.dataset.spark_splitters import spark_chrono_split
from reco_utils.evaluation.spark_evaluation import SparkRankingEvaluation
from reco_utils.evaluation.parameter_sweep import generate_param_grid

print("System version: {}".format(sys.version))
print("Spark version: {}".format(pyspark.__version__))

System version: 3.6.0 | packaged by conda-forge | (default, Feb  9 2017, 14:36:55) 
[GCC 4.8.2 20140120 (Red Hat 4.8.2-15)]
Spark version: 2.2.1


In [3]:
# Configure Spark
spark = SparkSession \
    .builder \
    .appName("ALS pySpark") \
    .master("local[*]") \
    .config("spark.driver.memory", "2g")\
    .config("spark.executor.cores", "32")\
    .config("spark.executor.memory", "8g")\
    .config("spark.memory.fraction", "0.9")\
    .config("spark.memory.stageFraction", "0.3")\
    .config("spark.executor.instances", 1)\
    .config("spark.executor.heartbeatInterval", "36000s")\
    .config("spark.network.timeout", "10000000s")\
    .config("spark.driver.maxResultSize", "50g")\
    .getOrCreate()

In [32]:
# top k items to recommend
TOP_K = 10

# Select Movielens data size: 100k, 1m, 10m, or 20m
MOVIELENS_DATA_SIZE = '100k'

# Metric for selection
SELECTION_METRIC = "recall"

In [31]:
# Helper functions
def _find_key_value(d, key):
    '''
    A private method to find value to a key in a nested dictionary. This is used to get values of metrics.
    It can also be exported as a general utility function if necessary.
    '''
    if key in d: return d[key]
    for k, v in d.items():
        if isinstance(v, dict):
            value = self._find_key_value(v, key)
            if value is not None:
                return value

def _is_better(a, b, metric):
    '''
    Check if a is better than b measured by metric. 

    return: True if a is "better" than b.
    '''
    if metric in ["rmse", "rsquared"]:
        return a < b
    elif metric in ["map", "ndcg", "recall", "precision", "diversity"]:
        return a > b
    else:
        raise ValueError("Metric {} not recognised.".format(metric))

## 1 Prepare data

In [11]:
# Set data schema
headers = {
    "col_user": "UserId",
    "col_item": "MovieId",
    "col_timestamp": "Timestamp"
}

In [4]:
# Download data
dfs = load_spark_df(spark=spark, size=MOVIELENS_DATA_SIZE)

In [12]:
# Split data w.r.t the experimentation protocol.
dfs_train, dfs_valid, dfs_test = spark_chrono_split(
    dfs,
    filter_by="user", 
    min_rating=10,
    ratio=[0.6, 0.2, 0.2],
    **headers
)

## 2 Train model

In [13]:
cf_algorithms = ["als", "sar", "svd"]

In [25]:
cf_params = {
    "als": {
        "rank": [10, 15],
        "regParam": 0.01,
        "alpha": 0.1
    },
    "sar": {
        "time_decay": [10, 20, 30],
        "similarity": ["jaccard", "cosine"]
    },
    "svd": {
        "rank": [40, 60, 80]
    }
}

In [26]:
algo = cf_algorithms[0]

In [27]:
algo_params = cf_params[algo]

In [28]:
params = generate_param_grid(algo_params)

In [30]:
params

[{'rank': 10, 'regParam': 0.01, 'alpha': 0.1},
 {'rank': 15, 'regParam': 0.01, 'alpha': 0.1}]

In [33]:
param_list = []
metric_list = []

for idx, param in enumerate(params):
    als = ALS(
        maxIter=15,
        implicitPrefs=True,
        coldStartStrategy='drop',
        userCol="UserId",
        itemCol="MovieId",
        ratingCol="Rating",
        nonnegative=False,
        **param
    )

    model = als.fit(dfs_train)

    dfs_rec = model.recommendForUserSubset(dfs_valid, TOP_K)
    dfs_pred = dfs_rec.select('MovieId', explode('recommendations').alias('r')) \
      .select('MovieId', 'r.*')

    rank_eval = SparkRankingEvaluation(
        rating_true=dfs_valid, 
        rating_pred=dfs_pred,
        k=TOP_K, 
        relevancy_method="top_k",
        **header
    )

    results = {
          "K": rank_eval.k,
          "cluster": cluster,
          "map": rank_eval.map_at_k(),
          "ndcg": rank_eval.ndcg_at_k(),
          "precision": rank_eval.precision_at_k(),
          "recall": rank_eval.recall_at_k()
    }

    param_list.append(param)
    metric_list.append(results)

    metric_current = results[SELECTION_METRIC]
    if idx == 0:
        metric_best = metric_current
        rec_best = model
        idx_best = idx
    else:
        if _is_better(metric_current, metric_best, SELECTION_METRIC):
            metric_best = metric_current
            results_best = results
            param_best = param
            rec_best = model
            idx_best = idx
            
    dfs_rec = rec_best.recommendForUserSubset(dfs_test, TOP_K)
    dfs_pred = dfs_rec.select('MovieId', explode('recommendations').alias('r')) \
      .select('MovieId', 'r.*')

    rank_eval = SparkRankingEvaluation(
        rating_true=dfs_valid, 
        rating_pred=dfs_pred,
        k=TOP_K, 
        relevancy_method="top_k",
        **header
    )

    results_final = {
        "K": rank_eval.k,
        "cluster": cluster,
        "map": rank_eval.map_at_k(),
        "ndcg": rank_eval.ndcg_at_k(),
        "precision": rank_eval.precision_at_k(),
        "recall": rank_eval.recall_at_k()
    }

    df_result = pd.DataFrame(
        {
          "K": results_final["K"],
          "cluster": results_final["cluster"],
          "MAP": results_final["map"],
          "nDCG@k": results_final["ndcg"],
          "Precision@k": results_final["precision"],
          "Recall@k": results_final["recall"]
        }, 
        index=[0]
    )

    df_results = df_results.append(df_result, ignore_index=True)

AttributeError: 'ALSModel' object has no attribute 'recommendForUserSubset'