<i>Copyright (c) Microsoft Corporation. All rights reserved.<br>
Licensed under the MIT License.</i>
<br>
# Model Comparison between SVD and NCF Using the Neural Network Intelligence Toolkit

This notebook shows how to use the **[Neural Network Intelligence](https://nni.readthedocs.io/en/latest/) toolkit (NNI)** for tuning hyperparameters for the Neural Collaborative Filtering Model and Surprise SVD model.

To learn about each tuner NNI offers you can read about it [here](https://nni.readthedocs.io/en/latest/Tuner/BuiltinTuner.html). To see how each tuner performs on the Surprise SVD model, visit [this notebook instead](./nni_surprise_svd.ipynb). 

NNI is a toolkit to help users design and tune machine learning models (e.g., hyperparameters), neural network architectures, or complex system’s parameters, in an efficient and automatic way. NNI has several appealing properties: ease of use, scalability, flexibility and efficiency. . NNI can be executed in a distributed way on a local machine, a remote server, or a large scale training platform such as OpenPAI or Kubernetes. 

In this notebook, we can see how NNI works with two different model types and the differences between their hyperparameter search spaces, yaml config file, and training scripts.

- [Surprise SVD Training Script](../../reco_utils/nni/svd_training.py)
- [NCF Training Script](../../reco_utils/nni/ncf_training.py)

In all experiments, we maximize precision@10. 

For this notebook we use a _local machine_ as the training platform (this can be any machine running the `reco_base` conda environment). In this case, NNI uses the available processors of the machine to parallelize the trials, subject to the value of `trialConcurrency` we specify in the configuration. Our runs and the results we report were obtained on a [Standard_D16_v3 virtual machine](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-general#dv3-series-1) with 16 vcpus and 64 GB memory.

### 1. Global Settings

In [1]:
import sys
sys.path.append("../../")
import json
import os
import surprise
import papermill as pm
import pandas as pd
import shutil
import subprocess
import tensorflow as tf
import yaml
import pkg_resources
from tempfile import TemporaryDirectory

import reco_utils
from reco_utils.common.timer import Timer
from reco_utils.dataset import movielens
from reco_utils.dataset.python_splitters import python_chrono_split
from reco_utils.evaluation.python_evaluation import rmse, precision_at_k, ndcg_at_k
from reco_utils.tuning.nni.nni_utils import (check_experiment_status, check_stopped, check_metrics_written, get_trials,
                                      stop_nni, start_nni)
from reco_utils.recommender.ncf.dataset import Dataset as NCFDataset
from reco_utils.recommender.ncf.ncf_singlenode import NCF
from reco_utils.recommender.surprise.surprise_utils import predict, compute_ranking_predictions
# from reco_utils.evaluation.python_evaluation import (rmse, mae, rsquared, exp_var, map_at_k, ndcg_at_k, precision_at_k, 
#                                                      recall_at_k, get_top_k_items)
from reco_utils.common.constants import SEED as DEFAULT_SEED

print("System version: {}".format(sys.version))
print("Surprise version: {}".format(surprise.__version__))
print("NNI version: {}".format(pkg_resources.get_distribution("nni").version))

tmp_dir = TemporaryDirectory()

%load_ext autoreload
%autoreload 2

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


System version: 3.6.10 |Anaconda, Inc.| (default, Mar 25 2020, 23:51:54) 
[GCC 7.3.0]
Surprise version: 1.1.0
NNI version: 1.5


### 2. Prepare Dataset
1. Download data and split into training, validation and test sets
2. Store the data sets to a local directory.

In [2]:
# Parameters used by papermill
# Select Movielens data size: 100k, 1m
MOVIELENS_DATA_SIZE = '100k'
SURPRISE_READER = 'ml-100k'
TMP_DIR = tmp_dir.name
NUM_EPOCHS = 30
MAX_TRIAL_NUM = 30
# time (in seconds) to wait for each tuning experiment to complete
WAITING_TIME = 20
MAX_RETRIES = MAX_TRIAL_NUM*4 # it is recommended to have MAX_RETRIES>=4*MAX_TRIAL_NUM

In [3]:
# Note: The NCF model can incorporate
df = movielens.load_pandas_df(
    size=MOVIELENS_DATA_SIZE,
    header=["userID", "itemID", "rating", "timestamp"]
)

df.head()

100%|██████████| 4.81k/4.81k [00:00<00:00, 11.0kKB/s]


Unnamed: 0,userID,itemID,rating,timestamp
0,196,242,3.0,881250949
1,186,302,3.0,891717742
2,22,377,1.0,878887116
3,244,51,2.0,880606923
4,166,346,1.0,886397596


In [4]:
train, validation, test = python_chrono_split(df, [0.7, 0.15, 0.15])
train = train.drop(['timestamp'], axis=1)
validation = validation.drop(['timestamp'], axis=1)
test = test.drop(['timestamp'], axis=1)

In [5]:
LOG_DIR = os.path.join(TMP_DIR, "experiments")
os.makedirs(LOG_DIR, exist_ok=True)

DATA_DIR = os.path.join(TMP_DIR, "data") 
os.makedirs(DATA_DIR, exist_ok=True)

TRAIN_FILE_NAME = "movielens_" + MOVIELENS_DATA_SIZE + "_train.pkl"
train.to_pickle(os.path.join(DATA_DIR, TRAIN_FILE_NAME))

VAL_FILE_NAME = "movielens_" + MOVIELENS_DATA_SIZE + "_val.pkl"
validation.to_pickle(os.path.join(DATA_DIR, VAL_FILE_NAME))

TEST_FILE_NAME = "movielens_" + MOVIELENS_DATA_SIZE + "_test.pkl"
test.to_pickle(os.path.join(DATA_DIR, TEST_FILE_NAME))

### 3. Prepare Hyperparameter Tuning 

To run an experiment on NNI we require a general training script for our model of choice.
A general framework for a training script utilizes the following components
1. Argument Parse for the fixed parameters (dataset location, metrics to use)
2. Data preprocessing steps specific to the model
3. Fitting the model on the train set
4. Evaluating the model on the validation set on each metric (ranking and rating)
5. Save metrics and model

To utilize NNI we also require a hypeyparameter search space. Only the hyperparameters we want to tune are required in the dictionary. NNI supports different methods of [hyperparameter sampling](https://nni.readthedocs.io/en/latest/Tutorial/SearchSpaceSpec.html).

The `script_params` below are the parameters of the training script that are fixed (unlike `hyper_params` which are tuned). In particular, `VERBOSE, BIASED, RANDOM_STATE, NUM_EPOCHS` are parameters used in the [SVD method](../02_model/surprise_svd_deep_dive.ipynb) and `REMOVE_SEEN` removes the training data from the recommended items. 

In [6]:
PRIMARY_METRIC = "precision_at_k"
RATING_METRICS = ["rmse"]
RANKING_METRICS = ["precision_at_k", "ndcg_at_k"]  
USERCOL = "userID"
ITEMCOL = "itemID"
REMOVE_SEEN = True
K = 10
RANDOM_STATE = 42
VERBOSE = True
BIASED = True

script_params = " ".join([
    "--datastore", DATA_DIR,
    "--train-datapath", TRAIN_FILE_NAME,
    "--validation-datapath", VAL_FILE_NAME,
    "--surprise-reader", SURPRISE_READER,
    "--rating-metrics", " ".join(RATING_METRICS),
    "--ranking-metrics", " ".join(RANKING_METRICS),
    "--usercol", USERCOL,
    "--itemcol", ITEMCOL,
    "--k", str(K),
    "--random-state", str(RANDOM_STATE),
    "--epochs", str(NUM_EPOCHS),
    "--primary-metric", PRIMARY_METRIC
])

if BIASED:
    script_params += " --biased"
if VERBOSE:
    script_params += " --verbose"
if REMOVE_SEEN:
    script_params += " --remove-seen"

In [7]:
# hyperparameters search space
# We do not set 'lr_all' and 'reg_all' because they will be overriden by the other lr_ and reg_ parameters

svd_hyper_params = {
    'n_factors': {"_type": "choice", "_value": [10, 50, 100, 150, 200]},
    'init_mean': {"_type": "uniform", "_value": [-0.5, 0.5]},
    'init_std_dev': {"_type": "uniform", "_value": [0.01, 0.2]},
    'lr_bu': {"_type": "uniform", "_value": [1e-6, 0.1]}, 
    'lr_bi': {"_type": "uniform", "_value": [1e-6, 0.1]}, 
    'lr_pu': {"_type": "uniform", "_value": [1e-6, 0.1]}, 
    'lr_qi': {"_type": "uniform", "_value": [1e-6, 0.1]}, 
    'reg_bu': {"_type": "uniform", "_value": [1e-6, 1]},
    'reg_bi': {"_type": "uniform", "_value": [1e-6, 1]}, 
    'reg_pu': {"_type": "uniform", "_value": [1e-6, 1]}, 
    'reg_qi': {"_type": "uniform", "_value": [1e-6, 1]}
}

In [8]:
with open(os.path.join(TMP_DIR, 'search_space_svd.json'), 'w') as fp:
    json.dump(svd_hyper_params, fp)

We also create a yaml file for the configuration of the trials and the tuning algorithm to be used (in this experiment we use the [TPE tuner](https://nni.readthedocs.io/en/latest/hyperoptTuner.html)). 

In [9]:
config = {
    "authorName": "default",
    "experimentName": "surprise_svd",
    "trialConcurrency": 8,
    "maxExecDuration": "1h",
    "maxTrialNum": MAX_TRIAL_NUM,
    "trainingServicePlatform": "local",
    # The path to Search Space
    "searchSpacePath": "search_space_svd.json",
    "useAnnotation": False,
    "logDir": LOG_DIR,
    "tuner": {
        "builtinTunerName": "TPE",
        "classArgs": {
            #choice: maximize, minimize
            "optimize_mode": "maximize"
        }
    },
    # The path and the running command of trial
    "trial":  {
      "command": sys.prefix + "/bin/python svd_training.py" + " " + script_params,
      "codeDir": os.path.join(os.path.split(os.path.abspath(reco_utils.__file__))[0], "tuning", "nni"),
      "gpuNum": 0
    }
}
 
with open(os.path.join(TMP_DIR, "config_svd.yml"), "w") as fp:
    fp.write(yaml.dump(config, default_flow_style=False))

We specify the search space for the NCF hyperparameters

In [10]:
ncf_hyper_params = {
    'n_factors': {"_type": "choice", "_value": [8, 12, 16, 24, 40]},
    'learning_rate': {"_type": "uniform", "_value": [1e-6, 1]},
}

In [11]:
with open(os.path.join(TMP_DIR, 'search_space_ncf.json'), 'w') as fp:
    json.dump(ncf_hyper_params, fp)

Our NCF config.yml file follows the same structure as the SVD config.yml. The only differences are the
- Experiment name
- Hyperparameter Search Space
- The executed command for the trial (Note: The script parameters have been configured to work for both training scripts)

In [12]:
config = {
    "authorName": "default",
    "experimentName": "tensorflow_ncf",
    "trialConcurrency": 8,
    "maxExecDuration": "1h",
    "maxTrialNum": MAX_TRIAL_NUM,
    "trainingServicePlatform": "local",
    # The path to Search Space
    "searchSpacePath": "search_space_ncf.json",
    "useAnnotation": False,
    "logDir": LOG_DIR,
    "tuner": {
        "builtinTunerName": "TPE",
        "classArgs": {
            #choice: maximize, minimize
            "optimize_mode": "maximize"
        }
    },
    # The path and the running command of trial
    "trial":  {
      "command": sys.prefix + "/bin/python ncf_training.py" + " " + script_params,
      "codeDir": os.path.join(os.path.split(os.path.abspath(reco_utils.__file__))[0], "tuning", "nni"),
      "gpuNum": 0
    }
}
 
with open(os.path.join(TMP_DIR, "config_ncf.yml"), "w") as fp:
    fp.write(yaml.dump(config, default_flow_style=False))

### 4. Execute NNI Trials

The conda environment comes with NNI installed, which includes the command line tool `nnictl` for controlling and getting information about NNI experiments. <br>
To start the NNI tuning trials from the command line, execute the following command: <br>
`nnictl create --config <path of config.yml>` <br>


The `start_nni` function will run the `nnictl create` command. To find the URL for an active experiment you can run `nnictl webui url` on your terminal.

In this notebook the SVD and NCF models are trained sequentially on different NNI experiments. While NNI can run two separate experiments simultaneously by adding the `--port <port_num>` flag to `nnictl create`, the total training time will probably be the same as running the experiments sequentially since these are CPU bound processes.

In [13]:
stop_nni()
config_path_svd = os.path.join(TMP_DIR, 'config_svd.yml')
with Timer() as time_svd:
    start_nni(config_path_svd, wait=WAITING_TIME, max_retries=MAX_RETRIES)

In [14]:
check_metrics_written(wait=WAITING_TIME, max_retries=MAX_RETRIES)
trials_svd, best_metrics_svd, best_params_svd, best_trial_path_svd = get_trials('maximize')

In [15]:
best_metrics_svd

{'rmse': 1.0153957155486038,
 'ndcg_at_k': 0.027899886410413542,
 'precision_at_k': 0.024708377518557794}

In [16]:
best_params_svd

{'parameter_id': 22,
 'parameter_source': 'algorithm',
 'parameters': {'n_factors': 100,
  'init_mean': 0.11174254678268791,
  'init_std_dev': 0.1977986447321912,
  'lr_bu': 0.008056394852767297,
  'lr_bi': 0.0008200509033991312,
  'lr_pu': 0.006790668636338857,
  'lr_qi': 0.09214174394023733,
  'reg_bu': 0.7866760462023896,
  'reg_bi': 0.23022204903863294,
  'reg_pu': 0.9432527716016627,
  'reg_qi': 0.9808218483424879},
 'parameter_index': 0}

In [None]:
stop_nni()
config_path_ncf = os.path.join(TMP_DIR, 'config_ncf.yml')
with Timer() as time_ncf:
    start_nni(config_path_ncf, wait=WAITING_TIME, max_retries=MAX_RETRIES)

In [None]:
check_metrics_written(wait=WAITING_TIME, max_retries=MAX_RETRIES)
trials_ncf, best_metrics_ncf, best_params_ncf, best_trial_path_ncf = get_trials('maximize')

In [None]:
best_metrics_ncf

In [None]:
best_params_ncf

### 5. Show Results

The metrics for each model type is reported on the validation set. At this point we can compare the metrics for each model and select the one with the best score on the primary metric(s) of interest.

In [None]:
def combine_metrics_dicts(*metrics):
    df = pd.DataFrame(metrics[0], index=[0])
    for metric in metrics[1:]:
        df = df.append(pd.DataFrame(metric, index=[0]))
    return df

best_metrics_svd['name'] = 'svd'
best_metrics_ncf['name'] = 'ncf'
combine_metrics_dicts(best_metrics_svd, best_metrics_ncf)

Once we select our model based on the validation set, we can test the model's performance on the test set using the best hyperparameters for the best model (in this case we will simply choose the NCF model, your results may differ depending on your own tests). We will do so by retraining the model on both the train and validation sets to predict on the test set

In [None]:
train_and_validation = train.append(validation).reset_index(drop=True)
data = NCFDataset(train_and_validation, test, seed=DEFAULT_SEED)

In [None]:
model = NCF(
    n_users=data.n_users, 
    n_items=data.n_items,
    model_type="NeuMF",
    n_factors=best_params_ncf["parameters"]["n_factors"],
    n_epochs=NUM_EPOCHS,
    learning_rate=best_params_ncf["parameters"]["learning_rate"],
    verbose=True,
    seed=DEFAULT_SEED
)

In [None]:
model.fit(data)

In [None]:
def compute_test_results(model, train, test):
    test_results = {}
    
    # Rating Metrics
    predictions = [[row.userID, row.itemID, model.predict(row.userID, row.itemID)]
           for (_, row) in test.iterrows()]

    predictions = pd.DataFrame(predictions, columns=['userID', 'itemID', 'prediction'])
    predictions = predictions.astype({'userID': 'int64', 'itemID': 'int64', 'prediction': 'float64'})

    for metric in RATING_METRICS:
        test_results[metric] = eval(metric)(test, predictions)
        
    # Ranking Metrics
    users, items, preds = [], [], []
    item = list(train.itemID.unique())
    for user in train.userID.unique():
        user = [user] * len(item) 
        users.extend(user)
        items.extend(item)
        preds.extend(list(model.predict(user, item, is_list=True)))

    all_predictions = pd.DataFrame(data={"userID": users, "itemID": items, "prediction": preds})

    merged = pd.merge(train, all_predictions, on=["userID", "itemID"], how="outer")
    all_predictions = merged[merged.rating.isnull()].drop('rating', axis=1)

    for metric in RANKING_METRICS:
        test_results[metric] = eval(metric)(test, all_predictions, col_prediction='prediction', k=K)
        
    return test_results

In [None]:
test_results = compute_test_results(model, train_and_validation, test)
print(test_results)

As we see in the table above, _annealing_ performs best with respect to the primary metric (precision@10) that all the tuners optimized. Also the best NDCG@10 is obtained for annealing and correlates well with precision@10. RMSE on the other hand does not correlate well and is not optimized for annealing, since finding the top k recommendations in the right order is a different task from predicting ratings (high and low) accurately.     
We have also observed that the above ranking of the tuners is not consistent and may change when trying these experiments multiple times. Since some of these tuners rely heavily on randomized sampling, a larger number of trials is required to get more consistent metrics.
In addition, some of the tuning algorithms themselves come with parameters, which can affect their performance.

In [None]:
# Stop the NNI experiment 
stop_nni()

In [None]:
tmp_dir.cleanup()

### 7. Concluding Remarks

In this notebook we showed how to use the NNI framework on different models. By inspection of the training scripts, the differences between the two should help you identify what components would need to be modified to run another model with NNI.

In practice, an AutoML framework like NNI is just a tool to help you explore a large space of hyperparameters quickly with a pre-described level of randomization. It is recommended that in addition to using NNI one trains baseline models using typical hyperparamter choices (learning rate of 0.005, 0.001 or regularization rates of 0.05, 0.01, etc.) to draw  more meaningful comparisons between model performances. This may help determine if a model is overfitting from the tuner or if there is a statistically significant improvement.

Another thing to note is the added computational cost required to train models using an AutoML framework. In this case, it takes about 1 minute to train each of the models on a [Standard_NC6 VM](https://docs.microsoft.com/en-us/azure/virtual-machines/nc-series). With this in mind, while NNI can easily train hundreds of models over all hyperparameters for a model, in practice it may be beneficial to choose a subset of the hyperparameters that are deemed most important and to tune those. Too small of a hyperparameter search space may restrict our exploration, but too large may also lead to random noise in the data being exploited by a specific combination of hyperparameters.   

For examples of scaling larger tuning workloads on clusters of machines, see [the notebooks](./README.md) that employ the [Azure Machine Learning service](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-tune-hyperparameters).  

### 8. References

Recommenders Repo References
* [SVD deep-dive notebook](../02_model/surprise_svd_deep_dive.ipynb)
* [NCF deep-dive notebook](../02_model/ncf_deep_dive.ipynb)
* [SVD + NNI model optimization](./nni_surprise_svd.ipynb)

External References
* [Surprise Docs | Matrix factorization algorithms](https://surprise.readthedocs.io/en/stable/matrix_factorization.html) 
* [NNI Docs | Neural Network Intelligence toolkit](https://github.com/Microsoft/nni)