<i>Copyright (c) Microsoft Corporation. All rights reserved.</i>

<i>Licensed under the MIT License.</i>

# Wide and Deep Model for Movie Recommendation

<br>

A linear model with a wide set of cross-product features can memorize the feature interactions, while deep neural networks (DNN) can generalize the feature patterns through low-dimensional dense embeddings learned for the sparse features. [**Wide-and-deep**](https://arxiv.org/abs/1606.07792) learning jointly trains wide linear model and deep neural networks to combine the benefits of memorization and generalization for recommender systems.

This notebook shows how to build and test the wide-and-deep model using [TensorFlow high-level Estimator API (v1.12)](https://www.tensorflow.org/api_docs/python/tf/estimator/DNNLinearCombinedRegressor). With the [movie recommendation dataset](https://grouplens.org/datasets/movielens/), we quickly demonstrate following topics:
1. How to prepare data
2. Build the model
3. Use log-hook to estimate performance while training
4. Test the model and export

> Note: The output cells in this notebook are from the result of run on Azure DSVM (Data Science Virtual Machine) with *Standard NC6* virtual machine.

In [1]:
import sys
sys.path.append("../../")

import os
import shutil
import itertools

import papermill as pm
import pandas as pd
import numpy as np
import sklearn.preprocessing

import tensorflow as tf
from tensorflow.python.client import device_lib

from reco_utils.common import tf_utils
from reco_utils.dataset import movielens
from reco_utils.dataset.pandas_df_utils import user_item_pairs, filter_by
from reco_utils.dataset.python_splitters import python_random_split
from reco_utils.evaluation.python_evaluation import (
    rmse, mae, rsquared, exp_var,
    map_at_k, ndcg_at_k, precision_at_k, recall_at_k
)

print("Tensorflow Version:", tf.VERSION)

devices = device_lib.list_local_devices()
[x.name for x in devices]

SUPPORTED_RANKING_METRICS = {
    'map': map_at_k,
    'ndcg': ndcg_at_k,
    'precision': precision_at_k,
    'recall': recall_at_k
}
SUPPORTED_RATING_METRICS = {
    'rmse': rmse,
    'mae': mae,
    'rsquared': rsquared,
    'exp_var': exp_var
}

Tensorflow Version: 1.12.0


In [2]:
"""Parameters. This cell is being used to pass parameters from other scripts via papermill"""

# Recommend top k items
TOP_K = 10
# Select Movielens data size: 100k, 1m, 10m, or 20m
MOVIELENS_DATA_SIZE = '100k'
METRICS = [
    'rmse', 'mae', 'rsquared', 'exp_var',
    'map', 'ndcg', 'precision', 'recall'
]
EVALUATE_WHILE_TRAINING = True  # If true, use session hook to evaluate model while training
# Data column names
USER_COL = 'UserId'
ITEM_COL = 'MovieId'
RATING_COL = 'Rating'
ITEM_FEAT_COL = 'Genres'

# Prepared train and test set pickle file paths. If None, load.
DATA_DIR = None
TRAIN_PICKLE_PATH = None
TEST_PICKLE_PATH = None
EXPORT_DIR_BASE = './outputs/model'

"""Hyperparameters"""
MODEL_TYPE = 'wide_deep'
EPOCHS = 50
BATCH_SIZE = 256
# Wide (linear) model hyperparameters
LINEAR_OPTIMIZER = 'Ftrl'
LINEAR_OPTIMIZER_LR = 0.0001  # Learning rate
LINEAR_L1_REG = 0.0           # L1 Regularization rate for FtrlOptimizer
# DNN model hyperparameters
DNN_OPTIMIZER = 'Adagrad'
DNN_OPTIMIZER_LR = 0.1
DNN_HIDDEN_LAYER_1 = 0     # Set 0 to not use this layer
DNN_HIDDEN_LAYER_2 = 1024  # Set 0 to not use this layer
DNN_HIDDEN_LAYER_3 = 32    # Set 0 to not use this layer
DNN_HIDDEN_LAYER_4 = 1024  # With this setting, DNN hidden units will be = [512, 256, 128, 128]
DNN_USER_DIM = 64
DNN_ITEM_DIM = 4
DNN_DROPOUT = 0.3
DNN_BATCH_NORM = 1        # 1 to use batch normalization, 0 if not.

MODEL_DIR = 'model_checkpoints'

### 1. Prepare Data

#### 1.1 Movie Rating and Genres Data
First, download [MovieLens](https://grouplens.org/datasets/movielens/) data. Movies in the data set are tagged as one or more genres where there are total 19 genres including '*unknown*'. We load *movie genres* to use them as item features.

In [3]:
data_loaded = False
# If local paths of train and test sets have given, use them
if TRAIN_PICKLE_PATH is not None and TEST_PICKLE_PATH is not None:
    if DATA_DIR is not None:
        train_pickle_path = os.path.join(DATA_DIR, TRAIN_PICKLE_PATH)
        test_pickle_path = os.path.join(DATA_DIR, TEST_PICKLE_PATH)
    train = pd.read_pickle(path=train_pickle_path)
    test = pd.read_pickle(path=test_pickle_path)
    data = pd.concat([train, test])
    data_loaded = True

In [4]:
if not data_loaded:
    # The genres of each movie are returned as '|' separated string, e.g. "Animation|Children's|Comedy".
    data = movielens.load_pandas_df(
        size=MOVIELENS_DATA_SIZE,
        header=[USER_COL, ITEM_COL, RATING_COL],
        genres_col='Genres_string'  # load genres as a temporal column 'Genres_string'
    )
    display(data.head())

Unnamed: 0,UserId,MovieId,Rating,Genres_string
0,196,242,3.0,Comedy
1,63,242,3.0,Comedy
2,226,242,5.0,Comedy
3,154,242,3.0,Comedy
4,306,242,5.0,Comedy


#### 1.2 Encode Item Features (Genres)
To use genres from our model, we multi-hot-encode them with scikit-learn's [MultiLabelBinarizer](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MultiLabelBinarizer.html).

For example, *Movie id=2355* has three genres, *Animation|Children's|Comedy*, which are being converted into an integer array of the indicator value for each genre like `[0, 0, 1, 1, 1, 0, 0, 0, ...]`. In the later step, we convert this into a float array and feed into the model.

In [5]:
if not data_loaded:
    # Encode 'genres' into int array (multi-hot representation) to use as item features
    genres_encoder = sklearn.preprocessing.MultiLabelBinarizer()
    data[ITEM_FEAT_COL] = genres_encoder.fit_transform(
        data['Genres_string'].apply(lambda s: s.split("|"))
    ).tolist()
    print("Genres:", genres_encoder.classes_)
    display(data.drop_duplicates(ITEM_COL)[[ITEM_COL, 'Genres_string', ITEM_FEAT_COL]].head())

Genres: ['Action' 'Adventure' 'Animation' "Children's" 'Comedy' 'Crime'
 'Documentary' 'Drama' 'Fantasy' 'Film-Noir' 'Horror' 'Musical' 'Mystery'
 'Romance' 'Sci-Fi' 'Thriller' 'War' 'Western' 'unknown']


Unnamed: 0,MovieId,Genres_string,Genres
0,242,Comedy,"[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
117,302,Crime|Film-Noir|Mystery|Thriller,"[0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, ..."
414,377,Children's|Comedy,"[0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
427,51,Drama|Romance|War|Western,"[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, ..."
508,346,Crime|Drama,"[0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, ..."


#### 1.3 Train and Test Split

In [6]:
if not data_loaded:
    train, test = python_random_split(
        data.drop('Genres_string', axis=1),  # We don't need Genres original string column
        ratio=0.75,
        seed=123 
    )
    data_loaded = True

print("Train = {}, test = {}".format(len(train), len(test)))

Train = 75000, test = 25000


In [7]:
# Unique items
if ITEM_FEAT_COL is None:
    items = data.drop_duplicates(ITEM_COL)[[ITEM_COL]].reset_index(drop=True)
    item_feat_shape = None
else:
    items = data.drop_duplicates(ITEM_COL)[[ITEM_COL, ITEM_FEAT_COL]].reset_index(drop=True)
    item_feat_shape = len(items[ITEM_FEAT_COL][0])
# Unique users
users = data.drop_duplicates(USER_COL)[[USER_COL]].reset_index(drop=True)

print("Num items = {}, num users = {}".format(len(items), len(users)))

Num items = 1682, num users = 943


### 2. Build Model

Wide-and-deep model consists of a linear model and DNN. We use the following hyperparameters and feature sets for the model:

<br> | <center>Wide (linear) model</center> | <center>Deep neural networks</center>
---|---|---
Feature set | <ul><li>User-item cross product features<br>to capture how their co-occurrence<br>correlates with the target rating</li></ul> | <ul><li>Deep, lower-dimensional embedding vectors<br>for every user and item</li><li>Item feature vector</li></ul>
Hyperparameters | <ul><li>[FTRL optimizer](https://www.eecs.tufts.edu/~dsculley/papers/ad-click-prediction.pdf)<br>learning rate = 0.07<br>l1 regularization = 0.015</li></ul> | <ul><li>[Adagrad optimizer](http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf)<br>learning rate = 0.018</li><li>Hidden units = [64, 128, 32]</li><li>Dropout rate = 0.2</li><li>Use batch normalization</li><li>User embedding vector size = 32</li><li>Item embedding vector size = 16 </li></ul>

<br>

The hyperparameters are found on *MovieLens 100k* **train set** (split by using the same `seed` we used in this notebook). We used **Azure Machine Learning service**([AzureML](https://azure.microsoft.com/en-us/services/machine-learning-service/)) for the Hyperparameter tuning. Please find details from [aml_hyperparameter_tuning](../04_model_select_and_optimize/hypertune_aml_wide_and_deep_quickstart.ipynb).

In [8]:
train_steps = EPOCHS * len(train) // BATCH_SIZE

# Clean-up previous model dir if already exists. Otherwise, it will try to train on top of the existing one.
shutil.rmtree(MODEL_DIR, ignore_errors=True)

DNN_HIDDEN_UNITS = []
if DNN_HIDDEN_LAYER_1 > 0:
    DNN_HIDDEN_UNITS.append(DNN_HIDDEN_LAYER_1)
if DNN_HIDDEN_LAYER_2 > 0:
    DNN_HIDDEN_UNITS.append(DNN_HIDDEN_LAYER_2)
if DNN_HIDDEN_LAYER_3 > 0:
    DNN_HIDDEN_UNITS.append(DNN_HIDDEN_LAYER_3)
if DNN_HIDDEN_LAYER_4 > 0:
    DNN_HIDDEN_UNITS.append(DNN_HIDDEN_LAYER_4)

if MODEL_TYPE is 'deep' or MODEL_TYPE is 'wide_deep':
    print("DNN hidden units =", DNN_HIDDEN_UNITS)
    print("Embedding {} users to {}-dim vector".format(len(users), DNN_USER_DIM))
    print("Embedding {} items to {}-dim vector\n".format(len(items), DNN_ITEM_DIM))

save_checkpoints_steps = max(1, train_steps // 5)
    
# Model type is tf.estimator.DNNLinearCombinedRegressor, known as 'wide-and-deep'
model, wide_columns, deep_columns = tf_utils.build_model(
    users=users[USER_COL].values,
    items=items[ITEM_COL].values,
    model_dir=MODEL_DIR,
    model_type=MODEL_TYPE,
    linear_optimizer=LINEAR_OPTIMIZER,
    linear_optimizer_lr=LINEAR_OPTIMIZER_LR,
    linear_l1_reg=LINEAR_L1_REG,
    dnn_optimizer=DNN_OPTIMIZER,
    dnn_optimizer_lr=DNN_OPTIMIZER_LR,
    dnn_hidden_units=DNN_HIDDEN_UNITS,
    dnn_user_dim=DNN_USER_DIM,
    dnn_item_dim=DNN_ITEM_DIM,
    dnn_dropout=DNN_DROPOUT,
    dnn_batch_norm=(DNN_BATCH_NORM==1),
    user_col=USER_COL,
    item_col=ITEM_COL,
    item_feat_col=ITEM_FEAT_COL,
    item_feat_shape=item_feat_shape,
    log_every_n_iter=max(1, train_steps//20),  # log 20 times
    save_checkpoints_steps=save_checkpoints_steps
)

# Wide columns are the features for wide model, and deep columns are for DNN
print("\nFeature specs:")
for c in wide_columns + deep_columns:
    print(str(c)[:100], "...")

DNN hidden units = [1024, 32, 1024]
Embedding 943 users to 64-dim vector
Embedding 1682 items to 4-dim vector

INFO:tensorflow:Using config: {'_model_dir': 'model_checkpoints', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 2929, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 732, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f4bc7de5550>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

Feature specs:
_CrossedColumn(keys=(_VocabularyListCategoricalCol

### 3. Train and Evaluate Model

Now we are all set to train the model. Here, we show how to utilize session hooks to track model performance while training. Our custom hook `tf_utils.evaluation_log_hook` estimates the model performance on the given data based on the specified evaluation functions. Note we pass test set to evaluate the model on rating metrics while we use <span id="ranking-pool">ranking-pool (all the user-item pairs)</span> for ranking metrics.

> Note: The TensorFlow Estimator's default loss calculates Mean Squared Error. Square root of the loss is the same as [RMSE](https://en.wikipedia.org/wiki/Root-mean-square_deviation).

In [9]:
ranking_metrics = {}
rating_metrics = {}
for m in METRICS:
    if m in SUPPORTED_RANKING_METRICS:
        ranking_metrics[m] = SUPPORTED_RANKING_METRICS[m]
    elif m in SUPPORTED_RATING_METRICS:
        rating_metrics[m] = SUPPORTED_RATING_METRICS[m]
        
cols = {
    'col_user': USER_COL,
    'col_item': ITEM_COL,
    'col_rating': RATING_COL,
    'col_prediction': 'prediction'
}

# Prepare ranking evaluation set, i.e. get the cross join of all user-item pairs
ranking_pool = user_item_pairs(
    user_df=users,
    item_df=items,
    user_col=USER_COL,
    item_col=ITEM_COL,
    user_item_filter_df=train,  # Remove seen items
    shuffle=True
)

In [10]:
""" Training hooks to track training performance (evaluate on 'train' data) 
"""
hooks = []
evaluation_logger = None
if EVALUATE_WHILE_TRAINING:
    class EvaluationLogger(tf_utils.Logger):
        def __init__(self):
            self.eval_log = {}

        def log(self, metric, value):
            if metric not in self.eval_log:
                self.eval_log[metric] = []
            self.eval_log[metric].append(value)
            print("eval_{} = {}".format(metric, value))

        def get_log(self):
            return self.eval_log

    evaluation_logger = EvaluationLogger()

    if len(ranking_metrics) > 0:
        hooks.append(
            tf_utils.evaluation_log_hook(
                model,
                logger=evaluation_logger,
                true_df=test,
                y_col=RATING_COL,
                eval_df=ranking_pool,
                every_n_iter=save_checkpoints_steps,
                model_dir=MODEL_DIR,
                eval_fns=list(ranking_metrics.values()),
                **{**cols, 'k': TOP_K}
            )
        )
    if len(rating_metrics) > 0:
        hooks.append(
            tf_utils.evaluation_log_hook(
                model,
                logger=evaluation_logger,
                true_df=test,
                y_col=RATING_COL,
                eval_df=test.drop(RATING_COL, axis=1),
                every_n_iter=save_checkpoints_steps,
                model_dir=MODEL_DIR,
                eval_fns=list(rating_metrics.values()),
                **cols
            )
        )

print("Training steps = {}, Batch size = {} (num epochs = {})".format(train_steps, BATCH_SIZE, EPOCHS))

train_fn = tf_utils.pandas_input_fn(
    df=train,
    y_col=RATING_COL,
    batch_size=BATCH_SIZE,
    num_epochs=None,  # None == run forever. We use steps=TRAIN_STEPS instead.
    shuffle=True
)

tf.logging.set_verbosity(tf.logging.INFO)

model.train(
    input_fn=train_fn,
    hooks=hooks,
    steps=train_steps
)

if evaluation_logger is not None:
    for m, v in evaluation_logger.get_log().items():
        pm.record("eval_{}".format(m), v)

Training steps = 14648, Batch size = 256 (num epochs = 50)
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
INFO:tensorflow:Saving checkpoints for 0 into model_checkpoints/model.ckpt.
INFO:tensorflow:loss = 3800.711, step = 0
INFO:tensorflow:global_step/sec: 111.266
INFO:tensorflow:loss = 364.64487, step = 732 (6.579 sec)
INFO:tensorflow:global_step/sec: 118.227
INFO:tensorflow:loss = 324.58643, step = 1464 (6.195 sec)
INFO:tensorflow:global_step/sec: 118.4
INFO:tensorflow:loss = 264.91138, step = 2196 (6.180 sec)
INFO:tensorflow:Saving checkpoints for 2929

#### 3.2 TensorBoard

Once the train is done, you can browse the details of the training results as well as the metrics we logged from [TensorBoard](https://www.tensorflow.org/guide/summaries_and_tensorboard).

[]()|[]()|[]()
:---:|:---:|:---:
![](./images/tensorboard_0.png)  |  ![](./images/tensorboard_1.png) | ![](./images/tensorboard_2.png)

To open the TensorBoard, open a terminal from the same directory of this notebook, run `tensorboard --logdir=model_checkpoints`, and open http://localhost:6006 from a browser.



### 4. Test and Export Model

#### 4.1 Item rating prediction

In [11]:
if len(rating_metrics) > 0:
    predictions = list(model.predict(input_fn=tf_utils.pandas_input_fn(df=test)))
    prediction_df = test.drop(RATING_COL, axis=1)
    prediction_df['prediction'] = [p['predictions'][0] for p in predictions]
    prediction_df['prediction'].describe()
    
    for m, fn in rating_metrics.items():
        result = fn(test, prediction_df, **cols)
        pm.record(m, result)
        print(m, "=", result)

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from model_checkpoints/model.ckpt-14648
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.


rmse = 0.9398877451087075


mae = 0.7429712194347382


rsquared = 0.30430762586623006


exp_var = 0.3046061620378063


#### 4.2 Recommend k items
For top-k recommendation evaluation, we use the ranking pool (all the user-item pairs) we prepared at the [training step](#ranking-pool). The difference is we remove users' seen items from the pool in this step which is more natural to the movie recommendation scenario.

In [12]:
if len(ranking_metrics) > 0:
    predictions = list(model.predict(input_fn=tf_utils.pandas_input_fn(df=ranking_pool)))
    prediction_df = ranking_pool.copy()
    prediction_df['prediction'] = [p['predictions'][0] for p in predictions]

    for m, fn in ranking_metrics.items():
        result = fn(test, prediction_df, **{**cols, 'k': TOP_K})
        name = "{}@{}".format(m, TOP_K)
        pm.record(name, result)
        print(name, "=", result)

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from model_checkpoints/model.ckpt-14648
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.


map@10 = 0.006098010993278619


ndcg@10 = 0.045067671395342554


precision@10 = 0.0435244161358811


recall@10 = 0.0175208404598925


#### 4.3 Export Model
Finally, we export the model so that we can load later for re-training, evaluation, and prediction.
Examples of how to load, re-train, and evaluate the saved model can be found from [aml_hyperparameter_tuning](../04_model_select_and_optimize/hypertune_aml_wide_and_deep_quickstart.ipynb) notebook.

In [None]:
os.makedirs(EXPORT_DIR_BASE, exist_ok=True)

In [None]:
tf.logging.set_verbosity(tf.logging.ERROR)

train_rcvr_fn = tf.contrib.estimator.build_supervised_input_receiver_fn_from_input_fn(
    train_fn
)
eval_rcvr_fn = tf.contrib.estimator.build_supervised_input_receiver_fn_from_input_fn(
    tf_utils.pandas_input_fn(df=test, y_col=RATING_COL)
)
serve_rcvr_fn = tf.estimator.export.build_parsing_serving_input_receiver_fn(
    tf.feature_column.make_parse_example_spec(wide_columns+deep_columns)
)
rcvr_fn_map = {
    tf.estimator.ModeKeys.TRAIN: train_rcvr_fn,
    tf.estimator.ModeKeys.EVAL: eval_rcvr_fn,
    tf.estimator.ModeKeys.PREDICT: serve_rcvr_fn
}

export_dir = tf.contrib.estimator.export_all_saved_models(
    model,
    export_dir_base=EXPORT_DIR_BASE,
    input_receiver_fn_map=rcvr_fn_map
)
pm.record('saved_model_dir', str(export_dir))
print("Model exported to", str(export_dir))

#### Cleanup

In [None]:
"""
Do not directly delete EXPORT_DIR_BASE directory since hypertune_aml_wide_and_deep_quickstart
notebook uses this notebook to train and export model.
Instead, use the same name for both MODEL_DIR and EXPORT_DIR_BASE to test so that can cleaned up
"""
shutil.rmtree(MODEL_DIR, ignore_errors=True)