# Triplet Loss for Implicit Feedback Neural Recommender Systems

This notebook is based on the Deep Learning course from the Master Datascience Paris Saclay. Materials of the course can be found [here](https://github.com/m2dsupsdlclass/lectures-labs). 

**Goals**

* Demonstrate how it is possible to build a bi-linear recommender system only using positive feedback data.
* Train deeper architectures following the same design principles.

This notebook is inspired by Maciej Kula's [Recommendations in Keras using triplet loss](https://github.com/maciejkula/triplet_recommendations_keras). Contrary to the Maciej Kula's work, we won't use the [Bayesian Personalized Ranking](https://arxiv.org/ftp/arxiv/papers/1205/1205.2618.pdf) loss but instead will introduce the more common margin-based comparator.

**Dataset used**

* Anime Recommendations Database from Kaggle [link](https://www.kaggle.com/CooperUnion/anime-recommendations-database).

In [None]:
# Load libraries
import umap

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import plotly.express as px
import tensorflow as tf

from collections import deque

from sklearn.manifold import TSNE
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import QuantileTransformer

from tensorflow.keras import layers
from tensorflow.keras.layers import (Concatenate, Dense, Dot, Dropout,
                                     Embedding, Flatten, Input, Lambda)
from tensorflow.keras.models import Model
from tensorflow.keras.regularizers import l2

## Load and preprocess the data

### Ratings file

After loading the data, each line of the dataframe contains:
 * user_id - non identifiable randomly generated user id.
 * anime_id - the anime that this user has rated.
 * rating - rating out of $10$ this user has assigned ($-1$ if the user watched it but did not assign a rating).

In [None]:
# Load and preprocess rating files
df_raw = pd.read_csv('../input/anime-recommendations-database/rating.csv')

In [None]:
print(f"Shape of the ratings data: {df_raw.shape}.") 

In [None]:
df_raw.head(5)

### Anime metadata file

The anime metadata file contains the following metadata: 
 * anime_id - myanimelist.net's unique id identifying an anime.
 * name - full name of the anime.
 * genre - comma separated list of genres for this anime.
 * type - movie, TV, OVA, etc.
 * episodes - how many episodes in this show ($1$ if it's a movie).
 * rating - average rating out of $10$ for this anime.
 * members - number of community members that are in this anime's group.

In [None]:
# Load metadata file
metadata = pd.read_csv('../input/anime-recommendations-database/anime.csv')

In [None]:
print(f"Shape of the metadata: {metadata.shape}.")

In [None]:
metadata.head(5)

## Merge ratings and metadata

Let's enrich the raw ratings with the collected items metadata by merging the two dataframes on `anime_id`.

In [None]:
ratings = df_raw.merge(metadata.loc[:, ['name', 'anime_id', 'type', 'episodes']], left_on='anime_id', right_on='anime_id')

In [None]:
print(f"Shape of the complete data: {ratings.shape}.")

In [None]:
ratings.head(5)

### Data preprocessing

To understand well the distribution of the data, the following statistics are computed:
* the number of users
* the number of items
* the rating distribution
* the popularity of each anime

In [None]:
print(f"Number of unique users: {ratings['user_id'].unique().size}.")

In [None]:
print(f"Number of unique animes: {ratings['anime_id'].unique().size}.")

In [None]:
# Histogram of the ratings
x, height = np.unique(ratings['rating'], return_counts=True)

fig, ax = plt.subplots()
ax.bar(x, height, align='center')
ax.set(xticks=np.arange(-1, 11), xlim=[-1.5, 10.5])
plt.show()

Now, let's compute the popularity of each anime, defined as the number of ratings.

In [None]:
# Count the number of ratings for each movie
popularity = ratings.groupby('anime_id').size().reset_index(name='popularity')
metadata = metadata.merge(popularity, left_on='anime_id', right_on='anime_id')

## Speed-up the computation

In order to speed up the computation, we will subset the dataset using three criteria:
* Remove the $-1$ ratings (people who watch the anime but without giving a rate).
* Get only TV shows (because I like TV show).
* Get the most popular ones (more than $5000$ ratings).

In [None]:
# Get most popular anime id and TV shows
metadata_5000 = metadata.loc[(metadata['popularity'] > 5000) & (metadata['type'] == 'TV')]
# Remove -1 ratings and user id less than 10000
ratings = ratings[(ratings['rating'] > -1) & (ratings['user_id'] < 10000)]

## Clean id

Add a new column to metadata_5000 in order to clean up id of the anime.

In [None]:
# Create a dataframe for anime_id
metadata_5000 = metadata_5000.assign(new_anime_id=pd.Series(np.arange(metadata_5000.shape[0])).values)
metadata_5000_indexed = metadata_5000.set_index('new_anime_id')

In [None]:
# Merge the dataframe
ratings = ratings.merge(metadata_5000.loc[:, ['anime_id', 'new_anime_id', 'popularity']], left_on='anime_id', right_on='anime_id')

In [None]:
# Create a dataframe for user_id
user = pd.DataFrame({'user_id': np.unique(ratings['user_id'])})
user = user.assign(new_user_id=pd.Series(np.arange(user.shape[0])).values)

In [None]:
# Merge the dataframe
ratings = ratings.merge(user, left_on='user_id', right_on='user_id')

In [None]:
ratings.head(5)

In [None]:
print(f'Shape of the rating dataset: {ratings.shape}.')

In [None]:
MAX_USER_ID = ratings['new_user_id'].max()
MAX_ITEM_ID = ratings['new_anime_id'].max()

N_USERS = MAX_USER_ID + 1
N_ITEMS = MAX_ITEM_ID + 1

In [None]:
print(f'Number of users: {N_USERS} / Number of animes: {N_ITEMS}')

Later in the analysis, we will assume that this popularity does not come from the ratings themselves but from an external metadata, *e.g.* box office numbers in the month after the release in movie theaters.

### Split the dataset into train/test sets

Let's split the enriched data in a train/test split to make it possible to do predictive modeling.

In [None]:
train, test = train_test_split(ratings, test_size=0.2, random_state=42)

user_id_train = np.array(train['new_user_id'])
anime_id_train = np.array(train['new_anime_id'])
ratings_train = np.array(train['rating'])

user_id_test = np.array(test['new_user_id'])
anime_id_test = np.array(test['new_anime_id'])
ratings_test = np.array(test['rating'])

### Implicit feedback data

Consider the ratings greater or equal than 8 as positive feedback and ignore the rest.

In [None]:
train_pos = train.query('rating > 7')
test_pos = test.query('rating > 7')

Because the median rating is around $8$, this cut will remove approximately half of the ratings from the datasets.

## The Triplet Loss

The following section demonstrates how to build a low-rank quadratic interaction model between users and anime. The similarity score between a user and an anime is defined by the unormlized dot products of their respective embeddings. The matching scores can be used to rank items to recommend to a specific user. 

Training of the model parameters is achieved by randomly sampling negative animes not seen by a pre-selected anchor user. We want the model embedding matrices to be such that the similarity between the user vector and the negative vector is smaller than the similarity between the user vector and the positive item vector. Furthermore, we use a margin to turther move appart the negative from the anchor user.

Here is the architecture of such a triplet architecture. The triplet name comes from the fact that the loss to optimize is defined for the triple `(anchor_user, positive_item, negative_item)`:
![](https://raw.githubusercontent.com/m2dsupsdlclass/lectures-labs/3cb7df3a75b144b4cb812bc6eacec8e27daa5214/labs/03_neural_recsys/images/rec_archi_implicit_2.svg)

We call this model a triplet model with bi-linear interactions because the similarity between a user and an anime is captured by a dot product of the first level embedding vectors. This is therefore not a deep architecture.

In [None]:
def identity_loss(y_true, y_pred):
    """Ignore y_true and return the mean of y_pred.
    This is a hack to work-around the design of the Keras API that is
    not really suited to train networks with a triplet loss by default.
    """
    return tf.reduce_mean(y_pred)


class MarginLoss(layers.Layer):
    """Define the loss for the triple architecture
    
    Parameters
    ----------
    margin: float, default=1.
        Define a margin (alpha)
    """
    def __init__(self, margin=1.):
        super().__init__()
        self.margin = margin
        
    def call(self, inputs):
        pos_pair_similarity = inputs[0]
        neg_pair_similarity = inputs[1]
        
        diff = neg_pair_similarity - pos_pair_similarity
        return tf.maximum(diff + self.margin, 0)

Here is the actual code that builds the model(s) with shared weights. Note that here we use the cosine similarity instead of unormalized dot products (both seems to yield comparable results).
The triplet model is used to train the weights of the companion similarity model. The triplet model takes one user, one positive anime (relative to the selected user) and one negative item and is trained with comparator loss. The similarity model takes one user and one anime as input and return compatibility score (*aka* the match score).

In [None]:
class TripletModel(Model):
    """Define the triplet model architecture
    
    Parameters
    ----------
    embedding_size: integer
        Size the embedding vector
    n_users: integer
        Number of user in the dataset
    n_items: integer
        Number of item in the dataset
    l2_reg: float or None
        Quantity of regularization
    margin: float
        Margin for the loss
        
    Arguments
    ---------
    margin: float
        Margin for the loss
    user_embedding: Embedding
        Embedding layer of user 
    item_embedding: Embedding
        Embedding layer of item
    flatten: Flatten
        Flatten layer
    dot: Dot
        Dot layer
    margin_loss: MarginLoss
        Loss layer
    """
    def __init__(self, n_users, n_items, embedding_size=64, l2_reg=None, margin=1.):
        super().__init__(name='TripletModel')
        
        # Define hyperparameters
        self.margin = margin
        l2_reg = None if l2_reg == 0 else l2(l2_reg)
        
        # Define Embedding layers
        self.user_embedding = Embedding(output_dim=embedding_size,
                                        input_dim=n_users,
                                        input_length=1,
                                        input_shape=(1,),
                                        name='user_embedding',
                                        embeddings_regularizer=l2_reg)
        # The following embedding parameters will be shared to encode
        # both the positive and negative items.
        self.item_embedding = Embedding(output_dim=embedding_size,
                                        input_dim=n_items,
                                        input_length=1,
                                        name='item_embedding',
                                        embeddings_regularizer=l2_reg)
        
        # The two following layers are without parameters, and can
        # therefore be used for oth potisitve and negative items.
        self.flatten = Flatten()
        self.dot = Dot(axes=1, normalize=True)
        
        # Define the loss
        self.margin_loss = MarginLoss(margin)
        
    def call(self, inputs, training=False, y=None, **kwargs):
        """
        Parameters
        ----------
        inputs: list with three elements
            First element corresponds to the users
            Second element corresponds to the positive items
            Third element correponds to the negative items
        """
        user_input = inputs[0]
        item_pos_input = inputs[1]
        item_neg_input = inputs[2]
        
        # Create embeddings
        user_embedding = self.user_embedding(user_input)
        user_embedding = self.flatten(user_embedding)
        
        item_pos_embedding = self.item_embedding(item_pos_input)
        item_pos_embedding = self.flatten(item_pos_embedding)

        item_neg_embedding = self.item_embedding(item_neg_input)
        item_neg_embedding = self.flatten(item_neg_embedding)
        
        # Similarity computation betweeitem_neg_embeddings
        pos_similarity = self.dot([user_embedding, item_pos_embedding])
        neg_similarity = self.dot([user_embedding, item_neg_embedding])

        return self.margin_loss([pos_similarity, neg_similarity])

In [None]:
# Define parameters
EMBEDDING_SIZE = 64
L2_REG = 1e-6

# Define a triplet model
triplet_model = TripletModel(N_USERS, N_ITEMS, EMBEDDING_SIZE, L2_REG)

In [None]:
class MatchModel(Model):
    """Define the triplet model architecture
    
    Parameters
    ----------
    user_layer: Embedding
        User layer from TripletModel
    item_layer: Embedding
        Item layer from TripletModel
        
    Arguments
    ---------
    user_embedding: Embedding
        Embedding layer of user 
    item_embedding: Embedding
        Embedding layer of item
    flatten: Flatten
        Flatten layer
    dot: Dot
        Dot layer
    """
    def __init__(self, user_layer, item_layer):
        super().__init__(name='MathcModel')

        # Reuse the layer from the triplet model
        self.user_embedding = user_layer
        self.item_embedding = item_layer
        
        self.flatten = Flatten()
        self.dot = Dot(axes=1, normalize=True)
        
    def call(self, inputs, **kwargs):
        """
        Parameters
        ----------
        inputs: list with three elements
            First element corresponds to the users
            Second element corresponds to the positive items
        """
        user_input = inputs[0]
        item_pos_input = inputs[1]
        
        # Create embeddings
        user_embedding = self.user_embedding(user_input)
        user_embedding = self.flatten(user_embedding)
        
        item_pos_embedding = self.item_embedding(item_pos_input)
        item_pos_embedding = self.flatten(item_pos_embedding)
                
        # Similarity computation between embeddings
        pos_similarity = self.dot([user_embedding, item_pos_embedding])
        
        return pos_similarity

In [None]:
# Define a match model
match_model = MatchModel(triplet_model.user_embedding, triplet_model.item_embedding)

Note the `triplet_model` and `match_model` have as much parameters, they share both user and anime embeddings. Their only difference is that the latter doesn't compute the negative similarity.

### Quality of ranked recommendations

Now that we have a randomly initialized model, we can start computing random recommendations. To assess their quality, we do the following for each user:
* compute matching scores for animes (except the ones that the user has already seen in the training set);
* compare to the positive feedback actually collected on the test set using the ROC AUC ranking metric;
* average ROC AUC scores accross users to get the average performance of the recommender model on the test set.

In [None]:
def average_roc_auc(model, data_train, data_test):
    """Compute the ROC AUC for each user and average over users.
    
    Parameters
    ----------
    model: MatchModel
        A MatchModel to train
    data_train: numpy array
        Train set
    data_test: numpy array
        Test set
        
    Return
    ------
    Average ROC AUC scores across users
    """
    max_user_id = max(data_train['new_user_id'].max(),
                      data_test['new_user_id'].max())
    max_anime_id = max(data_train['new_anime_id'].max(),
                       data_test['new_anime_id'].max())
    
    user_auc_scores = []
    for user_id in range(1, max_user_id + 1):
        pos_item_train = data_train[data_train['new_user_id'] == user_id]
        pos_item_test = data_test[data_test['new_user_id'] == user_id]
        
        # Consider all the items already seen in the training set
        all_item_idx = np.arange(1, max_anime_id + 1)
        items_to_rank = np.setdiff1d(all_item_idx,
                                     pos_item_train['new_anime_id'].values)
        
        # Ground truth: return 1 for each item positively present in
        # the test set and 0 otherwise
        expected = np.in1d(items_to_rank,
                           pos_item_test['new_anime_id'].values)
        
        # At least one positive test value to rank
        if np.sum(expected) >= 1:
            repeated_user_id = np.empty_like(items_to_rank)
            repeated_user_id.fill(user_id)
            
            # Make prediction
            predicted = model.predict([repeated_user_id, items_to_rank], batch_size=4096)
            
            # Compute AUC scores
            user_auc_scores.append(roc_auc_score(expected, predicted))
        
    return sum(user_auc_scores) / len(user_auc_scores)


By default, the model should make predictions to rank the items in random order. The **ROC AUC score** is a ranking score that represents the **expected value of correctly ordering uniformly sampled pairs of recommendations**. A random (untrained) model should yield $0.50$ ROC AUC on average.

In [None]:
%%time
print(f'Average ROC AUC on the untrained model: {average_roc_auc(match_model, train_pos, test_pos)}.')

### Training the Triplet Model

Let's now fit the parameters of the model by sampling triplets: for each user, select a anime in the positive feedback set of that user and randomly sample another anime to serve as negative item. 

Note that this sampling scheme could be improved by removing items that are marked as postive in the data to remove some label noise. In practice, this does not seem to be a problem though.

In [None]:
def sample_triplets(pos_data, max_item_id, random_seed=42):
    """Sample negative items ar random
    
    Parameters
    ----------
    pos_data: pd.DataFrame
        Dataframe of positive items
    max_item_id: integer
        Number of items in the complete dataframe
    random_seed: integer, default=42
        Random number generation
    
    Return
    ------
    A list with entries user_ids, pos_items_ids and neg_items_ids
    """
    rng = np.random.RandomState(random_seed)
    
    user_ids = pos_data['new_user_id'].values.astype('int64')
    pos_item_ids = pos_data['new_anime_id'].values.astype('int64')
    neg_item_ids = rng.randint(low=1, 
                               high=max_item_id + 1, 
                               size=len(user_ids), dtype='int64')
    return [user_ids, pos_item_ids, neg_item_ids]

In [None]:
# Define parameters
N_EPOCHS = 10
BATCH_SIZE = 64

# We plug the identity loss and a fake target variable ignored by 
# the model to be able to use the Keras API to train the model.
fake_y = np.ones_like(train_pos["new_user_id"], dtype='int64')
    
triplet_model.compile(loss=identity_loss, optimizer='adam')
    
for i in range(N_EPOCHS):
    # Sample new negative items to build different triplets at each epoch
    triplet_inputs = sample_triplets(train_pos, MAX_ITEM_ID, random_seed=i)
        
    # Fit the model incrementally by doing a single pass over the sampled triplets
    triplet_model.fit(x=triplet_inputs, y=fake_y,
                      shuffle=True, batch_size=BATCH_SIZE, epochs=1)

In [None]:
# Evaluate the convergence of the model. Ideally, we should prepare a
# validation set and compute this at each epoch but this is too slow.
test_auc = average_roc_auc(match_model, train_pos, test_pos)
print(f'Average ROC AUC on the trained model: {test_auc}.')

In [None]:
# Print summary of triplet model
triplet_model.summary()

In [None]:
# Print summary of match model
match_model.summary()

Both models have exactly the same number of parameters, namely the parameters of the two embeddings:
* user embedding: $n_{users} \times embedding_{dim}$
* item embedding: $n_{items} \times embedding_{dim}$

The triplet model uses the same item embedding twice, once to compute the positive similarity and the other time to compute the negative similarity. However, because two nodes in the computation graph share the same instance of the item embedding layer, the item embedding weight matrix is shared by the two branches of the graph and therefore the total number of parameters for each model is in both cases:
$$n_{users} \times embedding_{dim} + n_{items} \times embedding_{dim}$$

## Training a Deep Matching Model on Implicit Feedback

Instead of using hard-coded cosine to predict the match of a `(user_id, item_id)` pair, we can instead specify a deep neural network based on parametrisation of the similarity. The parameters of that matching model are also trained with the margin comparator loss.
![](https://github.com/m2dsupsdlclass/lectures-labs/raw/3cb7df3a75b144b4cb812bc6eacec8e27daa5214/labs/03_neural_recsys/images/rec_archi_implicit_1.svg)

**Goals**
* Implement a `(deep_match_model, deep_triplet_model)` pair of models for the architecture described in the schema. The last layer of the embedded Multi Layer Perceptron outputs a single scalar that encoded the similarity between a user and a candidate item.
* Evaluate the resulting model by computing the per-usage average ROC AUC score on the test feedback data:
    * Check that the AUC ROC score is close to $0.5$ for a randomly initialized model.
   
**Hints**
* It is possible to reuse the code to create embeddings from the previous model definition.
* The concatenation between user and the positive item embedding can be obtained with the `Concatenate` layer:
        concat = Concatenate()
        positive_embeddings_pair = concat([user_embedding, positive_item_embedding])
        negative_embeddings_pair = concat([user_embedding, negative_item_embedding])
* Those embedding pairs should be fed to a shared MLP instance to compute the similarity scores.

In [None]:
class MLP(layers.Layer):
    """Define the MLP layer for the triplet architecture
    
    Parameters
    ----------
    n_hidden: Integer, default=1
        Number of hidden layer
    hidden_size: list of size `n_hidden`
        Output size of the hidden layer
    p_dropout: float, default=0.
        Probability for the Dropout layer
    l2_reg: float, default=None
        Regularizer
        
    Argument
    --------
    layers: list of Layer
        The different layers used in the MLP
    """
    def __init__(self, n_hidden=1, hidden_size=[64], p_dropout=0., l2_reg=None):
        super().__init__()
        
        self.layers = [Dropout(p_dropout)]
        
        for i in range(n_hidden):
            self.layers.append(Dense(hidden_size[i], 
                                     activation='relu', 
                                     kernel_regularizer=l2_reg))
            self.layers.append(Dropout(p_dropout))
        
        self.layers.append(Dense(1, 
                                 activation='relu', 
                                 kernel_regularizer=l2_reg))
        
    def call(self, x, training=False):
        for layer in self.layers:
            if isinstance(layer, Dropout):
                x = layer(x, training=training)
            else:
                x = layer(x)
        return x
    
    
class DeepTripletModel(Model):
    """Define the triplet model architecture
    
    Parameters
    ----------
    embedding_size_user: integer
        Size of the embedding vector for the user
    embedding_size_item: integer
        Size of the embedding vector for the item
    n_users: integer
        Number of user in the dataset
    n_items: integer
        Number of item in the dataset
    n_hidden: Integer, default=1
        Number of hidden layer
    hidden_size: list of size `n_hidden`
        Output size of the hidden layer
    l2_reg: float or None
        Quantity of regularization
    margin: float
        Margin for the loss
    p_dropout: float, default=0.
        Probability for the Dropout layer
        
    Arguments
    ---------
    margin: float
        Margin for the loss
    user_embedding: Embedding
        Embedding layer of user 
    item_embedding: Embedding
        Embedding layer of item
    flatten: Flatten
        Flatten layer
    concat: Concetenate
        Concatenate layer
    mlp: MLP
        MLP layer
    margin_loss: MarginLoss
        Loss layer
    """
    def __init__(self, n_users, n_items, 
                 embedding_size_user=64, embedding_size_item=64, 
                 n_hidden=1, hidden_size=[64], 
                 l2_reg=None, margin=1., p_dropout=0.):
        super().__init__(name='TripletModel')
        
        # Define hyperparameters
        self.margin = margin
        l2_reg = None if l2_reg == 0 else l2(l2_reg)
        
        # Define Embedding layers
        self.user_embedding = Embedding(output_dim=embedding_size_user,
                                        input_dim=n_users,
                                        input_length=1,
                                        input_shape=(1,),
                                        name='user_embedding',
                                        embeddings_regularizer=l2_reg)
        # The following embedding parameters will be shared to encode
        # both the positive and negative items.
        self.item_embedding = Embedding(output_dim=embedding_size_item,
                                        input_dim=n_items,
                                        input_length=1,
                                        name='item_embedding',
                                        embeddings_regularizer=l2_reg)
        
        # The two following layers are without parameters, and can
        # therefore be used for oth potisitve and negative items.
        self.flatten = Flatten()
        self.concat = Concatenate()
        
        # Define the MLP
        self.mlp = MLP(n_hidden, hidden_size, p_dropout, l2_reg)
        
        # Define the loss
        self.margin_loss = MarginLoss(margin)
        
    def call(self, inputs, training=False, **kwargs):
        """
        Parameters
        ----------
        inputs: list with three elements
            First element corresponds to the users
            Second element corresponds to the positive items
            Third element correponds to the negative items
        """
        user_input = inputs[0]
        item_pos_input = inputs[1]
        item_neg_input = inputs[2]
        
        # Create embeddings
        user_embedding = self.user_embedding(user_input)
        user_embedding = self.flatten(user_embedding)
        
        item_pos_embedding = self.item_embedding(item_pos_input)
        item_pos_embedding = self.flatten(item_pos_embedding)

        item_neg_embedding = self.item_embedding(item_neg_input)
        item_neg_embedding = self.flatten(item_neg_embedding)
        
        # Concatenate embeddings
        pos_embeddings_pair = self.concat([user_embedding, item_pos_embedding])
        neg_embeddings_pair = self.concat([user_embedding, item_neg_embedding])
        
        # Pass trough the MLP
        pos_similarity = self.mlp(pos_embeddings_pair)
        neg_similarity = self.mlp(neg_embeddings_pair)
        
        return self.margin_loss([pos_similarity, neg_similarity])

    
class DeepMatchModel(Model):
    """Define the triplet model architecture
    
    Parameters
    ----------
    user_layer: Embedding
        User layer from TripletModel
    item_layer: Embedding
        Item layer from TripletModel
    mlp: MLP
        MLP layer from TripletModel

    Arguments
    ---------
    user_embedding: Embedding
        Embedding layer of user 
    item_embedding: Embedding
        Embedding layer of item
    mlp: MLP
        MLP layer
    flatten: Flatten
        Flatten layer
    concat: Concatenate
        Concatenate layer
    """
    def __init__(self, user_layer, item_layer, mlp):
        super().__init__(name='MatchModel')

        # Reuse the layer from the triplet model
        self.user_embedding = user_layer
        self.item_embedding = item_layer
        self.mlp = mlp
        
        self.flatten = Flatten()
        self.concat = Concatenate()
        
    def call(self, inputs, **kwargs):
        """
        Parameters
        ----------
        inputs: list with three elements
            First element corresponds to the users
            Second element corresponds to the positive items
        """
        user_input = inputs[0]
        item_pos_input = inputs[1]
        
        # Create embeddings
        user_embedding = self.user_embedding(user_input)
        user_embedding = self.flatten(user_embedding)
        
        item_pos_embedding = self.item_embedding(item_pos_input)
        item_pos_embedding = self.flatten(item_pos_embedding)
        
        pos_embeddings_pair = self.concat([user_embedding, item_pos_embedding])
        
        # Similarity computation between embeddings
        pos_similarity = self.mlp(pos_embeddings_pair)
        
        return pos_similarity

In [None]:
# Define and train the model
HYPER_PARAM = dict( 
    embedding_size_user=32, 
    embedding_size_item=64, 
    n_hidden=1, 
    hidden_size=[128], 
    l2_reg=0., 
    margin=0.5, 
    p_dropout=0.1)

deep_triplet_model = DeepTripletModel(N_USERS, N_ITEMS, **HYPER_PARAM)
deep_match_model = DeepMatchModel(deep_triplet_model.user_embedding, 
                                  deep_triplet_model.item_embedding, 
                                  deep_triplet_model.mlp)

In [None]:
print(f'Average ROC AUC on the untrained model: {average_roc_auc(deep_match_model, train_pos, test_pos)}.')

In [None]:
# Define parameters
N_EPOCHS = 20

# We plug the identity loss and a fake target variable ignored by 
# the model to be able to use the Keras API to train the model.
fake_y = np.ones_like(train_pos["new_user_id"], dtype='int64')
    
deep_triplet_model.compile(loss=identity_loss, optimizer='adam')
    
for i in range(N_EPOCHS):
    # Sample new negative items to build different triplets at each epoch
    triplet_inputs = sample_triplets(train_pos, MAX_ITEM_ID, random_seed=i)
        
    # Fit the model incrementally by doing a single pass over the sampled triplets
    deep_triplet_model.fit(x=triplet_inputs, y=fake_y,
                      shuffle=True, batch_size=BATCH_SIZE, epochs=1)

In [None]:
# Evaluate the convergence of the model. Ideally, we should prepare a
# validation set and compute this at each epoch but this is too slow.
test_auc = average_roc_auc(deep_match_model, train_pos, test_pos)
print(f'Average ROC AUC on the trained model: {test_auc}.')

In [None]:
deep_triplet_model.summary()

In [None]:
deep_match_model.summary()

Both models have again exactly the same number of parameters, namely the parameters of the two embeddings:
- user embedding: $n_{users} \times user_{dim}$
- item embedding: $n_{items} \times item_{dim}$

and the parameters of the MLP model used to compute the similarity score of an `(user, item)` pair:
- first hidden layer weights: $(user_{dim} + item_{dim}) \times hidden_{size}$
- first hidden biases: $hidden_{size}$
- extra hidden layers weights: $hidden_{size} \times hidden_{size}$
- extra hidden layers biases: $hidden_{size}$
- output layer weights: $hidden_{size} \times 1$
- output layer biases: $1$

The triplet model uses the same item embedding layer twice and the same MLP instance twice: once to compute the positive similarity and the other time to compute the negative similarity. However because those two lanes in the computation graph share the same instances for the item embedding layer and for the MLP, their parameters are shared.

## Possible Extensions

We may implement any of the following ideas.

### Leverage User and Item metadata

As we did for the Explicit Feedback model, it's also possible to extend our models to take additional user and item metadata as side information when computing the match score.

### Better Ranking Metrics

In this notebook, we evaluated the quality of the ranked recommendations using the ROC AUC metric. This score reflect the ability of the model to correctly rank any pair of items (sampled uniformly at random among all possible items).

In practice, recommender systems will only display a few recommendations to the user (typically 1 to 10). It is typically more informative to use an evaluation metric that characterize the quality of the top ranked items and attribute less or no importance to items that are not good recommendations for a specific users. Popular ranking metrics therefore include the *Precision at k* and the *Mean Average Precision*.

### Hard Negatives Sampling

In this experiment, we sampled negative items uniformly at random. However, after training the model for a while, it is possible that the vast majority of sampled negatives have a similarity already much lower than the positive pair and that the margin comparator loss sets the majority of the gradients to zero effectively wasting a lot of computation.

Given the current state of the recsys model we could sample harder negatives with a larger likelihood to train the model better closer to its decision boundary. This strategy is implemented in the WARP loss [1].

The main drawback of hard negative sampling is increasing the risk of sever overfitting if a significant fraction of the labels are noisy.

### Factorization Machines

A very popular recommender systems model is called Factorization Machines [2][3]. They two use low rank vector representations of the inputs but they do not use a cosine similarity or a neural network to model user/item compatibility.

It is be possible to adapt our previous code written with Keras to replace the cosine sims / MLP with the low rank FM quadratic interactions by reading through [this gentle introduction](http://tech.nextroll.com/blog/data-science/2015/08/25/factorization-machines.html).

If you choose to do so, you can compare the quality of the predictions with those obtained by the [pywFM project](https://github.com/jfloff/pywFM) which provides a Python wrapper for the [official libFM C++ implementation](http://www.libfm.org/). Maciej Kula also maintains a lighfm that implements an efficient and well documented variant in Cython and Python.

### References:

[1] Wsabie: Scaling Up To Large Vocabulary Image Annotation
Jason Weston, Samy Bengio, Nicolas Usunier, 2011
https://research.google.com/pubs/pub37180.html

[2] Factorization Machines, Steffen Rendle, 2010
https://www.ismll.uni-hildesheim.de/pub/pdfs/Rendle2010FM.pdf

[3] Factorization Machines with libFM, Steffen Rendle, 2012
in ACM Trans. Intell. Syst. Technol., 3(3), May.
http://doi.acm.org/10.1145/2168752.2168771

