# Explicit Feedback Neural Recommender Systems

Goals:
- Understand recommendation system 
- Build different models architectures using Tensorflow
- Retrieve Embeddings and visualize them
- Add metadata information as input to the model


This notebook is inspired by Oliver Grisel Notebook who used Keras
https://github.com/ogrisel for building the moels. We will be using Basic Tensorflow APIs instead. 

In [None]:
import pandas as pd
import numpy as np
import os
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
import tensorflow as tf
from tensorflow.contrib import layers
from tensorflow.python.estimator.inputs import numpy_io
from tensorflow.contrib.learn import *
%matplotlib inline 

In [None]:
print('Tensorflow Version : {0}'.format(tf.__version__))

### Ratings file

Each line contains a rated movie: 
- a user
- an item
- a rating from 1 to 5 stars

In [None]:
# Base Path for MovieLens dataset
ML_100K_PATH = os.path.join('processed','ml-100k','ml-100k')

In [None]:
df_raw_ratings = pd.read_csv(os.path.join(ML_100K_PATH, 'u.data'), sep='\t',
                      names=["user_id", "item_id", "rating", "timestamp"])
df_raw_ratings.head()

### Item metadata file


In [None]:
m_cols = ['item_id', 'title', 'release_date', 'video_release_date', 'imdb_url']
# Loading only 5 columns
df_items = pd.read_csv(os.path.join(ML_100K_PATH, 'u.item'), sep='|',
                    names=m_cols, usecols=range(5), encoding='latin-1')
df_items.head()

In [None]:
def get_release_year(x):
    splits = str(x).split('-')
    if(len(splits) == 3):
        return int(splits[2])
    else:
        return 1920
    
df_items['release_year'] = df_items['release_date'].map(lambda x : get_release_year(x))

In [None]:
df_items.head()

## Merge Rating with Item Metadata

In [None]:
df_all_ratings = pd.merge(df_items, df_raw_ratings)

In [None]:
df_all_ratings.head()

## Data Preprocessing

To understand well the distribution of the data, the following statistics are computed:
- the number of users
- the number of items
- the rating distribution

In [None]:
# Number of users
max_user_id = df_all_ratings['user_id'].max()
max_user_id

In [None]:
# Number of items
max_item_id = df_all_ratings['item_id'].max()
max_item_id

In [None]:
df_all_ratings.groupby('rating')['rating'].count().plot(kind='bar', rot=0);

In [None]:
# ratings
df_all_ratings['rating'].describe()

### Add Popularity

In [None]:
popularity = df_all_ratings.groupby('item_id').size().reset_index(name='popularity')

Enrich the ratings data with the popularity as an additional metadata.

In [None]:
df_all_ratings = pd.merge(df_all_ratings, popularity)
df_all_ratings.head()

In [None]:
df_all_ratings.nlargest(10, 'popularity')

Later in the analysis we will assume that this popularity does not come from the ratings themselves but from an external metadata, e.g. box office numbers in the month after the release in movie theaters.

### Train Test Validation Split

In [None]:
# Split All ratings into train_val and test
ratings_train_val, ratings_test = train_test_split(df_all_ratings, test_size=0.2, random_state=0)
# Split train_val into training and validation set
ratings_train, ratings_val = train_test_split(ratings_train_val, test_size=0.2, random_state=0)

print('Total rating rows count: {0} '.format(len(df_all_ratings)))
print('Total training rows count: {0} '.format(len(ratings_train_val)))
print('Total validation rows count: {0} '.format(len(ratings_val)))
print('Total test rows count: {0} '.format(len(ratings_test)))


In [None]:
ratings_train.info()

# Explicit feedback: supervised ratings prediction

For each pair of (user, item) try to predict the rating the user would give to the item.

This is the classical setup for building recommender systems from offline data with explicit supervision signal. 

## Predictive ratings  as a regression problem

The following code implements the following architecture:

![alt text](images/rec_archi_1.svg "Title")


### Matrix Factorization

In [None]:
embedding_size = 30 # embedding size
reg_param = 0.01 # regularization parameter lambda
learning_rate = 0.01 # learning rate 


# create tensorflow graph
g = tf.Graph()
with g.as_default():
    # setting up random seed
    tf.set_random_seed(1234)
    
    # placeholders
    users = tf.placeholder(shape=[None], dtype=tf.int64)
    items = tf.placeholder(shape=[None], dtype=tf.int64)
    ratings = tf.placeholder(shape=[None], dtype=tf.float32)
    
    # variables
    with tf.variable_scope("embedding"):
        user_weight = tf.get_variable("user_w"
                                      , shape=[max_user_id + 1, embedding_size]
                                      , dtype=tf.float32
                                      , initializer=layers.xavier_initializer())

        item_weight = tf.get_variable("item_w"
                                       , shape=[max_item_id + 1, embedding_size]
                                       , dtype=tf.float32
                                       , initializer=layers.xavier_initializer())
    # prediction
    with tf.name_scope("inference"):
        user_embedding = tf.nn.embedding_lookup(user_weight, users)
        item_embedding = tf.nn.embedding_lookup(item_weight, items)
        pred = tf.reduce_sum(tf.multiply(user_embedding, item_embedding), 1) 
        
    # loss 
    with tf.name_scope("loss"):
        reg_loss = tf.contrib.layers.apply_regularization(layers.l2_regularizer(scale=reg_param),
                                               weights_list=[user_weight, item_weight])
        loss = tf.nn.l2_loss(pred - ratings) + reg_loss
        train_ops = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss)
        rmse = tf.sqrt(tf.reduce_mean(tf.pow(pred - ratings, 2)))

        

In [None]:

def train_model():
    # Training 
    epochs = 1000 # number of iterations 
    losses_train = []
    losses_val = []



    with tf.Session(graph=g) as sess:
        # initializer
        sess.run(tf.global_variables_initializer())


        train_input_dict = {  users: ratings_train['user_id']
                            , items: ratings_train['item_id']
                            , ratings: ratings_train['rating']}
        val_input_dict =  {  users: ratings_val['user_id']
                            , items: ratings_val['item_id']
                            , ratings: ratings_val['rating']}

        test_input_dict =  {  users: ratings_test['user_id']
                            , items: ratings_test['item_id']
                            , ratings: ratings_test['rating']}

        def check_overfit(validation_loss):
            n = len(validation_loss)
            if n < 5:
                return False
            count = 0 
            for i in range(n-4, n):
                if validation_loss[i] < validation_loss[i-1]:
                    count += 1
                if count >=2:
                    return False
            return True

        for i in range(epochs):
            # run the training operation
            sess.run([train_ops], feed_dict=train_input_dict)

            # show intermediate results 
            if i % 5 == 0:
                loss_train = sess.run(loss, feed_dict=train_input_dict)
                loss_val = sess.run(loss, feed_dict=val_input_dict)
                losses_train.append(loss_train)
                losses_val.append(loss_val)


                # check early stopping 
                if(check_overfit(losses_val)):
                    print('overfit !')
                    break

                print("iteration : {0} train loss: {1:.3f} , valid loss {2:.3f}".format(i,loss_train, loss_val))

        # calculate RMSE on the test dataset
        print('RMSE on test dataset : {0:.4f}'.format(sess.run(rmse, feed_dict=test_input_dict)))

        plt.plot(losses_train, label='train')
        plt.plot(losses_val, label='validation')
        #plt.ylim(0, 50000)
        plt.legend(loc='best')
        plt.title('Loss');

In [None]:
train_model()

### Matrix Factorization with Biases

In [None]:
embedding_size = 30 # embedding size
reg_param = 0.01 # regularization parameter lambda
learning_rate = 0.01 # learning rate 


# create tensorflow graph
g = tf.Graph()
with g.as_default():
    
    tf.set_random_seed(1234)
    
    # placeholders
    users = tf.placeholder(shape=[None], dtype=tf.int64)
    items = tf.placeholder(shape=[None], dtype=tf.int64)
    ratings = tf.placeholder(shape=[None], dtype=tf.float32)
    
    # variables
    with tf.variable_scope("embedding"):
        user_weight = tf.get_variable("user_w"
                                      , shape=[max_user_id + 1, embedding_size]
                                      , dtype=tf.float32
                                      , initializer=layers.xavier_initializer())

        item_weight = tf.get_variable("item_w"
                                       , shape=[max_item_id + 1, embedding_size]
                                       , dtype=tf.float32
                                       , initializer=layers.xavier_initializer())
        
        user_bias = tf.get_variable("user_b"
                                , shape=[max_user_id + 1]
                                , dtype=tf.float32
                                , initializer=tf.zeros_initializer)
        
        item_bias = tf.get_variable("item_b"
                                 , shape=[max_item_id + 1]
                                 , dtype=tf.float32
                                 , initializer=tf.zeros_initializer)
        
    # prediction
    with tf.name_scope("inference"):
        user_embedding = tf.nn.embedding_lookup(user_weight, users)
        item_embedding = tf.nn.embedding_lookup(item_weight, items)
        user_b = tf.nn.embedding_lookup(user_bias, users)
        item_b = tf.nn.embedding_lookup(item_bias, items)
        pred = tf.reduce_sum(tf.multiply(user_embedding, item_embedding), 1) + user_b + item_b
        
    # loss 
    with tf.name_scope("loss"):
        reg_loss = tf.contrib.layers.apply_regularization(layers.l2_regularizer(scale=reg_param),
                                               weights_list=[user_weight, item_weight])
        loss = tf.nn.l2_loss(pred - ratings) + reg_loss
        train_ops = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss)
        rmse = tf.sqrt(tf.reduce_mean(tf.pow(pred - ratings, 2)))

   

In [None]:
train_model()

## A Deep recommender model

We can use deep learning models with multiple layers ( fully connected and dropout ) for the recommendation system.

![alt text](images/rec_archi_2.svg "Title")


To build this model we will need a new kind of layer:

In [None]:
embedding_size = 50
reg_param = 0.01
learning_rate = 0.01
n_users = max_user_id + 1
n_items = max_item_id + 1

g = tf.Graph()
with g.as_default():
    
    tf.set_random_seed(1234)

    users = tf.placeholder(shape=[None,1], dtype=tf.int64, name='input_users')
    items = tf.placeholder(shape=[None,1], dtype=tf.int64, name='input_items')
    ratings = tf.placeholder(shape=[None,1], dtype=tf.float32, name='input_ratings')
    
    l2_loss = tf.constant(0.0)
    
    # embeddding layer
    with tf.variable_scope("embedding"):
        user_weights = tf.get_variable("user_w"
                                      , shape=[n_users, embedding_size]
                                      , dtype=tf.float32
                                      , initializer=layers.xavier_initializer())
        
        item_weights = tf.get_variable("item_w"
                                       , shape=[n_items, embedding_size]
                                       , dtype=tf.float32
                                       , initializer=layers.xavier_initializer())
        
        user_embedding = tf.squeeze(tf.nn.embedding_lookup(user_weights, users),axis=1, name='user_embedding')
        item_embedding = tf.squeeze(tf.nn.embedding_lookup(item_weights, items),axis=1, name='item_embedding')
        
        l2_loss += tf.nn.l2_loss(user_weights)
        l2_loss += tf.nn.l2_loss(item_weights)
        
        
        print(user_embedding)
        print(item_embedding)
        
    
    # combine inputs
    with tf.name_scope('concatenation'):
        input_vecs = tf.concat([user_embedding, item_embedding], axis=1)
        print(input_vecs)
        
    # fc-1
    num_hidden = 64
    with tf.name_scope("fc_1"):
        W_fc_1 = tf.get_variable(
            "W_hidden",
            shape=[2*embedding_size, num_hidden],
            initializer=tf.contrib.layers.xavier_initializer())
        b_fc_1 = tf.Variable(tf.constant(0.1, shape=[num_hidden]), name="b")
        hidden_output = tf.nn.relu(tf.nn.xw_plus_b(input_vecs, W_fc_1, b_fc_1), name='hidden_output')
        l2_loss += tf.nn.l2_loss(W_fc_1)
        print(hidden_output)
        
    # dropout
    with tf.name_scope("dropout"):
        h_drop = tf.nn.dropout(hidden_output, 0.99, name="hidden_output_drop")
        print(h_drop)
    
    # fc-2
    with tf.name_scope("fc_2"):
        W_fc_2 = tf.get_variable(
            "W_output",
            shape=[num_hidden,1],
            initializer=tf.contrib.layers.xavier_initializer())
        b_fc_2 = tf.Variable(tf.constant(0.1, shape=[1]), name="b")
        pred = tf.nn.xw_plus_b(h_drop, W_fc_2, b_fc_2, name='pred')
        l2_loss += tf.nn.l2_loss(W_fc_2)
        print(pred)

    # loss
    with tf.name_scope("loss"):
        loss = tf.nn.l2_loss(pred - ratings) + reg_param * l2_loss
        train_ops = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss)
        rmse = tf.sqrt(tf.reduce_mean(tf.pow(pred - ratings, 2)))

        

In [None]:
def train_model_deep():
    losses_train = []
    losses_val = []
    epochs = 1000

    with tf.Session(graph=g) as sess:
        sess.run(tf.global_variables_initializer())
        train_input_dict = {users: ratings_train['user_id'].values.reshape([-1,1])
            , items: ratings_train['item_id'].values.reshape([-1,1])
            , ratings: ratings_train['rating'].values.reshape([-1,1])}

        val_input_dict = {users: ratings_val['user_id'].values.reshape([-1,1])
            , items: ratings_val['item_id'].values.reshape([-1,1])
            , ratings: ratings_val['rating'].values.reshape([-1,1])}

        test_input_dict = {users: ratings_test['user_id'].values.reshape([-1,1])
            , items: ratings_test['item_id'].values.reshape([-1,1])
            , ratings: ratings_test['rating'].values.reshape([-1,1])}

        def check_overfit(validation_loss):
                n = len(validation_loss)
                if n < 5:
                    return False
                count = 0 
                for i in range(n-4, n):
                    if validation_loss[i] < validation_loss[i-1]:
                        count += 1
                    if count >=3:
                        return False
                return True



        for i in range(epochs):
            sess.run([train_ops], feed_dict=train_input_dict)
            if i % 10 == 0:
                loss_train = sess.run(loss, feed_dict=train_input_dict)
                loss_val = sess.run(loss, feed_dict=val_input_dict)
                losses_train.append(loss_train)
                losses_val.append(loss_val)

                # check early stopping 
                if(check_overfit(losses_val)):
                    print('overfit !')
                    break

                print("iteration : %d train loss: %.3f , valid loss %.3f" % (i,loss_train, loss_val))

         # calculate RMSE on the test dataset
        print('RMSE on test dataset : {0:.4f}'.format(sess.run(rmse, feed_dict=test_input_dict)))

        # user and item embedding
        user_embedding_variable = [v for v in tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES) if v.name.endswith('embedding/user_w:0')][0]
        item_embedding_variable = [v for v in tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES) if v.name.endswith('embedding/item_w:0')][0]
        user_embedding_weights, item_embedding_weights = sess.run([user_embedding_variable,item_embedding_variable])
        
        
        # plot train and validation loss
        plt.plot(losses_train, label='train')
        plt.plot(losses_val, label='validation')
        plt.legend(loc='best')
        plt.title('Loss');
        
        return user_embedding_weights, item_embedding_weights 

In [None]:
user_embedding_weights, item_embedding_weights  = train_model_deep()

### Model Embeddings

In [None]:
print("First item name from metadata:", df_items["title"][1])
print("Embedding vector for the first item:")
print(item_embedding_weights[1])
print("shape:", item_embedding_weights[1].shape)

### Visualizing embeddings using TSNE

- we use scikit learn to visualize items embeddings
- Try different perplexities, and visualize user embeddings as well
- check what is the impact of different perplexity value. Here is a very nice tutorial if you want to know in detail (https://distill.pub/2016/misread-tsne/ )

In [None]:
from sklearn.manifold import TSNE

item_tsne = TSNE(perplexity=50).fit_transform(item_embedding_weights)

In [None]:
import matplotlib.pyplot as plt

plt.figure(figsize=(10, 10))
plt.scatter(item_tsne[:, 0], item_tsne[:, 1]);
plt.xticks(()); plt.yticks(());
plt.show()

## Using item metadata in the model

Using a similar framework as previously, we will build another deep model that can also leverage additional metadata. The resulting system is therefore an **Hybrid Recommender System** that does both **Collaborative Filtering** and **Content-based recommendations**.



![alt text](images/rec_archi_3.svg "Title")

In [None]:
embedding_size = 50
reg_param = 0.01
learning_rate = 0.01
n_users = max_user_id + 1
n_items = max_item_id + 1
meta_size = 2

g = tf.Graph()
with g.as_default():

    tf.set_random_seed(1234)
    
    users = tf.placeholder(shape=[None,1], dtype=tf.int64, name='input_users')
    items = tf.placeholder(shape=[None,1], dtype=tf.int64, name='input_items')
    meta = tf.placeholder(shape=[None,2], dtype=tf.float32, name='input_metadata')
    ratings = tf.placeholder(shape=[None,1], dtype=tf.float32, name='input_ratings')
    
    l2_loss = tf.constant(0.0)
    
    # embeddding layer
    with tf.variable_scope("embedding"):
        user_weights = tf.get_variable("user_w"
                                      , shape=[n_users, embedding_size]
                                      , dtype=tf.float32
                                      , initializer=layers.xavier_initializer())
        
        item_weights = tf.get_variable("item_w"
                                       , shape=[n_items, embedding_size]
                                       , dtype=tf.float32
                                       , initializer=layers.xavier_initializer())
        
        
        
        user_embedding = tf.squeeze(tf.nn.embedding_lookup(user_weights, users),axis=1, name='user_embedding')
        item_embedding = tf.squeeze(tf.nn.embedding_lookup(item_weights, items),axis=1, name='item_embedding')
        
        l2_loss += tf.nn.l2_loss(user_weights)
        l2_loss += tf.nn.l2_loss(item_weights)
        
        
        print(user_embedding)
        print(item_embedding)
        
    
    # combine inputs
    with tf.name_scope('concatenation'):
        input_vecs = tf.concat([user_embedding, item_embedding, meta], axis=1)
        print(input_vecs)
        
    # fc-1
    num_hidden = 64
    with tf.name_scope("fc_1"):
        W_fc_1 = tf.get_variable(
            "W_hidden",
            shape=[2*embedding_size + meta_size, num_hidden],
            initializer=tf.contrib.layers.xavier_initializer())
        b_fc_1 = tf.Variable(tf.constant(0.1, shape=[num_hidden]), name="b")
        hidden_output = tf.nn.relu(tf.nn.xw_plus_b(input_vecs, W_fc_1, b_fc_1), name='hidden_output')
        l2_loss += tf.nn.l2_loss(W_fc_1)
        print(hidden_output)
    
    # dropout
    with tf.name_scope("dropout"):
        h_drop = tf.nn.dropout(hidden_output, 0.99, name="hidden_output_drop")
        print(h_drop)
    
    # fc-2
    with tf.name_scope("fc_2"):
        W_fc_2 = tf.get_variable(
            "W_output",
            shape=[num_hidden,1],
            initializer=tf.contrib.layers.xavier_initializer())
        b_fc_2 = tf.Variable(tf.constant(0.1, shape=[1]), name="b")
        pred = tf.nn.xw_plus_b(h_drop, W_fc_2, b_fc_2, name='pred')
        l2_loss += tf.nn.l2_loss(W_fc_2)
        print(pred)

    # loss
    with tf.name_scope("loss"):
        loss = tf.nn.l2_loss(pred - ratings) + reg_param * l2_loss
        train_ops = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss)
        rmse = tf.sqrt(tf.reduce_mean(tf.pow(pred - ratings, 2)))

In [None]:
from sklearn.preprocessing import QuantileTransformer

meta_columns = ['popularity', 'release_year']

scaler = QuantileTransformer()
item_meta_train = scaler.fit_transform(ratings_train[meta_columns])
item_meta_val = scaler.transform(ratings_val[meta_columns])
item_meta_test = scaler.transform(ratings_test[meta_columns])

In [None]:
def train_model_deep_meta():

    losses_train = []
    losses_val = []
    epochs = 1000

    with tf.Session(graph=g) as sess:
        sess.run(tf.global_variables_initializer())
        train_input_dict = {users: ratings_train['user_id'].values.reshape([-1,1])
            , items: ratings_train['item_id'].values.reshape([-1,1])
            , ratings: ratings_train['rating'].values.reshape([-1,1])
                           ,meta: item_meta_train}

        val_input_dict = {users: ratings_val['user_id'].values.reshape([-1,1])
            , items: ratings_val['item_id'].values.reshape([-1,1])
            , ratings: ratings_val['rating'].values.reshape([-1,1])
                         ,meta : item_meta_val}

        test_input_dict = {users: ratings_test['user_id'].values.reshape([-1,1])
            , items: ratings_test['item_id'].values.reshape([-1,1])
            , ratings: ratings_test['rating'].values.reshape([-1,1])
                          ,meta : item_meta_test}
        def check_overfit(validation_loss):
            n = len(validation_loss)
            if n < 5:
                return False
            count = 0 
            for i in range(n-4, n):
                if validation_loss[i] < validation_loss[i-1]:
                    count += 1
                if count >=3:
                    return False
            return True


        for i in range(epochs):
            sess.run([train_ops], feed_dict=train_input_dict)
            if i % 10 == 0:
                loss_train = sess.run(loss, feed_dict=train_input_dict)
                loss_val = sess.run(loss, feed_dict=val_input_dict)
                losses_train.append(loss_train)
                losses_val.append(loss_val)

                 # check early stopping 
                if(check_overfit(losses_val)):
                    print('overfit !')
                    break
                print("iteration : %d train loss: %.3f , valid loss %.3f" % (i,loss_train, loss_val))
        
        # plot train and validation loss
        plt.plot(losses_train, label='train')
        plt.plot(losses_val, label='validation')
        plt.legend(loc='best')
        plt.title('Loss');
        
         # calculate RMSE on the test dataset
        print('RMSE on test dataset : {0:.4f}'.format(sess.run(rmse, feed_dict=test_input_dict)))
        

In [None]:
train_model_deep_meta()