# Neural Models for Collaborative Filtering

With the popularity of neural network based methods soaring in recent years due to their success in various machine learning tasks, we discuss the efficacy of neural networks for recommendation problems. By recommendation problem, we mean the problem of ranking items for users based on a user's affinity towards an item. Collaborative filtering based recommender systems assume that a user's preferences over items can be predicted from 'similar' users' preferences. 


In collaborative filtering approaches, we embed users and items in a low-dimensional space such that similar users/items are mapped to nearby points in the embedding space. Matrix factorization is a very popular technique for obtaining low-dimensional embedding for recommendation. Matrix factorization can only learn linear mapping to the low-dimensional space, whereas the inductive biases in the user-item interaction can be non-linear. 

Neural networks are effective in learning non-linearities in the data via activation functions. Moreover, deeper networks can potentially capture more intricate features. Motivated by this, we will explore simple neural network apporaches for recommendation tasks. 

We will use the MovieLens data which contains ratings of movies by users. The data which is publically available from [MovieLens Website](https://grouplens.org/datasets/movielens/). We are using 100k data which has 100k ratings. 

In [1]:
import pandas as pd
import numpy as np
import operator
from collections import defaultdict
from scipy.spatial.distance import cosine
from sklearn.metrics.pairwise import cosine_similarity

In [155]:
from keras.layers import Input, Embedding, Dot, Add, Flatten, Lambda, Dense, Concatenate, Dropout
from keras.models import Model
from keras.initializers import RandomNormal
from keras.regularizers import l2
from keras.callbacks import EarlyStopping
from keras import backend as K
from keras.optimizers import SGD, Adam

In [3]:
data_path = './data/'

In [4]:
rating_df = pd.read_csv(data_path + 'ratings.csv', sep=',', header=0)

In [5]:
rating_df.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931


In [6]:
rating_df.shape

(100836, 4)

In [7]:
rating_df.userId.nunique(), rating_df.movieId.nunique()

(610, 9724)

In [8]:
max(rating_df.userId), max(rating_df.movieId)

(610, 193609)

In [9]:
min(rating_df.userId), min(rating_df.movieId)

(1, 1)

### Movies Data

In [10]:
movie_df = pd.read_csv(data_path + 'movies.csv', sep=',', header=0)

In [11]:
movie_df.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


### Train/Test Split

In [12]:
train_df = rating_df.sample(frac=0.7,random_state=200) #random state is a seed value
test_df = rating_df.drop(train_df.index)

In [13]:
train_df.shape, test_df.shape

((70585, 4), (30251, 4))

In [14]:
test_users = test_df.userId.unique()
test_movies = test_df.movieId.unique()

In [15]:
test_users.shape, test_movies.shape

((610,), (6137,))

### Data Preprocessing

We first create a mapping of userids and movieids to integer indices for easier manipulation. 

In [16]:
user_item_mat = train_df[['userId', 'movieId', 'rating']]
user_item_mat.head()

Unnamed: 0,userId,movieId,rating
73648,474,1984,2.0
55731,368,2808,2.0
74592,474,6341,3.5
34198,232,6298,3.0
80322,506,68269,4.0


In [17]:
rating_matrix = user_item_mat.copy().values

user2index = dict()
index2user = dict()
userIndex = 0

item2index = dict()
index2item = dict()
itemIndex = 0

for (i, (user, item, rating)) in enumerate(user_item_mat.values):
    user = int(user)
    item = int(item)

    if user not in user2index:
        user2index[user] = userIndex
        index2user[userIndex] = user
        userIndex += 1

    if item not in item2index:
        item2index[item] = itemIndex
        index2item[itemIndex] = item
        itemIndex += 1

    user = user2index[user]
    item = item2index[item]

    rating_matrix[i] = [user, item, rating]

In [18]:
rating_matrix[0]

array([0., 0., 2.])

In [19]:
rating_matrix.shape

(70585, 3)

In [20]:
user_item_mat.values[0]

array([ 474., 1984.,    2.])

In [21]:
np.random.shuffle(rating_matrix);

user_matrix = rating_matrix[:, 0].reshape(-1, 1).astype(int)
item_matrix = rating_matrix[:, 1].reshape(-1, 1).astype(int)

label_matrix = rating_matrix[:, 2].reshape(-1, 1)

In [23]:
userCount = int(rating_matrix[: , 0].max()) + 1
itemCount = int(rating_matrix[: , 1].max()) + 1

In [24]:
userIndexMatrix = np.arange(userCount, dtype = int).reshape(-1, 1)
itemIndexMatrix = np.arange(itemCount, dtype = int).reshape(-1, 1)

## Neural Matrix Factorization Model

Before we delve into deeper architectures, let's discuss neural formulation of matrix factorization. 

Embedding dimension is chosen to be 50 for no particular reason. 

In [106]:
# hyperparamaters

embeddingDimension = 50
epochs = 50
batch_size = 32
regularizationScale = 0.0
# lr = 0.01

### MF Architecture

Input user and item layers with 1 unit each - the index.  

In [107]:
userInputLayer = Input(shape = (1, ), dtype = "int32")
itemInputLayer = Input(shape = (1, ), dtype = "int32")

User and Item embedding layers. The embeddings are learnable and are initialized randomly. We will later use the learned embeddings as features. Input dimension for user/item embedding layer is equal to the number of users/items (i.e. OHE). 

In [108]:
userEmbeddingLayer = Embedding(input_dim = userCount, output_dim = embeddingDimension, input_length = 1, 
                embeddings_regularizer = l2(regularizationScale), embeddings_initializer = RandomNormal())(userInputLayer)
userEmbeddingLayer = Flatten()(userEmbeddingLayer)

In [109]:
itemEmbeddingLayer = Embedding(input_dim = itemCount, output_dim = embeddingDimension, input_length = 1, 
                embeddings_regularizer = l2(regularizationScale), embeddings_initializer = RandomNormal())(itemInputLayer);
itemEmbeddingLayer = Flatten()(itemEmbeddingLayer)

For a user and item pair, the rating is given by the dot product of the user and the item embedding. Matrix factorization leans embeddings such that the dot-products are as close to the non-zero ratings as possible.  

In [110]:
dotLayer = Dot(axes = -1)([userEmbeddingLayer, itemEmbeddingLayer])

In [111]:
model = Model(inputs = [userInputLayer, itemInputLayer], outputs = dotLayer)

In [112]:
model.summary()

Model: "model_7"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_5 (InputLayer)            (None, 1)            0                                            
__________________________________________________________________________________________________
input_6 (InputLayer)            (None, 1)            0                                            
__________________________________________________________________________________________________
embedding_5 (Embedding)         (None, 1, 50)        30500       input_5[0][0]                    
__________________________________________________________________________________________________
embedding_6 (Embedding)         (None, 1, 50)        425950      input_6[0][0]                    
____________________________________________________________________________________________

Root Mean Squre Error (RMSE) as the loss function.

In [113]:
def getRMSE(labelMatrix, predictionMatrix):
        return K.sqrt(K.mean(K.square(labelMatrix - predictionMatrix)))

In [115]:
model.compile(optimizer = "adam", loss = "mean_squared_error", metrics = [getRMSE])

Train the model

In [116]:
model.fit([user_matrix, item_matrix], label_matrix, epochs = epochs, batch_size = batch_size)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.callbacks.History at 0x1a3c88b6d8>

(Sub) Models for extracting user and item embeddings. 

In [117]:
userEmbeddingOutputModel = Model(inputs = userInputLayer, outputs = userEmbeddingLayer)
userEmbeddingMatrix = userEmbeddingOutputModel.predict(userIndexMatrix)

itemEmbeddingOutputModel = Model(inputs = itemInputLayer, outputs = itemEmbeddingLayer)
itemEmbeddingMatrix = itemEmbeddingOutputModel.predict(itemIndexMatrix)

In [118]:
userEmbeddingMatrix.shape

(610, 50)

In [119]:
itemEmbeddingMatrix.shape

(8519, 50)

### Evaluation

In [120]:
test_df.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
3,1,47,5.0,964983815
4,1,50,5.0,964982931
7,1,110,4.0,964982176
8,1,151,5.0,964984041


In [121]:
usr = user2index[1]
mve = item2index[1]
np.dot(userEmbeddingMatrix[usr], itemEmbeddingMatrix[mve])

4.496409

In [122]:
usr = user2index[1]
mve = item2index[47]
np.dot(userEmbeddingMatrix[usr], itemEmbeddingMatrix[mve])

3.349145

In [123]:
sq_errs = []
for x in test_df.values:
    try:
        usr = user2index[x[0]]
        mve = item2index[x[1]]
        rating = x[2]
        pred = np.dot(userEmbeddingMatrix[usr], itemEmbeddingMatrix[mve])
        error = (rating - pred)**2
        sq_errs.append(error)
    except:
        pass

In [124]:
len(sq_errs)

28867

In [126]:
rmse_test = np.sqrt(np.mean(sq_errs))

In [127]:
rmse_test

1.0823980968750455

## Neural CF

Having built a simple neural network for matrix factorization, we are ready to built more interesting models. We can explore various architectures and play with the corresponding models.  

Similar to the ideas presented in Neural CF paper, we can replace the inner-product based output layer with a multi-layer perceptron. Instead of dot output, the user & the item embeddings are concated, and is passed through a couple of feed forward layers with activations. The output layer is just one neuron for the rating. 

In [267]:
user_item_concat_layer = Concatenate()([userEmbeddingLayer, itemEmbeddingLayer])

We will add dropout in subsequent layers to prevent overfitting which can creep in due to high sparsity. The hyperparameters like dropout probability are not tuned, 0.5 is chosen here. 

In [268]:
fc1 = Dense(50, activation='relu')(user_item_concat_layer)
fc1 = Dropout(0.5)(fc1)

In [269]:
fc2 = Dense(32, activation='relu')(fc1)
fc2 = Dropout(0.5)(fc2)

In [270]:
fc3 = Dense(16, activation='relu')(fc2)
fc3 = Dropout(0.5)(fc3)

In [271]:
out = Dense(1)(fc3)

In [272]:
NCF = Model(inputs = [userInputLayer, itemInputLayer], outputs = out)

In [273]:
NCF.compile(optimizer = "adam", loss = "mean_squared_error", metrics = [getRMSE])

In [274]:
NCF.summary()

Model: "model_19"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_5 (InputLayer)            (None, 1)            0                                            
__________________________________________________________________________________________________
input_6 (InputLayer)            (None, 1)            0                                            
__________________________________________________________________________________________________
embedding_5 (Embedding)         (None, 1, 50)        30500       input_5[0][0]                    
__________________________________________________________________________________________________
embedding_6 (Embedding)         (None, 1, 50)        425950      input_6[0][0]                    
___________________________________________________________________________________________

In [275]:
NCF.fit([user_matrix, item_matrix], label_matrix, epochs = epochs, batch_size = batch_size)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.callbacks.History at 0x1a40343198>

In [276]:
userEmbeddingOutputModel = Model(inputs = userInputLayer, outputs = userEmbeddingLayer)
userEmbeddingMatrix = userEmbeddingOutputModel.predict(userIndexMatrix)

itemEmbeddingOutputModel = Model(inputs = itemInputLayer, outputs = itemEmbeddingLayer)
itemEmbeddingMatrix = itemEmbeddingOutputModel.predict(itemIndexMatrix)

In [277]:
user_matrix = rating_matrix[:, 0].reshape(-1, 1).astype(int)
item_matrix = rating_matrix[:, 1].reshape(-1, 1).astype(int)

label_matrix = rating_matrix[:, 2].reshape(-1, 1)

In [278]:
test_df.head()

Unnamed: 0,userId,movieId,rating
0,1,1,4.0
3,1,47,5.0
4,1,50,5.0
7,1,110,4.0
8,1,151,5.0


In [279]:
test_df = test_df[['userId', 'movieId', 'rating']]
test_matrix = test_df.copy().values

cnt = 0

for (i, (usr, item, rating)) in enumerate(test_df.values):
    user = int(user)
    item = int(item)

    if (user in user2index) and (item in item2index):
        user = user2index[user]
        item = item2index[item]
        test_matrix[i] = [user, item, rating]
    else:
       cnt += 1

In [280]:
cnt

1384

In [281]:
test_matrix[0]

array([305., 638.,   4.])

In [282]:
x = [int(test_matrix[0][0])]
y = [int(test_matrix[0][1])]
NCF.predict([x,y])

array([[3.868872]], dtype=float32)

In [283]:
x = test_df.values[0]
t = [user2index[x[0]]]
s = [item2index[x[1]]]

In [284]:
NCF.predict([t,s]).reshape(1,)[0]

4.3693476

In [285]:
sq_errs = []
for x in test_df.values:
    usr = x[0]
    mve = x[1]
    rating = x[2]
    if (usr in user2index) and (mve in item2index):
        t = [user2index[usr]]
        s = [item2index[mve]]
        pred = NCF.predict([t,s])[0][0]
        error = (rating - pred) ** 2
        sq_errs.append(error) 

In [286]:
np.sqrt(np.mean(sq_errs))

0.8785065956414966

The RMSE score is lower than what we obtained for matrix factorization - 0.87 v/s 1.08 

In the original NCF paper was proposed to handle implicit feedback. The authors converted the explicit user provided ratings to binary outcomes - whether a user has rated the movie or not. This converts the numerical regression problem into a binary classification. The architecture is same as the one described above except the sigmoid output layer instead of the rating. 