<a href="https://colab.research.google.com/github/hungtangwei/Recommender_System_MF_GMF_MLP_Neumf_ALS/blob/master/Recommender_System_MF_GMF_MLP_Neumf_ALS.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Talbe of contents

* [Task 1:  Recommender System Challenge](#Task1)
    * [Introduction](#introduction1)
    * [Import libraries](#libraries1)
    * [Read the data set](#readdata1)
    * [Alternating Least Squares Model](#als)
    * [Matrix factorization Model (With Bias)](#mf)
    * [Neural matrix factorization Model](#neumf)
    * [Compare model and result conclusion](#conclusion1)


## Task 1: Recommender System Challenge 
<a id="Task1"></a>

### Introduction
<a id="introduction1"></a>
In the task 1, we will build a recommender system to recommend a list of items to each user. For this task, the dataset we will use is collected from an online social network platform. This dataset records the information that a set of interactions between users and items. If a user engages with an item, then there will be a record in the dataset.

To fininsh this task, we will use three models which are alternating least squares model, matrix factorization mode with bias, and neural matrix factorization model to build the recommender system.

### Import libraries
<a id="libraries1"></a>
In this part, we will import some libraries for the following task.

In [None]:
#!pip install implicit
import implicit
from sklearn.preprocessing import MinMaxScaler
from sklearn import metrics
import scipy.sparse as sparse
import pandas as pd
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F

### Read the data set
<a id="readdata1"></a>
In this part, I will read the train, val, and test data set.

The data sets I will use are training data which contains a set of interactions between users and items, test data which contains a list with 100 candidates, and validation data which is similar to the test data but with rating.

However, the train data set only contains rating of 1. Therefore, I combine the validation data with original train data set to the new train data set.

In [None]:
# read the train, validation, and test data set
df1= pd.read_csv('train_data.csv')
val= pd.read_csv('validation_data.csv')
test = pd.read_csv("test_data.csv")
train=pd.concat([df1,val])
train=train.sort_values(by=['user_id'],ascending=[True])

#drop the duplicate data
train = train.drop_duplicates()
val = val.drop_duplicates()
test = test.drop_duplicates()

In [None]:
# cacluate the number of users and items
num_users = len(train.user_id.unique())
num_items = len(train.item_id.unique())
print(num_users, num_items) 

2239 2174


### Alternating Least Squares model
<a id="als"></a>

The first model I will use is Alternating Least Squares model which is a form of matrix factorization that reduces user-item matrix to a much smaller amount of dimension called latent or hidden features.

In this ALS Recommendation Model, I will use the train data set and test data set.

In [None]:
#convert to sparse matrix
sparse_content_person = sparse.csr_matrix((train['rating'].astype(float), (train['item_id'], train['user_id'])))
sparse_person_content = sparse.csr_matrix((train['rating'].astype(float), (train['user_id'], train['item_id'])))

In [None]:
#fit the model
np.random.seed(25)
alpha = 15
data = (sparse_content_person * alpha).astype('double')
model_ALS = implicit.als.AlternatingLeastSquares(factors=7, regularization=0.1, iterations=100,use_gpu=False)
model_ALS.fit(data)



HBox(children=(FloatProgress(value=0.0), HTML(value='')))




In [None]:
#set the test user list
test_user_list=list(set(test.user_id.tolist()))
user_item_dict={}
for i in test_user_list:
    fliter = (test["user_id"] == i)
    user_item_dict[i]=test[fliter].item_id.tolist()

In [None]:
final_ALS=[]
#get the recommend for test user
for i in test_user_list:
    user_id=i
    recommend_ALS=model_ALS.recommend(userid=i,user_items=sparse_person_content,N=1000)
    count_ALS=0
    for j in recommend_ALS:
        item_id_ALS=j[0]
        if item_id_ALS in user_item_dict[i] and count_ALS<10:
            count_ALS+=1
            final_ALS.append(item_id_ALS)
        elif count_ALS > 10:
            continue

In [None]:
#output the csv file
final_id=test_user_list*10
final_id.sort()
df_final_ALS=pd.DataFrame({'user_id':final_id,'item_id':final_ALS})
df_final_ALS.to_csv('29375932.csv',index=False)

### Matrix factorization model (With Bias)
<a id="mf"></a>

The second model I use is matrix factorization model which is an algorithms work by decomposing the user-item interaction matrix into the product of two lower dimensionality rectangular matrices. Besides, due to variation in rating among different users, I add item bias and user bias to this matrix factorization model.

The dataset I used in this model is train, val, and test datasets.

In [None]:
# define a class of matrix factorization model 
class MF_bias(nn.Module):
    def __init__(self, num_users, num_items, emb_size=100):
        super(MF_bias, self).__init__()
        self.embedding_user = nn.Embedding(num_users, emb_size)
        self.user_bias = nn.Embedding(num_users, 1)
        self.embedding_item= nn.Embedding(num_items, emb_size)
        self.item_bias = nn.Embedding(num_items, 1)
        self.embedding_user.weight.data.uniform_(0,0.05)
        self.embedding_item.weight.data.uniform_(0,0.05)
        self.user_bias.weight.data.uniform_(-0.01,0.01)
        self.item_bias.weight.data.uniform_(-0.01,0.01)
        
    def forward(self, u, v):
        U = self.embedding_user(u)
        V = self.embedding_item(v)
        b_u = self.user_bias(u).squeeze()
        b_v = self.item_bias(v).squeeze()
        rating=(U*V).sum(1) +  b_u  + b_v
        return rating

In [None]:
# train the model
def train_epocs(model, epochs=30, lr=0.01, wd=0.0, unsqueeze=False):
    optimizer = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=wd)
    model.train()
    for i in range(epochs):
        users = torch.LongTensor(train.user_id.values)#.cuda()
        items = torch.LongTensor(train.item_id.values)#.cuda()
        ratings = torch.FloatTensor(train.rating.values)#.cuda()
        if unsqueeze:
            ratings = ratings.unsqueeze(1)
        y_hat = model(users, items)
        loss = F.mse_loss(y_hat, ratings)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        print(loss.item()) 
    test_loss(model, unsqueeze)

In [None]:
def test_loss(model, unsqueeze=False):
    model.eval()
    users = torch.LongTensor(val.user_id.values)#.cuda()
    items = torch.LongTensor(val.item_id.values)#.cuda()
    ratings = torch.FloatTensor(val.rating.values)#.cuda()
    if unsqueeze:
        ratings = ratings.unsqueeze(1)
    y_hat = model(users, items)
    loss = F.mse_loss(y_hat, ratings)
    print("test loss %.3f " % loss.item())

In [None]:
# build the moedle
model_mf = MF_bias(num_users, num_items, emb_size=100)#.cuda()

In [None]:
# train the modle
#train_epocs(model_mf, epochs=1700, lr=0.0001, wd=1e-5) #if use cuda, the epochs could use 1700
train_epocs(model_mf, epochs=20, lr=0.0001, wd=1e-5)

0.11249944567680359
0.11241340637207031
0.11232789605855942
0.11224278062582016
0.11215800791978836
0.11207377910614014
0.11199000477790833
0.11190664768218994
0.11182374507188797
0.1117413341999054
0.1116594672203064
0.11157800257205963
0.11149706691503525
0.11141666024923325
0.11133672297000885
0.11125726997852325
0.11117841303348541
0.1110999658703804
0.11102209240198135
0.1109447181224823
test loss 0.014 


In [None]:
#predit the rating and output the result to csv file
model_mf.eval()
with torch.no_grad():
    test_users = torch.LongTensor(test.user_id.values)#.cuda()
    test_items = torch.LongTensor(test.item_id.values)#.cuda()
    y_hat_mf = model_mf(test_users, test_items)
    rating_mf = [element.item() for element in y_hat_mf.flatten()]
    df_mf=pd.DataFrame({'user_id':test.user_id,'item_id':test.item_id,'rating':rating_mf})
    df_mf_sort=df_mf.sort_values(by=['user_id','rating'],ascending=[True,False])
    final_mf=df_mf_sort.groupby('user_id').head(10)
    final_mf=final_mf.drop(columns=['rating'])
    #final_mf.to_csv('final_mf.csv',index=False)

### Neural matrix factorization model
<a id="neumf"></a>

The final model is neural matrix factorization model which is the fusion of generalized matrix factorization and multi-layer perceptronto reinforce each other to better model the complex user-iterm interactions. 

In addition, I will pretrain the generalized matrix factorization model(GMF) and multi-layer perceptron model(MLP) and then load the wieght from GMF model and MLP model for initializing NerMF model.

The dataset I used in this model is train, val, and test datasets.

#### Generalized Matrix Factorization model (Pre-train model)

In this model, I will implement a generalized version of MF under neural collaborative filtering that uses the sigmoid function to output. Inaddition, I add the user biases and item biases to this model. Finally, I will save the weight of this model.

In [None]:
import torch
# define the class of GMF model
class GMF(torch.nn.Module):
    def __init__(self, num_users,num_items,latent_dim=100):
        super(GMF, self).__init__()
        self.embedding_user = torch.nn.Embedding(num_embeddings=num_users, embedding_dim=latent_dim)
        self.embedding_item = torch.nn.Embedding(num_embeddings=num_items, embedding_dim=latent_dim)
        # add the user and item bias
        self.user_bias = nn.Embedding(num_users, 1)
        self.item_bias = nn.Embedding(num_items, 1)
        #uniform the embedding of user and item 
        self.embedding_user.weight.data.uniform_(0,0.05)
        self.embedding_item.weight.data.uniform_(0,0.05)
        self.user_bias.weight.data.uniform_(-0.01,0.01)
        self.item_bias.weight.data.uniform_(-0.01,0.01)
        self.affine_output = torch.nn.Linear(in_features=latent_dim, out_features=1)
        # set sigmoid function
        self.logistic = torch.nn.Sigmoid()

    def forward(self, user_indices, item_indices):
        user_embedding = self.embedding_user(user_indices)
        item_embedding = self.embedding_item(item_indices)
        item_bias_mf = self.item_bias(item_indices)
        user_bias_mf = self.user_bias(user_indices)
        element_product = torch.mul(user_embedding, item_embedding)
        element_product = element_product + item_bias_mf+user_bias_mf
        logits = self.affine_output(element_product)
        rating = self.logistic(logits)
        return rating

In [None]:
# define the function of training model
def train_epocs_gmf(model, epochs=10, lr=1e-3, wd=0.0, unsqueeze=False):
    optimizer = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=wd)
    model.train()
    for i in range(epochs):
        users = torch.LongTensor(train.user_id.values)#.cuda()
        items = torch.LongTensor(train.item_id.values)#.cuda()
        ratings = torch.FloatTensor(train.rating.values)#.cuda()
        if unsqueeze:
            ratings = ratings.unsqueeze(1)
        y_hat = model(users, items)
        loss = F.mse_loss(y_hat, ratings)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        print(loss.item()) 
    test_loss(model, unsqueeze)
    print('save')
    # save the weight of model
    torch.save(model.state_dict(), 'gmf_model.pt')

In [None]:
# build the gmf model
gmf_model = GMF(num_users, num_items)#.cuda()

In [None]:
# train the gmf model
#train_epocs_gmf(gmf_model, epochs=1500, lr=0.0001, wd=1e-5, unsqueeze=True) #if use cuda, the epochs could use 1500
train_epocs_gmf(gmf_model, epochs=20, lr=0.0001, wd=1e-5, unsqueeze=True)


0.2544608414173126
0.2544329762458801
0.2544041872024536
0.25437474250793457
0.254344642162323
0.2543134093284607
0.2542816698551178
0.25424909591674805
0.25421568751335144
0.254181444644928
0.2541462481021881
0.25411009788513184
0.2540731430053711
0.2540351450443268
0.25399625301361084
0.2539563477039337
0.2539154887199402
0.25387343764305115
0.2538304030895233
0.2537863552570343
test loss 0.255 
save


#### Multi-Layer Perceptron Model (Pre-train model)

In this model, we add hidden layers on the concatenated vector, using a standard multi-layer perceptron to learn the interaction between user and item latent features. Besides, I also add the user biases and item biases to this model. Finally, I will save the weight of this model.

In [None]:
# define the class of MLP model
class MLP(torch.nn.Module):
    def __init__(self, num_users,num_items,latent_dim=32,layers=[64,32,16,8]):
        super(MLP, self).__init__()
        self.embedding_user = torch.nn.Embedding(num_embeddings=num_users, embedding_dim=latent_dim)
        self.embedding_item = torch.nn.Embedding(num_embeddings=num_items, embedding_dim=latent_dim)
        self.user_bias = nn.Embedding(num_users, 1)
        self.item_bias = nn.Embedding(num_items, 1)
        self.embedding_user.weight.data.uniform_(0,0.05)
        self.embedding_item.weight.data.uniform_(0,0.05)
        self.user_bias.weight.data.uniform_(-0.01,0.01)
        self.item_bias.weight.data.uniform_(-0.01,0.01)

        self.fc_layers = torch.nn.ModuleList()
        for idx, (in_size, out_size) in enumerate(zip(layers[:-1], layers[1:])):
            self.fc_layers.append(torch.nn.Linear(in_size, out_size))
        self.affine_output = torch.nn.Linear(in_features=layers[-1], out_features=1)
        self.logistic = torch.nn.Sigmoid()

    def forward(self, user_indices, item_indices):
        user_embedding = self.embedding_user(user_indices)
        item_embedding = self.embedding_item(item_indices)
        user_embedding = F.dropout(user_embedding, 0.1)
        item_embedding = F.dropout(item_embedding, 0.1)
        item_bias_mlp = self.item_bias(item_indices)
        user_bias_mlp = self.user_bias(user_indices)
        vector = torch.cat([user_embedding, item_embedding + item_bias_mlp +  user_bias_mlp], dim=-1)
        for idx, _ in enumerate(range(len(self.fc_layers))):
            vector = self.fc_layers[idx](vector)
            vector = torch.nn.ReLU()(vector)
            vector = F.dropout(vector, 0.1)

        logits = self.affine_output(vector)
        rating = self.logistic(logits)
        return rating

In [None]:
# define the train function
def train_epocs_mlp(model, epochs=10, lr=0.01, wd=0.0, unsqueeze=False):
    optimizer = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=wd)
    model.train()
    for i in range(epochs):
        users = torch.LongTensor(train.user_id.values)#.cuda()
        items = torch.LongTensor(train.item_id.values)#.cuda()
        ratings = torch.FloatTensor(train.rating.values)#.cuda()
        if unsqueeze:
            ratings = ratings.unsqueeze(1)
        y_hat = model(users, items)
        loss = F.mse_loss(y_hat, ratings)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        print(loss.item()) 
    test_loss(model, unsqueeze)
    print('save')
    # save the model
    torch.save(model.state_dict(), 'mlp_model.pt')

In [None]:
# build the mlp model
mlp_model = MLP(num_users, num_items)#.cuda()

In [None]:
# train the model
#train_epocs_mlp(mlp_model, epochs=1000, lr=0.001, wd=1e-6, unsqueeze=True) #if use cuda the epochs could use 1000
train_epocs_mlp(mlp_model, epochs=20, lr=0.001, wd=1e-6, unsqueeze=True)

0.27081772685050964
0.2700363099575043
0.2692871391773224
0.2685478627681732
0.2678098976612091
0.26705998182296753
0.26635944843292236
0.2656811773777008
0.26500770449638367
0.2643532454967499
0.26372426748275757
0.26307597756385803
0.2623986601829529
0.2617660164833069
0.26110222935676575
0.2604369521141052
0.25979703664779663
0.2591019570827484
0.25841549038887024
0.2577289938926697
test loss 0.259 
save


#### Neural matrix factorization Model

In [None]:
class NeuMF(torch.nn.Module):
    def __init__(self, num_users, num_items, latent_dim_mf=100, latent_dim_mlp=32, layers=[64,32,16,8]):
        super(NeuMF, self).__init__()

        # Part of GMF
        self.embedding_user_mf = torch.nn.Embedding(num_embeddings=num_users, embedding_dim=latent_dim_mf)
        self.embedding_item_mf = torch.nn.Embedding(num_embeddings=num_items, embedding_dim=latent_dim_mf)
        self.embedding_user_mf_bias = nn.Embedding(num_users, 1)
        self.embedding_item_mf_bias = nn.Embedding(num_items, 1)
        self.embedding_user_mf_bias.weight.data.uniform_(-0.01,0.01)
        self.embedding_item_mf_bias.weight.data.uniform_(-0.01,0.01)

        # Part of MLP
        self.embedding_user_mlp = torch.nn.Embedding(num_embeddings=num_users, embedding_dim=latent_dim_mlp)
        self.embedding_item_mlp = torch.nn.Embedding(num_embeddings=num_items, embedding_dim=latent_dim_mlp)
        self.embedding_user_mlp_bias = nn.Embedding(num_users, 1)
        self.embedding_item_mlp_bias = nn.Embedding(num_items, 1)
        self.embedding_user_mlp_bias.weight.data.uniform_(-0.01,0.01)
        self.embedding_item_mlp_bias.weight.data.uniform_(-0.01,0.01)
        self.fc_layers = torch.nn.ModuleList()
        for idx, (in_size, out_size) in enumerate(zip(layers[:-1], layers[1:])):
            self.fc_layers.append(torch.nn.Linear(in_size, out_size))

        self.affine_output = torch.nn.Linear(in_features=layers[-1] + latent_dim_mf, out_features=1)
        self.logistic = torch.nn.Sigmoid()

    def forward(self, user_indices, item_indices):

        # Part of GMF
        user_embedding_mf = self.embedding_user_mf(user_indices)
        item_embedding_mf = self.embedding_item_mf(item_indices)
        user_embedding_mf = F.dropout(user_embedding_mf, 0.1)
        item_embedding_mf = F.dropout(item_embedding_mf, 0.1)
        item_bias_mf = self.embedding_item_mf_bias(item_indices)
        user_bias_mf = self.embedding_user_mf_bias(user_indices)
        mf_vector =torch.mul(user_embedding_mf, item_embedding_mf)
        mf_vector = mf_vector + item_bias_mf+user_bias_mf
        
        # Part of MLP
        user_embedding_mlp = self.embedding_user_mlp(user_indices)
        item_embedding_mlp = self.embedding_item_mlp(item_indices)
        user_embedding_mlp = F.dropout(user_embedding_mlp, 0.1)
        item_embedding_mlp = F.dropout(item_embedding_mlp, 0.1)
        item_bias_mlp = self.embedding_item_mlp_bias(item_indices)
        user_bias_mlp = self.embedding_user_mlp_bias(user_indices)
        mlp_vector = torch.cat([user_embedding_mlp, item_embedding_mlp + item_bias_mlp +  user_bias_mlp], dim=-1)
        for idx, _ in enumerate(range(len(self.fc_layers))):
            mlp_vector = self.fc_layers[idx](mlp_vector)
            mlp_vector = torch.nn.ReLU()(mlp_vector)
            mlp_vector = F.dropout(mlp_vector, 0.1)

        # Fusion of GMF and MLP
        vector = torch.cat([F.dropout(mlp_vector,0.1), F.dropout(mf_vector,0.1)], dim=-1)
        logits = self.affine_output(vector)
        rating = self.logistic(logits)
        return rating

    # load the wieght from GMF model and MLP model
    def load_pretrain_weights(self):
        
        # Part of GMF
        gmf_model = GMF(num_users,num_items,latent_dim=100)#.cuda()
        gmf_model.load_state_dict(torch.load('gmf_model.pt'))
        self.embedding_user_mf.weight.data = gmf_model.embedding_user.weight.data
        self.embedding_item_mf.weight.data = gmf_model.embedding_item.weight.data

        # Part of MLP
        mlp_model = MLP(num_users,num_items,latent_dim=32,layers=[64,32,16,8])#.cuda()
        mlp_model.load_state_dict(torch.load('mlp_model.pt'))
        self.embedding_user_mlp.weight.data = mlp_model.embedding_user.weight.data
        self.embedding_item_mlp.weight.data = mlp_model.embedding_item.weight.data        
        for idx in range(len(self.fc_layers)):
            self.fc_layers[idx].weight.data = mlp_model.fc_layers[idx].weight.data
        
        # Concatenate weights of the two models.
        self.affine_output.weight.data = 0.5 * torch.cat([mlp_model.affine_output.weight.data, gmf_model.affine_output.weight.data], dim=-1)
        self.affine_output.bias.data = 0.5 * (mlp_model.affine_output.bias.data + gmf_model.affine_output.bias.data)


In [None]:
# define the train function
def train_epocs_NeuMF(model, epochs=10, lr=0.01, wd=0.0, unsqueeze=False):
    optimizer = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=wd)
    model.train()
    for i in range(epochs):
        users = torch.LongTensor(train.user_id.values)#.cuda()
        items = torch.LongTensor(train.item_id.values)#.cuda()
        ratings = torch.FloatTensor(train.rating.values)#.cuda()
        if unsqueeze:
            ratings = ratings.unsqueeze(1)
        y_hat = model(users, items)
        loss = F.mse_loss(y_hat, ratings)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        print(loss.item()) 
    test_loss(model, unsqueeze)
    print('save')
    torch.save(NeuMF_model.state_dict(), 'NeuMF_model.pt')

In [None]:
# build the NeuMF model
NeuMF_model = NeuMF(num_users, num_items,layers=[64,32,16,8])#.cuda()

In [None]:
# load the pre-training models of GMF and MLP
NeuMF_model.load_pretrain_weights()

In [None]:
# train the model
#train_epocs_NeuMF(NeuMF_model, epochs=1500, lr=0.0001, wd=1e-6, unsqueeze=True) #if use cuda, the epochs could use 1500
train_epocs_NeuMF(NeuMF_model, epochs=20, lr=0.0001, wd=1e-6, unsqueeze=True)

0.2685481607913971
0.26849186420440674
0.26842349767684937
0.26835474371910095
0.2683005630970001
0.2682367265224457
0.2681454122066498
0.26808658242225647
0.2680085599422455
0.2679571211338043
0.26787233352661133
0.26780810952186584
0.26772287487983704
0.267640084028244
0.26756560802459717
0.26749274134635925
0.26740965247154236
0.2673214375972748
0.2672480344772339
0.26716357469558716
test loss 0.272 
save


In [None]:
# get the predicted rating and save the ouput to csv file
NeuMF_model.eval()
with torch.no_grad():
    test_users = torch.LongTensor(test.user_id.values)#.cuda()
    test_items = torch.LongTensor(test.item_id.values)#.cuda()
    y_hat_NeuMF = NeuMF_model(test_users, test_items)
    rating_NeuMF = [element.item() for element in y_hat_NeuMF.flatten()]
    df_NeuMF=pd.DataFrame({'user_id':test.user_id,'item_id':test.item_id,'rating':rating_NeuMF})
    df_NeuMF_sort=df_NeuMF.sort_values(by=['user_id','rating'],ascending=[True,False])
    final_NeuMF=df_NeuMF_sort.groupby('user_id').head(10)
    final_NeuMF=final_NeuMF.drop(columns=['rating'])
    #final_NeuMF.to_csv('final_NeuMF.csv',index=False)

### Compare model and result conclusion
<a id="conclusion1"></a>

The final  Normalized Discounted Cumulative Gain (NDGC) for these thress model are:
* Alternating Least Squares model: 0.21
* Matrix factorization model (With Bias): 0.15
* Neural matrix factorization model:0.14

According to NDGC, we could notice that the Alternating Least Squares model has highest Normalized Discountd Coumulatice Gain. This means the ALS model is the most suitable model in this case. The reason may because in this case, the rating are only 0 and 1, and it has small data size. Therefore, if we use complex model to build the model, the complex model will not perform well. Finally, this is the reason I choice the ALS model as my model to submit to Kaggle.