# Neural Collaborative Filtering (NCF)

This notebook serves as an implement to Neural Collaborative Filtering (NCF), which is an innovative algorithm based on deep neural networks to tackle the key problem in recommendation — collaborative filtering — on the basis of implicit feedback.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import sys
sys.path.append("../../")
import time
import pandas as pd
import tensorflow as tf

from reco_utils.recommender.ncf.ncf_singlenode import NCF
from reco_utils.recommender.ncf.dataset import Dataset as NCFDataset
from reco_utils.dataset import movielens
from reco_utils.common.notebook_utils import is_jupyter
from reco_utils.dataset.python_splitters import python_chrono_split
from reco_utils.evaluation.python_evaluation import (rmse, mae, rsquared, exp_var, map_at_k, ndcg_at_k, precision_at_k, 
                                                     recall_at_k, get_top_k_items)

print("System version: {}".format(sys.version))
print("Pandas version: {}".format(pd.__version__))
print("Tensorflow version: {}".format(tf.__version__))

System version: 3.6.8 |Anaconda, Inc.| (default, Dec 29 2018, 19:04:46) 
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]
Pandas version: 0.25.3
Tensorflow version: 1.12.0


## Data Processing

### Load and split data
To evaluate the performance of item recommendation, we adopted the leave-one-out evaluation.

For each user, we held out his/her latest interaction as the test set and utilized the remaining data for training. 

In [42]:
import numpy as np

In [4]:
path = '/Users/guoyixin/Desktop/NEU/7374/hetrec2011-lastfm-2k/'

In [69]:
ua_time = pd.read_table(path+'user_taggedartists-timestamps.dat',sep = '\t',header=0,engine='python')

In [70]:
ua_time.head()

Unnamed: 0,userID,artistID,tagID,timestamp
0,2,52,13,1238536800000
1,2,52,15,1238536800000
2,2,52,18,1238536800000
3,2,52,21,1238536800000
4,2,52,41,1238536800000


In [71]:
ua_count = ua_time.groupby(['userID','artistID']).agg({'timestamp':'count'}).reset_index()
ua_count.columns = ['userID','artistID','rating']
ua_count

Unnamed: 0,userID,artistID,rating
0,2,52,5
1,2,63,4
2,2,73,8
3,2,94,8
4,2,96,2
...,...,...,...
71059,2100,3855,1
71060,2100,6658,3
71061,2100,8322,4
71062,2100,13978,1


In [72]:
ua_rating_time = pd.merge(ua_count, ua_time, on ='userID')

In [73]:
ua_rating_time = ua_rating_time.drop(columns=['artistID_y', 'tagID'])

In [74]:
ua_rating_time = ua_rating_time.drop_duplicates()

In [75]:
ua_rating_time.columns = ['userID','itemID','rating','timestamp']

In [77]:
ua_rating_time = ua_rating_time.loc[ua_rating_time['userID']<100]
ua_rating_time

Unnamed: 0,userID,itemID,rating,timestamp
0,2,52,5,1238536800000
27,2,52,5,1241128800000
45,2,63,4,1238536800000
72,2,63,4,1241128800000
90,2,73,8,1238536800000
...,...,...,...,...
1931344,99,2605,1,1285884000000
1931349,99,2605,1,1241128800000
1931357,99,2605,1,1254348000000
1931358,99,2605,1,1249077600000


In [100]:
rating = lambda x: ((x-np.min(x))*5/(np.max(x)-np.min(x)))+1

In [101]:
ua_rating_time['rating'] = ua_rating_time[['rating']].apply(rating)

In [110]:
ua_rating_time.describe()

Unnamed: 0,userID,itemID,rating,timestamp
count,72414.0,72414.0,72414.0,72414.0
mean,54.633041,3746.416259,1.331838,1253136000000.0
std,26.424758,4422.275449,0.658253,161375700000.0
min,2.0,1.0,1.0,-428720400000.0
25%,39.0,863.0,1.0,1235862000000.0
50%,49.0,1797.0,1.131579,1277935000000.0
75%,90.0,5074.0,1.263158,1304596000000.0
max,99.0,18707.0,6.0,1304686000000.0


# Split data
We use python_chrono_split to achieve this. And since it is too time-consuming to rank all items for every user during evaluation, we followed the common strategy that randomly samples 100 items that are not interacted by the user, ranking the test item among the 100 items. Our test samples will be constructed by NCFDataset.

In [103]:
# top k items to recommend
TOP_K = 10

# Model parameters
EPOCHS = 50
BATCH_SIZE = 256

SEED = 42

In [104]:
train, test = python_chrono_split(ua_rating_time, 0.75)

In [105]:
data = NCFDataset(train=train, test=test, seed=SEED)

# Train the NCF model on the training data, and get the top-k recommendations for our testing data

NCF accepts implicit feedback and generates prospensity of items to be recommended to users in the scale of 0 to 1. A recommended item list can then be generated based on the scores. Note that this quickstart notebook is using a smaller number of epochs to reduce time for training. As a consequence, the model performance will be slighlty deteriorated. 

### Pre-training

To get better performance of NeuMF, we can adopt pre-training strategy. We first train GMF and MLP with random initializations until convergence. Then use their model parameters as the initialization for the corresponding parts of NeuMF’s parameters.  Please pay attention to the output layer, where we concatenate weights of the two models with

$$h ^ { N C F } \leftarrow \left[ \begin{array} { c } { \alpha h ^ { G M F } } \\ { ( 1 - \alpha ) h ^ { M L P } } \end{array} \right]$$

where $h^{GMF}$ and $h^{MLP}$ denote the $h$ vector of the pretrained GMF and MLP model, respectively; and $\alpha$ is a
hyper-parameter determining the trade-off between the two pre-trained models. We set $\alpha$ = 0.5.

In [106]:
model = NCF (
    n_users=data.n_users, 
    n_items=data.n_items,
    model_type="NeuMF",
    n_factors=4,
    layer_sizes=[16,8,4],
    n_epochs=EPOCHS,
    batch_size=BATCH_SIZE,
    learning_rate=1e-3,
    verbose=10,
    seed=SEED
)

In [96]:
train

Unnamed: 0,userID,itemID,rating,timestamp
0,2,52,0.526316,1238536800000
180,2,96,0.131579,1238536800000
270,2,3894,0.263158,1238536800000
315,2,6160,0.131579,1238536800000
360,2,6177,0.526316,1238536800000
...,...,...,...,...
1930935,99,229,0.000000,1280613600000
1930974,99,419,0.000000,1280613600000
1931325,99,2601,0.000000,1280613600000
1930857,99,190,0.000000,1280613600000


In [107]:
start_time = time.time()

model.fit(data)

train_time = time.time() - start_time

print("Took {} seconds for training.".format(train_time))

Took 315.5201280117035 seconds for training.


### Prediction
Now that our model is fitted, we can call predict to get some predictions. predict returns an internal object Prediction which can be easily converted back to a dataframe:

In [108]:
start_time = time.time()

users, items, preds = [], [], []
item = list(train.itemID.unique())
for user in train.userID.unique():
    user = [user] * len(item) 
    users.extend(user)
    items.extend(item)
    preds.extend(list(model.predict(user, item, is_list=True)))

all_predictions = pd.DataFrame(data={"userID": users, "itemID":items, "prediction":preds})

merged = pd.merge(train, all_predictions, on=["userID", "itemID"], how="outer")
all_predictions = merged[merged.rating.isnull()].drop('rating', axis=1)

test_time = time.time() - start_time
print("Took {} seconds for prediction.".format(test_time))

Took 0.694303035736084 seconds for prediction.


# Evaluate 

In [109]:
eval_map = map_at_k(test, all_predictions, col_prediction='prediction', k=TOP_K)
eval_ndcg = ndcg_at_k(test, all_predictions, col_prediction='prediction', k=TOP_K)
eval_precision = precision_at_k(test, all_predictions, col_prediction='prediction', k=TOP_K)
eval_recall = recall_at_k(test, all_predictions, col_prediction='prediction', k=TOP_K)

print("MAP:\t%f" % eval_map,
      "NDCG:\t%f" % eval_ndcg,
      "Precision@K:\t%f" % eval_precision,
      "Recall@K:\t%f" % eval_recall, sep='\n')

MAP:	0.000101
NDCG:	0.000789
Precision@K:	0.001136
Recall@K:	0.000812


### Conclusions and Reconsideration

We believe that the NCF model can work well but the reason for our low precision result can be out of the fact that users are only focusing on limited number of artist, which means that particular users just listen to corresponding particular artist. That's why the data is not good to train for recommendation system. But we still learn a lot!