## Neural Collaborative Filtering
Neural Collaborative Filtering (NCF) is primarily used for Collaborative Filtering, which suggests items based on similarities in user preferences, without relying on extra features.

NCF is considered to be an advanced version of Matrix Factorization which not only captures linear relationships, but also non-linear relationships.NCF uses Matrix Factorization in combination with a Neural Network for collaborative filtering.

### Architecture
![image](./images/ncf.webp)

Neural Collaborative Filtering (NCF) utilizes a neural network with two branches that eventually converge to produce a single output. The NCF architecture consists of several key components:

### 1. Input and Output

NCF processes one User-Item pair at a time. The inputs are:

- **One-Hot Encoded (OHE) representations** of the User and Item, provided separately.

The output can be:

- **Expected rating** the user will give to the item, if explicit feedback is used during training.
- **Probability** that the user will interact with the item, if implicit feedback is used.

### 2. Embeddings Layer

This layer generates meaningful embeddings for both users and items. There are multiple embedding layers tailored for different branches of the NCF model:

- **No advanced techniques** are applied in this layer.

### 3. Architecture Breakdown

The NCF architecture is divided into two main sections:

- **Generalized Matrix Factorization (GMF)**:
  - This component performs a straightforward multiplication of the user and item embedding matrices derived from the embeddings layer.
  - It captures **linear relationships** and is akin to traditional Matrix Factorization.

- **Multi-Layer Perceptron (MLP)**:
  - This is a standard Deep Neural Network (DNN) that processes the concatenated user and item embeddings through multiple hidden layers with ReLU activation functions.
  - It captures **non-linear patterns** in the data.

### 4. Combining Outputs

The outputs from GMF and MLP are concatenated, and a hidden layer is applied to merge these outputs. This combined layer is known as **NeuMF**. A sigmoid activation function follows to produce the final output.

### 5. Summary

Thus, NCF integrates:

- **Embeddings Layer**
- **GMF**
- **MLP**
- **NeuMF**

**Note:** The user and item embeddings for GMF and MLP are distinct. Therefore, the embeddings used in GMF differ from those used in MLP, resulting in a total of four separate embeddings in the entire architecture. The loss function employed is **Logloss**, similar to Logistic Regression, which is designed for implicit feedback.

## Implementation

#### 1. Imports

In [1]:
!pip install LibRecommender

Collecting LibRecommender
  Downloading LibRecommender-1.5.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m10.7 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: LibRecommender
Successfully installed LibRecommender-1.5.1


In [2]:
import numpy as np
import pandas as pd


from libreco.data import random_split, DatasetPure
from libreco.algorithms import NCF
from libreco.evaluation import evaluate

Instructions for updating:
non-resource variables are not supported in the long term


### 2. Dataset
Movielens data

In [3]:
import random
import tensorflow as tf
from pathlib import Path

def load_ml_1m():
    # download and extract zip file
    tf.keras.utils.get_file(
        "ml-1m.zip",
        "http://files.grouplens.org/datasets/movielens/ml-1m.zip",
        cache_dir=".",
        cache_subdir=".",
        extract=True,
    )
    # read and merge data into same table
    cur_path = Path(".").absolute()
    ratings = pd.read_csv(
        cur_path / "ml-1m" / "ratings.dat",
        sep="::",
        usecols=[0, 1, 2, 3],
        names=["user", "item", "rating", "time"],
    )
    users = pd.read_csv(
        cur_path / "ml-1m" / "users.dat",
        sep="::",
        usecols=[0, 1, 2, 3],
        names=["user", "sex", "age", "occupation"],
    )
    items = pd.read_csv(
        cur_path / "ml-1m" / "movies.dat",
        sep="::",
        usecols=[0, 2],
        names=["item", "genre"],
        encoding="iso-8859-1",
    )
    items[["genre1", "genre2", "genre3"]] = (
        items["genre"].str.split(r"|", expand=True).fillna("missing").iloc[:, :3]
    )
    items.drop("genre", axis=1, inplace=True)
    data = ratings.merge(users, on="user").merge(items, on="item")
    data.rename(columns={"rating": "label"}, inplace=True)
    # random shuffle data
    data = data.sample(frac=1, random_state=42).reset_index(drop=True)
    return data

In [4]:
data = load_ml_1m()

data.shape

Downloading data from http://files.grouplens.org/datasets/movielens/ml-1m.zip


  ratings = pd.read_csv(
  users = pd.read_csv(
  items = pd.read_csv(


(1000209, 10)

In [5]:
data.head()

Unnamed: 0,user,item,label,time,sex,age,occupation,genre1,genre2,genre3
0,5755,184,3,958280246,F,35,2,Drama,missing,missing
1,4585,519,3,964321944,M,35,7,Sci-Fi,Thriller,missing
2,1503,3114,4,974762175,M,25,12,Animation,Children's,Comedy
3,2166,648,4,974614593,M,1,10,Action,Adventure,Mystery
4,3201,2178,5,968626301,M,45,7,Thriller,missing,missing


In [6]:
data.columns

Index(['user', 'item', 'label', 'time', 'sex', 'age', 'occupation', 'genre1',
       'genre2', 'genre3'],
      dtype='object')

In [7]:
data = data[['user', 'item', 'label', 'time']]

In [8]:
data.head()

Unnamed: 0,user,item,label,time
0,5755,184,3,958280246
1,4585,519,3,964321944
2,1503,3114,4,974762175
3,2166,648,4,974614593
4,3201,2178,5,968626301


In [9]:
train_data, eval_data, test_data = random_split(data, multi_ratios=[0.8, 0.1, 0.1])

In [10]:
# Convert the pandas dataframe into a compatible datatype for LibRecommender
train_data, data_info= DatasetPure.build_trainset(train_data)
eval_data = DatasetPure.build_evalset(eval_data)
test_data = DatasetPure.build_testset(test_data)

- DatasetPure: This is indicative of the fact we are not using any other feature other than the interaction between the user and an item. Hence, this function builds the datasets from a Pure Collaborative Filtering perspective.

## 3. NCF

Look: https://librecommender.readthedocs.io/en/latest/user_guide/model_train.html

In [11]:
ncf = NCF(
    task="rating",
    data_info=data_info,
    loss_type="cross_entropy",
    embed_size=64,
    n_epochs=50,
    lr=1e-3,
    batch_size=2048,
    num_neg=1,
)

### Hyper-parameters
- Task = `rating` and `ranking`

- `Rating` is usually used when we have a dataset around explicit feedback (direct rating, starts given by customers)

- The `ranking` is used when the dataset has implicit feedback (customer clicks, opens a webpage, etc)

- `data_info` holds meta-information about the training dataset

- `embedding_size`= Embedding size for user and item embeddings


In [12]:
# monitor metrics on eval data during training
ncf.fit(
    train_data,
    neg_sampling=False, #for rating, this param is false else True
    verbose=2,
    eval_data=eval_data,
    metrics=["loss"],
)

  net = tf.layers.batch_normalization(net, training=is_training)
Instructions for updating:
Colocations handled automatically by placer.


Training start time: [35m2024-07-22 19:08:20[0m


  net = tf.layers.batch_normalization(net, training=is_training)
train: 100%|██████████| 782/782 [00:06<00:00, 126.01it/s]


Epoch 1 elapsed: 6.212s
	 [32mtrain_loss: 1.8648[0m


eval_pointwise: 100%|██████████| 13/13 [00:00<00:00, 172.56it/s]


	 eval rmse: 0.9381


train: 100%|██████████| 782/782 [00:04<00:00, 174.50it/s]


Epoch 2 elapsed: 4.488s
	 [32mtrain_loss: 0.8125[0m


eval_pointwise: 100%|██████████| 13/13 [00:00<00:00, 397.69it/s]


	 eval rmse: 0.9262


train: 100%|██████████| 782/782 [00:03<00:00, 245.84it/s]


Epoch 3 elapsed: 3.188s
	 [32mtrain_loss: 0.7625[0m


eval_pointwise: 100%|██████████| 13/13 [00:00<00:00, 423.29it/s]


	 eval rmse: 0.9226


train: 100%|██████████| 782/782 [00:03<00:00, 237.91it/s]


Epoch 4 elapsed: 3.293s
	 [32mtrain_loss: 0.7184[0m


eval_pointwise: 100%|██████████| 13/13 [00:00<00:00, 411.67it/s]


	 eval rmse: 0.9199


train: 100%|██████████| 782/782 [00:03<00:00, 219.25it/s]


Epoch 5 elapsed: 3.574s
	 [32mtrain_loss: 0.6773[0m


eval_pointwise: 100%|██████████| 13/13 [00:00<00:00, 263.13it/s]


	 eval rmse: 0.9235


train: 100%|██████████| 782/782 [00:03<00:00, 197.38it/s]


Epoch 6 elapsed: 3.970s
	 [32mtrain_loss: 0.6372[0m


eval_pointwise: 100%|██████████| 13/13 [00:00<00:00, 417.58it/s]


	 eval rmse: 0.9272


train: 100%|██████████| 782/782 [00:03<00:00, 244.71it/s]


Epoch 7 elapsed: 3.202s
	 [32mtrain_loss: 0.6006[0m


eval_pointwise: 100%|██████████| 13/13 [00:00<00:00, 419.61it/s]


	 eval rmse: 0.9406


train: 100%|██████████| 782/782 [00:03<00:00, 245.48it/s]


Epoch 8 elapsed: 3.192s
	 [32mtrain_loss: 0.567[0m


eval_pointwise: 100%|██████████| 13/13 [00:00<00:00, 420.06it/s]


	 eval rmse: 0.9434


train: 100%|██████████| 782/782 [00:03<00:00, 195.52it/s]


Epoch 9 elapsed: 4.007s
	 [32mtrain_loss: 0.5352[0m


eval_pointwise: 100%|██████████| 13/13 [00:00<00:00, 281.23it/s]


	 eval rmse: 0.9472


train: 100%|██████████| 782/782 [00:03<00:00, 223.16it/s]


Epoch 10 elapsed: 3.511s
	 [32mtrain_loss: 0.5095[0m


eval_pointwise: 100%|██████████| 13/13 [00:00<00:00, 403.92it/s]


	 eval rmse: 0.9536


train: 100%|██████████| 782/782 [00:03<00:00, 235.31it/s]


Epoch 11 elapsed: 3.334s
	 [32mtrain_loss: 0.4852[0m


eval_pointwise: 100%|██████████| 13/13 [00:00<00:00, 411.77it/s]


	 eval rmse: 0.9630


train: 100%|██████████| 782/782 [00:03<00:00, 241.01it/s]


Epoch 12 elapsed: 3.252s
	 [32mtrain_loss: 0.4645[0m


eval_pointwise: 100%|██████████| 13/13 [00:00<00:00, 403.08it/s]


	 eval rmse: 0.9663


train: 100%|██████████| 782/782 [00:04<00:00, 175.29it/s]


Epoch 13 elapsed: 4.468s
	 [32mtrain_loss: 0.4472[0m


eval_pointwise: 100%|██████████| 13/13 [00:00<00:00, 400.76it/s]


	 eval rmse: 0.9760


train: 100%|██████████| 782/782 [00:03<00:00, 237.44it/s]


Epoch 14 elapsed: 3.301s
	 [32mtrain_loss: 0.4308[0m


eval_pointwise: 100%|██████████| 13/13 [00:00<00:00, 406.44it/s]


	 eval rmse: 0.9770


train: 100%|██████████| 782/782 [00:03<00:00, 240.33it/s]


Epoch 15 elapsed: 3.261s
	 [32mtrain_loss: 0.4167[0m


eval_pointwise: 100%|██████████| 13/13 [00:00<00:00, 418.05it/s]


	 eval rmse: 0.9836


train: 100%|██████████| 782/782 [00:03<00:00, 225.24it/s]


Epoch 16 elapsed: 3.481s
	 [32mtrain_loss: 0.4049[0m


eval_pointwise: 100%|██████████| 13/13 [00:00<00:00, 237.34it/s]


	 eval rmse: 0.9856


train: 100%|██████████| 782/782 [00:04<00:00, 194.54it/s]


Epoch 17 elapsed: 4.027s
	 [32mtrain_loss: 0.3932[0m


eval_pointwise: 100%|██████████| 13/13 [00:00<00:00, 397.40it/s]


	 eval rmse: 0.9901


train: 100%|██████████| 782/782 [00:03<00:00, 245.64it/s]


Epoch 18 elapsed: 3.192s
	 [32mtrain_loss: 0.3832[0m


eval_pointwise: 100%|██████████| 13/13 [00:00<00:00, 390.29it/s]


	 eval rmse: 0.9929


train: 100%|██████████| 782/782 [00:03<00:00, 243.90it/s]


Epoch 19 elapsed: 3.214s
	 [32mtrain_loss: 0.3744[0m


eval_pointwise: 100%|██████████| 13/13 [00:00<00:00, 308.17it/s]


	 eval rmse: 0.9956


train: 100%|██████████| 782/782 [00:03<00:00, 200.24it/s]


Epoch 20 elapsed: 3.915s
	 [32mtrain_loss: 0.3664[0m


eval_pointwise: 100%|██████████| 13/13 [00:00<00:00, 229.22it/s]


	 eval rmse: 1.0022


train: 100%|██████████| 782/782 [00:03<00:00, 230.41it/s]


Epoch 21 elapsed: 3.404s
	 [32mtrain_loss: 0.3578[0m


eval_pointwise: 100%|██████████| 13/13 [00:00<00:00, 428.19it/s]


	 eval rmse: 1.0053


train: 100%|██████████| 782/782 [00:03<00:00, 245.20it/s]


Epoch 22 elapsed: 3.196s
	 [32mtrain_loss: 0.3517[0m


eval_pointwise: 100%|██████████| 13/13 [00:00<00:00, 386.43it/s]


	 eval rmse: 1.0068


train: 100%|██████████| 782/782 [00:03<00:00, 247.23it/s]


Epoch 23 elapsed: 3.169s
	 [32mtrain_loss: 0.3445[0m


eval_pointwise: 100%|██████████| 13/13 [00:00<00:00, 404.92it/s]


	 eval rmse: 1.0094


train: 100%|██████████| 782/782 [00:04<00:00, 182.31it/s]


Epoch 24 elapsed: 4.294s
	 [32mtrain_loss: 0.3388[0m


eval_pointwise: 100%|██████████| 13/13 [00:00<00:00, 400.18it/s]


	 eval rmse: 1.0079


train: 100%|██████████| 782/782 [00:03<00:00, 240.68it/s]


Epoch 25 elapsed: 3.258s
	 [32mtrain_loss: 0.3332[0m


eval_pointwise: 100%|██████████| 13/13 [00:00<00:00, 404.32it/s]


	 eval rmse: 1.0106


train: 100%|██████████| 782/782 [00:03<00:00, 246.68it/s]


Epoch 26 elapsed: 3.178s
	 [32mtrain_loss: 0.3279[0m


eval_pointwise: 100%|██████████| 13/13 [00:00<00:00, 368.01it/s]


	 eval rmse: 1.0148


train: 100%|██████████| 782/782 [00:03<00:00, 217.11it/s]


Epoch 27 elapsed: 3.616s
	 [32mtrain_loss: 0.3229[0m


eval_pointwise: 100%|██████████| 13/13 [00:00<00:00, 87.04it/s]


	 eval rmse: 1.0169


train: 100%|██████████| 782/782 [00:04<00:00, 185.81it/s]


Epoch 28 elapsed: 4.218s
	 [32mtrain_loss: 0.3184[0m


eval_pointwise: 100%|██████████| 13/13 [00:00<00:00, 413.07it/s]


	 eval rmse: 1.0204


train: 100%|██████████| 782/782 [00:03<00:00, 249.42it/s]


Epoch 29 elapsed: 3.142s
	 [32mtrain_loss: 0.3142[0m


eval_pointwise: 100%|██████████| 13/13 [00:00<00:00, 280.81it/s]


	 eval rmse: 1.0189


train: 100%|██████████| 782/782 [00:03<00:00, 248.68it/s]


Epoch 30 elapsed: 3.151s
	 [32mtrain_loss: 0.3098[0m


eval_pointwise: 100%|██████████| 13/13 [00:00<00:00, 427.47it/s]


	 eval rmse: 1.0260


train: 100%|██████████| 782/782 [00:03<00:00, 201.47it/s]


Epoch 31 elapsed: 3.888s
	 [32mtrain_loss: 0.3059[0m


eval_pointwise: 100%|██████████| 13/13 [00:00<00:00, 281.30it/s]


	 eval rmse: 1.0236


train: 100%|██████████| 782/782 [00:03<00:00, 219.25it/s]


Epoch 32 elapsed: 3.572s
	 [32mtrain_loss: 0.3022[0m


eval_pointwise: 100%|██████████| 13/13 [00:00<00:00, 373.11it/s]


	 eval rmse: 1.0249


train: 100%|██████████| 782/782 [00:03<00:00, 245.69it/s]


Epoch 33 elapsed: 3.190s
	 [32mtrain_loss: 0.2989[0m


eval_pointwise: 100%|██████████| 13/13 [00:00<00:00, 396.98it/s]


	 eval rmse: 1.0257


train: 100%|██████████| 782/782 [00:03<00:00, 226.06it/s]


Epoch 34 elapsed: 3.470s
	 [32mtrain_loss: 0.2958[0m


eval_pointwise: 100%|██████████| 13/13 [00:00<00:00, 256.20it/s]


	 eval rmse: 1.0287


train: 100%|██████████| 782/782 [00:05<00:00, 154.21it/s]


Epoch 35 elapsed: 5.083s
	 [32mtrain_loss: 0.2927[0m


eval_pointwise: 100%|██████████| 13/13 [00:00<00:00, 370.76it/s]


	 eval rmse: 1.0304


train: 100%|██████████| 782/782 [00:03<00:00, 243.75it/s]


Epoch 36 elapsed: 3.218s
	 [32mtrain_loss: 0.2904[0m


eval_pointwise: 100%|██████████| 13/13 [00:00<00:00, 399.94it/s]


	 eval rmse: 1.0312


train: 100%|██████████| 782/782 [00:03<00:00, 247.92it/s]


Epoch 37 elapsed: 3.161s
	 [32mtrain_loss: 0.2875[0m


eval_pointwise: 100%|██████████| 13/13 [00:00<00:00, 370.35it/s]


	 eval rmse: 1.0346


train: 100%|██████████| 782/782 [00:03<00:00, 243.31it/s]


Epoch 38 elapsed: 3.224s
	 [32mtrain_loss: 0.2842[0m


eval_pointwise: 100%|██████████| 13/13 [00:00<00:00, 298.71it/s]


	 eval rmse: 1.0365


train: 100%|██████████| 782/782 [00:04<00:00, 184.16it/s]


Epoch 39 elapsed: 4.255s
	 [32mtrain_loss: 0.2812[0m


eval_pointwise: 100%|██████████| 13/13 [00:00<00:00, 378.12it/s]


	 eval rmse: 1.0358


train: 100%|██████████| 782/782 [00:03<00:00, 244.97it/s]


Epoch 40 elapsed: 3.199s
	 [32mtrain_loss: 0.2788[0m


eval_pointwise: 100%|██████████| 13/13 [00:00<00:00, 415.10it/s]


	 eval rmse: 1.0351


train: 100%|██████████| 782/782 [00:03<00:00, 244.96it/s]


Epoch 41 elapsed: 3.200s
	 [32mtrain_loss: 0.2763[0m


eval_pointwise: 100%|██████████| 13/13 [00:00<00:00, 417.28it/s]


	 eval rmse: 1.0402


train: 100%|██████████| 782/782 [00:03<00:00, 215.66it/s]


Epoch 42 elapsed: 3.636s
	 [32mtrain_loss: 0.2746[0m


eval_pointwise: 100%|██████████| 13/13 [00:00<00:00, 264.28it/s]


	 eval rmse: 1.0380


train: 100%|██████████| 782/782 [00:03<00:00, 213.98it/s]


Epoch 43 elapsed: 3.661s
	 [32mtrain_loss: 0.272[0m


eval_pointwise: 100%|██████████| 13/13 [00:00<00:00, 408.32it/s]


	 eval rmse: 1.0397


train: 100%|██████████| 782/782 [00:03<00:00, 237.78it/s]


Epoch 44 elapsed: 3.298s
	 [32mtrain_loss: 0.2691[0m


eval_pointwise: 100%|██████████| 13/13 [00:00<00:00, 392.20it/s]


	 eval rmse: 1.0407


train: 100%|██████████| 782/782 [00:03<00:00, 246.22it/s]


Epoch 45 elapsed: 3.180s
	 [32mtrain_loss: 0.2682[0m


eval_pointwise: 100%|██████████| 13/13 [00:00<00:00, 421.63it/s]


	 eval rmse: 1.0425


train: 100%|██████████| 782/782 [00:04<00:00, 185.51it/s]


Epoch 46 elapsed: 4.223s
	 [32mtrain_loss: 0.2657[0m


eval_pointwise: 100%|██████████| 13/13 [00:00<00:00, 353.77it/s]


	 eval rmse: 1.0438


train: 100%|██████████| 782/782 [00:03<00:00, 244.82it/s]


Epoch 47 elapsed: 3.201s
	 [32mtrain_loss: 0.2637[0m


eval_pointwise: 100%|██████████| 13/13 [00:00<00:00, 411.60it/s]


	 eval rmse: 1.0440


train: 100%|██████████| 782/782 [00:03<00:00, 245.52it/s]


Epoch 48 elapsed: 3.194s
	 [32mtrain_loss: 0.2627[0m


eval_pointwise: 100%|██████████| 13/13 [00:00<00:00, 368.75it/s]


	 eval rmse: 1.0448


train: 100%|██████████| 782/782 [00:03<00:00, 241.83it/s]


Epoch 49 elapsed: 3.242s
	 [32mtrain_loss: 0.26[0m


eval_pointwise: 100%|██████████| 13/13 [00:00<00:00, 283.30it/s]


	 eval rmse: 1.0466


train: 100%|██████████| 782/782 [00:04<00:00, 187.18it/s]


Epoch 50 elapsed: 4.187s
	 [32mtrain_loss: 0.259[0m


eval_pointwise: 100%|██████████| 13/13 [00:00<00:00, 402.06it/s]

	 eval rmse: 1.0455





### 4. Evaluation

In [13]:
evaluate(
    model=ncf,
    data=test_data,
    neg_sampling=False,
    metrics=["loss"],
)
#for implicit feedback, metrics like precision@k, recall@k, ndcg can be used

eval_pointwise: 100%|██████████| 13/13 [00:00<00:00, 346.22it/s]


{'loss': 1.0433564}

### 5. Prediction Time

In [14]:
# predict preference of user 5755 to item 110
ncf.predict(user=5755, item=110)

array([3.6730342], dtype=float32)

In [15]:
# recommend 10-items for user 5755
ncf.recommend_user(user=5755, n_rec=10)

{5755: array([3746, 1235,  599, 3718, 1922, 2858,  608,  922, 3338, 1203])}