<a href="https://colab.research.google.com/github/lizzzb/Collaborative-Filtering-Algo-for-RecommenderSystems/blob/main/NCF_from_Medium_Course_Gupta.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Recommendation Systems using Neural Collaborative Filtering (NCF) explained with codes
## Understanding the maths behind NCF\

Mehul Gupta - Medium Course
Jul 28, 2023

### 1. Import libraries

In [1]:
!pip install LibRecommender



`LibRecommender` is a Python library for building and evaluating recommender systems. It provides various tools and models for collaborative filtering, content-based filtering, and hybrid recommender systems. The library allows users to implement recommendation algorithms, including matrix factorization, nearest neighbor-based methods, and deep learning-based models. It is designed to make it easier for researchers and developers to experiment with recommender system techniques without needing to manually implement each algorithm from scratch.

In [2]:
!pip install tensorflow==2.13.0



The given LibRecommender Library doesn't work with Keras 3 Environment

In [3]:
import numpy as np
import pandas as pd
from libreco.data import random_split, DatasetPure
from libreco.algorithms import NCF  # pure data,
from libreco.evaluation import evaluate
import tensorflow as tf
tf.compat.v1.reset_default_graph()  # Resets the graph to prevent conflicts

Instructions for updating:
non-resource variables are not supported in the long term


### 2. MovieLens Dataset

In [4]:
data = pd.read_csv('ratings.csv')
data.columns = ["user", "item", "label", "time"]

data.head(10)

Unnamed: 0,user,item,label,time
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931
5,1,70,3.0,964982400
6,1,101,5.0,964980868
7,1,110,4.0,964982176
8,1,151,5.0,964984041
9,1,157,5.0,964984100


In [5]:
train_data, eval_data, test_data = random_split(data, multi_ratios=[0.8, 0.1, 0.1])

In [6]:
train_data, data_info= DatasetPure.build_trainset(train_data)
eval_data = DatasetPure.build_evalset(eval_data)
test_data = DatasetPure.build_testset(test_data)

In [7]:
ncf = NCF(
    task="rating",
    data_info=data_info,
    loss_type="cross_entropy",
    embed_size=16,
    n_epochs=10,
    lr=1e-3,
    batch_size=2048,
    num_neg=1,
)

In [8]:
# monitor metrics on eval data during training
ncf.fit(
    train_data,
    neg_sampling=False, #for rating, this param is false else True
    verbose=2,
    eval_data=eval_data,
    metrics=["loss"],
)

# do final evaluation on test data
evaluate(
    model=ncf,
    data=test_data,
    neg_sampling=False,
    metrics=["loss"],
)
#for implicit feedback, metrics like precision@k, recall@k, ndcg can be used

  net = tf.layers.batch_normalization(net, training=is_training)
Instructions for updating:
Colocations handled automatically by placer.


Training start time: [35m2024-11-20 17:24:06[0m


  net = tf.layers.batch_normalization(net, training=is_training)
train: 100%|██████████| 79/79 [00:01<00:00, 66.38it/s]


Epoch 1 elapsed: 1.196s
	 [32mtrain_loss: 8.3543[0m


eval_pointwise: 100%|██████████| 2/2 [00:00<00:00, 26.78it/s]


	 eval rmse: 1.0240


train: 100%|██████████| 79/79 [00:00<00:00, 93.60it/s]


Epoch 2 elapsed: 0.852s
	 [32mtrain_loss: 0.8824[0m


eval_pointwise: 100%|██████████| 2/2 [00:00<00:00, 76.23it/s]


	 eval rmse: 1.0827


train: 100%|██████████| 79/79 [00:00<00:00, 94.19it/s]


Epoch 3 elapsed: 0.846s
	 [32mtrain_loss: 0.5957[0m


eval_pointwise: 100%|██████████| 2/2 [00:00<00:00, 83.61it/s]


	 eval rmse: 1.0810


train: 100%|██████████| 79/79 [00:00<00:00, 98.57it/s]


Epoch 4 elapsed: 0.808s
	 [32mtrain_loss: 0.4892[0m


eval_pointwise: 100%|██████████| 2/2 [00:00<00:00, 87.83it/s]


	 eval rmse: 1.0507


train: 100%|██████████| 79/79 [00:01<00:00, 66.47it/s]


Epoch 5 elapsed: 1.204s
	 [32mtrain_loss: 0.4402[0m


eval_pointwise: 100%|██████████| 2/2 [00:00<00:00, 46.75it/s]


	 eval rmse: 1.0421


train: 100%|██████████| 79/79 [00:01<00:00, 60.53it/s]


Epoch 6 elapsed: 1.313s
	 [32mtrain_loss: 0.406[0m


eval_pointwise: 100%|██████████| 2/2 [00:00<00:00, 49.97it/s]


	 eval rmse: 1.0114


train: 100%|██████████| 79/79 [00:01<00:00, 58.95it/s]


Epoch 7 elapsed: 1.352s
	 [32mtrain_loss: 0.3808[0m


eval_pointwise: 100%|██████████| 2/2 [00:00<00:00, 48.13it/s]


	 eval rmse: 0.9763


train: 100%|██████████| 79/79 [00:00<00:00, 90.23it/s]


Epoch 8 elapsed: 0.884s
	 [32mtrain_loss: 0.3599[0m


eval_pointwise: 100%|██████████| 2/2 [00:00<00:00, 82.16it/s]


	 eval rmse: 0.9645


train: 100%|██████████| 79/79 [00:00<00:00, 97.57it/s]


Epoch 9 elapsed: 0.816s
	 [32mtrain_loss: 0.3439[0m


eval_pointwise: 100%|██████████| 2/2 [00:00<00:00, 91.57it/s]


	 eval rmse: 0.9700


train: 100%|██████████| 79/79 [00:00<00:00, 94.72it/s]


Epoch 10 elapsed: 0.842s
	 [32mtrain_loss: 0.3279[0m


eval_pointwise: 100%|██████████| 2/2 [00:00<00:00, 70.55it/s]


	 eval rmse: 0.9897


eval_pointwise: 100%|██████████| 2/2 [00:00<00:00, 78.13it/s]


{'loss': 1.0042298}

This output corresponds to the training process of your Neural Collaborative Filtering (NCF) model, where the model is being trained and evaluated on your data.

### Key Sections of the Output:

1. **Training Loop**:
   The lines like:

   ```
   train: 100%|██████████| 79/79 [00:01<00:00, 66.38it/s]
   Epoch 1 elapsed: 1.196s
       train_loss: 8.3543
   ```

   - **Training Progress**: This indicates that 79 batches of data were processed in the training set during the first epoch (iteration). The training loop shows 100% completion, followed by the elapsed time (in seconds) for that epoch.
   - **Training Loss**: After each epoch, the `train_loss` is printed. This is the loss value calculated from the predictions of the model on the training set. The lower this value, the better the model is fitting the data.

2. **Evaluation**:
   The lines like:

   ```
   eval_pointwise: 100%|██████████| 2/2 [00:00<00:00, 26.78it/s]
       eval rmse: 1.0240
   ```

   - **Evaluation Progress**: This indicates that the model is being evaluated on the evaluation set (separate from the training set) after each epoch. It processes 2 batches (in your case, it's small data).
   - **Evaluation RMSE**: The `eval rmse` (Root Mean Squared Error) is printed. RMSE is a common metric for regression tasks, and it measures the difference between predicted values and the true values. A lower RMSE value means the model's predictions are closer to the actual values. In your case:
     - After Epoch 1, `eval rmse` is 1.0240.
     - This decreases slightly over the epochs, indicating that the model is improving at predicting on the evaluation data.

3. **Epoch Summary**:
   The lines:

   ```
   ==============================
   Epoch 2 elapsed: 0.852s
       train_loss: 0.8824
   ```

   - Each epoch is printed with the time it took (`elapsed: 0.852s`) and the `train_loss` at the end of that epoch.
   - The `train_loss` value decreases over epochs (e.g., from 8.3543 in epoch 1 to 0.8824 in epoch 2), suggesting the model is learning and improving.

4. **Final Evaluation**:
   After completing all epochs, the final evaluation result is printed:

   ```
   eval_pointwise: 100%|██████████| 2/2 [00:00<00:00, 78.13it/s]

   {'loss': 1.0042298}
   ```

   - This final loss is an overall evaluation of the model performance after all epochs. It seems to be a final summary value of the model's error or loss on the evaluation data, which is `1.0042298` in this case.

### Summary of the Process:

- **Training Process**: During each epoch, the model is trained on the training dataset, with the loss indicating how well the model is fitting the data.
- **Evaluation**: After each epoch, the model's performance is evaluated on a separate evaluation dataset (not used during training). The `rmse` metric measures how well the model's predictions match the actual values in the evaluation dataset.
- **Improvement**: The `train_loss` decreases over the epochs, which indicates the model is learning and improving during training. The `eval rmse` fluctuates, with a slight decrease overall, which shows that the model is generalizing well to unseen data, though some instability might be expected.

### Key Takeaways:

- **Training Loss**: Measures the error on the training data; it’s expected to decrease over time as the model learns.
- **RMSE**: Measures the model's performance on the evaluation data. Lower RMSE indicates better performance.
- **Final Loss**: After all epochs, the model's loss on the evaluation set is `1.0042298`, which represents the final error after training.

If the model's `train_loss` decreases steadily and the `eval_rmse` either decreases or stabilizes over time, this is typically a sign of good training progress.

In [11]:
# predict preference of user 5755 to item 110
ncf.predict(user=555, item=110)

array([5.], dtype=float32)

In [12]:
# recommend 10items for user 5755
ncf.recommend_user(user=555, n_rec=10)

{555: array([108932,   5747, 168492,   2467,  56921,  60737,   1479, 119155,
         86898,  54190])}