## DeepFM
DeepFM, as the name suggests, is a deep learning-based extension for Factorization Machines capturing non-linear relationships.

### Architecture
![image](./images/deepfm.webp)

Neural Collaborative Filtering (NCF) utilizes a neural network with two branches that eventually converge to produce a single output. The NCF architecture consists of several key components:

### 1. Input and Output

As conveyed DeepFM is an extension of Factorization Machines, even though the input is quite similar i.e. One-Hot Encoded features. The output is 0–1 giving the probability of whether the user will click or not as sigmoid activation is used in the last layer. However, this can be changed to ReLU or Linear activation for ratings/ranking purposes (explicit feedback).

### 2. Embeddings Layer

This layer is trying to learn embeddings similar to the Latent vectors in the Factorization Machine. So, for each OHE feature, we will learn embeddings of the same size in this layer

From here on, these embeddings will be fed to two different branches

#### a. Factorization Machines
This is a replica of the Factorization Machine where, using the latent vector embeddings we generated in the previous step, will use the below equation

y = w₀ + ∑(wᵢ * xᵢ) + ∑ᵢ(∑ⱼ(<vᵢ . vⱼ> * xᵢ * xⱼ))

Where

    y = Label

    w₀ = Bias

    wᵢ = weights

    xᵢ = Features from One-Hot encoded feature-set

    <vᵢ . vⱼ> = Dot product between latent vectors.

Note: Touch down on this blog to understand the equation

The other branch where the embeddings are fed is
#### b. DNN

This is a general Neural Network (similar to NCF’s MLP) where all these latent vector embeddings are fed to a few hidden layers and forwarded to a final output layer.

### 5. Final Output

The outputs from the Factorization Machine and DNN segments are combined together (concatenated) and then the sigmoid function is applied for the final output. So, a DeepFM can be summarized as

## Implementation

#### 1. Imports

In [1]:
!pip install LibRecommender



In [2]:
import numpy as np
import pandas as pd


from libreco.data import split_by_ratio_chrono, DatasetFeat
from libreco.algorithms import DeepFM

Instructions for updating:
non-resource variables are not supported in the long term


### 2. Dataset
Movielens data

In [3]:
data = pd.read_csv("./sample_movielens_merged.csv", sep=",", header=0)
data.fillna(value={'age':0,'genre1':'','genre2':'','genre3':'','occupation':'','sex':''},inplace=True)

In [4]:
data.shape

(100021, 10)

In [5]:
data.head()

Unnamed: 0,user,item,label,time,sex,age,occupation,genre1,genre2,genre3
0,4617,296,2,964138229,F,25,6,crime,drama,missing
1,1298,208,4,974849526,M,35,6,action,adventure,missing
2,4585,1769,4,964322774,M,35,7,action,thriller,missing
3,3706,1136,5,966376465,M,25,12,comedy,missing,missing
4,2137,1215,3,974640099,F,1,10,action,adventure,comedy


In [6]:
data.columns

Index(['user', 'item', 'label', 'time', 'sex', 'age', 'occupation', 'genre1',
       'genre2', 'genre3'],
      dtype='object')

In [7]:
# split into train and test data based on time
train_data, test_data = split_by_ratio_chrono(data, test_size=0.2)

- Sparse columns: Categorical columns

- Dense columns: Numerical columns

- User & Item columns

In [8]:
# specify complete columns information
sparse_col = ["sex", "occupation", "genre1", "genre2", "genre3"]
dense_col = ["age"]
user_col = ["sex", "age", "occupation"]
item_col = ["genre1", "genre2", "genre3"]

train_data, data_info = DatasetFeat.build_trainset(train_data, user_col, item_col, sparse_col, dense_col)
test_data = DatasetFeat.build_testset(test_data)

## 3. DeepFM

Look: https://librecommender.readthedocs.io/en/latest/user_guide/model_train.html

In [9]:
model = DeepFM(
    task="ranking",
    data_info=data_info,
    embed_size=64,
    n_epochs=10,
    lr=1e-4,
    batch_size=512,
    use_bn=True,
    hidden_units=(128, 64, 32),
)

### Hyper-parameters
- Task = `rating` and `ranking`

- `Rating` is usually used when we have a dataset around explicit feedback (direct rating, starts given by customers)

- The `ranking` is used when the dataset has implicit feedback (customer clicks, opens a webpage, etc)

- `data_info` holds meta-information about the training dataset

- `embedding_size`= Embedding size for user and item embeddings


In [10]:
model.fit(
    train_data,
    neg_sampling=True,
    verbose=2,
    shuffle=True,
    eval_data=test_data,
    metrics=["loss"],
)

Training start time: [35m2024-07-22 19:37:10[0m


  net = tf.layers.batch_normalization(net, training=is_training)
Instructions for updating:
Colocations handled automatically by placer.
  net = tf.layers.batch_normalization(net, training=is_training)


total params: [33m678,440[0m | embedding params: [33m601,151[0m | network params: [33m77,289[0m


train: 100%|██████████| 315/315 [00:08<00:00, 38.69it/s]


Epoch 1 elapsed: 8.162s
	 [32mtrain_loss: 0.9579[0m


eval_pointwise: 100%|██████████| 5/5 [00:00<00:00, 18.61it/s]


	 eval log_loss: 0.7250


train: 100%|██████████| 315/315 [00:03<00:00, 101.10it/s]


Epoch 2 elapsed: 3.125s
	 [32mtrain_loss: 0.5694[0m


eval_pointwise: 100%|██████████| 5/5 [00:00<00:00, 207.73it/s]


	 eval log_loss: 0.6160


train: 100%|██████████| 315/315 [00:01<00:00, 160.20it/s]


Epoch 3 elapsed: 1.972s
	 [32mtrain_loss: 0.5337[0m


eval_pointwise: 100%|██████████| 5/5 [00:00<00:00, 203.57it/s]


	 eval log_loss: 0.6128


train: 100%|██████████| 315/315 [00:01<00:00, 161.23it/s]


Epoch 4 elapsed: 1.962s
	 [32mtrain_loss: 0.5175[0m


eval_pointwise: 100%|██████████| 5/5 [00:00<00:00, 159.14it/s]


	 eval log_loss: 0.6174


train: 100%|██████████| 315/315 [00:01<00:00, 159.82it/s]


Epoch 5 elapsed: 1.978s
	 [32mtrain_loss: 0.5045[0m


eval_pointwise: 100%|██████████| 5/5 [00:00<00:00, 177.56it/s]


	 eval log_loss: 0.6220


train: 100%|██████████| 315/315 [00:01<00:00, 162.05it/s]


Epoch 6 elapsed: 1.953s
	 [32mtrain_loss: 0.4894[0m


eval_pointwise: 100%|██████████| 5/5 [00:00<00:00, 192.74it/s]


	 eval log_loss: 0.6219


train: 100%|██████████| 315/315 [00:02<00:00, 121.82it/s]


Epoch 7 elapsed: 2.595s
	 [32mtrain_loss: 0.4716[0m


eval_pointwise: 100%|██████████| 5/5 [00:00<00:00, 145.55it/s]


	 eval log_loss: 0.6362


train: 100%|██████████| 315/315 [00:02<00:00, 126.83it/s]


Epoch 8 elapsed: 2.493s
	 [32mtrain_loss: 0.4581[0m


eval_pointwise: 100%|██████████| 5/5 [00:00<00:00, 201.85it/s]


	 eval log_loss: 0.6517


train: 100%|██████████| 315/315 [00:01<00:00, 162.55it/s]


Epoch 9 elapsed: 1.942s
	 [32mtrain_loss: 0.4428[0m


eval_pointwise: 100%|██████████| 5/5 [00:00<00:00, 190.93it/s]


	 eval log_loss: 0.6611


train: 100%|██████████| 315/315 [00:01<00:00, 163.34it/s]


Epoch 10 elapsed: 1.935s
	 [32mtrain_loss: 0.4281[0m


eval_pointwise: 100%|██████████| 5/5 [00:00<00:00, 194.63it/s]


	 eval log_loss: 0.6837


### 4. Prediction Time

In [11]:
# predict preference of user 2211 to item 110
model.predict(user=2211, item=110)

0.9818667

In [12]:
# recommend 7 items for user 2211
model.recommend_user(user=2211, n_rec=7)

{2211: array([2858, 1270,  110, 1265, 2791, 3072, 3448])}

In [13]:
# cold-start prediction
model.predict(user="ccc", item="not item", cold_start="average")
# cold-start recommendation
model.recommend_user(user="are we good?", n_rec=7, cold_start="popular")

[31mDetect 1 unknown interaction(s), position: [0][0m
[31mDetect unknown user: are we good?[0m


{'are we good?': array([ 480, 1573,   50, 1193, 1193, 1393,  480])}