<i>Copyright (c) Microsoft Corporation. All rights reserved.</i>

<i>Licensed under the MIT License.</i>

# LightGCN - simplified GCN model for recommendation

This notebook serves as an introduction to LightGCN, which is an simple, linear and neat GCN model for recommendation.

## 0 Global Settings and Imports

In [1]:
import sys
sys.path.append("../../")
import os
import papermill as pm
import pandas as pd
import numpy as np
import tensorflow as tf
from reco_utils.common.timer import Timer
from reco_utils.recommender.deeprec.graphrec.lightgcn import LightGCN
from reco_utils.recommender.deeprec.graphrec.dataset import Dataset as LightGCNDataset
from reco_utils.dataset import movielens
from reco_utils.dataset.python_splitters import python_chrono_split
from reco_utils.evaluation.python_evaluation import map_at_k, ndcg_at_k, precision_at_k, recall_at_k
from reco_utils.common.constants import SEED as DEFAULT_SEED


print("System version: {}".format(sys.version))
print("Pandas version: {}".format(pd.__version__))
print("Tensorflow version: {}".format(tf.__version__))

System version: 3.6.10 |Anaconda, Inc.| (default, May  8 2020, 02:54:21) 
[GCC 7.3.0]
Pandas version: 0.25.3
Tensorflow version: 1.15.2


In [2]:
# top k items to recommend
TOP_K = 10

# Select MovieLens data size: 100k, 1m, 10m, or 20m
MOVIELENS_DATA_SIZE = '100k'

# Model parameters
EPOCHS = 100
BATCH_SIZE = 1024

SEED = DEFAULT_SEED  # Set None for non-deterministic results

## 1 LightGCN model architecture

LightGCN is a simplified design of GCN to make it more concise and appropriate for recommendation. The model architecture is illustrated below.

<img src="https://recodatasets.blob.core.windows.net/images/lightGCN-model.jpg">

In Light Graph Convolution, only the normalized sum of neighbor embeddings is performed towards next layer; other operations like self-connection, feature transformation, and nonlinear activation are all removed, which largely simplifies GCNs. In Layer Combination, we sum over the embeddings at each layer to obtain the final representations.

### 1.1 Light Graph Convolution (LGC)

In LightGCN, we adopt the simple weighted sum aggregator and abandon the use of feature transformation and nonlinear activation. The graph convolution operation in LightGCN is defined as:

$$
\begin{array}{l}
\mathbf{e}_{u}^{(k+1)}=\sum_{i \in \mathcal{N}_{u}} \frac{1}{\sqrt{\left|\mathcal{N}_{u}\right|} \sqrt{\left|\mathcal{N}_{i}\right|}} \mathbf{e}_{i}^{(k)} \\
\mathbf{e}_{i}^{(k+1)}=\sum_{u \in \mathcal{N}_{i}} \frac{1}{\sqrt{\left|\mathcal{N}_{i}\right|} \sqrt{\left|\mathcal{N}_{u}\right|}} \mathbf{e}_{u}^{(k)}
\end{array}
$$

The symmetric normalization term $\frac{1}{\sqrt{\left|\mathcal{N}_{u}\right|} \sqrt{\left|\mathcal{N}_{i}\right|}}$ follows the design of standard GCN, which can avoid the scale of embeddings increasing with graph convolution operations.


### 1.2 Layer Combination and Model Prediction

In LightGCN, the only trainable model parameters are the embeddings at the 0-th layer, i.e., $\mathbf{e}_{u}^{(0)}$ for all users and $\mathbf{e}_{i}^{(0)}$ for all items. When they are given, the embeddings at higher layers can be computed via LGC. After $K$ layers LGC, we further combine the embeddings obtained at each layer to form the final representation of a user (an item):

$$
\mathbf{e}_{u}=\sum_{k=0}^{K} \alpha_{k} \mathbf{e}_{u}^{(k)} ; \quad \mathbf{e}_{i}=\sum_{k=0}^{K} \alpha_{k} \mathbf{e}_{i}^{(k)}
$$

where $\alpha_{k} \geq 0$ denotes the importance of the $k$-th layer embedding in constituting the final embedding. In our experiments, we set $\alpha_{k}$ uniformly as $1 / (K+1)$.

The model prediction is defined as the inner product of user and item final representations:

$$
\hat{y}_{u i}=\mathbf{e}_{u}^{T} \mathbf{e}_{i}
$$

which is used as the ranking score for recommendation generation.


### 1.3 Matrix Form

Let the user-item interaction matrix be $\mathbf{R} \in \mathbb{R}^{M \times N}$ where $M$ and $N$ denote the number of users and items, respectively, and each entry $R_{ui}$ is 1 if $u$ has interacted with item $i$ otherwise 0. We then obtain the adjacency matrix of the user-item graph as

$$
\mathbf{A}=\left(\begin{array}{cc}
\mathbf{0} & \mathbf{R} \\
\mathbf{R}^{T} & \mathbf{0}
\end{array}\right)
$$

Let the 0-th layer embedding matrix be $\mathbf{E}^{(0)} \in \mathbb{R}^{(M+N) \times T}$, where $T$ is the embedding size. Then we can obtain the matrix equivalent form of LGC as:

$$
\mathbf{E}^{(k+1)}=\left(\mathbf{D}^{-\frac{1}{2}} \mathbf{A} \mathbf{D}^{-\frac{1}{2}}\right) \mathbf{E}^{(k)}
$$

where $\mathbf{D}$ is a $(M+N) \times(M+N)$ diagonal matrix, in which each entry $D_{ii}$ denotes the number of nonzero entries in the $i$-th row vector of the adjacency matrix $\mathbf{A}$ (also named as degree matrix). Lastly, we get the final embedding matrix used for model prediction as:

$$
\begin{aligned}
\mathbf{E} &=\alpha_{0} \mathbf{E}^{(0)}+\alpha_{1} \mathbf{E}^{(1)}+\alpha_{2} \mathbf{E}^{(2)}+\ldots+\alpha_{K} \mathbf{E}^{(K)} \\
&=\alpha_{0} \mathbf{E}^{(0)}+\alpha_{1} \tilde{\mathbf{A}} \mathbf{E}^{(0)}+\alpha_{2} \tilde{\mathbf{A}}^{2} \mathbf{E}^{(0)}+\ldots+\alpha_{K} \tilde{\mathbf{A}}^{K} \mathbf{E}^{(0)}
\end{aligned}
$$

where $\tilde{\mathbf{A}}=\mathbf{D}^{-\frac{1}{2}} \mathbf{A} \mathbf{D}^{-\frac{1}{2}}$ is the symmetrically normalized matrix.

### 1.4 Model Training

We employ the Bayesian Personalized Ranking (BPR) loss which is a pairwise loss that encourages the prediction of an observed entry to be higher than its unobserved counterparts:

$$
L_{B P R}=-\sum_{u=1}^{M} \sum_{i \in \mathcal{N}_{u}} \sum_{j \notin \mathcal{N}_{u}} \ln \sigma\left(\hat{y}_{u i}-\hat{y}_{u j}\right)+\lambda\left\|\mathbf{E}^{(0)}\right\|^{2}
$$

Where $\lambda$ controls the $L_2$ regularization strength. We employ the Adam optimizer and use it in a mini-batch manner.


## 2 TensorFlow implementation of LightGCN

We will use the MovieLens dataset, which is composed of integer ratings from 1 to 5.

We convert MovieLens into implicit feedback for model training and evaluation.

You can check the details of implementation in `reco_utils/recommender/deeprec/graphrec/`


## 3 TensorFlow LightGCN movie recommender

### 3.1 Load and split data

For each user, we held out his/her latest interaction as the test set and utilized the remaining data for training. We use `python_chrono_split` to achieve this.

In [3]:
df = movielens.load_pandas_df(size=MOVIELENS_DATA_SIZE)

df.head()

100%|██████████| 4.81k/4.81k [00:02<00:00, 1.96kKB/s]


Unnamed: 0,userID,itemID,rating,timestamp
0,196,242,3.0,881250949
1,186,302,3.0,891717742
2,22,377,1.0,878887116
3,244,51,2.0,880606923
4,166,346,1.0,886397596


In [4]:
train, test = python_chrono_split(df, 0.75)

### 3.2 Functions of LightGCN Dataset 

Dataset Class for LightGCN, where important functions are:

`get_norm_adj_mat`, load normalized adjacency matrix $\mathbf{A}$ of user-item graph if it already exists, otherwise create the matrix.

`create_norm_adj_mat`, create normalized adjacency matrix $\mathbf{A}$ of user-item graph.

`train_loader`, generate a batch of training data — sample a batch of users and then sample one positive item and one negative item for each user.


In [5]:
data = LightGCNDataset(train=train, test=test, seed=SEED)

### 3.3 Train LightGCN based on TensorFlow

The LightGCN has a lot of parameters. The most important ones are:

`data`, initialized LightGCNDataset object.

`epoch`, number of epochs for training.

`n_layers`, number of layers of the model.

`eval_epoch`, if it is not None, evaluation metrics will be calculated on test set every "eval_epoch" epochs. In this way, we can observe the effect of the model during the training process.

`top_k`, the number of items to be recommended for each user when calculating ranking metrics.

To train the model, we simply need to call the `fit()` method.

In [6]:
model = LightGCN (
    data,
    seed=SEED,
    epoch=EPOCHS,
    learning_rate=0.001,
    embed_size=64,
    batch_size=BATCH_SIZE,
    n_layers=3,
    decay=1e-4,
    eval_epoch=5,
    top_k=TOP_K
)


Already create adjacency matrix.
Already normalize adjacency matrix.

The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.



  d_inv = np.power(rowsum, -0.5).flatten()


Using xavier initialization.









In [7]:
with Timer() as train_time:
    model.fit()

print("Took {} seconds for training.".format(train_time.interval))

Epoch 1 (train)4.1s: train loss = 0.67001 = (mf)0.66999 + (embed)0.00002
Epoch 2 (train)2.0s: train loss = 0.47925 = (mf)0.47911 + (embed)0.00013
Epoch 3 (train)2.0s: train loss = 0.33086 = (mf)0.33056 + (embed)0.00030
Epoch 4 (train)2.0s: train loss = 0.29205 = (mf)0.29162 + (embed)0.00042
Epoch 5 (train)1.9s + (eval)1.0s: train loss = 0.27894 = (mf)0.27845 + (embed)0.00049, recall = 0.06678, ndcg = 0.11996, precision = 0.11241, map = 0.02757
Epoch 6 (train)2.0s: train loss = 0.26818 = (mf)0.26765 + (embed)0.00054
Epoch 7 (train)1.9s: train loss = 0.26423 = (mf)0.26366 + (embed)0.00057
Epoch 8 (train)1.9s: train loss = 0.25734 = (mf)0.25674 + (embed)0.00060
Epoch 9 (train)1.9s: train loss = 0.24631 = (mf)0.24568 + (embed)0.00063
Epoch 10 (train)1.9s + (eval)0.3s: train loss = 0.24007 = (mf)0.23941 + (embed)0.00066, recall = 0.07510, ndcg = 0.14055, precision = 0.12715, map = 0.03356
Epoch 11 (train)1.9s: train loss = 0.23424 = (mf)0.23355 + (embed)0.00069
Epoch 12 (train)1.9s: train l

Epoch 92 (train)1.6s: train loss = 0.12294 = (mf)0.12023 + (embed)0.00271
Epoch 93 (train)1.6s: train loss = 0.12378 = (mf)0.12106 + (embed)0.00272
Epoch 94 (train)1.6s: train loss = 0.12361 = (mf)0.12086 + (embed)0.00275
Epoch 95 (train)1.6s + (eval)0.2s: train loss = 0.12396 = (mf)0.12119 + (embed)0.00276, recall = 0.09690, ndcg = 0.18621, precision = 0.16850, map = 0.04711
Epoch 96 (train)1.6s: train loss = 0.12161 = (mf)0.11883 + (embed)0.00278
Epoch 97 (train)1.6s: train loss = 0.12354 = (mf)0.12075 + (embed)0.00280
Epoch 98 (train)1.6s: train loss = 0.12106 = (mf)0.11824 + (embed)0.00282
Epoch 99 (train)1.6s: train loss = 0.12263 = (mf)0.11980 + (embed)0.00284
Epoch 100 (train)1.6s + (eval)0.2s: train loss = 0.12106 = (mf)0.11820 + (embed)0.00286, recall = 0.09700, ndcg = 0.18557, precision = 0.16766, map = 0.04714
Took 182.74709977395833 seconds for training.


## 3.4 Recommendation and Evaluation

Recommendation and evaluation have been performed on the specified test set during training. After training, we can also use the model to perform recommendation and evalution on other data. Here we still use `test` as test data, but `test` can be replaced by other data with similar data structure.

### 3.4.1 Recommendation

We can call `recommend_k_items` to recommend k items for each user passed in this function. We set `remove_seen=True` to remove the items already seen by the user. The function returns a dataframe, containing each user and top k items recommended to them and the corresponding ranking scores.

In [8]:
topk_scores = model.recommend_k_items(test, top_k=TOP_K, remove_seen=True)

topk_scores.head()

Unnamed: 0,userID,itemID,prediction
0,1,100,6.981168
1,1,222,5.855599
2,1,210,5.580697
3,1,405,5.470736
4,1,12,5.36365


### 3.4.2 Evaluation

With `topk_scores` predicted by the model, we can evaluate how LightGCN performs on this test set.

In [9]:
eval_map = map_at_k(test, topk_scores, k=TOP_K)
eval_ndcg = ndcg_at_k(test, topk_scores, k=TOP_K)
eval_precision = precision_at_k(test, topk_scores, k=TOP_K)
eval_recall = recall_at_k(test, topk_scores, k=TOP_K)

print("MAP:\t%f" % eval_map,
      "NDCG:\t%f" % eval_ndcg,
      "Precision@K:\t%f" % eval_precision,
      "Recall@K:\t%f" % eval_recall, sep='\n')

MAP:	0.047139
NDCG:	0.185569
Precision@K:	0.167656
Recall@K:	0.097002


In [10]:
# Record results with papermill for tests
pm.record("map", eval_map)
pm.record("ndcg", eval_ndcg)
pm.record("precision", eval_precision)
pm.record("recall", eval_recall)

  


  This is separate from the ipykernel package so we can avoid doing imports until


  after removing the cwd from sys.path.


  """


### Reference: 
1. Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yongdong Zhang & Meng Wang, LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation, 2020, https://arxiv.org/abs/2002.02126

2. LightGCN implementation [TensorFlow]: https://github.com/kuandeng/lightgcn