# LightGCN(Light Graph Convolution Network)

* [実装の参考](https://github.com/microsoft/recommenders/blob/main/examples/02_model_collaborative_filtering/lightgcn_deep_dive.ipynb)

## Install

In [1]:
# recommenders
! git clone https://github.com/microsoft/recommenders

Cloning into 'recommenders'...
remote: Enumerating objects: 35770, done.[K
remote: Counting objects: 100% (993/993), done.[K
remote: Compressing objects: 100% (413/413), done.[K
remote: Total 35770 (delta 653), reused 855 (delta 562), pack-reused 34777[K
Receiving objects: 100% (35770/35770), 202.34 MiB | 21.37 MiB/s, done.
Resolving deltas: 100% (24120/24120), done.


In [2]:
%%capture
! pip install papermill
!pip install scrapbook
!pip install recommenders[examples]

## Import

In [3]:
import sys
import os
import papermill as pm
import scrapbook as sb
import pandas as pd
import numpy as np
import tensorflow as tf
tf.get_logger().setLevel('ERROR') # only show error messages

from recommenders.utils.timer import Timer
from recommenders.models.deeprec.models.graphrec.lightgcn import LightGCN
from recommenders.models.deeprec.DataModel.ImplicitCF import ImplicitCF
from recommenders.datasets import movielens
from recommenders.datasets.python_splitters import python_stratified_split
from recommenders.evaluation.python_evaluation import map_at_k, ndcg_at_k, precision_at_k, recall_at_k
from recommenders.utils.constants import SEED as DEFAULT_SEED
from recommenders.models.deeprec.deeprec_utils import prepare_hparams

print("System version: {}".format(sys.version))
print("Pandas version: {}".format(pd.__version__))
print("Tensorflow version: {}".format(tf.__version__))

System version: 3.7.13 (default, Apr 24 2022, 01:04:09) 
[GCC 7.5.0]
Pandas version: 1.3.5
Tensorflow version: 2.8.2


In [4]:
# レコメンドするアイテム数
TOP_K = 10

# Select MovieLens data size: 100k, 1m, 10m, or 20m
MOVIELENS_DATA_SIZE = '100k'

# ハイパーパラメータ
EPOCHS = 50
BATCH_SIZE = 1024

SEED = DEFAULT_SEED  # Set None for non-deterministic results


# パスの設定
yaml_file = "/content/recommenders/recommenders/models/deeprec/config/lightgcn.yaml"
user_file = "/content/data/user_embeddings.csv"
item_file = "/content/data/item_embeddings.csv"

In [5]:
# データの読み込み
df = movielens.load_pandas_df(size=MOVIELENS_DATA_SIZE)

df.head()

100%|██████████| 4.81k/4.81k [00:00<00:00, 10.5kKB/s]


Unnamed: 0,userID,itemID,rating,timestamp
0,196,242,3.0,881250949
1,186,302,3.0,891717742
2,22,377,1.0,878887116
3,244,51,2.0,880606923
4,166,346,1.0,886397596


## Model

In [6]:
# 訓練データとテスト用データに分ける
train, test = python_stratified_split(df, ratio=0.75)

# LightGCN用のデータの用意
data = ImplicitCF(train=train, test=test, seed=SEED)

# LightGCN用のハイパーパラメータ
hparams = prepare_hparams(yaml_file,
                          n_layers=3,
                          batch_size=BATCH_SIZE,
                          epochs=EPOCHS,
                          learning_rate=0.005,
                          eval_epoch=5,
                          top_k=TOP_K,
                         )


model = LightGCN(hparams, data, seed=SEED)

Already create adjacency matrix.
Already normalize adjacency matrix.
Using xavier initialization.


## Train

In [7]:
with Timer() as train_time:
    model.fit()

print(f"学習の時間 : {train_time.interval} 秒")

Epoch 1 (train)3.3s: train loss = 0.46985 = (mf)0.46960 + (embed)0.00025
Epoch 2 (train)3.0s: train loss = 0.28470 = (mf)0.28405 + (embed)0.00066
Epoch 3 (train)2.9s: train loss = 0.25343 = (mf)0.25260 + (embed)0.00082
Epoch 4 (train)3.0s: train loss = 0.23669 = (mf)0.23570 + (embed)0.00099
Epoch 5 (train)3.0s + (eval)0.5s: train loss = 0.23210 = (mf)0.23100 + (embed)0.00111, recall = 0.15584, ndcg = 0.34174, precision = 0.29703, map = 0.08969
Epoch 6 (train)2.9s: train loss = 0.22394 = (mf)0.22274 + (embed)0.00120
Epoch 7 (train)3.0s: train loss = 0.21258 = (mf)0.21126 + (embed)0.00132
Epoch 8 (train)3.1s: train loss = 0.20166 = (mf)0.20020 + (embed)0.00146
Epoch 9 (train)3.0s: train loss = 0.18874 = (mf)0.18712 + (embed)0.00161
Epoch 10 (train)2.8s + (eval)0.3s: train loss = 0.18451 = (mf)0.18273 + (embed)0.00178, recall = 0.17787, ndcg = 0.38410, precision = 0.33521, map = 0.10577
Epoch 11 (train)3.0s: train loss = 0.17410 = (mf)0.17217 + (embed)0.00193
Epoch 12 (train)3.0s: train l

## Infer
* 他の構造の似たデータに対しても実行可能

In [8]:
# recommend_k_itemsにより各ユーザーに対するk個のアイテムを推薦することができる．
# remove_seenをTrueにすると既に見たアイテムを削除することができる
# アイテム以外に算出したランキングスコアを返す
topk_scores = model.recommend_k_items(test, top_k=TOP_K, remove_seen=True)

topk_scores.head()

Unnamed: 0,userID,itemID,prediction
0,1,7,5.792505
1,1,475,5.48312
2,1,919,5.35205
3,1,89,5.296584
4,1,1,5.276995


In [9]:
# 各指標による評価
eval_map = map_at_k(test, topk_scores, k=TOP_K)
eval_ndcg = ndcg_at_k(test, topk_scores, k=TOP_K)
eval_precision = precision_at_k(test, topk_scores, k=TOP_K)
eval_recall = recall_at_k(test, topk_scores, k=TOP_K)

print("MAP:\t%f" % eval_map,
      "NDCG:\t%f" % eval_ndcg,
      "Precision@K:\t%f" % eval_precision,
      "Recall@K:\t%f" % eval_recall, sep='\n')

MAP:	0.135738
NDCG:	0.455456
Precision@K:	0.400424
Recall@K:	0.213484


In [10]:
# 結果の記録
# よく分かっていない
sb.glue("map", eval_map)
sb.glue("ndcg", eval_ndcg)
sb.glue("precision", eval_precision)
sb.glue("recall", eval_recall)

In [11]:
# 学習セットに含まれるユーザーとアイテムの埋め込み情報をcsvファイルで出力できる
model.infer_embedding(user_file, item_file)