<a href="https://colab.research.google.com/github/fuyu-quant/Data_Science/blob/main/Recommendation/LightGCN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# LightGCN
* 最終更新日2022年5月30日
* [実装の参考](https://github.com/microsoft/recommenders/blob/main/examples/02_model_collaborative_filtering/lightgcn_deep_dive.ipynb)

## Install

In [11]:
# recommenders
! git clone https://github.com/microsoft/recommenders

Cloning into 'recommenders'...
remote: Enumerating objects: 34404, done.[K
remote: Counting objects: 100% (5786/5786), done.[K
remote: Compressing objects: 100% (1976/1976), done.[K
remote: Total 34404 (delta 4000), reused 5406 (delta 3724), pack-reused 28618[K
Receiving objects: 100% (34404/34404), 202.12 MiB | 28.28 MiB/s, done.
Resolving deltas: 100% (23031/23031), done.


In [2]:
! pip install papermill

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting papermill
  Downloading papermill-2.3.4-py3-none-any.whl (37 kB)
Collecting ansiwrap
  Downloading ansiwrap-0.8.4-py2.py3-none-any.whl (8.5 kB)
Collecting jupyter-client>=6.1.5
  Downloading jupyter_client-7.3.1-py3-none-any.whl (130 kB)
[K     |████████████████████████████████| 130 kB 47.3 MB/s 
Collecting tornado>=6.0
  Downloading tornado-6.1-cp37-cp37m-manylinux2010_x86_64.whl (428 kB)
[K     |████████████████████████████████| 428 kB 49.6 MB/s 
Collecting textwrap3>=0.9.2
  Downloading textwrap3-0.9.2-py2.py3-none-any.whl (12 kB)
Installing collected packages: tornado, textwrap3, jupyter-client, ansiwrap, papermill
  Attempting uninstall: tornado
    Found existing installation: tornado 5.1.1
    Uninstalling tornado-5.1.1:
      Successfully uninstalled tornado-5.1.1
  Attempting uninstall: jupyter-client
    Found existing installation: jupyter-client 5.3.5
    Uninstal

In [4]:
!pip install scrapbook

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting scrapbook
  Downloading scrapbook-0.5.0-py3-none-any.whl (34 kB)
Installing collected packages: scrapbook
Successfully installed scrapbook-0.5.0


In [8]:
!pip install recommenders[examples]

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting recommenders[examples]
  Downloading recommenders-1.1.0-py3-none-manylinux1_x86_64.whl (335 kB)
[K     |████████████████████████████████| 335 kB 27.2 MB/s 
[?25hCollecting lightfm<2,>=1.15
  Downloading lightfm-1.16.tar.gz (310 kB)
[K     |████████████████████████████████| 310 kB 67.2 MB/s 
[?25hCollecting nltk<4,>=3.4
  Downloading nltk-3.7-py3-none-any.whl (1.5 MB)
[K     |████████████████████████████████| 1.5 MB 50.7 MB/s 
[?25hCollecting retrying>=1.3.3
  Downloading retrying-1.3.3.tar.gz (10 kB)
Collecting pandera[strategies]>=0.6.5
  Downloading pandera-0.9.0-py3-none-any.whl (197 kB)
[K     |████████████████████████████████| 197 kB 55.7 MB/s 
Collecting pyyaml<6,>=5.4.1
  Downloading PyYAML-5.4.1-cp37-cp37m-manylinux1_x86_64.whl (636 kB)
[K     |████████████████████████████████| 636 kB 51.6 MB/s 
Collecting category-encoders<2,>=1.3.0
  Downloading category_encod

## Import

In [9]:
import sys
import os
import papermill as pm
import scrapbook as sb
import pandas as pd
import numpy as np
import tensorflow as tf
tf.get_logger().setLevel('ERROR') # only show error messages

from recommenders.utils.timer import Timer
from recommenders.models.deeprec.models.graphrec.lightgcn import LightGCN
from recommenders.models.deeprec.DataModel.ImplicitCF import ImplicitCF
from recommenders.datasets import movielens
from recommenders.datasets.python_splitters import python_stratified_split
from recommenders.evaluation.python_evaluation import map_at_k, ndcg_at_k, precision_at_k, recall_at_k
from recommenders.utils.constants import SEED as DEFAULT_SEED
from recommenders.models.deeprec.deeprec_utils import prepare_hparams

print("System version: {}".format(sys.version))
print("Pandas version: {}".format(pd.__version__))
print("Tensorflow version: {}".format(tf.__version__))

System version: 3.7.13 (default, Apr 24 2022, 01:04:09) 
[GCC 7.5.0]
Pandas version: 1.3.5
Tensorflow version: 2.8.0


In [29]:
# レコメンドするアイテム数
TOP_K = 10

# Select MovieLens data size: 100k, 1m, 10m, or 20m
MOVIELENS_DATA_SIZE = '100k'

# ハイパーパラメータ
EPOCHS = 50
BATCH_SIZE = 1024

SEED = DEFAULT_SEED  # Set None for non-deterministic results


# パスの設定
yaml_file = "/content/recommenders/recommenders/models/deeprec/config/lightgcn.yaml"
user_file = "/content/data/user_embeddings.csv"
item_file = "/content/data/item_embeddings.csv"

In [14]:
# データの読み込み
df = movielens.load_pandas_df(size=MOVIELENS_DATA_SIZE)

df.head()

100%|██████████| 4.81k/4.81k [00:00<00:00, 15.4kKB/s]


Unnamed: 0,userID,itemID,rating,timestamp
0,196,242,3.0,881250949
1,186,302,3.0,891717742
2,22,377,1.0,878887116
3,244,51,2.0,880606923
4,166,346,1.0,886397596


## Model

In [20]:
# 訓練データとテスト用データに分ける
train, test = python_stratified_split(df, ratio=0.75)

# LightGCN用のデータの用意
data = ImplicitCF(train=train, test=test, seed=SEED)

# LightGCN用のハイパーパラメータ
hparams = prepare_hparams(yaml_file,
                          n_layers=3,
                          batch_size=BATCH_SIZE,
                          epochs=EPOCHS,
                          learning_rate=0.005,
                          eval_epoch=5,
                          top_k=TOP_K,
                         )


model = LightGCN(hparams, data, seed=SEED)

Already create adjacency matrix.
Already normalize adjacency matrix.
Using xavier initialization.


## Train

In [21]:
with Timer() as train_time:
    model.fit()

print(f"学習の時間 : {train_time.interval} 秒")

Epoch 1 (train)2.8s: train loss = 0.46985 = (mf)0.46960 + (embed)0.00025
Epoch 2 (train)2.5s: train loss = 0.28470 = (mf)0.28405 + (embed)0.00066
Epoch 3 (train)2.5s: train loss = 0.25343 = (mf)0.25260 + (embed)0.00082
Epoch 4 (train)2.5s: train loss = 0.23669 = (mf)0.23570 + (embed)0.00099
Epoch 5 (train)2.5s + (eval)0.4s: train loss = 0.23210 = (mf)0.23100 + (embed)0.00111, recall = 0.15584, ndcg = 0.34174, precision = 0.29703, map = 0.08969
Epoch 6 (train)2.5s: train loss = 0.22394 = (mf)0.22274 + (embed)0.00120
Epoch 7 (train)2.5s: train loss = 0.21258 = (mf)0.21126 + (embed)0.00132
Epoch 8 (train)2.6s: train loss = 0.20166 = (mf)0.20020 + (embed)0.00146
Epoch 9 (train)2.5s: train loss = 0.18874 = (mf)0.18712 + (embed)0.00161
Epoch 10 (train)2.5s + (eval)0.2s: train loss = 0.18451 = (mf)0.18273 + (embed)0.00178, recall = 0.17787, ndcg = 0.38410, precision = 0.33521, map = 0.10577
Epoch 11 (train)2.5s: train loss = 0.17410 = (mf)0.17217 + (embed)0.00193
Epoch 12 (train)2.5s: train l

## Infer
* 他の構造の似たデータに対しても実行可能

In [25]:
# recommend_k_itemsにより各ユーザーに対するk個のアイテムを推薦することができる．
# remove_seenをTrueにすると既に見たアイテムを削除することができる
# アイテム以外に算出したランキングスコアを返す
topk_scores = model.recommend_k_items(test, top_k=TOP_K, remove_seen=True)

topk_scores.head()

Unnamed: 0,userID,itemID,prediction
0,1,7,5.792505
1,1,475,5.48312
2,1,919,5.35205
3,1,89,5.296584
4,1,1,5.276995


In [26]:
# 各指標による評価
eval_map = map_at_k(test, topk_scores, k=TOP_K)
eval_ndcg = ndcg_at_k(test, topk_scores, k=TOP_K)
eval_precision = precision_at_k(test, topk_scores, k=TOP_K)
eval_recall = recall_at_k(test, topk_scores, k=TOP_K)

print("MAP:\t%f" % eval_map,
      "NDCG:\t%f" % eval_ndcg,
      "Precision@K:\t%f" % eval_precision,
      "Recall@K:\t%f" % eval_recall, sep='\n')

MAP:	0.135738
NDCG:	0.455456
Precision@K:	0.400424
Recall@K:	0.213484


In [27]:
# 結果の記録
# よく分かっていない
sb.glue("map", eval_map)
sb.glue("ndcg", eval_ndcg)
sb.glue("precision", eval_precision)
sb.glue("recall", eval_recall)

In [30]:
# 学習セットに含まれるユーザーとアイテムの埋め込み情報をcsvファイルで出力できる
model.infer_embedding(user_file, item_file)