# **LightGCN - simplified GCN model for recommendation**
## LightGCN Exploration and Critical Analysis

### Tutored project Study and research work

**Submitted By:**  
Mohamed Fares Mekaoussi  
Faycal Bendakir  
Redouane Arab  
Rim Bozari

---




# LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation

This notebook is dedicated to reproducing the **state-of-the-art** work of [LightGCN](https://arxiv.org/abs/2002.02126), tested on three datasets: 'ml-100k', 'Amazon_Books_sampled', and 'gowalla_sampled'.

---

## RecBole Library

We utilized the [RecBole library](https://recbole.io/docs/) to simplify the coding process. RecBole is a **unified**, **comprehensive**, and **efficient** framework for reproducing and developing recommendation algorithms. It is developed based on Python and PyTorch. The code is available on [GitHub](https://github.com/RUCAIBox/RecBole).

---

## Configuration Files

The configuration files (`lightgcn_config.yaml`, `multivae_config.yaml`, `ngcf_config.yaml`) and the dataset directory must be in the **same directory** as this notebook. These files contain the hyperparameters and settings for the models.

---

## Training and Results

After training the models, a directory named 'log_tensorboard' will be created. This directory contains the results for TensorBoard. To visualize these results:

1. Enter the command `tensorboard --logdir=log_tensorboard` in the terminal.
2. Navigate to http://localhost:6006/ in your web browser.


In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
%cd /content/drive/MyDrive/LightGCN_expr/src

/content/drive/MyDrive/LightGCN_expr/src


In [None]:
!ls

dataset  lightgcn_config.yaml  multivae_config.yaml  ngcf_config.yaml  train_main.py


# Installing the necessary libraries:

---

In [None]:
!pip install recbole
!pip install ray
!pip install kmeans_pytorch
!pip install jax

Collecting recbole
  Downloading recbole-1.2.0-py3-none-any.whl (2.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m12.9 MB/s[0m eta [36m0:00:00[0m
Collecting colorlog==4.7.2 (from recbole)
  Downloading colorlog-4.7.2-py2.py3-none-any.whl (10 kB)
Collecting colorama==0.4.4 (from recbole)
  Downloading colorama-0.4.4-py2.py3-none-any.whl (16 kB)
Collecting thop>=0.1.1.post2207130030 (from recbole)
  Downloading thop-0.1.1.post2209072238-py3-none-any.whl (15 kB)
Collecting texttable>=0.9.0 (from recbole)
  Downloading texttable-1.7.0-py2.py3-none-any.whl (10 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch>=1.10.0->recbole)
  Using cached nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch>=1.10.0->recbole)
  Using cached nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch>=1.10

In [None]:
import time
from contextlib import contextmanager
from recbole.quick_start import run_recbole

### This function calculates the time of training

In [None]:
@contextmanager
def timer(label: str):
    start = time.time()
    try:
        yield
    finally:
        end = time.time()
        print(f"{label} took {(end - start) / 60:.2f} mins")

The `run_experiments` function automates the process of running a series of experiments on different datasets with a specified model and configuration.

1. **Prepare Output File**: An output file is created named `results_{model_name}.txt` to store the results.

2. **Loop Over Datasets**: For each dataset in the provided list, the function performs the following steps:

   - **Start Timer**: A timer is started to measure the time taken for the experiment.
   
   - **Run Experiment**: The `run_recbole` function is called with the current model, dataset, and configuration file. The result is stored.
   
   - **Print and Write Results**: The result is printed to the console and written to the output file.
   


In [None]:
def run_experiments(datasets, model, config_file):
    output_file = f"results_{model.lower()}.txt"
    with open(output_file, 'w') as f:
        for dataset in datasets:
            print(f"Running experiment for {model} on {dataset}...")
            with timer(f"{dataset}-{model}"):
                result = run_recbole(model=model, dataset=dataset, config_file_list=[config_file])
                print(result)
                f.write(f"{'='*20} {dataset}-{model} {'='*20}\n")
                f.write(str(result) + '\n\n')
            print("-" * 50)

The list of the Datasets

In [None]:
datasets = ['ml-100k', 'Amazon_Books_sampled', 'gowalla_sampled']

### **Training lightGCN**

---

In [None]:
config_file = 'lightgcn_config.yaml'
run_experiments(datasets, 'LightGCN', config_file)



Running experiment for LightGCN on ml-100k...


  SparseL = torch.sparse.FloatTensor(i, data, torch.Size(L.shape))
Train     0: 100%|█████████████████████████| 20/20 [00:05<00:00,  3.92it/s, GPU RAM: 0.03 G/14.75 G]
Train     1: 100%|█████████████████████████| 20/20 [00:02<00:00,  9.09it/s, GPU RAM: 0.03 G/14.75 G]
Train     2: 100%|█████████████████████████| 20/20 [00:02<00:00,  7.98it/s, GPU RAM: 0.03 G/14.75 G]
Train     3: 100%|█████████████████████████| 20/20 [00:02<00:00,  8.92it/s, GPU RAM: 0.03 G/14.75 G]
Train     4: 100%|█████████████████████████| 20/20 [00:02<00:00,  8.18it/s, GPU RAM: 0.03 G/14.75 G]
Train     5: 100%|█████████████████████████| 20/20 [00:01<00:00, 19.24it/s, GPU RAM: 0.03 G/14.75 G]
Train     6: 100%|█████████████████████████| 20/20 [00:00<00:00, 21.89it/s, GPU RAM: 0.03 G/14.75 G]
Train     7: 100%|█████████████████████████| 20/20 [00:00<00:00, 24.29it/s, GPU RAM: 0.03 G/14.75 G]
Train     8: 100%|█████████████████████████| 20/20 [00:00<00:00, 23.81it/s, GPU RAM: 0.03 G/14.75 G]
Train     9: 100%|██████

{'best_valid_score': 0.7487, 'valid_score_bigger': True, 'best_valid_result': OrderedDict([('recall@10', 0.2125), ('mrr@10', 0.3896), ('ndcg@10', 0.229), ('hit@10', 0.7487), ('precision@10', 0.1584)]), 'test_result': OrderedDict([('recall@10', 0.2518), ('mrr@10', 0.4807), ('ndcg@10', 0.2915), ('hit@10', 0.7964), ('precision@10', 0.1988)])}
ml-100k-LightGCN took 8.71 mins
--------------------------------------------------
Running experiment for LightGCN on Amazon_Books_sampled...


Train     0: 100%|█████████████████████████| 21/21 [00:01<00:00, 20.73it/s, GPU RAM: 0.07 G/14.75 G]
Train     1: 100%|█████████████████████████| 21/21 [00:00<00:00, 21.44it/s, GPU RAM: 0.07 G/14.75 G]
Train     2: 100%|█████████████████████████| 21/21 [00:01<00:00, 15.36it/s, GPU RAM: 0.07 G/14.75 G]
Train     3: 100%|█████████████████████████| 21/21 [00:01<00:00, 17.83it/s, GPU RAM: 0.07 G/14.75 G]
Train     4: 100%|█████████████████████████| 21/21 [00:01<00:00, 16.47it/s, GPU RAM: 0.07 G/14.75 G]
Train     5: 100%|█████████████████████████| 21/21 [00:01<00:00, 20.37it/s, GPU RAM: 0.07 G/14.75 G]
Train     6: 100%|█████████████████████████| 21/21 [00:01<00:00, 20.40it/s, GPU RAM: 0.07 G/14.75 G]
Train     7: 100%|█████████████████████████| 21/21 [00:01<00:00, 17.67it/s, GPU RAM: 0.07 G/14.75 G]
Train     8: 100%|█████████████████████████| 21/21 [00:01<00:00, 16.61it/s, GPU RAM: 0.07 G/14.75 G]
Train     9: 100%|█████████████████████████| 21/21 [00:01<00:00, 18.59it/s, GPU RAM: 0.07 G

{'best_valid_score': 0.1845, 'valid_score_bigger': True, 'best_valid_result': OrderedDict([('recall@10', 0.0608), ('mrr@10', 0.0624), ('ndcg@10', 0.0418), ('hit@10', 0.1845), ('precision@10', 0.02)]), 'test_result': OrderedDict([('recall@10', 0.0536), ('mrr@10', 0.0599), ('ndcg@10', 0.0384), ('hit@10', 0.1662), ('precision@10', 0.0184)])}
Amazon_Books_sampled-LightGCN took 27.31 mins
--------------------------------------------------
Running experiment for LightGCN on gowalla_sampled...


Train     0: 100%|█████████████████████████| 14/14 [00:00<00:00, 17.36it/s, GPU RAM: 0.07 G/14.75 G]
Train     1: 100%|█████████████████████████| 14/14 [00:00<00:00, 22.93it/s, GPU RAM: 0.07 G/14.75 G]
Train     2: 100%|█████████████████████████| 14/14 [00:00<00:00, 22.85it/s, GPU RAM: 0.07 G/14.75 G]
Train     3: 100%|█████████████████████████| 14/14 [00:00<00:00, 23.76it/s, GPU RAM: 0.07 G/14.75 G]
Train     4: 100%|█████████████████████████| 14/14 [00:00<00:00, 21.74it/s, GPU RAM: 0.07 G/14.75 G]
Train     5: 100%|█████████████████████████| 14/14 [00:00<00:00, 21.69it/s, GPU RAM: 0.07 G/14.75 G]
Train     6: 100%|█████████████████████████| 14/14 [00:00<00:00, 22.01it/s, GPU RAM: 0.07 G/14.75 G]
Train     7: 100%|█████████████████████████| 14/14 [00:00<00:00, 19.50it/s, GPU RAM: 0.07 G/14.75 G]
Train     8: 100%|█████████████████████████| 14/14 [00:00<00:00, 20.00it/s, GPU RAM: 0.07 G/14.75 G]
Train     9: 100%|█████████████████████████| 14/14 [00:00<00:00, 21.80it/s, GPU RAM: 0.07 G

{'best_valid_score': 0.5805, 'valid_score_bigger': True, 'best_valid_result': OrderedDict([('recall@10', 0.2229), ('mrr@10', 0.2946), ('ndcg@10', 0.1905), ('hit@10', 0.5805), ('precision@10', 0.0949)]), 'test_result': OrderedDict([('recall@10', 0.236), ('mrr@10', 0.3193), ('ndcg@10', 0.2027), ('hit@10', 0.5749), ('precision@10', 0.0973)])}
gowalla_sampled-LightGCN took 13.19 mins
--------------------------------------------------


### **Training NGCF**

---

In [None]:
config_file = 'ngcf_config.yaml'  # Make sure you've uploaded this file
run_experiments(datasets, 'NGCF', config_file)

Running experiment for NGCF on ml-100k...


  return torch.sparse.FloatTensor(i, val)
Train     0: 100%|█████████████████████████| 20/20 [00:01<00:00, 11.43it/s, GPU RAM: 0.08 G/14.75 G]
Train     1: 100%|█████████████████████████| 20/20 [00:01<00:00, 13.17it/s, GPU RAM: 0.08 G/14.75 G]
Train     2: 100%|█████████████████████████| 20/20 [00:01<00:00, 12.63it/s, GPU RAM: 0.08 G/14.75 G]
Train     3: 100%|█████████████████████████| 20/20 [00:01<00:00, 15.33it/s, GPU RAM: 0.08 G/14.75 G]
Train     4: 100%|█████████████████████████| 20/20 [00:01<00:00, 13.84it/s, GPU RAM: 0.08 G/14.75 G]
Train     5: 100%|█████████████████████████| 20/20 [00:01<00:00, 14.34it/s, GPU RAM: 0.08 G/14.75 G]
Train     6: 100%|█████████████████████████| 20/20 [00:01<00:00, 12.86it/s, GPU RAM: 0.08 G/14.75 G]
Train     7: 100%|█████████████████████████| 20/20 [00:01<00:00, 15.76it/s, GPU RAM: 0.08 G/14.75 G]
Train     8: 100%|█████████████████████████| 20/20 [00:01<00:00, 16.65it/s, GPU RAM: 0.08 G/14.75 G]
Train     9: 100%|█████████████████████████| 20/2

{'best_valid_score': 0.755, 'valid_score_bigger': True, 'best_valid_result': OrderedDict([('recall@10', 0.2231), ('mrr@10', 0.398), ('ndcg@10', 0.2395), ('hit@10', 0.755), ('precision@10', 0.1663)]), 'test_result': OrderedDict([('recall@10', 0.2605), ('mrr@10', 0.4873), ('ndcg@10', 0.2999), ('hit@10', 0.8134), ('precision@10', 0.2069)])}
ml-100k-NGCF took 13.15 mins
--------------------------------------------------
Running experiment for NGCF on Amazon_Books_sampled...


Train     0: 100%|█████████████████████████| 21/21 [00:01<00:00, 14.64it/s, GPU RAM: 0.14 G/14.75 G]
Train     1: 100%|█████████████████████████| 21/21 [00:01<00:00, 12.14it/s, GPU RAM: 0.14 G/14.75 G]
Train     2: 100%|█████████████████████████| 21/21 [00:01<00:00, 14.56it/s, GPU RAM: 0.14 G/14.75 G]
Train     3: 100%|█████████████████████████| 21/21 [00:01<00:00, 12.41it/s, GPU RAM: 0.14 G/14.75 G]
Train     4: 100%|█████████████████████████| 21/21 [00:01<00:00, 13.10it/s, GPU RAM: 0.14 G/14.75 G]
Train     5: 100%|█████████████████████████| 21/21 [00:01<00:00, 11.99it/s, GPU RAM: 0.14 G/14.75 G]
Train     6: 100%|█████████████████████████| 21/21 [00:01<00:00, 13.33it/s, GPU RAM: 0.14 G/14.75 G]
Train     7: 100%|█████████████████████████| 21/21 [00:01<00:00, 14.32it/s, GPU RAM: 0.14 G/14.75 G]
Train     8: 100%|█████████████████████████| 21/21 [00:01<00:00, 13.18it/s, GPU RAM: 0.14 G/14.75 G]
Train     9: 100%|█████████████████████████| 21/21 [00:01<00:00, 15.77it/s, GPU RAM: 0.14 G

{'best_valid_score': 0.1834, 'valid_score_bigger': True, 'best_valid_result': OrderedDict([('recall@10', 0.0611), ('mrr@10', 0.0581), ('ndcg@10', 0.0405), ('hit@10', 0.1834), ('precision@10', 0.02)]), 'test_result': OrderedDict([('recall@10', 0.0564), ('mrr@10', 0.0568), ('ndcg@10', 0.038), ('hit@10', 0.174), ('precision@10', 0.0188)])}
Amazon_Books_sampled-NGCF took 43.49 mins
--------------------------------------------------
Running experiment for NGCF on gowalla_sampled...


Train     0: 100%|█████████████████████████| 14/14 [00:00<00:00, 14.09it/s, GPU RAM: 0.16 G/14.75 G]
Train     1: 100%|█████████████████████████| 14/14 [00:00<00:00, 15.14it/s, GPU RAM: 0.16 G/14.75 G]
Train     2: 100%|█████████████████████████| 14/14 [00:01<00:00, 11.34it/s, GPU RAM: 0.16 G/14.75 G]
Train     3: 100%|█████████████████████████| 14/14 [00:01<00:00, 13.30it/s, GPU RAM: 0.16 G/14.75 G]
Train     4: 100%|█████████████████████████| 14/14 [00:00<00:00, 16.46it/s, GPU RAM: 0.16 G/14.75 G]
Train     5: 100%|█████████████████████████| 14/14 [00:00<00:00, 17.24it/s, GPU RAM: 0.16 G/14.75 G]
Train     6: 100%|█████████████████████████| 14/14 [00:00<00:00, 15.62it/s, GPU RAM: 0.16 G/14.75 G]
Train     7: 100%|█████████████████████████| 14/14 [00:01<00:00, 10.28it/s, GPU RAM: 0.16 G/14.75 G]
Train     8: 100%|█████████████████████████| 14/14 [00:01<00:00, 11.68it/s, GPU RAM: 0.16 G/14.75 G]
Train     9: 100%|█████████████████████████| 14/14 [00:01<00:00, 11.78it/s, GPU RAM: 0.16 G

{'best_valid_score': 0.5991, 'valid_score_bigger': True, 'best_valid_result': OrderedDict([('recall@10', 0.2365), ('mrr@10', 0.2873), ('ndcg@10', 0.1899), ('hit@10', 0.5991), ('precision@10', 0.0939)]), 'test_result': OrderedDict([('recall@10', 0.2466), ('mrr@10', 0.3261), ('ndcg@10', 0.2132), ('hit@10', 0.5886), ('precision@10', 0.0968)])}
gowalla_sampled-NGCF took 18.48 mins
--------------------------------------------------


### **Training MultiVAE**

---

In [None]:
config_file = 'multivae_config.yaml'  # Make sure you've uploaded this file
run_experiments(datasets, 'MultiVAE', config_file)

Running experiment for MultiVAE on ml-100k...


Train     0: 100%|███████████████████████████| 1/1 [00:00<00:00,  1.74it/s, GPU RAM: 0.16 G/14.75 G]
Train     1: 100%|███████████████████████████| 1/1 [00:00<00:00, 16.77it/s, GPU RAM: 0.16 G/14.75 G]
Train     2: 100%|███████████████████████████| 1/1 [00:00<00:00,  2.47it/s, GPU RAM: 0.16 G/14.75 G]
Train     3: 100%|███████████████████████████| 1/1 [00:00<00:00,  4.54it/s, GPU RAM: 0.16 G/14.75 G]
Train     4: 100%|███████████████████████████| 1/1 [00:00<00:00, 14.33it/s, GPU RAM: 0.16 G/14.75 G]
Train     5: 100%|███████████████████████████| 1/1 [00:00<00:00, 33.57it/s, GPU RAM: 0.16 G/14.75 G]
Train     6: 100%|███████████████████████████| 1/1 [00:00<00:00, 32.34it/s, GPU RAM: 0.16 G/14.75 G]
Train     7: 100%|███████████████████████████| 1/1 [00:00<00:00, 15.38it/s, GPU RAM: 0.16 G/14.75 G]
Train     8: 100%|███████████████████████████| 1/1 [00:00<00:00, 14.70it/s, GPU RAM: 0.16 G/14.75 G]
Train     9: 100%|███████████████████████████| 1/1 [00:00<00:00, 18.78it/s, GPU RAM: 0.16 G

{'best_valid_score': 0.7688, 'valid_score_bigger': True, 'best_valid_result': OrderedDict([('recall@10', 0.2258), ('mrr@10', 0.385), ('ndcg@10', 0.2348), ('hit@10', 0.7688), ('precision@10', 0.1599)]), 'test_result': OrderedDict([('recall@10', 0.2715), ('mrr@10', 0.475), ('ndcg@10', 0.2974), ('hit@10', 0.8229), ('precision@10', 0.2007)])}
ml-100k-MultiVAE took 7.70 mins
--------------------------------------------------
Running experiment for MultiVAE on Amazon_Books_sampled...


Train     0: 100%|███████████████████████████| 1/1 [00:00<00:00,  4.20it/s, GPU RAM: 0.47 G/14.75 G]
Train     1: 100%|███████████████████████████| 1/1 [00:00<00:00,  7.00it/s, GPU RAM: 0.59 G/14.75 G]
Train     2: 100%|███████████████████████████| 1/1 [00:00<00:00,  5.16it/s, GPU RAM: 0.59 G/14.75 G]
Train     3: 100%|███████████████████████████| 1/1 [00:00<00:00,  5.50it/s, GPU RAM: 0.59 G/14.75 G]
Train     4: 100%|███████████████████████████| 1/1 [00:00<00:00,  7.71it/s, GPU RAM: 0.59 G/14.75 G]
Train     5: 100%|███████████████████████████| 1/1 [00:00<00:00,  3.28it/s, GPU RAM: 0.59 G/14.75 G]
Train     6: 100%|███████████████████████████| 1/1 [00:00<00:00,  3.01it/s, GPU RAM: 0.59 G/14.75 G]
Train     7: 100%|███████████████████████████| 1/1 [00:00<00:00,  6.06it/s, GPU RAM: 0.59 G/14.75 G]
Train     8: 100%|███████████████████████████| 1/1 [00:00<00:00,  3.02it/s, GPU RAM: 0.59 G/14.75 G]
Train     9: 100%|███████████████████████████| 1/1 [00:00<00:00,  3.18it/s, GPU RAM: 0.59 G