# Baseline Sequential Recommendation Models — Paper Experiments

**Dataset:** MovieLens-1M &nbsp;|&nbsp; **Evaluation:** HR@K · NDCG@K · MRR@K &nbsp;|&nbsp; K ∈ {5, 10, 20}

---

## Models & Paper-Accurate Hyperparameters

| Model | Paper | d | heads | layers | max_len | dropout | Notes |
|-------|-------|---|-------|--------|---------|---------|-------|
| **GRU4Rec** | Hidasi et al., RecSys 2016 | 64 | — | 1 GRU | 50 | 0.1 | — |
| **SASRec** | Kang & McAuley, ICDM 2018 | 50 | 1 | 2 blocks | 200 | 0.2 | d_ff=200 |
| **BERT4Rec** | Sun et al., CIKM 2019 | 64 | 2 | 2 blocks | 200 | 0.2 | d_ff=256 |
| **LightGCN** | He et al., SIGIR 2020 | 64 | — | 3 GCN | 50 | 0.0 | wd=1e-4 |
| **Caser** | Tang & Wang, WSDM 2018 | 50 | — | — | 50 | 0.5 | L=5, nh=16, nv=4 |

---

## Quick Start
1. Run **Step 1** to set up the environment
2. Run **Step 2** to verify / prepare data
3. Run **Steps 3–7** to train each model *(~1–2 h per model on GPU)*
4. Run **Steps 8–10** to collect results, plot charts, and export LaTeX


## Step 1: Environment Setup

In [None]:
import os, sys

# Make sure we are at the project root
PROJECT_ROOT = os.path.abspath('..')
os.chdir(PROJECT_ROOT)
sys.path.insert(0, PROJECT_ROOT)

import torch

DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'

print('=' * 60)
print('ENVIRONMENT')
print('=' * 60)
print(f'  Working dir  : {os.getcwd()}')
print(f'  Device       : {DEVICE}')
if DEVICE == 'cuda':
    print(f'  GPU          : {torch.cuda.get_device_name(0)}')
    mem_gb = torch.cuda.get_device_properties(0).total_memory / 1e9
    print(f'  GPU Memory   : {mem_gb:.1f} GB')
print(f'  PyTorch      : {torch.__version__}')
print('=' * 60)
print()
print('Training settings (paper-level):')
print('  Max epochs : 200  (with early stopping patience=20)')
print('  Batch size : 256  (128 for SASRec)')
print('  LR         : 0.001')
print('  Seed       : 42')
print()
print('Models to train:')
print('  [1] GRU4Rec  -- Hidasi et al., RecSys 2016')
print('  [2] SASRec   -- Kang & McAuley, ICDM 2018')
print('  [3] BERT4Rec -- Sun et al., CIKM 2019')
print('  [4] LightGCN -- He et al., SIGIR 2020')
print('  [5] Caser    -- Tang & Wang, WSDM 2018')
print('=' * 60)

## Step 2: Prepare Data

Checks for preprocessed MovieLens-1M files. Downloads and preprocesses automatically if missing.

In [None]:
import os, pickle, numpy as np

data_file  = 'data/ml-1m/processed/sequences.pkl'
graph_file = 'data/graphs/cooccurrence_graph.pkl'
raw_file   = 'data/ml-1m/raw/ml-1m/ratings.dat'

print('=' * 70)
print('Checking Data Files')
print('=' * 70)

files_ok = True
for path, label in [(data_file, 'Sequences (pkl)'), (graph_file, 'Graph (pkl)'), (raw_file, 'Raw ratings')]:
    if os.path.exists(path):
        mb = os.path.getsize(path) / 1024 / 1024
        print(f'  OK  {label:20s}: {path}  ({mb:.2f} MB)')
    else:
        print(f'  MISSING  {label:20s}: {path}')
        files_ok = False

if not files_ok:
    print()
    print('Running preprocessing...')
    if not os.path.exists(raw_file):
        print('Downloading MovieLens-1M...')
        !mkdir -p data/ml-1m/raw
        !wget -q http://files.grouplens.org/datasets/movielens/ml-1m.zip
        !unzip -q ml-1m.zip -d data/ml-1m/raw/
        !rm -f ml-1m.zip
    !python -m src.data.preprocess
    !python -m src.data.graph_builder
    print('Preprocessing complete!')
else:
    print()
    print('All data files ready!')
print('=' * 70)

# Print dataset statistics for paper
with open(data_file, 'rb') as f:
    data = pickle.load(f)

cfg         = data['config']
seq_lengths = [len(s) for s in data['train_sequences'].values()]

short_u = sum(1 for l in seq_lengths if l <= 10)
long_u  = sum(1 for l in seq_lengths if l > 50)
mid_u   = cfg['num_users'] - short_u - long_u

print()
print('=== DATASET STATISTICS (for paper) ===')
print(f'  Dataset       : MovieLens-1M')
print(f'  Filtering     : rating >= 4 (implicit positive), min_seq_len = 5')
print(f'  Split         : leave-one-out (val=second-last, test=last)')
print(f'  Users         : {cfg["num_users"]:,}')
print(f'  Items         : {cfg["num_items"]:,}')
print(f'  Train seqs    : {len(data["train_sequences"]):,}')
print(f'  Val instances : {len(data["val_data"]):,}')
print(f'  Test instances: {len(data["test_data"]):,}')
print(f'  Seq len avg   : {np.mean(seq_lengths):.1f}')
print(f'  Seq len median: {np.median(seq_lengths):.1f}')
print(f'  Seq len min   : {min(seq_lengths)}')
print(f'  Seq len max   : {max(seq_lengths)}')
print()
print('  User groups (based on training sequence length):')
print(f'    Short  (len <= 10) : {short_u:,}  ({100*short_u/cfg["num_users"]:.1f}%)')
print(f'    Medium (11 <= 50)  : {mid_u:,}  ({100*mid_u/cfg["num_users"]:.1f}%)')
print(f'    Long   (len >  50) : {long_u:,}  ({100*long_u/cfg["num_users"]:.1f}%)')

## Step 3: Train GRU4Rec

> Hidasi, B., Karatzoglou, A., Baltrunas, L., & Tikk, D. (2016). *Session-based Recommendations with Recurrent Neural Networks.* ICLR.

**Paper config (ML-1M):** d=64, n_layers=1, dropout=0.1, batch=256, lr=0.001, epochs=200 (early stop patience=20)

Expected time: ~30–60 min (GPU) | ~4–6 h (CPU)

In [None]:
!python -m experiments.run_experiment \
    --model gru4rec \
    --d_model 64 \
    --n_layers 1 \
    --dropout 0.1 \
    --max_len 50 \
    --epochs 200 \
    --patience 20 \
    --batch_size 256 \
    --lr 0.001 \
    --weight_decay 0.0 \
    --num_workers 0 \
    --seed 42

print()
print('GRU4Rec training complete!')

## Step 4: Train SASRec

> Kang, W., & McAuley, J. (2018). *Self-Attentive Sequential Recommendation.* ICDM.

**Paper config (ML-1M):** d=50, n_heads=1, n_blocks=2, d_ff=200, max_len=200, dropout=0.2, batch=128, lr=0.001

Expected time: ~60–90 min (GPU) | ~5–8 h (CPU)

In [None]:
!python -m experiments.run_experiment \
    --model sasrec \
    --d_model 50 \
    --n_heads 1 \
    --n_blocks 2 \
    --d_ff 200 \
    --max_len 200 \
    --dropout 0.2 \
    --epochs 200 \
    --patience 20 \
    --batch_size 128 \
    --lr 0.001 \
    --weight_decay 0.0 \
    --num_workers 0 \
    --seed 42

print()
print('SASRec training complete!')

## Step 5: Train BERT4Rec

> Sun, F., Liu, J., Wu, J., Pei, C., Lin, X., Ou, W., & Jiang, P. (2019). *BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer.* CIKM.

**Paper config (ML-1M):** d=64, n_heads=2, n_blocks=2, d_ff=256, max_len=200, dropout=0.2, batch=256, lr=0.001

Expected time: ~60–90 min (GPU) | ~5–8 h (CPU)

In [None]:
!python -m experiments.run_experiment \
    --model bert4rec \
    --d_model 64 \
    --n_heads 2 \
    --n_blocks 2 \
    --d_ff 256 \
    --max_len 200 \
    --dropout 0.2 \
    --epochs 200 \
    --patience 20 \
    --batch_size 256 \
    --lr 0.001 \
    --weight_decay 0.0 \
    --num_workers 0 \
    --seed 42

print()
print('BERT4Rec training complete!')

## Step 6: Train LightGCN

> He, X., Deng, K., Wang, X., Li, Y., Zhang, Y., & Wang, M. (2020). *LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation.* SIGIR.

**Paper config (ML-1M):** d=64, gnn_layers=3, no dropout, weight_decay=1e-4, batch=256, lr=0.001

Expected time: ~30–60 min (GPU) | ~3–5 h (CPU)

In [None]:
!python -m experiments.run_experiment \
    --model lightgcn \
    --d_model 64 \
    --gnn_layers 3 \
    --dropout 0.0 \
    --max_len 50 \
    --epochs 200 \
    --patience 20 \
    --batch_size 256 \
    --lr 0.001 \
    --weight_decay 1e-4 \
    --num_workers 0 \
    --seed 42

print()
print('LightGCN training complete!')

## Step 7: Train Caser

> Tang, J., & Wang, K. (2018). *Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding.* WSDM.

**Paper config (ML-1M):** d=50, L=5 (window), nh=16 (horizontal filters), nv=4 (vertical filters), dropout=0.5, batch=256, lr=0.001

Expected time: ~30–60 min (GPU) | ~3–5 h (CPU)

In [None]:
!python -m experiments.run_experiment \
    --model caser \
    --d_model 50 \
    --L_caser 5 \
    --nh 16 \
    --nv 4 \
    --dropout 0.5 \
    --max_len 50 \
    --epochs 200 \
    --patience 20 \
    --batch_size 256 \
    --lr 0.001 \
    --weight_decay 1e-4 \
    --num_workers 0 \
    --seed 42

print()
print('Caser training complete!')

## Step 8: Collect & Compare Results

Reads all `results/*/results.json` files produced by Steps 3–7 and builds a comparison table.

In [None]:
import glob, json, os
import pandas as pd
import numpy as np

RESULTS_DIR  = 'results'
MODEL_MAP    = {'gru4rec': 'GRU4Rec', 'sasrec': 'SASRec',
                'bert4rec': 'BERT4Rec', 'lightgcn': 'LightGCN', 'caser': 'Caser'}
PAPER_ORDER  = ['GRU4Rec', 'SASRec', 'BERT4Rec', 'LightGCN', 'Caser']
K_LIST       = [5, 10, 20]

# Collect latest run per baseline model
latest = {}  # model_key -> (exp_dir, mtime)
for exp_dir in glob.glob(os.path.join(RESULTS_DIR, '*')):
    cfg_path = os.path.join(exp_dir, 'config.json')
    res_path = os.path.join(exp_dir, 'results.json')
    if not (os.path.exists(cfg_path) and os.path.exists(res_path)):
        continue
    with open(cfg_path) as f:
        cfg = json.load(f)
    model_key = cfg.get('model', '')
    if model_key not in MODEL_MAP:
        continue
    mtime = os.path.getmtime(res_path)
    if model_key not in latest or mtime > latest[model_key][1]:
        latest[model_key] = (exp_dir, mtime)

if not latest:
    print('No baseline results found. Run Steps 3-7 first.')
else:
    rows = []
    for model_key, (exp_dir, _) in latest.items():
        with open(os.path.join(exp_dir, 'config.json'))  as f: cfg = json.load(f)
        with open(os.path.join(exp_dir, 'results.json')) as f: res = json.load(f)
        m    = res['test_metrics']
        name = MODEL_MAP[model_key]
        row  = {'Model': name, 'Best Epoch': res.get('best_epoch', '-'),
                'Best Val NDCG@10': round(res.get('best_val_metric', 0), 4)}
        for k in K_LIST:
            row[f'HR@{k}']   = round(m.get(f'HR@{k}',   0), 4)
            row[f'NDCG@{k}'] = round(m.get(f'NDCG@{k}', 0), 4)
            row[f'MRR@{k}']  = round(m.get(f'MRR@{k}',  0), 4)
        rows.append(row)

    df = (pd.DataFrame(rows)
            .set_index('Model')
            .reindex([m for m in PAPER_ORDER if m in [r['Model'] for r in rows]]))

    print('=' * 80)
    print('OVERALL TEST RESULTS  (MovieLens-1M, leave-one-out evaluation)')
    print('=' * 80)
    print(df.to_string())
    print()
    print(f'Best HR@10  : {df["HR@10"].idxmax()}  ({df["HR@10"].max():.4f})')
    print(f'Best NDCG@10: {df["NDCG@10"].idxmax()}  ({df["NDCG@10"].max():.4f})')
    print(f'Best MRR@10 : {df["MRR@10"].idxmax()}  ({df["MRR@10"].max():.4f})')
    df.to_csv('baseline_overall_results.csv')
    print()
    print('Saved: baseline_overall_results.csv')

In [None]:
# Results by user group (short / medium / long)
if latest:
    group_rows = []
    for model_key, (exp_dir, _) in latest.items():
        with open(os.path.join(exp_dir, 'results.json')) as f:
            res = json.load(f)
        name = MODEL_MAP[model_key]
        for group, gm in res.get('grouped_metrics', {}).items():
            r = {'Model': name, 'Group': group}
            for k in K_LIST:
                r[f'HR@{k}']   = round(gm.get(f'HR@{k}',   0), 4)
                r[f'NDCG@{k}'] = round(gm.get(f'NDCG@{k}', 0), 4)
                r[f'MRR@{k}']  = round(gm.get(f'MRR@{k}',  0), 4)
            r['Count'] = gm.get('count', '-')
            group_rows.append(r)

    if group_rows:
        df_grp = pd.DataFrame(group_rows)
        pivot = (df_grp.pivot_table(index='Model', columns='Group', values='NDCG@10')
                       .reindex([m for m in PAPER_ORDER if m in df_grp['Model'].values]))
        # Reorder columns
        col_order = [c for c in ['short','medium','long','overall'] if c in pivot.columns]
        pivot = pivot[col_order]
        print('=' * 60)
        print('NDCG@10 BY USER GROUP')
        print('=' * 60)
        print(pivot.to_string())
        df_grp.to_csv('baseline_grouped_results.csv', index=False)
        print()
        print('Saved: baseline_grouped_results.csv')
    else:
        print('No grouped metrics found in results.')

## Step 9: Visualizations

In [None]:
import matplotlib.pyplot as plt
import numpy as np

if latest:
    model_names  = [m for m in PAPER_ORDER if m in df.index]
    metrics_sets = [
        ('Hit Rate @ K',  'HR@5',   'HR@10',   'HR@20'),
        ('NDCG @ K',      'NDCG@5', 'NDCG@10', 'NDCG@20'),
        ('MRR @ K',       'MRR@5',  'MRR@10',  'MRR@20'),
    ]
    colors = ['#2ecc71', '#3498db', '#e74c3c']
    x      = np.arange(len(model_names))
    width  = 0.22

    fig, axes = plt.subplots(1, 3, figsize=(18, 5))
    for ax, (title, m5, m10, m20) in zip(axes, metrics_sets):
        for i, (col, lbl, clr) in enumerate([(m5,'@5',colors[0]),(m10,'@10',colors[1]),(m20,'@20',colors[2])]):
            vals = [df.loc[m, col] if m in df.index else 0 for m in model_names]
            ax.bar(x + (i-1)*width, vals, width, label=lbl, color=clr, edgecolor='white')
        ax.set_xticks(x)
        ax.set_xticklabels(model_names, rotation=15, ha='right')
        ax.set_ylabel('Score')
        ax.set_title(title, fontweight='bold')
        ax.legend()
        ax.set_ylim(bottom=0)

    plt.suptitle('Baseline Model Comparison on MovieLens-1M', fontsize=14, fontweight='bold', y=1.01)
    plt.tight_layout()
    plt.savefig('baseline_metric_comparison.png', dpi=150, bbox_inches='tight')
    plt.show()
    print('Saved: baseline_metric_comparison.png')

In [None]:
import seaborn as sns

# Heatmap: NDCG@10 by user group
if latest and group_rows:
    col_order = [c for c in ['short','medium','long','overall'] if c in pivot.columns]
    heat = pivot[col_order].copy()
    heat.columns = [c.capitalize() for c in heat.columns]

    fig, ax = plt.subplots(figsize=(8, 4))
    sns.heatmap(heat, annot=True, fmt='.4f', cmap='YlOrRd',
                linewidths=0.5, cbar_kws={'label': 'NDCG@10'}, ax=ax)
    ax.set_title('NDCG@10 by User Group — MovieLens-1M', fontweight='bold')
    ax.set_xlabel('')
    plt.tight_layout()
    plt.savefig('baseline_group_heatmap.png', dpi=150, bbox_inches='tight')
    plt.show()
    print('Saved: baseline_group_heatmap.png')

## Step 10: Paper-Ready LaTeX Table

Copy the output directly into your paper (requires `booktabs` package).

In [None]:
if latest:
    key_cols = ['HR@5','NDCG@5','MRR@5','HR@10','NDCG@10','MRR@10','HR@20','NDCG@20','MRR@20']
    df_paper = df[key_cols]

    col_fmt = 'l' + 'r' * len(key_cols)
    header  = 'Model & ' + ' & '.join(key_cols) + ' \\\\'

    lines = [
        '\\begin{table}[ht]',
        '  \\centering',
        '  \\caption{Performance comparison of baseline sequential recommendation models on MovieLens-1M.}',
        '  \\label{tab:baselines}',
        f'  \\begin{{tabular}}{{{col_fmt}}}',
        '    \\toprule',
        f'    {header}',
        '    \\midrule',
    ]

    for idx, row in df_paper.iterrows():
        cells = []
        for col, v in row.items():
            # Bold the best value in each column
            if v == df_paper[col].max():
                cells.append(f'\\textbf{{{v:.4f}}}')
            else:
                cells.append(f'{v:.4f}')
        lines.append(f'    {idx} & ' + ' & '.join(cells) + ' \\\\')

    lines += ['    \\bottomrule', '  \\end{tabular}', '\\end{table}']
    latex = '\n'.join(lines)

    print(latex)
    with open('baseline_latex_table.tex', 'w') as f:
        f.write(latex)
    print()
    print('Saved: baseline_latex_table.tex')

## Summary

All generated files:

| File | Description |
|------|-------------|
| `baseline_overall_results.csv` | Full metrics table (HR / NDCG / MRR @ 5, 10, 20) |
| `baseline_grouped_results.csv` | Metrics per user group (short / medium / long / overall) |
| `baseline_latex_table.tex` | LaTeX `\begin{table}` ready to paste in paper |
| `baseline_metric_comparison.png` | Bar chart HR/NDCG/MRR @ K for all models |
| `baseline_group_heatmap.png` | NDCG@10 heatmap by user group |
| `results/<model>_<timestamp>/` | Per-model checkpoint, config, history & results JSON |
