Skip to content

matfu-pixel/GRID

 
 

Repository files navigation

Gated Bidirectional Linear Attention for Generative Retrieval

Research fork of the official GRID codebase from snap-research/GRID, used for the TIGER encoder comparison in the SIGIR 2026 GBLA paper.

Scope of This Fork

This fork does not try to document the full GRID project. It documents the part added for the GBLA paper: replacing selected TIGER encoder self-attention layers with GBLA while keeping the rest of the training pipeline unchanged.

The comparison is implemented with two experiment configs:

  • tiger_train_flat: baseline TIGER encoder
  • tiger_train_flat_linear_attn: TIGER encoder with GBLA

At config level, the difference is explicit:

  • baseline: linear_attention_encoder_layers: null
  • GBLA: linear_attention_encoder_layers: [1, 2]

Layer indices are 0-based, so only the middle two encoder layers are changed. The resulting encoder layouts are:

  • baseline encoder: SA, SA, SA, SA
  • GBLA encoder: SA, GBLA, GBLA, SA

The GBLA config uses:

  • use_gate: true
  • use_conv1d: true
  • normalization: rmsnorm_gated
  • conv_kernel_size: 4

So, relative to the baseline, the GBLA variant adds:

  • key gating
  • Conv1D mixing
  • gated RMSNorm

Installation

git clone https://github.com/matfu-pixel/GRID.git
cd GRID
pip install -r requirements.txt

Reproducing the Amazon Comparison

Use one shared data pipeline, one shared semantic-ID pipeline, then train the two model variants.

1. Prepare data

The training configs expect:

data/
├── training/
├── evaluation/
├── testing/
└── items/

2. Generate semantic embeddings

python -m src.inference experiment=sem_embeds_inference_flat \
    data_dir=data/amazon_data/beauty

3. Train semantic IDs

python -m src.train experiment=rkmeans_train_flat \
    data_dir=data/amazon_data/beauty \
    embedding_path=<embedding_tensor.pt> \
    embedding_dim=2048 \
    num_hierarchies=3 \
    codebook_width=256

4. Generate semantic IDs

python -m src.inference experiment=rkmeans_inference_flat \
    data_dir=data/amazon_data/beauty \
    embedding_path=<embedding_tensor.pt> \
    embedding_dim=2048 \
    num_hierarchies=3 \
    codebook_width=256 \
    ckpt_path=<rkmeans_checkpoint>

5. Train the baseline

python src/train.py experiment=tiger_train_flat \
    data_dir=data/amazon_data/beauty \
    semantic_id_path=<semantic_ids.pt> \
    num_hierarchies=4

6. Train the GBLA variant

python src/train.py experiment=tiger_train_flat_linear_attn \
    data_dir=data/amazon_data/beauty \
    semantic_id_path=<semantic_ids.pt> \
    num_hierarchies=4

7. Compare the two runs

Compare the final validation or test metrics reported by the two training runs. The training pipeline automatically tests the best checkpoint after training.

For a fair comparison, keep all of the following fixed between runs:

  • dataset split
  • semantic IDs
  • GPU setup
  • batch size
  • sequence length
  • random seed or set of seeds

Compare:

  • Recall@5
  • Recall@10
  • NDCG@5
  • NDCG@10

In this repo:

  • Without GBLA = tiger_train_flat
  • With GBLA = tiger_train_flat_linear_attn

Metrics From the Paper

Public Amazon benchmark results from the paper. These are averages over 5 runs and should not be treated as exact single-run targets.

Dataset Model Recall@5 Recall@10 NDCG@5 NDCG@10
Beauty Tiger 0.0439 0.0641 0.0289 0.0355
Beauty Tiger+GBLA 0.0410 0.0611 0.0273 0.0338
Toys Tiger 0.0402 0.0584 0.0274 0.0333
Toys Tiger+GBLA 0.0384 0.0579 0.0249 0.0311
Sports Tiger 0.0229 0.0345 0.0150 0.0188
Sports Tiger+GBLA 0.0218 0.0329 0.0144 0.0180

Citation

If you use this fork, please cite the GBLA paper:

@inproceedings{matveev2026gbla,
  title     = {Gated Bidirectional Linear Attention for Generative Retrieval},
  author    = {Matveev, Artem and Tytskiy, Vladislav and Makeev, Sergei and Liamaev, Sergei},
  booktitle = {Proceedings of the 49th International ACM SIGIR Conference on Research and Development in Information Retrieval},
  year      = {2026},
  doi       = {10.1145/3805712.3808495}
}

If you use the GRID framework or protocol, also cite the original GRID paper:

@inproceedings{grid,
  title     = {Generative Recommendation with Semantic IDs: A Practitioner's Handbook},
  author    = {Ju, Clark Mingxuan and Collins, Liam and Neves, Leonardo and Kumar, Bhuvesh and Wang, Louis Yufeng and Zhao, Tong and Shah, Neil},
  booktitle = {Proceedings of the 34th ACM International Conference on Information and Knowledge Management},
  year      = {2025}
}

Acknowledgments

  • This repository is based on the original GRID implementation from snap-research/GRID.
  • The public Amazon evaluation setup in this fork follows the GRID experimental protocol.
  • Part of the original repository is built on top of ashleve/lightning-hydra-template.

About

GRID: Generative Recommendation with Semantic IDs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 98.9%
  • Shell 1.1%