Research fork of the official GRID codebase from snap-research/GRID, used for the TIGER encoder comparison in the SIGIR 2026 GBLA paper.
- Original repository: https://github.com/snap-research/GRID
This fork does not try to document the full GRID project. It documents the part added for the GBLA paper: replacing selected TIGER encoder self-attention layers with GBLA while keeping the rest of the training pipeline unchanged.
The comparison is implemented with two experiment configs:
tiger_train_flat: baseline TIGER encodertiger_train_flat_linear_attn: TIGER encoder with GBLA
At config level, the difference is explicit:
- baseline:
linear_attention_encoder_layers: null - GBLA:
linear_attention_encoder_layers: [1, 2]
Layer indices are 0-based, so only the middle two encoder layers are changed. The resulting encoder layouts are:
- baseline encoder:
SA, SA, SA, SA - GBLA encoder:
SA, GBLA, GBLA, SA
The GBLA config uses:
use_gate: trueuse_conv1d: truenormalization: rmsnorm_gatedconv_kernel_size: 4
So, relative to the baseline, the GBLA variant adds:
- key gating
- Conv1D mixing
- gated RMSNorm
git clone https://github.com/matfu-pixel/GRID.git
cd GRID
pip install -r requirements.txtUse one shared data pipeline, one shared semantic-ID pipeline, then train the two model variants.
The training configs expect:
data/
├── training/
├── evaluation/
├── testing/
└── items/
python -m src.inference experiment=sem_embeds_inference_flat \
data_dir=data/amazon_data/beautypython -m src.train experiment=rkmeans_train_flat \
data_dir=data/amazon_data/beauty \
embedding_path=<embedding_tensor.pt> \
embedding_dim=2048 \
num_hierarchies=3 \
codebook_width=256python -m src.inference experiment=rkmeans_inference_flat \
data_dir=data/amazon_data/beauty \
embedding_path=<embedding_tensor.pt> \
embedding_dim=2048 \
num_hierarchies=3 \
codebook_width=256 \
ckpt_path=<rkmeans_checkpoint>python src/train.py experiment=tiger_train_flat \
data_dir=data/amazon_data/beauty \
semantic_id_path=<semantic_ids.pt> \
num_hierarchies=4python src/train.py experiment=tiger_train_flat_linear_attn \
data_dir=data/amazon_data/beauty \
semantic_id_path=<semantic_ids.pt> \
num_hierarchies=4Compare the final validation or test metrics reported by the two training runs. The training pipeline automatically tests the best checkpoint after training.
For a fair comparison, keep all of the following fixed between runs:
- dataset split
- semantic IDs
- GPU setup
- batch size
- sequence length
- random seed or set of seeds
Compare:
Recall@5Recall@10NDCG@5NDCG@10
In this repo:
- Without GBLA =
tiger_train_flat - With GBLA =
tiger_train_flat_linear_attn
Public Amazon benchmark results from the paper. These are averages over 5 runs and should not be treated as exact single-run targets.
| Dataset | Model | Recall@5 | Recall@10 | NDCG@5 | NDCG@10 |
|---|---|---|---|---|---|
| Beauty | Tiger | 0.0439 | 0.0641 | 0.0289 | 0.0355 |
| Beauty | Tiger+GBLA | 0.0410 | 0.0611 | 0.0273 | 0.0338 |
| Toys | Tiger | 0.0402 | 0.0584 | 0.0274 | 0.0333 |
| Toys | Tiger+GBLA | 0.0384 | 0.0579 | 0.0249 | 0.0311 |
| Sports | Tiger | 0.0229 | 0.0345 | 0.0150 | 0.0188 |
| Sports | Tiger+GBLA | 0.0218 | 0.0329 | 0.0144 | 0.0180 |
If you use this fork, please cite the GBLA paper:
@inproceedings{matveev2026gbla,
title = {Gated Bidirectional Linear Attention for Generative Retrieval},
author = {Matveev, Artem and Tytskiy, Vladislav and Makeev, Sergei and Liamaev, Sergei},
booktitle = {Proceedings of the 49th International ACM SIGIR Conference on Research and Development in Information Retrieval},
year = {2026},
doi = {10.1145/3805712.3808495}
}If you use the GRID framework or protocol, also cite the original GRID paper:
@inproceedings{grid,
title = {Generative Recommendation with Semantic IDs: A Practitioner's Handbook},
author = {Ju, Clark Mingxuan and Collins, Liam and Neves, Leonardo and Kumar, Bhuvesh and Wang, Louis Yufeng and Zhao, Tong and Shah, Neil},
booktitle = {Proceedings of the 34th ACM International Conference on Information and Knowledge Management},
year = {2025}
}- This repository is based on the original GRID implementation from
snap-research/GRID. - The public Amazon evaluation setup in this fork follows the GRID experimental protocol.
- Part of the original repository is built on top of
ashleve/lightning-hydra-template.