This work has been accepted as a full paper at SIGIR 2026 (ACM International Conference on Information Retrieval).
This repository contains the core implementation of our recommendation model with two training strategies:
- Frequency-based Uncertainty Decay: Dynamically switches between Gumbel sampling and deterministic indexing based on code usage frequency
- Standard Deviation Uncertainty Decay: Uses learnable uncertainty to balance task loss
Current status: Due to time constraints, what you see here is an illustrative / reference implementation—it sketches the model and training flow but is not yet the full end-to-end release we intend to ship.
Target: Before SIGIR 2026 (conference begins July 20, 2026), we plan to publish:
- Runnable code — including configs and pre-trained checkpoints needed to reproduce the paper
- Dataset — processed data and embeddings, with setup instructions in the README
DIGER/
├── main.py # Main training entry point
├── vq.py # Vector Quantization (RQ-VAE) implementation
├── trainer.py # Training loop and loss computation
├── model.py # Recommender model architecture
├── data.py # Data loading utilities
├── utils.py # Helper functions
├── metrics.py # Evaluation metrics
├── layers.py # Neural network layers
├── config/
│ └── beauty_jo.yaml # Configuration file for Beauty dataset
├── accelerate_config.yaml # Accelerate configuration
├── run_FrqUD.sh # Training script 1
└── run_SDUD.sh # Training script 2
pip install torch transformers accelerate pyyaml numpy faiss-cpu scikit-learn colorama tqdm- Python 3.12.11
- PyTorch 2.5.1
Organize your dataset in the following structure:
dataset/
└── beauty/
├── beauty.train.inter
├── beauty.valid.inter
├── beauty.test.inter
└── Beauty.emb-llama.npy # Semantic embeddings
- Interaction files (
.inter): Tab-separated values with columnsuser_id:token,item_id:token,timestamp:float - Semantic embeddings (
.npy): NumPy array of shape[num_items, embedding_dim]
You need a pre-trained RQ-VAE checkpoint. The checkpoint should contain:
- Encoder weights
- Residual Quantization (RQ) codebooks
- Decoder weights (optional, can be frozen)
Before running, update the placeholder paths in the following files:
-
Shell scripts (
run_FrqUD.sh,run_SDUD.sh):RQVAE_INIT="<PATH_TO_RQVAE_CHECKPOINT>" # Update this
-
Config file (
config/beauty_jo.yaml):semantic_emb_path: <PATH_TO_DATASET>/beauty/Beauty.emb-llama.npy # Update this rqvae_path: <PATH_TO_RQVAE_CHECKPOINT> # Update this data_path: ./dataset # Update if needed
This script uses adaptive selection to dynamically choose between Gumbel sampling (for popular codes) and deterministic indexing (for rare codes).
bash run_FrqUD.shThis script uses a learnable uncertainty parameter to automatically balance task loss.
bash run_SDUD.shLoss formula:
L = L_task / (2*(σ+λ)²) + log(σ+λ)
At equilibrium: σ = sqrt(L_task) - λ
Training logs are saved to ./logs/<dataset>/ with timestamps.
Model checkpoints are saved to ./myckpt/<dataset>/ including:
best_model.pth: Best model based on validation metric- Training statistics and metrics
The model is evaluated on:
- Recall@5, Recall@10
- NDCG@5, NDCG@10
Validation metric: NDCG@10