Code for "Expressiveness Limits of Autoregressive Semantic ID Generation in Generative Recommendation."
Generative recommendation (GR) models generate items by autoregressively producing a sequence of discrete tokens that jointly index the target item. However, this autoregressive generation process also induces a structured decoding space whose impact on model expressiveness remains underexplored. We show theoretically that GR models struggle to represent even simple patterns that can be well captured by conventional collaborative filtering models. To mitigate this issue, we propose Latte, a simple modification that injects a latent token before each semantic ID, improving expressiveness and yielding an average of 3.45% relative improvement on NDCG@10.
This project requires Python 3.12+. We recommend using uv for fast and reliable dependency management.
# Install dependencies (automatically creates .venv)
uv syncTrain a Latte model with default settings:
uv run bash train_latte.shOr with custom arguments:
# bash train_latte.sh [category] [gpu_id] [n_latent_tokens] [vq_method] [aggregation_method] [lr]
uv run bash train_latte.sh Industrial_and_Scientific 0 8 rqkmeans agg_max 3e-3This script will:
- Train a Latte model with the specified hyperparameters
- Save the best checkpoint to
saved/ - Run inference with
num_beams=500for final evaluation
uv run python main.py --model=<MODEL> --category=<CATEGORY> [OPTIONS]Latte: Our proposed method that adds latent tokens before semantic IDsPSID: Baseline model with purely semantic IDs (conflict-free)
We use Amazon Reviews 2023 for evaluation. The following categories are available: Musical_Instruments, Industrial_and_Scientific, and Video_Games.
Three vector quantization methods are supported for generating semantic IDs:
rqkmeans: Residual Quantization with K-means (Faiss) - supported by both PSID and Latteopq: Optimized Product Quantization (Faiss) - PSID onlyrqvae: Residual Quantization with VAE (neural network) - PSID only
Note: To use
opqorrqvaewith Latte, you must first train a PSID model with the same VQ method. Once PSID training starts (and has cached the semantic IDs), Latte can automatically load them.
# Run Latte with default settings (rqkmeans)
uv run python main.py --model=Latte --category=Industrial_and_Scientific
# Run Latte with different VQ methods (requires PSID to cache semantic IDs first)
uv run python main.py --model=PSID --category=Industrial_and_Scientific --vq_method=opq # cache semantic IDs
uv run python main.py --model=Latte --category=Industrial_and_Scientific --vq_method=opq # then run Latte
# Run PSID baseline
uv run python main.py --model=PSID --category=Industrial_and_Scientific
# Run with custom configuration
uv run python main.py --model=Latte --category=Industrial_and_Scientific \
--vq_method=rqkmeans \
--aggregation_method=agg_max \
--n_latent_tokens=8 \
--lr=0.003
# Resume from checkpoint
uv run python main.py --model=Latte --category=Industrial_and_Scientific \
--checkpoint=ckpt/your_checkpoint.ptConfiguration files are hierarchically loaded in the following order:
genrec/default.yaml- Default settingsgenrec/datasets/<DATASET>/config.yaml- Dataset-specific settingsgenrec/models/<MODEL>/config.yaml- Model-specific settings- Command-line arguments (override all above)
| Option | Description | Default |
|---|---|---|
--model |
Model name (Latte, PSID) | Latte |
--dataset |
Dataset name | AmazonReviews2023 |
--category |
Dataset category | Industrial_and_Scientific |
--vq_method |
VQ method (opq, rqkmeans, rqvae) | rqkmeans |
--vq_n_codebooks |
Number of codebooks | 3 |
--vq_codebook_size |
Size of each codebook | 256 |
--n_latent_tokens |
Number of latent tokens (Latte only) | 8 |
--lr |
Learning rate | 0.003 |
--num_beams |
Number of beams for beam search | 50 |
