A Ranking Model Benchmark for Unified Sequential Modeling and Feature Interaction
UniRank is an open PyTorch benchmark for large-scale recommendation ranking models. It focuses on a practical setting that is increasingly common in industrial recommender systems: ranking models must jointly learn from heterogeneous non-sequential features, target item features, and long user behavior sequences under multi-feedback objectives such as click, follow, like, share, comment, long-view, and conversion.
The project is built to make modern unified ranking architectures easier to compare, reproduce, and extend. It provides standardized dataset configurations, model implementations, distributed training utilities, mixed precision support, blocked data loading for large datasets, and sparse attention acceleration for long-sequence models.
Modern ranking research is moving from isolated feature interaction or sequence pooling modules toward unified architectures that model feature fields and user behavior tokens together. However, many strong ranking models are released from industrial systems where data, implementations, and infrastructure are not fully available. This makes it difficult to answer basic research questions:
- Which architecture works best under the same data split, sequence length, and metric protocol?
- How should feature interaction and sequential modeling be combined?
- How do models behave across different feedback tasks rather than only CTR?
- What engineering support is needed to train ranking models on industrial-scale data?
UniRank addresses these gaps by collecting representative ranking models, unified data processing logic, and reproducible experiment settings in one benchmark.
UniRank follows a unified ranking pipeline. Raw user, item, context, and action features are embedded, converted into model-specific tokens, passed through feature interaction or sequence interaction layers, and finally predicted by task-specific towers.
Figure 1. Traditional New Impression Only Paradigm. Most conventional ranking systems train on the latest impressed target item only. Historical positive feedback is used as auxiliary behavior context, usually through target attention, pooling, or aggregation, before being combined with the target item, user profile, and context features in a feature interaction layer. This paradigm is efficient, but it treats each target impression as an independent sample and does not fully exploit the step-by-step evolution of user behavior.
Figure 2. UniRank Auto-Regressive Paradigm. UniRank reorganizes user histories as sequential training samples. Each behavior step can be represented with action-aware sequential tokens, target item, and non-sequential feature tokens. Instead of only predicting the latest impression, the model learns from the chronological behavior sequence and supports multi-task prediction at different positions. This design better matches long user histories and enables unified sequence-feature interaction.
Following the paper, UniRank organizes representative unified ranking models into two architectural paradigms:
| Paradigm | Description | Representative Models |
|---|---|---|
| Unified Interaction after Sequence Pooling and Non-sequence Tokenization | Behavior sequences are first pooled or aggregated into compact sequential representations. These representations are then tokenized together with non-sequential features into a unified token space for subsequent interaction modeling. | HiFormer, RankMixer, Zenith, UniMixer, HeMix |
| Layer-wise Unified Interaction | Keep sequence tokens and non-sequence tokens inside the interaction layers, allowing behavior tokens, field tokens, and target tokens to exchange information throughout the unified interaction network. | OneTrans, HyFormer, MixFormer, INFNet, EST, SORT, TokenFormer, LONGER, UltraHSTU |
Design choices in this repository are intentionally practical:
- Multi-feedback ranking: each dataset can define multiple binary feedback tasks and evaluate AUC/gAUC per task.
- Auto-regressive / user-centric training support: long behavior histories can be represented as structured action sequences rather than only a latest-impression sample.
- Distributed training:
torchrun+ DDP are supported throughrun_expid.py. - Large data loading: blocked parquet loading is supported for large datasets such as TencentGR-10M.
- Mixed precision and operator acceleration: bf16 training and sparse/flex attention paths are available for compatible models.
UniRank/
+-- config/
| +-- dataset_config.yaml # Dataset paths, feature schemas, labels, and blocked-loading options
| +-- model_config.yaml # Experiment ids and hyperparameters
+-- data/
| +-- QK_Video_Action/
| +-- KuaiRand_Video_Action/
| +-- TencentGR_10M_Action_Blocked/
+-- fuxictr/ # Training, feature, metric, and layer utilities based on FuxiCTR
+-- model_zoo/ # Ranking model implementations
+-- checkpoints/ # Saved models and experiment logs
+-- test/ # Metric and utility tests
+-- UniRank_Dataloader.py # UniRank-specific sequence/action dataloader
+-- run_expid.py # Run one experiment
+-- run_all.sh # Run a list of experiments
+-- run_param_tuner.py # Hyperparameter tuning entry
+-- autotuner.py # Tuning utilities
+-- requirements.txt
+-- README.md
Place the downloaded preprocessed datasets under ./data/ using the same directory names as the dataset ids in config/dataset_config.yaml.
Additional experimental or auxiliary implementations may also appear in model_zoo/.
The table below reports the preliminary benchmarking results under a fixed sequence length of 100. For a fair comparison, all models are configured with three layers. The token dimension is set to 128 for QK-Video and 256 for KuaiRand and TAAC-25.
Figure 3. Preliminary Benchmark Results. The benchmark evaluates 15 ranking models on QK-Video, KuaiRand, and TAAC-25 under AUC and gAUC. Results are reported for multiple feedback tasks, including click, follow, like, share, comment, long view, and conversion. Bold values indicate top-performing results for each task-metric pair.
conda create -n UniRank python=3.9
conda activate UniRank
pip install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txtDownload the preprocessed datasets from Hugging Face and place them under ./data/:
data/
+-- QK_Video_Action/
+-- KuaiRand_Video_Action/
+-- TencentGR_10M_Action_Blocked/
Check config/dataset_config.yaml if you want to change paths, feature schemas, labels, or blocked-loading settings.
Single GPU:
python run_expid.py --config ./config --expid DIN_KuaiRand_Video_Action --gpu 0Multi-GPU DDP:
torchrun --standalone --nproc_per_node=2 run_expid.py \
--config ./config \
--expid DIN_KuaiRand_Video_Action \
--gpu 0,1Experiment ids are defined in config/model_config.yaml and usually follow:
<Model>_<Dataset>
Examples:
UltraHSTU_QK_Video_Action
TokenFormer_KuaiRand_Video_Action
LONGER_TencentGR_10M_Action
Edit run_all.sh to uncomment the experiments you want, then run:
chmod +x run_all.sh
./run_all.shLogs and checkpoints are written to ./checkpoints/ and ./logs/ when enabled by the running script/configuration.
- Add the model implementation to
model_zoo/YourModel.py. - Export it in
model_zoo/__init__.py. - Add an experiment block to
config/model_config.yaml. - Reuse
UniRank_Dataloader.pyunless the model needs a custom input format. - Run
python run_expid.py --config ./config --expid YourModel_Dataset --gpu 0.
dataset_config.yamldefines feature columns, label columns, parquet paths, sequence length metadata, and blocked data loading.model_config.yamldefines model hyperparameters, batch size, optimizer, task list, metrics, monitor rule, and sequence length.run_expid.pyinitializes feature encoders, builds dataloaders, sets up DDP, constructs the model frommodel_zoo, trains, validates, and optionally evaluates on the test split.UniRank_Dataloader.pyhandles action-aware sequence construction and large blocked parquet loading.
UniRank is built on top of, and deeply inspired by, the excellent FuxiCTR project. We sincerely thank the FuxiCTR authors and contributors for their open-source work on reproducible CTR and ranking model research.


