[CVPR 2026]From Detection to Association: Learning Discriminative Object Embeddings for Multi-Object Tracking
TL;DR. We reveal that DETR-based end-to-end MOT suffers from overly similar object embeddings. FDTA explicitly enhances discriminativeness in this paradigm.
- [2026] Our paper has been accepted by CVPR 2026! The code is now released. Please star this repo for updates!
conda create -n FDTA python=3.12
conda activate FDTA
pip install -r requirements.txt
# Compile the Deformable Attention operator:
cd models/ops/
sh make.shPlease organize your datasets under ./datasets/ in the following structure. Note that depth maps are required alongside the RGB images:
datasets/
├── DanceTrack/
│ ├── train/
│ │ └── <sequence>/
│ │ ├── img1/
│ │ ├── gt/
│ │ └── depth/
│ ├── val/
│ └── test/
├── SportsMOT/
│ ├── train/
│ │ └── <sequence>/
│ │ ├── img1/
│ │ ├── gt/
│ │ └── depth/
│ ├── val/
│ └── test/
└── BFT/
├── train/
│ └── <sequence>/
│ ├── img1/
│ ├── gt/
│ └── depth/
└── test/
Dataset sources:
- DanceTrack: Download from the official repository.
- SportsMOT: Download from the official repository or HuggingFace.
- BFT: Download from Google Drive or Baidu Pan (NetTrack).
- Depth maps: Generated using Video-Depth-Anything (see
tools/gen_depthmaps.py). Other depth estimators with the same directory structure are also supported.
We use COCO pre-trained Deformable DETR weights for initialization, sourced from MOTIP. Download the weights and place them under ./pretrains/:
| File | Description | Download |
|---|---|---|
r50_deformable_detr_coco.pth |
COCO pre-trained (base) | link |
r50_deformable_detr_coco_dancetrack.pth |
Fine-tuned on DanceTrack | link |
r50_deformable_detr_coco_sportsmot.pth |
Fine-tuned on SportsMOT | link |
r50_deformable_detr_coco_bft.pth |
Fine-tuned on BFT | link |
Our trained FDTA model weights are available on HuggingFace. Download and place them under ./checkpoints/ for inference.
All training configurations are stored in the ./configs/ folder. For example, to train on DanceTrack:
accelerate launch --num_processes=4 train.py \
--data-root /path/to/your/datasets/ \
--exp-name fdta_dancetrack \
--config-path ./configs/dancetrack.yaml \
--detr-pretrain ./pretrains/r50_deformable_detr_coco_dancetrack.pthReplace dancetrack with sportsmot or bft to train on other datasets.
Note: If your GPU memory is less than 24GB, you can set
--detr-num-checkpoint-frames 2(< 16GB) or--detr-num-checkpoint-frames 1(< 12GB) to reduce memory usage.
We support two inference modes:
submit: Generate tracker files for submission .evaluate: Generate tracker files and compute evaluation metrics.
accelerate launch --num_processes=4 submit_and_evaluate.py \
--data-root /path/to/your/datasets/ \
--inference-mode evaluate \
--config-path ./configs/dancetrack.yaml \
--inference-model ./checkpoints/dancetrack.pth \
--outputs-dir ./outputs/ \
--inference-dataset DanceTrack \
--inference-split val
accelerate launch --num_processes=4 submit_and_evaluate.py \
--data-root /path/to/your/datasets/ \
--inference-mode submit \
--config-path ./configs/dancetrack.yaml \
--inference-model ./checkpoints/dancetrack.pth \
--outputs-dir ./outputs/ \
--inference-dataset DanceTrack \
--inference-split testYou can add
--inference-dtype FP16for faster inference with minimal performance loss.
| Training Data | HOTA | IDF1 | AssA | MOTA | DetA |
|---|---|---|---|---|---|
| train | 71.7 | 77.2 | 63.5 | 91.3 | 81.0 |
| train+val | 74.4 | 80.0 | 67.0 | 92.2 | 82.7 |
| Training Data | HOTA | IDF1 | AssA | MOTA | DetA |
|---|---|---|---|---|---|
| train | 74.2 | 78.5 | 65.5 | 93.0 | 84.1 |
| Training Data | HOTA | IDF1 | AssA | MOTA | DetA |
|---|---|---|---|---|---|
| train | 72.2 | 84.2 | 74.5 | 78.2 | 70.1 |
The code is built on top of these awesome repositories. We thank the authors for opensourcing their code.
If you find our work useful for your research, please consider citing:
@article{shao2025fdta,
title={From Detection to Association: Learning Discriminative Object Embeddings for Multi-Object Tracking},
author={Shao, Yuqing and Yang, Yuchen and Yu, Rui and Li, Weilong and Guo, Xu and Yan, Huaicheng and Wang, Wei and Sun, Xiao},
journal={arXiv preprint arXiv:2512.02392},
year={2025}
}