Self-Supervised Pre-Training for Transformer-Based Person Re-Identification [pdf]
The official repository for Self-Supervised Pre-Training for Transformer-Based Person Re-Identification.
pip install -r requirements.txt
We recommend to use /torch=1.7.1 /torchvision=0.8.2 /timm=0.3.4 /cuda>10.1 /faiss-gpu=1.7.1/ 16G or 32G V100 for training and evaluation. If you find some packages are missing, please install them manually. You can refer to DINO, TransReID and cluster-contrast-reid to install the environment of pre-training, supervised ReID and unsupervised ReID, respectively.
mkdir data
Download the datasets:
- Market-1501
- MSMT17
- LUPerson. We don't have the copyright of the LUPerson dataset. Please contact authors of LUPerson to get this dataset.
- You can download the file list ordered by the CFS score for the LUPerson. [CFS_list.pkl]
Then unzip them and rename them under the directory like
data
├── market1501
│ └── bounding_box_train
│ └── bounding_box_test
│ └── ..
├── MSMT17
│ └── train
│ └── test
│ └── ..
└── LUP
└── images
└── CFS_list.pkl
Model | Download |
---|---|
ViT-S/16 | link |
ViT-S/16+ICS | link |
ViT-B/16+ICS | link |
Please download pre-trained models and put them into your custom file path.
We have reproduced the performance to verify the reproducibility. The reproduced results may have a gap of about 0.1~0.2% with the numbers in the paper.
Model | Image Size | Paper | Reproduce | Download |
---|---|---|---|---|
ViT-S/16 | 256*128 | 91.0/96.0 | 91.2/95.8 | model / log |
ViT-S/16+ICS | 256*128 | 91.3/96.2 | 91.4/96.2 | model / log |
ViT-B/16+ICS | 384*128 | 93.2/96.7 | 93.1/96.6 | model / log |
Model | Image Size | Paper | Reproduce | Download |
---|---|---|---|---|
ViT-S/16 | 256*128 | 66.1/84.6 | 66.3/84.8 | model / log |
ViT-S/16+ICS | 256*128 | 68.1/86.1 | 68.3/86.2 | model / log |
ViT-B/16+ICS | 384*128 | 75.0/89.6 | 75.1/89.6 | model / log |
Model | Image Size | Paper | Reproduce | Download |
---|---|---|---|---|
ViT-S/16 | 256*128 | 88.2/94.2 | 88.4/94.6 | model / log |
ViT-S/16+ICS | 256*128 | 89.6/95.3 | 89.5/95.3 | model / log |
Model | Image Size | Paper | Reproduce | Download |
---|---|---|---|---|
ViT-S/16 | 256*128 | 40.9/66.4 | 40.9/66.4 | model / log |
ViT-S/16+ICS | 256*128 | 50.6/75.0 | 50.6/75.0 | model / log |
Model | Image Size | Paper | Reproduce | Download |
---|---|---|---|---|
ViT-S/16 | 256*128 | 89.4/95.4 | 89.2/95.3 | model / log |
ViT-S/16+ICS | 256*128 | 89.9/95.5 | 89.9/95.4 | model / log |
Model | Image Size | Paper | Reproduce | Download |
---|---|---|---|---|
ViT-S/16 | 256*128 | 47.4/70.8 | 47.7/71.2 | model / log |
ViT-S/16+ICS | 256*128 | 57.8/79.5 | 57.8/79.4 | model / log |
Our implementation is mainly based on the following codebases. We gratefully thank the authors for their wonderful works.
LUPerson, DINO, TransReID, cluster-contrast-reid.
If you find this code useful for your research, please cite our paper
@article{luo2021self,
title={Self-Supervised Pre-Training for Transformer-Based Person Re-Identification},
author={Luo, Hao and Wang, Pichao and Xu, Yi and Ding, Feng and Zhou, Yanxin and Wang, Fan and Li, Hao and Jin, Rong},
journal={arXiv preprint arXiv:2111.12084},
year={2021}
}
If you have any question, please feel free to contact us. E-mail: michuan.lh@alibaba-inc.com or haoluocsc@zju.edu.cn