Name		Name	Last commit message	Last commit date
parent directory ..
backbones		backbones
configs		configs
docs		docs
eval		eval
utils		utils
README.md		README.md
dataset.py		dataset.py
dist.sh		dist.sh
eval_ijbc.py		eval_ijbc.py
flops.py		flops.py
inference.py		inference.py
losses.py		losses.py
lr_scheduler.py		lr_scheduler.py
onnx_helper.py		onnx_helper.py
onnx_ijbc.py		onnx_ijbc.py
partial_fc.py		partial_fc.py
partial_fc_v2.py		partial_fc_v2.py
requirement.txt		requirement.txt
run.sh		run.sh
torch2onnx.py		torch2onnx.py
train.py		train.py
train_v2.py		train_v2.py

README.md

Distributed Arcface Training in Pytorch

This is a deep learning library that makes face recognition efficient, and effective, which can train tens of millions identity on a single server.

Requirements

In order to enjoy the new features of pytorch, we have upgraded the pytorch to 1.9.0.
Pytorch before 1.9.0 may not work in the future.

Install PyTorch (torch>=1.9.0), our doc for install.md.
(Optional) Install DALI, our doc for install_dali.md.
pip install -r requirement.txt.

How to Training

To train a model, run train.py with the path to the configs.
The example commands below show how to run distributed training.

1. To run on a machine with 8 GPUs:

python -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --node_rank=0 --master_addr="127.0.0.1" --master_port=12581 train.py configs/ms1mv3_r50_lr02

2. To run on 2 machines with 8 GPUs each:

Node 0:

python -m torch.distributed.launch --nproc_per_node=8 --nnodes=2 --node_rank=0 --master_addr="ip1" --master_port=12581 train.py configs/webface42m_r100_lr01_pfc02_bs4k_16gpus

Node 1:

python -m torch.distributed.launch --nproc_per_node=8 --nnodes=2 --node_rank=1 --master_addr="ip1" --master_port=12581 train.py configs/webface42m_r100_lr01_pfc02_bs4k_16gpus

3. Run ViT-B on a machine with 24k batchsize:

python -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --node_rank=0 --master_addr="127.0.0.1" --master_port=12345 train_v2.py configs/wf42m_pfc03_40epoch_8gpu_vit_b.py

Download Datasets or Prepare Datasets

MS1MV2 (87k IDs, 5.8M images)
MS1MV3 (93k IDs, 5.2M images)
Glint360K (360k IDs, 17.1M images)
WebFace42M (2M IDs, 42.5M images)

Model Zoo

The models are available for non-commercial research purposes only.
All models can be found in here.
Baidu Yun Pan: e8pw
OneDrive

Performance on IJB-C and ICCV2021-MFR

ICCV2021-MFR testset consists of non-celebrities so we can ensure that it has very few overlap with public available face recognition training set, such as MS1M and CASIA as they mostly collected from online celebrities. As the result, we can evaluate the FAIR performance for different algorithms.

For ICCV2021-MFR-ALL set, TAR is measured on all-to-all 1:1 protocal, with FAR less than 0.000001(e-6). The globalised multi-racial testset contains 242,143 identities and 1,624,305 images.

1. Training on Single-Host GPU

Datasets	Backbone	MFR-ALL	IJB-C(1E-4)	IJB-C(1E-5)	log
MS1MV2	mobilefacenet-0.45G	62.07	93.61	90.28	click me
MS1MV2	r50	75.13	95.97	94.07	click me
MS1MV2	r100	78.12	96.37	94.27	click me
MS1MV3	mobilefacenet-0.45G	63.78	94.23	91.33	click me
MS1MV3	r50	79.14	96.37	94.47	click me
MS1MV3	r100	81.97	96.85	95.02	click me
Glint360K	mobilefacenet-0.45G	70.18	95.04	92.62	click me
Glint360K	r50	86.34	97.16	95.81	click me
Glint360k	r100	89.52	97.55	96.38	click me
WF4M	r100	89.87	97.19	95.48	click me
WF12M-PFC-0.2	r100	94.75	97.60	95.90	click me
WF12M-PFC-0.3	r100	94.71	97.64	96.01	click me
WF12M	r100	94.69	97.59	95.97	click me
WF42M-PFC-0.2	r100	96.27	97.70	96.31	click me
WF42M-PFC-0.2	ViT-T-1.5G	92.04	97.27	95.68	click me
WF42M-PFC-0.3	ViT-B-11G	97.16	97.91	97.05	click me

2. Training on Multi-Host GPU

Datasets	Backbone(bs*gpus)	MFR-ALL	IJB-C(1E-4)	IJB-C(1E-5)	Throughout	log
WF42M-PFC-0.2	r50(512*8)	93.83	97.53	96.16	~5900	click me
WF42M-PFC-0.2	r50(512*16)	93.96	97.46	96.12	~11000	click me
WF42M-PFC-0.2	r50(128*32)	94.04	97.48	95.94	~17000	click me
WF42M-PFC-0.2	r100(128*16)	96.28	97.80	96.57	~5200	click me
WF42M-PFC-0.2	r100(256*16)	96.69	97.85	96.63	~5200	click me
WF42M-PFC-0.0018	r100(512*32)	93.08	97.51	95.88	~10000	click me
WF42M-PFC-0.2	r100(128*32)	96.57	97.83	96.50	~9800	click me

r100(128*32) means backbone is r100, batchsize per gpu is 128, the number of gpus is 32.

3. ViT For Face Recognition

Datasets	Backbone(bs)	FLOPs	MFR-ALL	IJB-C(1E-4)	IJB-C(1E-5)	Throughout	log
WF42M-PFC-0.3	r18(128*32)	2.6	79.13	95.77	93.36	-	click me
WF42M-PFC-0.3	r50(128*32)	6.3	94.03	97.48	95.94	-	click me
WF42M-PFC-0.3	r100(128*32)	12.1	96.69	97.82	96.45	-	click me
WF42M-PFC-0.3	r200(128*32)	23.5	97.70	97.97	96.93	-	click me
WF42M-PFC-0.3	VIT-T(384*64)	1.5	92.24	97.31	95.97	~35000	click me
WF42M-PFC-0.3	VIT-S(384*64)	5.7	95.87	97.73	96.57	~25000	click me
WF42M-PFC-0.3	VIT-B(384*64)	11.4	97.42	97.90	97.04	~13800	click me
WF42M-PFC-0.3	VIT-L(384*64)	25.3	97.85	98.00	97.23	~9406	click me

WF42M means WebFace42M, PFC-0.3 means negivate class centers sample rate is 0.3.

4. Noisy Datasets

Datasets	Backbone	MFR-ALL	IJB-C(1E-4)	IJB-C(1E-5)	log
WF12M-Flip(40%)	r50	43.87	88.35	80.78	click me
WF12M-Flip(40%)-PFC-0.1*	r50	80.20	96.11	93.79	click me
WF12M-Conflict	r50	79.93	95.30	91.56	click me
WF12M-Conflict-PFC-0.3*	r50	91.68	97.28	95.75	click me

WF12M means WebFace12M, +PFC-0.1* denotes additional abnormal inter-class filtering.

Speed Benchmark

Arcface-Torch can train large-scale face recognition training set efficiently and quickly. When the number of classes in training sets is greater than 1 Million, partial fc sampling strategy will get same accuracy with several times faster training performance and smaller GPU memory. Partial FC is a sparse variant of the model parallel architecture for large sacle face recognition. Partial FC use a sparse softmax, where each batch dynamicly sample a subset of class centers for training. In each iteration, only a sparse part of the parameters will be updated, which can reduce a lot of GPU memory and calculations. With Partial FC, we can scale trainset of 29 millions identities, the largest to date. Partial FC also supports multi-machine distributed training and mixed precision training.

More details see speed_benchmark.md in docs.

Training speed of different parallel methods (samples / second), Tesla V100 32GB * 8. (Larger is better)

- means training failed because of gpu memory limitations.

Number of Identities in Dataset	Data Parallel	Model Parallel	Partial FC 0.1
125000	4681	4824	5004
1400000	1672	3043	4738
5500000	-	1389	3975
8000000	-	-	3565
16000000	-	-	2679
29000000	-	-	1855

GPU memory cost of different parallel methods (MB per GPU), Tesla V100 32GB * 8. (Smaller is better)

Number of Identities in Dataset	Data Parallel	Model Parallel	Partial FC 0.1
125000	7358	5306	4868
1400000	32252	11178	6056
5500000	-	32188	9854
8000000	-	-	12310
16000000	-	-	19950
29000000	-	-	32324

Citations

@inproceedings{deng2019arcface,
  title={Arcface: Additive angular margin loss for deep face recognition},
  author={Deng, Jiankang and Guo, Jia and Xue, Niannan and Zafeiriou, Stefanos},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={4690--4699},
  year={2019}
}
@inproceedings{An_2022_CVPR,
    author={An, Xiang and Deng, Jiankang and Guo, Jia and Feng, Ziyong and Zhu, XuHan and Yang, Jing and Liu, Tongliang},
    title={Killing Two Birds With One Stone: Efficient and Robust Training of Face Recognition CNNs by Partial FC},
    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month={June},
    year={2022},
    pages={4042-4051}
}
@inproceedings{zhu2021webface260m,
  title={Webface260m: A benchmark unveiling the power of million-scale deep face recognition},
  author={Zhu, Zheng and Huang, Guan and Deng, Jiankang and Ye, Yun and Huang, Junjie and Chen, Xinze and Zhu, Jiagang and Yang, Tian and Lu, Jiwen and Du, Dalong and Zhou, Jie},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={10492--10502},
  year={2021}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

arcface_torch

arcface_torch

README.md

Distributed Arcface Training in Pytorch

Requirements

How to Training

1. To run on a machine with 8 GPUs:

2. To run on 2 machines with 8 GPUs each:

3. Run ViT-B on a machine with 24k batchsize:

Download Datasets or Prepare Datasets

Model Zoo

Performance on IJB-C and ICCV2021-MFR

1. Training on Single-Host GPU

2. Training on Multi-Host GPU

3. ViT For Face Recognition

4. Noisy Datasets

Speed Benchmark

Citations

Files

arcface_torch

Directory actions

More options

Directory actions

More options

Latest commit

History

arcface_torch

Folders and files

parent directory

README.md

Distributed Arcface Training in Pytorch

Requirements

How to Training

1. To run on a machine with 8 GPUs:

2. To run on 2 machines with 8 GPUs each:

3. Run ViT-B on a machine with 24k batchsize:

Download Datasets or Prepare Datasets

Model Zoo

Performance on IJB-C and ICCV2021-MFR

1. Training on Single-Host GPU

2. Training on Multi-Host GPU

3. ViT For Face Recognition

4. Noisy Datasets

Speed Benchmark

Citations