Implementation of CMT: Convolutional Neural Networks Meet Vision Transformers
Model | Top 1 Acc. | Log | Ckpt |
---|---|---|---|
CMT-Ti | 79.0% | github | github |
CMT-XS | 81.8% | github | github |
CMT-Small | 83.5% | github | github |
CMT-Base | 84.5% | github | github |
- python==3.6
- cuda==10.0
# other pytorch/timm version can also work
pip install torch==1.7.0 torchvision==0.8.1;
pip install timm==0.3.2;
pip install torchprofile;
# build apex
cd /your_path_to/apex-master/;
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
Download and extract ImageNet train and val images from http://image-net.org/. The directory structure is:
│path/to/imagenet/
├──train/
│ ├── n01440764
│ │ ├── n01440764_10026.JPEG
│ │ ├── n01440764_10027.JPEG
│ │ ├── ......
│ ├── ......
├──val/
│ ├── n01440764
│ │ ├── ILSVRC2012_val_00000293.JPEG
│ │ ├── ILSVRC2012_val_00002138.JPEG
│ │ ├── ......
│ ├── ......
To train CMT-Tiny on ImageNet-1K on a single node with 8 gpus:
python -m torch.distributed.launch --nproc_per_node=8 train.py --data-path /your_path_to/imagenet/ --output_dir /your_path_to/output/ --model cmt_ti --batch-size 256 --apex-amp --input-size 160 --weight-decay 0.05 --drop-path 0.1 --epochs 800 --test_freq 100 --test_epoch 760 --warmup-lr 1e-7 --warmup-epochs 20 --lr 8e-4 --min-lr 1e-5 --no-model-ema
To train CMT-XS on ImageNet-1K on a single node with 8 gpus:
python -m torch.distributed.launch --nproc_per_node=8 train.py --data-path /your_path_to/imagenet/ --output_dir /your_path_to/output/ --model cmt_xs --batch-size 256 --apex-amp --input-size 192 --weight-decay 0.04 --drop-path 0.08 --epochs 400 --test_freq 100 --test_epoch 360 --warmup-lr 1e-6 --warmup-epochs 20 --lr 7e-4 --min-lr 2e-5 --model-ema-decay 0.9998
To train CMT-Small on ImageNet-1K on a single node with 8 gpus:
python -m torch.distributed.launch --nproc_per_node=8 train.py --data-path /your_path_to/imagenet/ --output_dir /your_path_to/output/ --model cmt_s --batch-size 128 --apex-amp --input-size 224 --weight-decay 0.05 --drop-path 0.1 --epochs 300 --test_freq 100 --test_epoch 260 --warmup-lr 1e-7 --warmup-epochs 20
To train CMT-Base on ImageNet-1K on a single node with 8 gpus:
python -m torch.distributed.launch --nproc_per_node=8 train.py --data-path /your_path_to/imagenet/ --output_dir /your_path_to/output/ --model cmt_b --batch-size 64 --apex-amp --input-size 256 --weight-decay 0.05 --drop-path 0.25 --epochs 300 --test_freq 100 --test_epoch 260 --warmup-lr 1e-6 --min-lr 2e-5 --warmup-epochs 20
This repo is based on DeiT and pytorch-image-models.
If you find this project useful in your research, please consider cite:
@article{guo2021cmt,
title={Cmt: Convolutional neural networks meet vision transformers},
author={Guo, Jianyuan and Han, Kai and Wu, Han and Xu, Chang and Tang, Yehui and Xu, Chunjing and Wang, Yunhe},
journal={arXiv preprint arXiv:2107.06263},
year={2021}
}