The top-1 accuracy is tested on ImageNet-1K and the latency is measured by an iPhone 12 with iOS 16 across 20 experimental sets.
RepNeXt: A Fast Multi-Scale CNN using Structural Reparameterization.
Mingshu Zhao, Yi Luo, and Yong Ouyang
[arXiv
]
Abstract
We introduce RepNeXt, a novel model series integrates multi-scale feature representations and incorporates both serial and parallel structural reparameterization (SRP) to enhance network depth and width without compromising inference speed. Extensive experiments demonstrate RepNeXt's superiority over current leading lightweight CNNs and ViTs, providing advantageous latency across various vision benchmarks. RepNeXt-M4 matches RepViT-M1.5's 82.3% accuracy on ImageNet within 1.5ms on an iPhone 12, outperforms its AP$^{box}$ by 1.1 on MS-COCO, and reduces parameters by 0.7M.Model | Top-1 (300) | #params | MACs | Latency | Ckpt | Core ML | Log |
---|---|---|---|---|---|---|---|
M1 | 78.8 | 4.8M | 0.8G | 0.86ms | fused 300e / 300e | 300e | distill 300e / 300e |
M2 | 80.1 | 6.5M | 1.1G | 1.00ms | fused 300e / 300e | 300e | distill 300e / 300e |
M3 | 80.7 | 7.8M | 1.3G | 1.11ms | fused 300e / 300e | 300e | distill 300e / 300e |
M4 | 82.3 | 13.3M | 2.3G | 1.48ms | fused 300e / 300e | 300e | distill 300e / 300e |
M5 | 83.3 | 21.7M | 4.5G | 2.20ms | fused 300e / 300e | 300e | distill 300e / 300e |
Tips: Convert a training-time RepNeXt into the inference-time structure
from timm.models import create_model
import utils
model = create_model('repnext_m1')
utils.replace_batchnorm(model)
The latency reported in RepNeXt for iPhone 12 (iOS 16) uses the benchmark tool from XCode 14.
Tips: export the model to Core ML model
python export_coreml.py --model repnext_m1 --ckpt pretrain/repnext_m1_distill_300e.pth
Tips: measure the throughput on GPU
python speed_gpu.py --model repnext_m1
conda
virtual environment is recommended.
conda create -n repnext python=3.8
pip install -r requirements.txt
Download and extract ImageNet train and val images from http://image-net.org/. The training and validation data are expected to be in the train
folder and val
folder respectively:
# script to extract ImageNet dataset: https://github.com/pytorch/examples/blob/main/imagenet/extract_ILSVRC.sh
# ILSVRC2012_img_train.tar (about 138 GB)
# ILSVRC2012_img_val.tar (about 6.3 GB)
# organize the ImageNet dataset as follows:
imagenet
├── train
│ ├── n01440764
│ │ ├── n01440764_10026.JPEG
│ │ ├── n01440764_10027.JPEG
│ │ ├── ......
│ ├── ......
├── val
│ ├── n01440764
│ │ ├── ILSVRC2012_val_00000293.JPEG
│ │ ├── ILSVRC2012_val_00002138.JPEG
│ │ ├── ......
│ ├── ......
To train RepNeXt-M1 on an 8-GPU machine:
python -m torch.distributed.launch --nproc_per_node=8 --master_port 12346 --use_env main.py --model repnext_m1 --data-path ~/imagenet --dist-eval
Tips: specify your data path and model name!
For example, to test RepNeXt-M1:
python main.py --eval --model repnext_m1 --resume pretrain/repnext_m1_distill_300e.pth --data-path ~/imagenet
For example, to evaluate RepNeXt-M1 with the fused model:
python fuse_eval.py --model repnext_m1 --resume pretrain/repnext_m1_distill_300e_fused.pt --data-path ~/imagenet
Object Detection and Instance Segmentation
Model | Latency | Ckpt | Log | ||||||
---|---|---|---|---|---|---|---|---|---|
RepNeXt-M3 | 40.8 | 62.4 | 44.7 | 37.8 | 59.5 | 40.6 | 5.1ms | M3 | M3 |
RepNeXt-M4 | 42.9 | 64.4 | 47.2 | 39.1 | 61.7 | 41.7 | 6.6ms | M4 | M4 |
RepNeXt-M5 | 44.7 | 66.0 | 49.2 | 40.7 | 63.5 | 43.6 | 10.4ms | M5 | M5 |
Model | mIoU | Latency | Ckpt | Log |
---|---|---|---|---|
RepNeXt-M3 | 40.6 | 5.1ms | M3 | M3 |
RepNeXt-M4 | 43.3 | 6.6ms | M4 | M4 |
RepNeXt-M5 | 45.0 | 10.4ms | M5 | M5 |
Run feature map visualization demo:
Original Image | Identity | RepDWConvS | RepDWConvM | DWConvL |
Classification (ImageNet) code base is partly built with LeViT, PoolFormer, EfficientFormer, and RepViT
The detection and segmentation pipeline is from MMCV (MMDetection and MMSegmentation).
Thanks for the great implementations!
If our code or models help your work, please cite our paper:
@misc{zhao2024repnext,
title={RepNeXt: A Fast Multi-Scale CNN using Structural Reparameterization},
author={Mingshu Zhao and Yi Luo and Yong Ouyang},
year={2024},
eprint={2406.16004},
archivePrefix={arXiv},
primaryClass={cs.CV}
}