Skip to content
/ RepNeXt Public

RepNeXt: A Fast Multi-Scale CNN using Structural Reparameterization

License

Notifications You must be signed in to change notification settings

suous/RepNeXt

Repository files navigation

license arXiv Colab


The top-1 accuracy is tested on ImageNet-1K and the latency is measured by an iPhone 12 with iOS 16 across 20 experimental sets.

RepNeXt: A Fast Multi-Scale CNN using Structural Reparameterization.
Mingshu Zhao, Yi Luo, and Yong Ouyang [arXiv]

architecture

Abstract We introduce RepNeXt, a novel model series integrates multi-scale feature representations and incorporates both serial and parallel structural reparameterization (SRP) to enhance network depth and width without compromising inference speed. Extensive experiments demonstrate RepNeXt's superiority over current leading lightweight CNNs and ViTs, providing advantageous latency across various vision benchmarks. RepNeXt-M4 matches RepViT-M1.5's 82.3% accuracy on ImageNet within 1.5ms on an iPhone 12, outperforms its AP$^{box}$ by 1.1 on MS-COCO, and reduces parameters by 0.7M.

Classification on ImageNet-1K

Models

Model Top-1 (300) #params MACs Latency Ckpt Core ML Log
M1 78.8 4.8M 0.8G 0.86ms fused 300e / 300e 300e distill 300e / 300e
M2 80.1 6.5M 1.1G 1.00ms fused 300e / 300e 300e distill 300e / 300e
M3 80.7 7.8M 1.3G 1.11ms fused 300e / 300e 300e distill 300e / 300e
M4 82.3 13.3M 2.3G 1.48ms fused 300e / 300e 300e distill 300e / 300e
M5 83.3 21.7M 4.5G 2.20ms fused 300e / 300e 300e distill 300e / 300e

Tips: Convert a training-time RepNeXt into the inference-time structure

from timm.models import create_model
import utils

model = create_model('repnext_m1')
utils.replace_batchnorm(model)

Latency Measurement

The latency reported in RepNeXt for iPhone 12 (iOS 16) uses the benchmark tool from XCode 14.

RepNeXt-M1
RepNeXt-M2
RepNeXt-M3
RepNeXt-M4
RepNeXt-M5

Tips: export the model to Core ML model

python export_coreml.py --model repnext_m1 --ckpt pretrain/repnext_m1_distill_300e.pth

Tips: measure the throughput on GPU

python speed_gpu.py --model repnext_m1

ImageNet

Prerequisites

conda virtual environment is recommended.

conda create -n repnext python=3.8
pip install -r requirements.txt

Data preparation

Download and extract ImageNet train and val images from http://image-net.org/. The training and validation data are expected to be in the train folder and val folder respectively:

# script to extract ImageNet dataset: https://github.com/pytorch/examples/blob/main/imagenet/extract_ILSVRC.sh
# ILSVRC2012_img_train.tar (about 138 GB)
# ILSVRC2012_img_val.tar (about 6.3 GB)
# organize the ImageNet dataset as follows:
imagenet
├── train
│   ├── n01440764
│   │   ├── n01440764_10026.JPEG
│   │   ├── n01440764_10027.JPEG
│   │   ├── ......
│   ├── ......
├── val
│   ├── n01440764
│   │   ├── ILSVRC2012_val_00000293.JPEG
│   │   ├── ILSVRC2012_val_00002138.JPEG
│   │   ├── ......
│   ├── ......

Training

To train RepNeXt-M1 on an 8-GPU machine:

python -m torch.distributed.launch --nproc_per_node=8 --master_port 12346 --use_env main.py --model repnext_m1 --data-path ~/imagenet --dist-eval

Tips: specify your data path and model name!

Testing

For example, to test RepNeXt-M1:

python main.py --eval --model repnext_m1 --resume pretrain/repnext_m1_distill_300e.pth --data-path ~/imagenet

Fused model evaluation

For example, to evaluate RepNeXt-M1 with the fused model: Colab

python fuse_eval.py --model repnext_m1 --resume pretrain/repnext_m1_distill_300e_fused.pt --data-path ~/imagenet

Downstream Tasks

Object Detection and Instance Segmentation

Model $AP^b$ $AP_{50}^b$ $AP_{75}^b$ $AP^m$ $AP_{50}^m$ $AP_{75}^m$ Latency Ckpt Log
RepNeXt-M3 40.8 62.4 44.7 37.8 59.5 40.6 5.1ms M3 M3
RepNeXt-M4 42.9 64.4 47.2 39.1 61.7 41.7 6.6ms M4 M4
RepNeXt-M5 44.7 66.0 49.2 40.7 63.5 43.6 10.4ms M5 M5

Semantic Segmentation

Model mIoU Latency Ckpt Log
RepNeXt-M3 40.6 5.1ms M3 M3
RepNeXt-M4 43.3 6.6ms M4 M4
RepNeXt-M5 45.0 10.4ms M5 M5

Feature Map Visualization

Run feature map visualization demo: Colab

Original Image Identity RepDWConvS RepDWConvM DWConvL

Acknowledgement

Classification (ImageNet) code base is partly built with LeViT, PoolFormer, EfficientFormer, and RepViT

The detection and segmentation pipeline is from MMCV (MMDetection and MMSegmentation).

Thanks for the great implementations!

Citation

If our code or models help your work, please cite our paper:

@misc{zhao2024repnext,
      title={RepNeXt: A Fast Multi-Scale CNN using Structural Reparameterization},
      author={Mingshu Zhao and Yi Luo and Yong Ouyang},
      year={2024},
      eprint={2406.16004},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}