RepNeXt: A Fast Multi-Scale CNN using Structural Reparameterization

The top-1 accuracy is tested on ImageNet-1K and the latency is measured by an iPhone 12 with iOS 16 across 20 experimental sets.

RepNeXt: A Fast Multi-Scale CNN using Structural Reparameterization.
Mingshu Zhao, Yi Luo, and Yong Ouyang [arXiv]

Abstract

We introduce RepNeXt, a novel model series integrates multi-scale feature representations and incorporates both serial and parallel structural reparameterization (SRP) to enhance network depth and width without compromising inference speed. Extensive experiments demonstrate RepNeXt's superiority over current leading lightweight CNNs and ViTs, providing advantageous latency across various vision benchmarks. RepNeXt-M4 matches RepViT-M1.5's 82.3% accuracy on ImageNet within 1.5ms on an iPhone 12, outperforms its AP$^{box}$ by 1.1 on MS-COCO, and reduces parameters by 0.7M.

Classification on ImageNet-1K

Models

Model	Top-1 (300)	#params	MACs	Latency	Ckpt	Core ML	Log
M1	78.8	4.8M	0.8G	0.86ms	fused 300e / 300e	300e	distill 300e / 300e
M2	80.1	6.5M	1.1G	1.00ms	fused 300e / 300e	300e	distill 300e / 300e
M3	80.7	7.8M	1.3G	1.11ms	fused 300e / 300e	300e	distill 300e / 300e
M4	82.3	13.3M	2.3G	1.48ms	fused 300e / 300e	300e	distill 300e / 300e
M5	83.3	21.7M	4.5G	2.20ms	fused 300e / 300e	300e	distill 300e / 300e

Tips: Convert a training-time RepNeXt into the inference-time structure

from timm.models import create_model
import utils

model = create_model('repnext_m1')
utils.replace_batchnorm(model)

Latency Measurement

The latency reported in RepNeXt for iPhone 12 (iOS 16) uses the benchmark tool from XCode 14.

RepNeXt-M1

RepNeXt-M2

RepNeXt-M3

RepNeXt-M4

RepNeXt-M5

Tips: export the model to Core ML model

python export_coreml.py --model repnext_m1 --ckpt pretrain/repnext_m1_distill_300e.pth

Tips: measure the throughput on GPU

python speed_gpu.py --model repnext_m1

ImageNet

Prerequisites

conda virtual environment is recommended.

conda create -n repnext python=3.8
pip install -r requirements.txt

Data preparation

Download and extract ImageNet train and val images from http://image-net.org/. The training and validation data are expected to be in the train folder and val folder respectively:

# script to extract ImageNet dataset: https://github.com/pytorch/examples/blob/main/imagenet/extract_ILSVRC.sh
# ILSVRC2012_img_train.tar (about 138 GB)
# ILSVRC2012_img_val.tar (about 6.3 GB)

# organize the ImageNet dataset as follows:
imagenet
├── train
│   ├── n01440764
│   │   ├── n01440764_10026.JPEG
│   │   ├── n01440764_10027.JPEG
│   │   ├── ......
│   ├── ......
├── val
│   ├── n01440764
│   │   ├── ILSVRC2012_val_00000293.JPEG
│   │   ├── ILSVRC2012_val_00002138.JPEG
│   │   ├── ......
│   ├── ......

Training

To train RepNeXt-M1 on an 8-GPU machine:

python -m torch.distributed.launch --nproc_per_node=8 --master_port 12346 --use_env main.py --model repnext_m1 --data-path ~/imagenet --dist-eval

Tips: specify your data path and model name!

Testing

For example, to test RepNeXt-M1:

python main.py --eval --model repnext_m1 --resume pretrain/repnext_m1_distill_300e.pth --data-path ~/imagenet

Fused model evaluation

For example, to evaluate RepNeXt-M1 with the fused model:

python fuse_eval.py --model repnext_m1 --resume pretrain/repnext_m1_distill_300e_fused.pt --data-path ~/imagenet

Downstream Tasks

Object Detection and Instance Segmentation

Model	$AP^b$	$AP_{50}^b$	$AP_{75}^b$	$AP^m$	$AP_{50}^m$	$AP_{75}^m$	Latency	Ckpt	Log
RepNeXt-M3	40.8	62.4	44.7	37.8	59.5	40.6	5.1ms	M3	M3
RepNeXt-M4	42.9	64.4	47.2	39.1	61.7	41.7	6.6ms	M4	M4
RepNeXt-M5	44.7	66.0	49.2	40.7	63.5	43.6	10.4ms	M5	M5

Semantic Segmentation

Model	mIoU	Latency	Ckpt	Log
RepNeXt-M3	40.6	5.1ms	M3	M3
RepNeXt-M4	43.3	6.6ms	M4	M4
RepNeXt-M5	45.0	10.4ms	M5	M5

Feature Map Visualization

Run feature map visualization demo:

Original Image	Identity	RepDWConvS	RepDWConvM	DWConvL

Acknowledgement

Classification (ImageNet) code base is partly built with LeViT, PoolFormer, EfficientFormer, and RepViT

The detection and segmentation pipeline is from MMCV (MMDetection and MMSegmentation).

Thanks for the great implementations!

Citation

If our code or models help your work, please cite our paper:

@misc{zhao2024repnext,
      title={RepNeXt: A Fast Multi-Scale CNN using Structural Reparameterization},
      author={Mingshu Zhao and Yi Luo and Yong Ouyang},
      year={2024},
      eprint={2406.16004},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
data		data
demo		demo
detection		detection
figures		figures
logs		logs
model		model
segmentation		segmentation
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
engine.py		engine.py
eval.sh		eval.sh
export_coreml.py		export_coreml.py
flops.py		flops.py
fuse_eval.py		fuse_eval.py
losses.py		losses.py
main.py		main.py
requirements.txt		requirements.txt
speed_gpu.py		speed_gpu.py
train.sh		train.sh
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RepNeXt: A Fast Multi-Scale CNN using Structural Reparameterization

Classification on ImageNet-1K

Models

Latency Measurement

ImageNet

Prerequisites

Data preparation

Training

Testing

Fused model evaluation

Downstream Tasks

Feature Map Visualization

Acknowledgement

Citation

About

Releases 1

Packages

Languages

License

suous/RepNeXt

Folders and files

Latest commit

History

Repository files navigation

RepNeXt: A Fast Multi-Scale CNN using Structural Reparameterization

Classification on ImageNet-1K

Models

Latency Measurement

ImageNet

Prerequisites

Data preparation

Training

Testing

Fused model evaluation

Downstream Tasks

Feature Map Visualization

Acknowledgement

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages