ConvNext-PyTorch

Overview

This repository contains an op-for-op PyTorch reimplementation of A ConvNet for the 2020s.

Download weights

Download datasets

Contains MNIST, CIFAR10&CIFAR100, TinyImageNet_200, MiniImageNet_1K, ImageNet_1K, Caltech101&Caltech256 and more etc.

Please refer to README.md in the data directory for the method of making a dataset.

How Test and Train

Both training and testing only need to modify the config.py file.

Test

line 29: model_arch_name change to convnext_tiny.
line 31: model_mean_parameters change to [0.485, 0.456, 0.406].
line 32: model_std_parameters change to [0.229, 0.224, 0.225].
line 34: model_num_classes change to 1000.
line 36: mode change to test.
line 91: model_weights_path change to ./results/pretrained_models/ConvNext_tiny-ImageNet_1K-b03a77c2.pth.tar.

python3 test.py

Train model

line 29: model_arch_name change to convnext_tiny.
line 31: model_mean_parameters change to [0.485, 0.456, 0.406].
line 32: model_std_parameters change to [0.229, 0.224, 0.225].
line 34: model_num_classes change to 1000.
line 36: mode change to train.
line 51: pretrained_model_weights_path change to ./results/pretrained_models/ConvNext_tiny-ImageNet_1K-b03a77c2.pth.tar.

python3 train.py

Resume train model

line 29: model_arch_name change to convnext_tiny.
line 31: model_mean_parameters change to [0.485, 0.456, 0.406].
line 32: model_std_parameters change to [0.229, 0.224, 0.225].
line 34: model_num_classes change to 1000.
line 36: mode change to train.
line 54: resume change to ./samples/convnext_tiny-ImageNet_1K/epoch_xxx.pth.tar.

python3 train.py

Result

Source of original paper results: https://arxiv.org/pdf/2201.03545v2.pdf)

In the following table, the top-x error value in () indicates the result of the project, and - indicates no test.

Model	Dataset	Top-1 error (val)	Top-5 error (val)
convnext_tiny	ImageNet_1K	17.9%(17.5%)	-(3.9%)
convnext_small	ImageNet_1K	16.9%(16.4%)	-(3.4%)
convnext_base	ImageNet_1K	15.9%(15.9%)	-(3.1%)
convnext_large	ImageNet_1K	14.5%(15.6%)	-(3.0%)

# Download `ConvNext_tiny-ImageNet_1K-b03a77c2.pth.tar` weights to `./results/pretrained_models`
# More detail see `README.md<Download weights>`
python3 ./inference.py

Input:

Output:

Build `convnext_tiny` model successfully.
Load `convnext_tiny` model weights `/ConvNext-PyTorch/results/pretrained_models/ConvNext_tiny-ImageNet_1K-b03a77c2.pth.tar` successfully.
tench, Tinca tinca                                                          (38.61%)
barracouta, snoek                                                           (2.95%)
gar, garfish, garpike, billfish, Lepisosteus osseus                         (0.53%)
reel                                                                        (0.52%)
croquet ball                                                                (0.36%)

Contributing

If you find a bug, create a GitHub issue, or even better, submit a pull request. Similarly, if you have questions, simply post them as GitHub issues.

I look forward to seeing what the community does with these models!

Credit

A ConvNet for the 2020s

Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie

Abstract

The "Roaring 20s" of visual recognition began with the introduction of Vision Transformers (ViTs), which quickly superseded ConvNets as the state-of-the-art image classification model. A vanilla ViT, on the other hand, faces difficulties when applied to general computer vision tasks such as object detection and semantic segmentation. It is the hierarchical Transformers (e.g., Swin Transformers) that reintroduced several ConvNet priors, making Transformers practically viable as a generic vision backbone and demonstrating remarkable performance on a wide variety of vision tasks. However, the effectiveness of such hybrid approaches is still largely credited to the intrinsic superiority of Transformers, rather than the inherent inductive biases of convolutions. In this work, we reexamine the design spaces and test the limits of what a pure ConvNet can achieve. We gradually "modernize" a standard ResNet toward the design of a vision Transformer, and discover several key components that contribute to the performance difference along the way. The outcome of this exploration is a family of pure ConvNet models dubbed ConvNeXt. Constructed entirely from standard ConvNet modules, ConvNeXts compete favorably with Transformers in terms of accuracy and scalability, achieving 87.8% ImageNet top-1 accuracy and outperforming Swin Transformers on COCO detection and ADE20K segmentation, while maintaining the simplicity and efficiency of standard ConvNets.

[Paper]

@inproceedings{liu2022convnet,
  title={A convnet for the 2020s},
  author={Liu, Zhuang and Mao, Hanzi and Wu, Chao-Yuan and Feichtenhofer, Christoph and Darrell, Trevor and Xie, Saining},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={11976--11986},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
figure		figure
results		results
samples		samples
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.py		config.py
dataset.py		dataset.py
imgproc.py		imgproc.py
inference.py		inference.py
model.py		model.py
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py
utils.py		utils.py

License

Lornatang/ConvNet-PyTorch

Folders and files

Latest commit

History

Repository files navigation

ConvNext-PyTorch

Overview

Table of contents

Download weights

Download datasets

How Test and Train

Test

Train model

Resume train model

Result

Contributing

Credit

A ConvNet for the 2020s

Abstract

About

Resources

License

Stars

Watchers

Forks

Languages