This repository contains an op-for-op PyTorch reimplementation of A ConvNet for the 2020s.
Contains MNIST, CIFAR10&CIFAR100, TinyImageNet_200, MiniImageNet_1K, ImageNet_1K, Caltech101&Caltech256 and more etc.
Please refer to README.md
in the data
directory for the method of making a dataset.
Both training and testing only need to modify the config.py
file.
- line 29:
model_arch_name
change toconvnext_tiny
. - line 31:
model_mean_parameters
change to[0.485, 0.456, 0.406]
. - line 32:
model_std_parameters
change to[0.229, 0.224, 0.225]
. - line 34:
model_num_classes
change to1000
. - line 36:
mode
change totest
. - line 91:
model_weights_path
change to./results/pretrained_models/ConvNext_tiny-ImageNet_1K-b03a77c2.pth.tar
.
python3 test.py
- line 29:
model_arch_name
change toconvnext_tiny
. - line 31:
model_mean_parameters
change to[0.485, 0.456, 0.406]
. - line 32:
model_std_parameters
change to[0.229, 0.224, 0.225]
. - line 34:
model_num_classes
change to1000
. - line 36:
mode
change totrain
. - line 51:
pretrained_model_weights_path
change to./results/pretrained_models/ConvNext_tiny-ImageNet_1K-b03a77c2.pth.tar
.
python3 train.py
- line 29:
model_arch_name
change toconvnext_tiny
. - line 31:
model_mean_parameters
change to[0.485, 0.456, 0.406]
. - line 32:
model_std_parameters
change to[0.229, 0.224, 0.225]
. - line 34:
model_num_classes
change to1000
. - line 36:
mode
change totrain
. - line 54:
resume
change to./samples/convnext_tiny-ImageNet_1K/epoch_xxx.pth.tar
.
python3 train.py
Source of original paper results: https://arxiv.org/pdf/2201.03545v2.pdf)
In the following table, the top-x error value in ()
indicates the result of the project, and -
indicates no test.
Model | Dataset | Top-1 error (val) | Top-5 error (val) |
---|---|---|---|
convnext_tiny | ImageNet_1K | 17.9%(17.5%) | -(3.9%) |
convnext_small | ImageNet_1K | 16.9%(16.4%) | -(3.4%) |
convnext_base | ImageNet_1K | 15.9%(15.9%) | -(3.1%) |
convnext_large | ImageNet_1K | 14.5%(15.6%) | -(3.0%) |
# Download `ConvNext_tiny-ImageNet_1K-b03a77c2.pth.tar` weights to `./results/pretrained_models`
# More detail see `README.md<Download weights>`
python3 ./inference.py
Input:
Output:
Build `convnext_tiny` model successfully.
Load `convnext_tiny` model weights `/ConvNext-PyTorch/results/pretrained_models/ConvNext_tiny-ImageNet_1K-b03a77c2.pth.tar` successfully.
tench, Tinca tinca (38.61%)
barracouta, snoek (2.95%)
gar, garfish, garpike, billfish, Lepisosteus osseus (0.53%)
reel (0.52%)
croquet ball (0.36%)
If you find a bug, create a GitHub issue, or even better, submit a pull request. Similarly, if you have questions, simply post them as GitHub issues.
I look forward to seeing what the community does with these models!
Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie
The "Roaring 20s" of visual recognition began with the introduction of Vision Transformers (ViTs), which quickly superseded ConvNets as the state-of-the-art image classification model. A vanilla ViT, on the other hand, faces difficulties when applied to general computer vision tasks such as object detection and semantic segmentation. It is the hierarchical Transformers (e.g., Swin Transformers) that reintroduced several ConvNet priors, making Transformers practically viable as a generic vision backbone and demonstrating remarkable performance on a wide variety of vision tasks. However, the effectiveness of such hybrid approaches is still largely credited to the intrinsic superiority of Transformers, rather than the inherent inductive biases of convolutions. In this work, we reexamine the design spaces and test the limits of what a pure ConvNet can achieve. We gradually "modernize" a standard ResNet toward the design of a vision Transformer, and discover several key components that contribute to the performance difference along the way. The outcome of this exploration is a family of pure ConvNet models dubbed ConvNeXt. Constructed entirely from standard ConvNet modules, ConvNeXts compete favorably with Transformers in terms of accuracy and scalability, achieving 87.8% ImageNet top-1 accuracy and outperforming Swin Transformers on COCO detection and ADE20K segmentation, while maintaining the simplicity and efficiency of standard ConvNets.
@inproceedings{liu2022convnet,
title={A convnet for the 2020s},
author={Liu, Zhuang and Mao, Hanzi and Wu, Chao-Yuan and Feichtenhofer, Christoph and Darrell, Trevor and Xie, Saining},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={11976--11986},
year={2022}
}