This repository contains SWAG models from the paper Revisiting Weakly Supervised Pre-Training of Visual Perception Models.
This code has been tested to work with Python 3.8, PyTorch 1.10.1 and torchvision 0.11.2.
Note that CUDA support is not required for the tutorials.
To setup PyTorch and torchvision, please follow PyTorch's getting started instructions. If you are using conda on a linux machine, you can follow the following setup instructions -
conda create --name swag python=3.8
conda activate swag
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
We share checkpoints for all the pretrained models in the paper, and their ImageNet-1k finetuned counterparts. The models are available via torch.hub, and we also share URLs to all the checkpoints.
The details of the models, their torch.hub names / checkpoint links, and their performance on Imagenet-1k (IN-1K) are listed below.
Model | Pretrain Resolution | Pretrained Model | Finetune Resolution | IN-1K Finetuned Model | IN-1K Top-1 | IN-1K Top-5 |
---|---|---|---|---|---|---|
RegNetY 16GF | 224 x 224 | regnety_16gf | 384 x 384 | regnety_16gf_in1k | 86.02% | 98.05% |
RegNetY 32GF | 224 x 224 | regnety_32gf | 384 x 384 | regnety_32gf_in1k | 86.83% | 98.36% |
RegNetY 128GF | 224 x 224 | regnety_128gf | 384 x 384 | regnety_128gf_in1k | 88.23% | 98.69% |
ViT B/16 | 224 x 224 | vit_b16 | 384 x 384 | vit_b16_in1k | 85.29% | 97.65% |
ViT L/16 | 224 x 224 | vit_l16 | 512 x 512 | vit_l16_in1k | 88.07% | 98.51% |
ViT H/14 | 224 x 224 | vit_h14 | 518 x 518 | vit_h14_in1k | 88.55% | 98.69% |
The models can be loaded via torch hub using the following command -
model = torch.hub.load("facebookresearch/swag", model="vit_b16_in1k")
For a tutorial with step-by-step instructions to perform inference, follow our inference tutorial and run it locally, or .
SWAG web demo and docker image is on Replicate. You can try out demo with all the checkpoints here .
SWAG has been integrated into Huggingface Spaces 🤗 using Gradio. Try out the web demo on .
Credits: AK391
We also provide a script to evaluate the accuracy of our models on ImageNet 1K, imagenet_1k_eval.py. This script is a slightly modified version of the PyTorch ImageNet example which supports our models.
To evaluate the RegNetY 16GF IN1K model on a single node (one or more GPUs), one can simply run the following command -
python imagenet_1k_eval.py -m regnety_16gf_in1k -r 384 -b 400 /path/to/imagenet_1k/root/
Note that we specify a 384 x 384
resolution since that was the model's training resolution, and also specify a mini-batch size of 400
, which is distributed over all the GPUs in the node. For larger models or with fewer GPUs, the batch size will need to be reduced. See the PyTorch ImageNet example README for more details.
If you use the SWAG models or if the work is useful in your research, please give us a star and cite:
@inproceedings{singh2022revisiting,
title={{Revisiting Weakly Supervised Pre-Training of Visual Perception Models}},
author={Singh, Mannat and Gustafson, Laura and Adcock, Aaron and Reis, Vinicius de Freitas and Gedik, Bugra and Kosaraju, Raj Prateek and Mahajan, Dhruv and Girshick, Ross and Doll{\'a}r, Piotr and van der Maaten, Laurens},
booktitle={CVPR},
year={2022}
}
SWAG models are released under the CC-BY-NC 4.0 license. See LICENSE for additional details.