Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding pretrained ViT weights #5085

Merged
merged 10 commits into from
Jan 5, 2022
Merged

Adding pretrained ViT weights #5085

merged 10 commits into from
Jan 5, 2022

Conversation

yiwen-song
Copy link
Contributor

@yiwen-song yiwen-song commented Dec 10, 2021

In #4594, we added ViT models' architecture to torchvision prototype.
In this PR, we are going to add pretrained weights for ViT models :D

Model Training Recipe Training Command Training Job ID Epochs Nodes Num of GPUs per node Batch Size per GPU Global batch_size Image Size Representation Size Original Paper Acc@1 Classy Vision Acc@1 ACC@1 ACC@5 Testing Job ID Testing Command
vit_b_16 Close to DeiT PYTHONPATH=$PYTHONPATH:pwd python -u run_with_submitit.py --timeout 3000 --ngpus 8 --nodes 8 --partition train --model vit_b_16 --batch-size 64 --epochs 300 --opt adamw --lr 0.003 --wd 0.3 --lr-scheduler cosineannealinglr --lr-warmup-method linear --lr-warmup-epochs 30 --lr-warmup-decay 0.033 --amp --label-smoothing 0.11 --mixup-alpha 0.2 --auto-augment ra --clip-grad-norm 1 --ra-sampler --cutmix-alpha 1.0 --model-ema 10266 268/300 8 8 64 4096 224 None 77.91 78.98 81.072 95.318 13799 PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --timeout 3000 --ngpus 1 --nodes 1 --partition train --model vit_b_16 --batch-size 1 --test-only --weights ViT_B_16_Weights.ImageNet1K_V1
vit_b_32 Close to DeiT PYTHONPATH=$PYTHONPATH:pwd python -u ~/workspace/scripts/run_with_submitit.py --timeout 3000 --ngpus 8 --nodes 2 --partition train --model vit_b_32 --batch-size 256 --epochs 300 --opt adamw --lr 0.003 --wd 0.3 --lr-scheduler cosineannealinglr --lr-warmup-method linear --lr-warmup-epochs 30 --lr-warmup-decay 0.033 --amp --label-smoothing 0.1 --mixup-alpha 0.2 --auto-augment imagenet --clip-grad-norm 1 --ra-sampler --cutmix-alpha 1.0 --model-ema 10265 291/300 2 8 256 4096 224 None 73.38 73.3 75.912 92.466 13796 PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --timeout 3000 --ngpus 1 --nodes 1 --partition train --model vit_b_32 --batch-size 1 --test-only --weights ViT_B_32_Weights.ImageNet1K_V1
vit_l_16 TorchVision New Recipe PYTHONPATH=$PYTHONPATH:pwd python -u run_with_submitit.py --timeout 3000 --ngpus 8 --nodes 2 --model vit_l_16 --batch-size 64 --lr 0.5 --lr-scheduler cosineannealinglr --lr-warmup-epochs 5 --lr-warmup-method linear --auto-augment ta_wide --epochs 600 --random-erase 0.1 --label-smoothing 0.1 --mixup-alpha 0.2 --cutmix-alpha 1.0 --weight-decay 0.00002 --norm-weight-decay 0.0 --model-ema --val-resize-size 232 --clip-grad-norm 1 --ra-sampler 12107/12349/12567 (due to multiple times of resuming) 378/600 2 8 16 1024 224 None 76.53 76.57 79.662 94.638 13804 PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --timeout 3000 --ngpus 1 --nodes 1 --partition train --model vit_l_16 --batch-size 1 --test-only --weights ViT_L_16_Weights.ImageNet1K_V1
vit_l_32 Close to DeiT PYTHONPATH=$PYTHONPATH:pwd python -u run_with_submitit.py --timeout 3000 --ngpus 8 --nodes 8 --partition train --model vit_l_32 --batch-size 64 --epochs 300 --opt adamw --lr 0.003 --wd 0.3 --lr-scheduler cosineannealinglr --lr-warmup-method linear --lr-warmup-epochs 30 --lr-warmup-decay 0.033 --amp --label-smoothing 0.11 --mixup-alpha 0.2 --auto-augment ra --clip-grad-norm 1 --ra-sampler --cutmix-alpha 1.0 --model-ema 10430 253/300 8 8 64 4096 224 None 71.16 73.49 76.972 93.07 13793 PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --timeout 3000 --ngpus 1 --nodes 1 --partition train --model vit_l_32 --batch-size 1 --test-only --weights ViT_L_32_Weights.ImageNet1K_V1

cc @datumbox

@facebook-github-bot
Copy link

facebook-github-bot commented Dec 10, 2021

💊 CI failures summary and remediations

As of commit 3aee34e (more details on the Dr. CI page):



1 failure not recognized by patterns:

Job Step Action
CircleCI binary_libtorchvision_ops_ios_12.0.0_arm64 Build 🔁 rerun

🚧 3 ongoing upstream failures:

These were probably caused by upstream breakages that are not fixed yet.


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@yiwen-song yiwen-song linked an issue Dec 10, 2021 that may be closed by this pull request
Copy link
Contributor

@datumbox datumbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sallysyw. From #5086 I understand that some of the optimizations applied to all other models were not used here. Can you confirm that you tried them and were not beneficial? If you haven't used them, I would strongly recommend doing a few more runs prior merging this to confirm we pushed the accuracy as high as we can. The above tricks helped a multitude of models including ResNets, RegNets (unpublished, coming soon), MobileNetV3, EfficientNet and ResNeXt.

@yiwen-song
Copy link
Contributor Author

yiwen-song commented Dec 23, 2021

Ok. Let me upload the checkpoint based on the epoch/job-id with the best EMA weights and re-run the testing using the EMA model...

@yiwen-song
Copy link
Contributor Author

yiwen-song commented Dec 31, 2021

Here are the updated version of our best results (all tested with batch-size=1)

Model Training Job ID Epochs Nodes Num of GPUs per node Batch Size per GPU Global batch_size Image Size Representation Size Original Acc@1 ClassyVision Acc@1 ACC@1 ACC@5 Testing Job ID Testing Command
vit_b_16 10266 268/300 8 8 64 4096 224 None 77.91 78.98 81.072 95.318 13799 PYTHONPATH=$PYTHONPATH:`pwd` python -u ~/workspace/scripts/run_with_submitit.py --timeout 3000 --ngpus 1 --nodes 1 --partition train --model vit_b_16 --batch-size 1 --data-path /datasets01_ontap/imagenet_full_size/061417/ --test-only --weights ViT_B_16_Weights.ImageNet1K_V1
vit_b_32 10265 291/300 2 8 256 4096 224 None 73.38 73.3 75.912 92.466 13796 PYTHONPATH=$PYTHONPATH:`pwd` python -u ~/workspace/scripts/run_with_submitit.py --timeout 3000 --ngpus 1 --nodes 1 --partition train --model vit_b_32 --batch-size 1 --data-path /datasets01_ontap/imagenet_full_size/061417/ --test-only --weights ViT_B_32_Weights.ImageNet1K_V1
vit_l_16 12349 378/600 2 8 16 1024 224 None 76.53 76.57 79.662 94.638 13804 PYTHONPATH=$PYTHONPATH:`pwd` python -u ~/workspace/scripts/run_with_submitit.py --timeout 3000 --ngpus 1 --nodes 1 --partition train --model vit_l_16 --batch-size 1 --data-path /datasets01_ontap/imagenet_full_size/061417/ --test-only --weights ViT_L_16_Weights.ImageNet1K_V1
vit_l_32 10430 253/300 8 8 64 4096 224 None 71.16 73.49 76.972 93.07 13793 PYTHONPATH=$PYTHONPATH:`pwd` python -u ~/workspace/scripts/run_with_submitit.py --timeout 3000 --ngpus 1 --nodes 1 --partition train --model vit_l_32 --batch-size 1 --data-path /datasets01_ontap/imagenet_full_size/061417/ --test-only --weights ViT_L_32_Weights.ImageNet1K_V1

@datumbox
Copy link
Contributor

datumbox commented Jan 2, 2022

@sallysyw Thanks for the clarifications and for making the necessary changes.

If I understand correctly, vit_l_16 is produced using the new TorchVision recipe while vit_b_16, vit_b_32 and vit_l_32 use one that is closer to DeIT. If that's correct, we need to update the readme on the references and your table to reflect that.

For vit_l_16 (jobids 12107/12349/12567) the exact config I used was:

 --ngpus 8 --nodes 2 --model vit_l_16 --batch-size 64 --lr 0.5 --lr-scheduler cosineannealinglr --lr-warmup-epochs 5 --lr-warmup-method linear --auto-augment ta_wide --epochs 600 --random-erase 0.1 --label-smoothing 0.1 --mixup-alpha 0.2 --cutmix-alpha 1.0 --weight-decay 0.00002 --norm-weight-decay 0.0 --model-ema --val-resize-size 232 --clip-grad-norm 1 --ra-sampler

Moreover, there are a few more changes required in the code now that you introduce weights. You need to replace the @handle_legacy_interface(weights=("pretrained", None)) with the default value of each model. The value None indicated that there was no default pre-trained value. Instead now we should pass the appropriate value for each model, something like ViT_XYZ_Weights.ImageNet1K_V1

With the above changes, we are getting close to being able to merge the PR. Please note that before doing so, we will need to finish some of the pending steps such as deploying the models on Manifold, adding them on the torchvision/models folders on our AWS infra. Let's finish these prior merging. I'll send you offline the guide that describes the model release process.

Copy link
Contributor

@datumbox datumbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks a lot @sallysyw.

@datumbox datumbox merged commit df628c4 into pytorch:main Jan 5, 2022
facebook-github-bot pushed a commit that referenced this pull request Jan 5, 2022
Summary:
* Adding pretrained ViT weights

* Adding recipe as part of meta

* update checkpoints using best ema results

* Fix handle_legacy_interface and update recipe url

* Update README

Reviewed By: datumbox

Differential Revision: D33426965

fbshipit-source-id: 753ce1d1318df3d47da181db06b35b770de26ffc
@yiwen-song yiwen-song deleted the weights branch January 6, 2022 18:40
facebook-github-bot pushed a commit that referenced this pull request Jan 6, 2022
Differential Revision:
D33426965

Original commit changeset: 753ce1d1318d

Original Phabricator Diff: D33426965

fbshipit-source-id: db9a9f51c5365b2dd9c002aa681da0be33b3cb7d
facebook-github-bot pushed a commit that referenced this pull request Jan 8, 2022
Summary:
* Adding pretrained ViT weights

* Adding recipe as part of meta

* update checkpoints using best ema results

* Fix handle_legacy_interface and update recipe url

* Update README

Reviewed By: sallysyw

Differential Revision: D33479262

fbshipit-source-id: 20d344db0961ed8ae12104c509ebddd17179d286
@xiaohu2015
Copy link
Contributor

@sallysyw Is is a plan to add vit-tiny and vit-small (DeiT-Ti, DeiT-Small)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adding Vision Transformer to torchvision/models
4 participants