You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Mar 15, 2024. It is now read-only.
Thank you for your excellent work and for sharing the code! I learned a lot from what you have described.
Recently, I have been trying to use DeiT to train a plan ViT-Base model. I could follow the documentation to reproduce the ViT-Tiny and ViT-Small performance, but the same training procedure on ViT-Base has the accuracy of 78.9% on ImageNet1K, which is even worse than ViT-Small.
Therefore, I am wondering what could be the hidden tricks for training a good ViT-Base. Could you please share some hints? Thank you so much for the help!
The text was updated successfully, but these errors were encountered:
@Alihjt No luck. One thing I found was that per-gpu batch size seemed to influence the numerical stability (acc improved with a smaller per-gpu batch size). Although I didn't had the chance to verify or explain the reason, using 16GPUs x 64 images per GPU would give a better performance than my previous run (8 GPUs x 128 images per GPU).
Thank you for your excellent work and for sharing the code! I learned a lot from what you have described.
Recently, I have been trying to use DeiT to train a plan ViT-Base model. I could follow the documentation to reproduce the ViT-Tiny and ViT-Small performance, but the same training procedure on ViT-Base has the accuracy of 78.9% on ImageNet1K, which is even worse than ViT-Small.
Therefore, I am wondering what could be the hidden tricks for training a good ViT-Base. Could you please share some hints? Thank you so much for the help!
The text was updated successfully, but these errors were encountered: