Skip to content
This repository has been archived by the owner on Mar 15, 2024. It is now read-only.

ViT-B Training for DeiT #233

Closed
ziqipang opened this issue Aug 21, 2023 · 2 comments
Closed

ViT-B Training for DeiT #233

ziqipang opened this issue Aug 21, 2023 · 2 comments

Comments

@ziqipang
Copy link

Thank you for your excellent work and for sharing the code! I learned a lot from what you have described.

Recently, I have been trying to use DeiT to train a plan ViT-Base model. I could follow the documentation to reproduce the ViT-Tiny and ViT-Small performance, but the same training procedure on ViT-Base has the accuracy of 78.9% on ImageNet1K, which is even worse than ViT-Small.

Therefore, I am wondering what could be the hidden tricks for training a good ViT-Base. Could you please share some hints? Thank you so much for the help!

@Alihjt
Copy link

Alihjt commented Feb 26, 2024

Hey.
Did you find any configs?

@ziqipang
Copy link
Author

ziqipang commented Mar 5, 2024

@Alihjt No luck. One thing I found was that per-gpu batch size seemed to influence the numerical stability (acc improved with a smaller per-gpu batch size). Although I didn't had the chance to verify or explain the reason, using 16GPUs x 64 images per GPU would give a better performance than my previous run (8 GPUs x 128 images per GPU).

@ziqipang ziqipang closed this as completed Mar 5, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants