ViT-B Training for DeiT #233

ziqipang · 2023-08-21T22:46:05Z

Thank you for your excellent work and for sharing the code! I learned a lot from what you have described.

Recently, I have been trying to use DeiT to train a plan ViT-Base model. I could follow the documentation to reproduce the ViT-Tiny and ViT-Small performance, but the same training procedure on ViT-Base has the accuracy of 78.9% on ImageNet1K, which is even worse than ViT-Small.

Therefore, I am wondering what could be the hidden tricks for training a good ViT-Base. Could you please share some hints? Thank you so much for the help!

Alihjt · 2024-02-26T11:58:54Z

Hey.
Did you find any configs?

ziqipang · 2024-03-05T15:39:25Z

@Alihjt No luck. One thing I found was that per-gpu batch size seemed to influence the numerical stability (acc improved with a smaller per-gpu batch size). Although I didn't had the chance to verify or explain the reason, using 16GPUs x 64 images per GPU would give a better performance than my previous run (8 GPUs x 128 images per GPU).

ziqipang closed this as completed Mar 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ViT-B Training for DeiT #233

ViT-B Training for DeiT #233

ziqipang commented Aug 21, 2023

Alihjt commented Feb 26, 2024

ziqipang commented Mar 5, 2024 •

edited

Loading

ViT-B Training for DeiT #233

ViT-B Training for DeiT #233

Comments

ziqipang commented Aug 21, 2023

Alihjt commented Feb 26, 2024

ziqipang commented Mar 5, 2024 • edited Loading

ziqipang commented Mar 5, 2024 •

edited

Loading