-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
train ViT-b16 from scratch on Imagenet #153
Comments
@andsteing Thank you for your answer
I'll be back after 3 days for train... :) |
|
Hi @andsteing. I've discovered a few things while doing the experiment, and I'd like to hear your opinions.
|
Hi, @andsteing . I'm trying to train ViT-b16 from scratch on ImageNet using SAM optimizer. Could you share your training details? |
@xiangning-chen who added SAM checkpoints in #119 |
Hi @yzlnew , how many machines are you using to train the model? This essentially determine the distributed level of SAM, which corresponds to the m-sharpness discussed in section 4.1 here. For my experiments, I used 64 TPU chips. If you are using fewer machines, my experience is to enlarge the rho. |
@xiangning-chen Thanks for clarification! I have tried training on 4/8/32 A100 cards with rho=0.2. And I also notice that a larger rho can improve the performance in other experiments. |
Thanks for your work and Detailed answer in issues.
I am reproducing the ViT B-16 in Tensorflow based on your Paper and answers. (In this issue, I only deal with original ViT paper)
But I just reached about 47% on imagenet1k validation set (Upstream)
I want to know if my experimental conditions are incorrect, and I hope this issue helps others to reproduce ViT from cratch.
Here is my train_loss, val_loss, train_acc, val_acc, lr curve
Train loss
Val Loss
Train acc
Val acc
LR
I don't know if you can afford to look at the code, but here is my code .
The text was updated successfully, but these errors were encountered: