-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pre-trained results of MAE #1
Comments
Hi @lucasliunju Yes, I did run MAE pretraining + linear probe experiments on Base and Large architectures, although without gradient accumulation (I have to run those experiments too, but haven't had a chance yet). Base reached 63% and ViT-L/16 got 69% accuracy in linear probe experiments. This is in comparison to official results in paper for ViT-L/16, which reached a linear probe accuracy of 73.5%. I do have the pretrained weights which I do intend to release publicly, I just haven't had time to do so yet. I believe running gradient accumulation will close this gap, but I'm not certain when and if I'll have the capacity to do those experiments. |
Dear SarthakYadav, Thanks for your reply. Maybe I can help you to test it. May I ask what your mean about gradient accumulation is? I noticed current batch size is 128*8. Best, |
Thank you for awesome work! Thank you! |
Thank you very much for your contribution. I think that will help the whole jax community about MAE training.
May I ask whether the repo can reproduce the results on the MAE paper, such as the comparison between this repo and official results?
Thanks for your contribution again!
Best,
Lucas
The text was updated successfully, but these errors were encountered: