Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-trained results of MAE #1

Open
lucasliunju opened this issue May 27, 2022 · 3 comments
Open

Pre-trained results of MAE #1

lucasliunju opened this issue May 27, 2022 · 3 comments

Comments

@lucasliunju
Copy link

Thank you very much for your contribution. I think that will help the whole jax community about MAE training.

May I ask whether the repo can reproduce the results on the MAE paper, such as the comparison between this repo and official results?

Thanks for your contribution again!

Best,
Lucas

@SarthakYadav
Copy link
Owner

Hi @lucasliunju

Yes, I did run MAE pretraining + linear probe experiments on Base and Large architectures, although without gradient accumulation (I have to run those experiments too, but haven't had a chance yet).

Base reached 63% and ViT-L/16 got 69% accuracy in linear probe experiments. This is in comparison to official results in paper for ViT-L/16, which reached a linear probe accuracy of 73.5%. I do have the pretrained weights which I do intend to release publicly, I just haven't had time to do so yet.

I believe running gradient accumulation will close this gap, but I'm not certain when and if I'll have the capacity to do those experiments.

@lucasliunju
Copy link
Author

Dear SarthakYadav,

Thanks for your reply. Maybe I can help you to test it. May I ask what your mean about gradient accumulation is? I noticed current batch size is 128*8.

Best,
Yong

@snoop2head
Copy link

@SarthakYadav

Thank you for awesome work!
May I ask training loss or validation loss value when training on ImageNet1K?

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants