Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to set appropriate learning rate ? #16

Closed
sleeplessai opened this issue Oct 20, 2020 · 7 comments
Closed

How to set appropriate learning rate ? #16

sleeplessai opened this issue Oct 20, 2020 · 7 comments

Comments

@sleeplessai
Copy link

vit = ViT( image_size=448, patch_size=32, num_classes=180, dim=1024, depth=8, heads=8, mlp_dim=2048, dropout=0.5, emb_dropout=0.5 ).cuda()
optimizer = torch.optim.Adam(vit.parameters(), lr=5e-3, weight_decay=0.1)
I tried to train ViT on a 180-class dataset and used the shown config but loss doesn't descend during training.
Any suggestion to solve ?

@lessw2020
Copy link

Without seeing your dataset,I would guess your lr of 5e-3 is too low. Try 4e-5 or even 3e-1 to start.

@lucidrains
Copy link
Owner

@sleeplessai I believe your dropout is way too high, I would set it to be 0.1 at most

@sleeplessai
Copy link
Author

sleeplessai commented Oct 20, 2020

Without seeing your dataset,I would guess your lr of 5e-3 is too low. Try 4e-5 or even 3e-1 to start.
@sleeplessai I believe your dropout is way too high, I would set it to be 0.1 at most

@lucidrains @lessw2020
Trying these settings.
The loss function still cannot descend normally while I adjust LR like 1e-1, 3e-2 and 2e-4 and fix dropout rate to 0.1 and 0.2.

@lucidrains
Copy link
Owner

lucidrains commented Oct 20, 2020

@sleeplessai Hmm, I would try 0.1 dropout, 3e-4 learning rate, patch size of 16

How big is your training set?

@sleeplessai
Copy link
Author

@lucidrains I run ViT training on two datasets; one contains 6k images to train and other one has around 5k images. Both of them has labeled, 150 and 180 categories respectively.
I observed that the Transformer on vision task is hard to train, while I am experimenting to transfer from original CNN to Transformer. How to break this gap between vision and language task in your opinion by designing suitable Transformer ?

@lucidrains
Copy link
Owner

lucidrains commented Oct 21, 2020

@sleeplessai so you are definitely not working with a lot of data

Fortunately, self-supervised learning is taking off, so you could follow the instructions https://github.com/lucidrains/vit-pytorch#self-supervised-training and train on a bunch of images you scrape from the internet, and train on your labelled dataset at the end

@lucidrains
Copy link
Owner

@sleeplessai You could also just wait until the paper has been reviewed, and fine-tune from the pre-trained model once that is released, probably by google

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants