How to set appropriate learning rate ? #16

sleeplessai · 2020-10-20T04:15:31Z

vit = ViT( image_size=448, patch_size=32, num_classes=180, dim=1024, depth=8, heads=8, mlp_dim=2048, dropout=0.5, emb_dropout=0.5 ).cuda()
optimizer = torch.optim.Adam(vit.parameters(), lr=5e-3, weight_decay=0.1)
I tried to train ViT on a 180-class dataset and used the shown config but loss doesn't descend during training.
Any suggestion to solve ?

The text was updated successfully, but these errors were encountered:

lessw2020 · 2020-10-20T04:31:28Z

Without seeing your dataset,I would guess your lr of 5e-3 is too low. Try 4e-5 or even 3e-1 to start.

lucidrains · 2020-10-20T04:39:55Z

@sleeplessai I believe your dropout is way too high, I would set it to be 0.1 at most

sleeplessai · 2020-10-20T06:47:49Z

Without seeing your dataset,I would guess your lr of 5e-3 is too low. Try 4e-5 or even 3e-1 to start.
@sleeplessai I believe your dropout is way too high, I would set it to be 0.1 at most

@lucidrains @lessw2020
Trying these settings.
The loss function still cannot descend normally while I adjust LR like 1e-1, 3e-2 and 2e-4 and fix dropout rate to 0.1 and 0.2.

lucidrains · 2020-10-20T07:17:22Z

@sleeplessai Hmm, I would try 0.1 dropout, 3e-4 learning rate, patch size of 16

How big is your training set?

sleeplessai · 2020-10-21T05:16:08Z

@lucidrains I run ViT training on two datasets; one contains 6k images to train and other one has around 5k images. Both of them has labeled, 150 and 180 categories respectively.
I observed that the Transformer on vision task is hard to train, while I am experimenting to transfer from original CNN to Transformer. How to break this gap between vision and language task in your opinion by designing suitable Transformer ?

lucidrains · 2020-10-21T06:04:09Z

@sleeplessai so you are definitely not working with a lot of data

Fortunately, self-supervised learning is taking off, so you could follow the instructions https://github.com/lucidrains/vit-pytorch#self-supervised-training and train on a bunch of images you scrape from the internet, and train on your labelled dataset at the end

lucidrains · 2020-10-21T06:05:42Z

@sleeplessai You could also just wait until the paper has been reviewed, and fine-tune from the pre-trained model once that is released, probably by google

lucidrains closed this as completed Oct 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to set appropriate learning rate ? #16

How to set appropriate learning rate ? #16

sleeplessai commented Oct 20, 2020

lessw2020 commented Oct 20, 2020

lucidrains commented Oct 20, 2020

sleeplessai commented Oct 20, 2020 •

edited

lucidrains commented Oct 20, 2020 •

edited

sleeplessai commented Oct 21, 2020

lucidrains commented Oct 21, 2020 •

edited

lucidrains commented Oct 21, 2020

How to set appropriate learning rate ? #16

How to set appropriate learning rate ? #16

Comments

sleeplessai commented Oct 20, 2020

lessw2020 commented Oct 20, 2020

lucidrains commented Oct 20, 2020

sleeplessai commented Oct 20, 2020 • edited

lucidrains commented Oct 20, 2020 • edited

sleeplessai commented Oct 21, 2020

lucidrains commented Oct 21, 2020 • edited

lucidrains commented Oct 21, 2020

sleeplessai commented Oct 20, 2020 •

edited

lucidrains commented Oct 20, 2020 •

edited

lucidrains commented Oct 21, 2020 •

edited