New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to set appropriate learning rate ? #16
Comments
Without seeing your dataset,I would guess your lr of 5e-3 is too low. Try 4e-5 or even 3e-1 to start. |
@sleeplessai I believe your dropout is way too high, I would set it to be |
@lucidrains @lessw2020 |
@sleeplessai Hmm, I would try 0.1 dropout, 3e-4 learning rate, patch size of 16 How big is your training set? |
@lucidrains I run ViT training on two datasets; one contains 6k images to train and other one has around 5k images. Both of them has labeled, 150 and 180 categories respectively. |
@sleeplessai so you are definitely not working with a lot of data Fortunately, self-supervised learning is taking off, so you could follow the instructions https://github.com/lucidrains/vit-pytorch#self-supervised-training and train on a bunch of images you scrape from the internet, and train on your labelled dataset at the end |
@sleeplessai You could also just wait until the paper has been reviewed, and fine-tune from the pre-trained model once that is released, probably by google |
vit = ViT( image_size=448, patch_size=32, num_classes=180, dim=1024, depth=8, heads=8, mlp_dim=2048, dropout=0.5, emb_dropout=0.5 ).cuda()
optimizer = torch.optim.Adam(vit.parameters(), lr=5e-3, weight_decay=0.1)
I tried to train ViT on a 180-class dataset and used the shown config but loss doesn't descend during training.
Any suggestion to solve ?
The text was updated successfully, but these errors were encountered: