Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LR too high for gradient accumulation #3040

Merged
merged 1 commit into from
Nov 26, 2020
Merged

LR too high for gradient accumulation #3040

merged 1 commit into from
Nov 26, 2020

Conversation

marii-moe
Copy link
Collaborator

We were not dividing by the number of batches to accumulate, so this effectively increasing the learning rate. Added test to make sure this is fixed. I think this thread was lost when fastai/fastai2 got moved to fastai/fastai: fastai/fastai2#194

fixes #3023

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@jph00
Copy link
Member

jph00 commented Nov 26, 2020

Nice one!

@jph00 jph00 merged commit 35e5303 into fastai:master Nov 26, 2020
@jph00 jph00 added the bug label Nov 26, 2020
@jph00 jph00 changed the title Fixed lr too high for gradient accumulation fixed #3023 LR too high for gradient accumulation Nov 26, 2020
@muellerzr
Copy link
Contributor

muellerzr commented Nov 26, 2020

I know this is fixed and merged now, but I'm hesitant to use it. I was seeing better results with the old implementation of the high LR and got worse results here. Do you have any advice @marii-moe as to how I should adapt my old LR to work with this new adjustment?

@marii-moe
Copy link
Collaborator Author

I know this is fixed and merged now, but I'm hesitant to use it. I was seeing better results with the old implementation of the high LR and got worse results here. Do you have any advice @marii-moe as to how I should adapt my old LR to work with this new adjustment?

I am not familiar with your particular example, but you should get approximately the same results if you set your learning rate like so:
new_lr = old_lr*n_acc/bs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Gradient Accumulation causes lower lr to be same as non-gradient accumulation
3 participants