Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do we need some kind of Learning rate decay with Ranger? #12

Closed
avostryakov opened this issue Sep 13, 2019 · 3 comments
Closed

Do we need some kind of Learning rate decay with Ranger? #12

avostryakov opened this issue Sep 13, 2019 · 3 comments

Comments

@avostryakov
Copy link

For AdamW people usually add some sort of learning rate decay: linear, cosine triangle, etc. Also, warm up steps are also popular.

Do we need all of these with Ranger or just use a fixed learning rate?

@avostryakov
Copy link
Author

Sorry, I didn't notice it: flat + cosine anneal training curve. So, Is it mean don't change LR ~0.72% of steps and after it to anneal LR with a cosine function, right?

@lessw2020
Copy link
Owner

Hi @avostryakov,
Correct, we found a cosine anneal after 72% or so works best with Ranger.
There is no need for warmup with Ranger - it uses the RAdam rectifier to monitor that variance automatically.
Note, as a bit of a preview, currently testing a new calibrated adaptive lr for Ranger right now...so may have an updated version in a few days.
Hope this info helps!
Less

@avostryakov
Copy link
Author

Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants