You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sorry, I didn't notice it: flat + cosine anneal training curve. So, Is it mean don't change LR ~0.72% of steps and after it to anneal LR with a cosine function, right?
Hi @avostryakov,
Correct, we found a cosine anneal after 72% or so works best with Ranger.
There is no need for warmup with Ranger - it uses the RAdam rectifier to monitor that variance automatically.
Note, as a bit of a preview, currently testing a new calibrated adaptive lr for Ranger right now...so may have an updated version in a few days.
Hope this info helps!
Less
For AdamW people usually add some sort of learning rate decay: linear, cosine triangle, etc. Also, warm up steps are also popular.
Do we need all of these with Ranger or just use a fixed learning rate?
The text was updated successfully, but these errors were encountered: