Skip to content
Ranger - a synergistic optimizer using RAdam (Rectified Adam) and LookAhead in one codebase
Python
Branch: master
Clone or download
lessw2020 QH = Quasi-Hyperbolic
uses the Quasi Hyperbolic Momentum from https://arxiv.org/abs/1810.06801v4.
Defaults of nus = (.7, 1.0) work pretty well but still testing and tuning.
It is not quite as fast as original Ranger but training curves are smoother and appear to be better for extended training durations.
Was able to get a new high for one set of 80 epochs on ImageWoof but still training/tuning.
Latest commit 7fbbaec Sep 30, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
LICENSE Initial commit Aug 19, 2019
README.md Update README.md Sep 13, 2019
ranger.py New version 9.3.19 Sep 4, 2019
ranger913A.py Beta - uses calibrated anisotropic alr Sep 13, 2019
rangerqh.py QH = Quasi-Hyperbolic Oct 1, 2019

README.md

Ranger-Deep-Learning-Optimizer

Ranger - a synergistic optimizer combining RAdam (Rectified Adam) and LookAhead in one codebase.

Latest version 9.3.19 - full refactoring for slow weights and one pass handling (vs two before). Refactor should eliminate any random save/load issues regarding memory.

///////////////////////

Beta Version - Ranger913A.py:

For anyone who wants to try this out early, this version changes from RAdam to using calibrated anistropic adaptive learning rate per this paper:

https://arxiv.org/abs/1908.00700v2

"Empirical studies support our observation of the anisotropic A-LR and show that the proposed methods outperform existing AGMs and generalize even better than S-Momentum in multiple deep learning tasks."

Initial testing looks very good for training stabilization. Any feedback in comparsion with current Ranger (9.3.19) is welcome!

/////////////////////

Medium article with more info:
https://medium.com/@lessw/new-deep-learning-optimizer-ranger-synergistic-combination-of-radam-lookahead-for-the-best-of-2dc83f79a48d

Multiple updates: 1 - Ranger is the optimizer we used to beat the high scores for 12 different categories on the FastAI leaderboards! (Previous records all held with AdamW optimizer).

2 - Highly recommend combining Ranger with: Mish activation function, and flat+ cosine anneal training curve.

3 - Based on that, also found .95 is better than .90 for beta1 (momentum) param (ala betas=(0.95, 0.999)).

Fixes: 1 - Differential Group learning rates now supported. This was fix in RAdam and ported here thanks to @sholderbach. 2 - save and then load may leave first run weights stranded in memory, slowing down future runs = fixed.

Usage and notebook to test are available here: https://github.com/lessw2020/Ranger-Mish-ImageWoof-5

You can’t perform that action at this time.