-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
D-Adaptation and Prodigy contrib implementations #651
Conversation
Thanks @adefazio for the contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution!
Minor comment: if s and d could have more explicit names, that could be helpful for a newcomer to investigate the algorithm but numerous algorithms have non-readable parameters like b1 or beta so that's fine as is too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks perfect, thank you! Final request: could you squash your commits into one?
I'm squashed the commits, thanks for reviewing so quickly! |
Implementations of D-Adaptation AdamW and the related method Prodigy based on the official PyTorch implementations. I have verified that they give the same outputs as the PyTorch version on an example problem. Unit tests similar to those used on other optimizers in contrib.
https://github.com/facebookresearch/dadaptation
https://github.com/konstmish/prodigy
These two new optimizers perform learning rate adaptation, similar to Mechanic and COCOB, two optimizers already included in contrib, but by a different mechanism, and so I think these are relevant to Optax and interesting to the community. D-Adaptation won an ICML outstanding paper award and is already gaining a lot of traction in the ML community, particularly for fine-tuning diffusion models with the Prodigy variant.