-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AdaBelief - Adapting Stepsizes by the Belief in Observed Gradients #233
Conversation
This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍 |
@@ -1,3 +1,61 @@ | |||
## AdaBelief | |||
|
|||
*An optimizer for [differentiable separable functions](#differentiable-separable-functions).* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this anchor works as expected. Or is this eventually built in a way that works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does; it's built into a larger page: http://ensmallen.org/docs.html#differentiable-separable-functions :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised the reviewers didn't have a problem with this sentence:
Therefore, AdaBelief considers curvature information and performs better than Adam.
Anyway, seems like a nice technique to add regardless. :) Sorry it took so long to review this!
@@ -1,3 +1,61 @@ | |||
## AdaBelief | |||
|
|||
*An optimizer for [differentiable separable functions](#differentiable-separable-functions).* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does; it's built into a larger page: http://ensmallen.org/docs.html#differentiable-separable-functions :)
@mlpack-jenkins test this please |
Co-authored-by: Ryan Curtin <ryan@ratml.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Sorry for the slow review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second approval provided automatically after 24 hours. 👍
Let's not merge this yet, I worked locally on adapting the update step to be more closely to what the paper says. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything looks good to me here too---maybe we should merge this and the others (once you're ready) and then release a new minor version of ensmallen?
Sounds good, let me resolve the merge conflict and merge it after the build comes back green. |
Implementation of AdaBelief - "AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients
", J. Zhuang et al.