-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Additional optimizers of differentiable functions #68
Comments
Nesterov's Accelerated Gradient Descent sounds like a good idea to me, ND-Adam (Normalized Direction-preserving Adam) might be an alternative. |
To clarify, for Nesterov's Accelerated Gradient Descent are you referring to the algorithm described here https://blogs.princeton.edu/imabandit/2013/04/01/acceleratedgradientdescent/, as opposed to the algorithm described here http://ruder.io/optimizing-gradient-descent/index.html#nesterovacceleratedgradient which looks like it's already implemented in sgd/update_policies/nesterov_momentum_update.hpp? |
I was thinking about the http://ruder.io/optimizing-gradient-descent/index.html#nesterovacceleratedgradient which as you pointed out is already implemented. |
Ah, I suppose that we do have http://ensmallen.org/docs.html#nesterov-momentum-sgd , but that doesn't work for functions that are just differentiable---it only works on differentiable separable functions. We could add a Otherwise I might also suggest ND-Adam, or maybe https://arxiv.org/pdf/1711.05101.pdf] or if you want a distributed challenge (I guess you could do it with OpenMP) there is also https://papers.nips.cc/paper/5761-deep-learning-with-elastic-averaging-sgd.pdf, for instance. All of those are differentiable separable optimizers, though---maybe you can find some other type of optimizer to implement, if you like? |
@lukasmack0 I was porting mlpack issues to ensmallen today and opened #73, which I think might be interesting to you. 👍 |
How about Non-linear Conjugate Gradient, BFGS, and perhaps even Newton--Raphson (if the Hessian is available). |
Absolutely, there is L-BFGS, so personally I don't see BFGS as a high priority, but I agree each one would be a nice addition. |
@rcurtin can I do AdamWR / SGDWR ? I read the paper and I think I can do it and I didnt see its implementation in the library. |
@niteya-shah: sure, I think it could be nice to add those also. |
This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍 |
Sad :( |
We can reopen it if you like. It might be better to open issues written for first-time contributors for BFGS, nonlinear CG, and N-R (although we don't have any abstractions for second-order differentiable functions at the moment). "Written for first-time contributors" basically just means that they have enough detail to get started even if they are not familiar with the internals of ensmallen. If you'd like to do that I could mark those |
I'd be interested in having it re-opened. I should have a lot more time to look at it in the next month or so. |
Personally, I don't see the need to open an issue that asks for new optimisers in a general sense. Contributions in that or another direction are always welcome. Also, if somebody likes to see a method implemented, opening a new issue is just fine and also much appreciated. |
Are there any optimizers that anyone would like to see implemented? In particular optimizers for differentiable functions.
@zoq any ideas or requests? @rcurtin has suggested Nesterov's Accelerated Gradient Descent method as a potential.
The text was updated successfully, but these errors were encountered: