Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional optimizers of differentiable functions #68

Closed
lukasmack0 opened this issue Jan 10, 2019 · 14 comments
Closed

Additional optimizers of differentiable functions #68

lukasmack0 opened this issue Jan 10, 2019 · 14 comments

Comments

@lukasmack0
Copy link
Contributor

Are there any optimizers that anyone would like to see implemented? In particular optimizers for differentiable functions.
@zoq any ideas or requests? @rcurtin has suggested Nesterov's Accelerated Gradient Descent method as a potential.

@zoq
Copy link
Member

zoq commented Jan 10, 2019

Nesterov's Accelerated Gradient Descent sounds like a good idea to me, ND-Adam (Normalized Direction-preserving Adam) might be an alternative.

@lukasmack0
Copy link
Contributor Author

To clarify, for Nesterov's Accelerated Gradient Descent are you referring to the algorithm described here https://blogs.princeton.edu/imabandit/2013/04/01/acceleratedgradientdescent/, as opposed to the algorithm described here http://ruder.io/optimizing-gradient-descent/index.html#nesterovacceleratedgradient which looks like it's already implemented in sgd/update_policies/nesterov_momentum_update.hpp?

@zoq
Copy link
Member

zoq commented Jan 14, 2019

I was thinking about the http://ruder.io/optimizing-gradient-descent/index.html#nesterovacceleratedgradient which as you pointed out is already implemented.

@rcurtin
Copy link
Member

rcurtin commented Jan 14, 2019

Ah, I suppose that we do have http://ensmallen.org/docs.html#nesterov-momentum-sgd , but that doesn't work for functions that are just differentiable---it only works on differentiable separable functions. We could add a NesterovGradientDescent method that would work for differentiable functions, or, we can do the harder thing, which is to use the template metaprogramming infrastructure in ensmallen_bits/include/function/ to create an Evaluate(const arma::mat& coordinates) function when we have an Evaluate(const arma::mat& coordinates, const size_t i, const size_t batchSize) function. (And similarly for Gradient().) That would be tricky if you are not comfortable with template metaprogramming though. :)

Otherwise I might also suggest ND-Adam, or maybe https://arxiv.org/pdf/1711.05101.pdf] or if you want a distributed challenge (I guess you could do it with OpenMP) there is also https://papers.nips.cc/paper/5761-deep-learning-with-elastic-averaging-sgd.pdf, for instance. All of those are differentiable separable optimizers, though---maybe you can find some other type of optimizer to implement, if you like?

@rcurtin
Copy link
Member

rcurtin commented Jan 19, 2019

@lukasmack0 I was porting mlpack issues to ensmallen today and opened #73, which I think might be interesting to you. 👍

@originalsouth
Copy link

Are there any optimizers that anyone would like to see implemented? In particular optimizers for differentiable functions.

How about Non-linear Conjugate Gradient, BFGS, and perhaps even Newton--Raphson (if the Hessian is available).

@zoq
Copy link
Member

zoq commented Jan 31, 2019

Absolutely, there is L-BFGS, so personally I don't see BFGS as a high priority, but I agree each one would be a nice addition.

@niteya-shah
Copy link
Contributor

@rcurtin can I do AdamWR / SGDWR ? I read the paper and I think I can do it and I didnt see its implementation in the library.

@rcurtin
Copy link
Member

rcurtin commented Feb 6, 2019

@niteya-shah: sure, I think it could be nice to add those also.

@mlpack-bot
Copy link

mlpack-bot bot commented Apr 9, 2019

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍

@mlpack-bot mlpack-bot bot added the s: stale label Apr 9, 2019
@mlpack-bot mlpack-bot bot closed this as completed Apr 16, 2019
@originalsouth
Copy link

Sad :(

@rcurtin
Copy link
Member

rcurtin commented Apr 17, 2019

We can reopen it if you like. It might be better to open issues written for first-time contributors for BFGS, nonlinear CG, and N-R (although we don't have any abstractions for second-order differentiable functions at the moment). "Written for first-time contributors" basically just means that they have enough detail to get started even if they are not familiar with the internals of ensmallen. If you'd like to do that I could mark those keep-open. This repository doesn't currently have problems with way too many issues open, so it's not a problem for me to add a few feature requests there. (Although I reserve the right to close the issue a year or two from now if nobody jumps on it.)

@lukasmack0
Copy link
Contributor Author

I'd be interested in having it re-opened. I should have a lot more time to look at it in the next month or so.

@zoq
Copy link
Member

zoq commented Apr 17, 2019

Personally, I don't see the need to open an issue that asks for new optimisers in a general sense. Contributions in that or another direction are always welcome. Also, if somebody likes to see a method implemented, opening a new issue is just fine and also much appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants