ENH: optimize: Class based Optimizers #8414

andyfaff · 2018-02-14T03:16:28Z

This PR introduces an Optimize class for scipy.optimize. The Optimize class can be advanced like an iterator, or be advanced to convergence with the solve method.
The PR is intended to provide a cleaner interface to the minimisation process, with the user being able to interact with the solver during iteration. This allows the variation of solver hyper-parameters, or other user definable interactions. As such callbacks become redundant, as you can access all the optimizer information required.
Other historical parameters may become redundant in these objects, such as disp. This is also a chance for a ground-zero approach to how people interact with optimizers.

This PR is intended as a Request For Comment to gain feedback on architecture of the solvers. Given that it's a key part of infrastructure this may create a large amount of feedback, if I don't reply to it all bear in mind that I don't have unlimited time resource.
In it's current state I've implemented Optimizer, Function, LBFGS and NelderMead classes. LBFGS and NelderMead are being called by the corresponding functions already in existence. The current test suite passes, but no unit tests have been written for the functionality.`

These optimisers were straightforward to implement. However, I don't know how this approach would translate to other minimisers, or things like least_squares.

person142 · 2018-02-14T04:38:06Z

It's possible to push things even further into userland by not passing in a callable and instead just requesting that the user provide a function value(/gradient/Hessian) at every step. This might help keep the API uncluttered for something like reusing gradient computations in the Hessian. But maybe folks will feel that it's a step too far.

andyfaff · 2018-02-14T05:40:04Z

You can supply a Function class to Optimizer, which doesn't need to be supplied with any callables. But it does require that you override the func, jac, and hess methods. With that kind of design you should be able to achieve that.

newville · 2018-02-14T18:14:21Z

@andyfaff This seems like an interesting approach for some of the solvers in scipy.optimize. As it is a Request for Comments, and potentially a fairly large change to how one interacts with scipy.optimize, I would suggest a longer description of what you are aiming to do, including example user code, and a discussion of how the APIs would change and what implications there might be for existing code, new algorithms, new use cases, and so on.

Presenting this as a PR sort of builds in assumptions, as well as an investment of vision and work. As such, you also saying that you not have time to respond to all comments might be read as meaning that you do expect this vision to be universally shared and the questions to be discussed are about implementation, not overall vision.

I raise this concern partly because this Optimize class approach appears to me (happy to be proven wrong) to assume that solvers are iterating in Python. That is not the case for all the routines in scipy.optimize. Is the idea to change the compiled solvers too, or only those that "iterate in Python"? If so, that might imply a performance hit. If not, it might imply a bifurcation of the way the solvers are used, making it more difficult to change the solver for a particular use. To be clear, I don't know this will be the case, but I would request that you should probably describe this in detail before writing code.

I also raise this concern partly because my experience with the changes to scipy.optimize over the past few years has not always made my life as maintainer of lmfit easy. In particular, the least_squares routine that you mention has some very nice features as a solver and some very irritating features as an API. This makes me particularly aware of the importance of designing the API. Having a rational discussion of features and implications of potentially large changes such as this would be most appreciated.

andyfaff · 2018-02-14T18:29:30Z

"you also saying that you not have time to respond to all comments might be read as meaning that you do expect this vision to be universally shared and the questions to be discussed are about implementation, not overall vision."

To be clear - I fully expect everyone to have their own vision, as well as implementation. I wanted to hear both. However, I was expecting a lot of comments, and if I missed replying to any it was because I accidentally overlooked some because I didn't have enough time resource.

newville · 2018-02-14T20:16:32Z

@andyfaff

To be clear - I fully expect everyone to have their own vision, as well as implementation. I wanted to hear both. However, I was expecting a lot of comments, and if I missed replying to any it was because I accidentally overlooked some because I didn't have enough time resource.

OK. Again, it would be helpful for you to describe your vision in more detail, including what this approach allows that cannot be done now, example usage, and discussion of how it will impact existing code inside and outside of scipy. From a cursory look, the changes you are proposing here could impact many users and have real consequences for people not paying attention to PRs on the development branch.

With scipy now at version 1.0, one way to think about this might be that any API change requires extraordinary burden of proof. Saying "hey, I think optimize should be class-based with an iterator model" could mean a very significant change to the API that breaks things that work in scipy 1.0. I would strongly encourage you to do such design work and try to demonstrate community buy-in to any such changes before writing more code.

That's not to say I'm opposed to the proposed change, of course. Just trying to encourage caution about changes that might have unintended consequences.

stsievert · 2018-02-14T20:20:06Z

I'm glad to see this PR. I study optimization, and this enables the interface I want to see. I've briefly developed a similar interface (objopt) before moving to PyTorch (which is better suited for my domain, deep learning). Most other deep learning libraries have similar interfaces to the one proposed (i.e., torch.optim.Optimizer and torch.autograd.Function though there are some differences).

I would suggest a longer description of what you are aiming to do, including example user code, and a discussion of how the APIs would change and what implications there might be for existing code, new algorithms, new use cases, and so on.

This PR enables interacting with the Optimizer as an object. For example, things I often want to do are

compute stats after a step (or steps) to see how the optimization is progressing
perform changes to internal variables (i.e., on each iteration, coding the gradients or changing the step size)
access internal variables (i.e., gradient norm, function value, etc)
having simple things enabled by classes (i.e., having a partial solution when a keyboard interrupt is sent)

This PR makes this simple. If this PR is merged, I'd be inclined to develop a library of different machine learning losses and optimizers specific to my research.

Here's a simple example of this PR:

from scipy.optimize import Function, Optimizer

class L2Loss(Function):
    def __init__(self, A, y):
        self.A = A
        self.y = y

    def func(x):
        return LA.norm(self.A@x - self.y)**2

    def grad(x):
        return 2 * self.A.T @ (self.A@x - self.y)

class GradientDescent(Optimizer):
    def __init__(self, *args, step_size=1e-3, **kwargs):
        self.step_size = step_size   
        super().__init__(*arg, **kwargs)

    def __next__(self):
        self.x -= self.step_size*self.grad(x)

if __name__ == "__main__":
    n, d = 100, 10
    A = np.random.randn(n, d)
    x_star = np.random.randn(d)
    y = np.sign(A @ x_star)

    loss = L2Loss(A, y)
    opt = GradientDescent(loss)

    for k, _ in enumerate(opt):
        if k % 100 == 0:
            compute_stats(opt, loss)
    compute_final_stats(opt, loss)

Some details in this example might be missing or inaccurate.

We can support the old API, though this PR doesn't support it (and I'm working on it right now). We can rewrite minimize to make a new Function class from the passed in arguments, and return Optimizer.solve(). See andyfaff#5 (comment) for a prototype.

josef-pkt · 2018-02-14T20:33:50Z

Similar to @newville For statsmodels we essentially need good and fast optimizers that work out of the box and don't require user intervention. I don't want to interact with the optimizer in almost all cases, except for maybe an early stopping signal in the callback.

Besides backwards compatibility, my main worry is about the python overhead when we have many simple problems, or need nelder mead with many thousand iterations.

person142 · 2018-02-14T20:51:01Z

my main worry is about the python overhead when we have many simple problems, or need nelder mead with many thousand iterations.

Note that Nelder Mead is written in Python:

https://github.com/scipy/scipy/blob/master/scipy/optimize/optimize.py#L560

Even solvers written in low-level languages sometimes loop in Python because they are written in Fortran and there are no function pointers; e.g. SLSQP:

https://github.com/scipy/scipy/blob/master/scipy/optimize/slsqp.py#L373

Even if you are using C under the hood, to really run the full optimization in C code would require using a LowLevelCallable, which isn't supported by any optimizers as far as I know.

Not to say that this isn't an issue on its own, but I'd claim it's agnostic on the question of old interface versus the one proposed here.

andyfaff · 2018-02-14T20:57:19Z

@stsievert gave a small demo, but I'll write out a "scipy-pep"-ish which will outline in text what code inspection will fail to do.

It's vital that back compatibility is not lost.
If it's slow then it won't have good takeup.

stsievert · 2018-02-14T22:21:50Z

@andyfaff I'd like to help writing the sci-pep. I think we also need to consider

enhancements during moderate and expert use (simple use should not change)
what incorporating this into SciPy will enable
how it will work with the existing minimize code base

newville · 2018-02-14T23:10:33Z

@andyfaff @stsievert Thanks -- I definitely see potentials benefits of a class-based Optimizer. I'm just saying the user base for scipy.optimize is large, diverse, and that the API should move slowly, and possibly "very slowly". Having some class-based optimizers and some in C/Fortran might be OK, but it might add incompatibilities. Like, how easy will it be for a user to compare using leastsq and Nelder-Mead to solve the same problem?

Even if you are using C under the hood, to really run the full optimization in C code would require using a LowLevelCallable, which isn't supported for any optimizers as far as I know.

The minpack solvers were (hand) written in C long ago using the Python API (not LowLevelCallable, or Cython, or ctypes) to call the users Python objective function. Please do not change these to loop in Python.

ghost · 2018-02-16T22:05:44Z

I'm not necessarily anti-class, but I'm not sure that classes are an improvement here. As I noted on scipy-dev, I would like to see some consistency between solve_ivp (which I personally think is good) and minimize. I see that on @andyfaff's repository, there are these types of arguments:

* Current API needs improvement
          * minimize is trying to be a class

The problem is that all of these arguments can be applied to the solve_ivp interface. Both minimize and solve_ivp are calling a function that can potentially provide gradient information. Both potentially calculate the gradient themselves (and potentially could re-use computation by returning the results). There is especially no functional difference between the functions being called here (other than the fact that the ODE is vectorized).

ghost · 2018-02-16T22:08:26Z

Note that we also have gh-8431. It would in my opinion help if the differences between the Cython and Python interface were confined to typing rather than larger structural changes.

andyfaff · 2018-02-16T22:22:34Z

Yes, we're just fleshing out a case for consideration at the moment, a scipy PEP if you will. That'll gave a more complete picture. You can head on over to my fork to have a look. I think it's andyfaff/scipy/6 but bear in mind that we're still fleshing out topics we want to cover. We're going to a big effort to do this, because we realise it's a big development, and want to make sure the pros and cons are covered. BTW checkout the test_functions.py in benchmarks. They're already exist in class form, because function/grad/Hess naturally belong together.:-)

ghost · 2018-02-16T22:24:38Z

For sure. I'm not the only voice here; I'm just raising what I consider the potential issues to be.

stsievert · 2018-02-16T22:35:04Z

For sure. I'm not the only voice here; I'm just raising what I consider the potential issues to be.

Thanks for your comments, I wouldn't be surprised if they wind up in our proposal.

We're focusing on the minimize function because the interface has been an issue to us (or at least me).

andyfaff · 2018-02-16T22:45:41Z

To reply to Matt, no intention of breaking into stuff that loops in C, like minpack. Although if that were to be tried by somone, ( eg in cython) it would have to be shown by comprehensive testing and benchmarking that it was not a retrograde step. I fully expect all changes proposed here to be the same, comprehensive test and benchmark. The proposal will leave the outward behaviour of fmin the same, there would just be a NelderMead class. The changes required to use the class are minor, but you can still use fmin if you want. We'll cover example usages in the PEP. There will be overhead in having a class, but I think this will be outweighed by actual function calculation. I'm investigating benchmarking at the moment. I think there will be less maintenance overhead in the long run, and its easier to make comprehensive tests.

mikofski · 2018-02-24T20:07:12Z

I have similar concerns as @josef-pkt, maybe? I'm not opposed to class based optimizers, but they won't help my workflow, where I just need fast routine optimization methods for thousands, even millions of similar objectives.

Maybe the class based optimizers can be a parallel effort, or maybe they belong in a separate sub-package? Basically I see possibly two different needs for optimizers.

Interactive optimizers for machine learning where fine control of unsupervised learning is more important.
Fast routine optimizers for known objective functions.

Please accept my apologies if my assumptions are wrong or misguided, as I'm not an expert in mathematics, optimization, or statistics.

andyfaff · 2018-02-24T20:59:15Z

One of the biggest advantages is that the maintenance burden will become very much reduced. This is because once you fix things in a base class, all inheriting classes get fixed too. If we want to introduce a new feature, all inheriting classes get that too. Going to a class based framework does mean overhead, but it depends on the thing you're trying to optimize. If your objective function takes more than e.g. 10-20 us to evaluate then you won't see a difference. If your objective is super quick, then you might see a slight slowdown, of around 15%. For guidance the objective functions I deal with normally take around 500us to 5 ms for each evaluation, a few orders of magnitude larger. I've come across many others on the 1 s time scale and longer. Those objective functions totally outweigh small amounts of minimizer overhead. But I understand there will be problems at the other end of the timescale. It's instructive to consider something like LBFGS or BFGS. For those optimizers there is an estimation of the inverse Hessian at the end. The estimation of that matrix takes a significant amount of the entire optimization time (if your objective function is very quick). Those who want blazing fast speed probably don't want to be calculating that. However, speed isn't everything, and I point back to my first statement - smaller maintenance burden and simpler development. If anyone has problems they think would be good benchmarks for such changes, then please send them to me.

Probably better that the approach is not in public API for a release or two.

jonas-eschle · 2021-02-07T13:25:35Z

Hi all, are there any news on this? It seems quite good, especially to have more information available in the callback would indeed be gorgeous!
Many thanks for the work on this!

andyfaff · 2021-02-08T22:06:03Z

This PR is very much stalled. I think the best way forward with it would be to have a proof of concept operating in a separate package first, go through a few design cycles and learning from mistakes, before seeking to integrate into scipy.

That last item isn't guaranteed, which is why I haven't gone any further with it.

jonas-eschle · 2021-02-09T10:15:05Z

I see, it's a pity, the current API is near to unusable if used seriously; too many documentary problems and inconsistencies internal, an overhaul would be great. And an OOP based approach seems the way to go. What about scikit-optimize? Too far?

newville · 2021-02-09T13:38:03Z

@mayou36 Depending on your needs and what inconsistencies give you the most trouble, you might find lmfit (https://lmfit.github.io/lmfit-py/) interesting. It is kind of like a scikit for optimization and curve-fitting, wrapping many scipy methods. It does try to iron out some of the inconsistencies, though I'm sure it does not remove them all.

To be clear, lmfit does not approach the ideas here. It does not have an Optimizer class for which a subclass (or user?) is expected to overwrite methods to do the iteration, decide updated parameter values, determine stopping condition, etc.

I think we would not be opposed to trying that within lmfit, but I would also not promise immediate enthusiasm from current devels (including myself) to focus on this. We have delved a little bit into other algorithms but most of the work in lmfit has used scipy.optimize methods as starting points and has emphasized "downstream" and "meta" needs rather than tackle the mechanics of the fitting methods themselves. That said, I do sort of agree that the developments of the level in this PR are sort of scary for scipy.optimize, and moving such work to a scikit model might make more sense.

jonas-eschle · 2021-02-09T17:55:28Z

ah, lmfit, that's interesting to hear. Well, that's the exact same corner we're coming from: zfit. This is meant to be a model fitting library like lmfit but more extensive: support for numerical integration, sampling, composed parameters and high performance for large datasets on the scale for High Energy Physics. And all with a clean interface - at least that is all the goal.

We're currently expanding more seriously the minimizer classes and take a more complete look at SciPy and yep, stumbling upon these issues.

In zfit, we have also an OOP approach to the minimizer: instantiate it arbitrarily (whatever arguments, they can be minimizer specific) and use one specific method to minimize.

Interesting! But given that, it may be worth to exchange some experience

andyfaff · 2021-02-09T22:02:15Z

I see, it's a pity, the current API is near to unusable if used seriously; too many documentary problems and inconsistencies internal

PRs are always welcome to improve difficult points. I would disagree that minimize is not suitable for serious use, it just may not be suitable in the ways that people want to use it. However, there are undoubtedly pain points.

I had envisaged that any Solver classes would reuse the code within the existing minimizers that does each step. A lot of these are compiled codes, so the classes would have to use a lot of private scipy functions. I had a vision that any development would be towards the classes being eventually absorbed into scipy (not guaranteed), rather that being a standalone package.

A lot of the features I would expect to be in the class are in DifferentialEvolutionSolver, e.g. the solver is iterable, etc.

jonas-eschle · 2021-02-10T20:32:49Z

PRs are always welcome to improve difficult points. I would disagree that minimize is not suitable for serious use, it just may not be suitable in the ways that people want to use it. However, there are undoubtedly pain points.

Of course, in general it is a great collection. What I meant is the fact that with the new interface and the documentation, basically every feature has to be tested by hand and the docs have to be rewritten completely from scratch. With a dedicated function, at least it was clear what went into it.

I had envisaged that any Solver classes would reuse the code within the existing minimizers that does each step. A lot of these are compiled codes, so the classes would have to use a lot of private scipy functions. I had a vision that any development would be towards the classes being eventually absorbed into scipy (not guaranteed), rather that being a standalone package.

Yes, being absorbed is of course a good idea, but if not...

A lot of the features I would expect to be in the class are in DifferentialEvolutionSolver, e.g. the solver is iterable, etc.

I see, that's interesting. It's a stateful approach, we had quite a few discussions on this. I guess while it is nice to massage the function, it somehow mixes the configuration, minimization and the result of the fit into one class. But then it's also useful to have a state as the minimization is at a state.

What is the status of this then, what exactly would be required to move forward?

andyfaff mentioned this pull request Feb 14, 2018

ENH: optimize: trust-constr optimization algorithms [GSoC 2017] #8328

Merged

antonior92 requested review from jaimefrio and removed request for jaimefrio February 16, 2018 10:59

andyfaff force-pushed the OPTIMIZE branch 3 times, most recently from e191f77 to ad3c986 Compare February 21, 2018 06:11

This was referenced Feb 22, 2018

ENH: optimize, returning True from callback function halts minimization #4384

Closed

any way of stopping optimization? #7306

Closed

ENH: optimize: more complete callback signature #7425

Closed

andyfaff force-pushed the OPTIMIZE branch from 1b97279 to f7e86b3 Compare February 22, 2018 20:24

mikofski mentioned this pull request Feb 25, 2018

ENH: vectorize scalar zero-search-functions #8357

Merged

andyfaff mentioned this pull request Feb 26, 2018

MAINT: bench nelder-mead, slow func evaluations #8462

Merged

andyfaff added 6 commits March 2, 2018 11:37

WIP: Class based Optimizers

766db1c

MAINT: reverse adding OPTIMIZER to public interface

2604efe

Probably better that the approach is not in public API for a release or two.

ENH: callback raises StopIteration for early halt

7a56685

DOC: more implementation instructions

f0ee08e

TST: check that Optimizer callback given an OptimizeResult

fbcf02c

MAINT: cleaned up StopIteration in Optimizer.solve

1bbf5b1

andyfaff force-pushed the OPTIMIZE branch from f7e86b3 to 1bbf5b1 Compare March 2, 2018 00:41

andyfaff mentioned this pull request Mar 14, 2018

SciPEP: optimize: class based Optimizers #8552

Open

rgommers added enhancement A new feature or improvement scipy.optimize labels Mar 18, 2018

andyfaff mentioned this pull request Mar 26, 2018

MAINT: small cleanup of optimize.py. #8621

Merged

andyfaff mentioned this pull request May 22, 2018

Loging the DE optimisation f(x) in function of steps. #8857

Closed

andyfaff mentioned this pull request Jul 16, 2018

optimize: Incorrect statement about jac in the minimize docstring. #9042

Closed

This was referenced Mar 30, 2021

DOC: optimize: update Powell docs to reflect API #13753

Merged

optimize.minimize 'trust' methods and finite difference Hessian strategies potentially breaking due to NumPy broadcasting. #13754

Closed

Stefan-Endres mentioned this pull request May 9, 2021

[WIP] MAINT: Update optimize.shgo #14013

Closed

8 tasks

tupui mentioned this pull request Sep 9, 2021

DOC: update detailed roadmap #14700

Merged

andyfaff mentioned this pull request Nov 16, 2021

ENH: Return last population of scipy.optimize.differential_evolution #15046

Closed

Stefan-Endres mentioned this pull request May 29, 2022

MAINT: Update optimize.shgo #16313

Merged

5 tasks

mdhaber mentioned this pull request Jun 30, 2023

ENH: differential_evolution callback accepts intermediate_result #18742

Merged

lucascolley marked this pull request as draft March 14, 2024 21:53

lucascolley changed the title ~~WIP: Class based Optimizers~~ ENH: optimize: Class based Optimizers Mar 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: optimize: Class based Optimizers #8414

ENH: optimize: Class based Optimizers #8414

andyfaff commented Feb 14, 2018

person142 commented Feb 14, 2018

andyfaff commented Feb 14, 2018

newville commented Feb 14, 2018

andyfaff commented Feb 14, 2018

newville commented Feb 14, 2018

stsievert commented Feb 14, 2018 •

edited

Loading

josef-pkt commented Feb 14, 2018

person142 commented Feb 14, 2018 •

edited

Loading

andyfaff commented Feb 14, 2018

stsievert commented Feb 14, 2018

newville commented Feb 14, 2018

ghost commented Feb 16, 2018

ghost commented Feb 16, 2018

andyfaff commented Feb 16, 2018 via email

ghost commented Feb 16, 2018

stsievert commented Feb 16, 2018

andyfaff commented Feb 16, 2018 via email

mikofski commented Feb 24, 2018

andyfaff commented Feb 24, 2018 via email

jonas-eschle commented Feb 7, 2021

andyfaff commented Feb 8, 2021

jonas-eschle commented Feb 9, 2021

newville commented Feb 9, 2021

jonas-eschle commented Feb 9, 2021

andyfaff commented Feb 9, 2021

jonas-eschle commented Feb 10, 2021

ENH: optimize: Class based Optimizers #8414

Are you sure you want to change the base?

ENH: optimize: Class based Optimizers #8414

Conversation

andyfaff commented Feb 14, 2018

person142 commented Feb 14, 2018

andyfaff commented Feb 14, 2018

newville commented Feb 14, 2018

andyfaff commented Feb 14, 2018

newville commented Feb 14, 2018

stsievert commented Feb 14, 2018 • edited Loading

josef-pkt commented Feb 14, 2018

person142 commented Feb 14, 2018 • edited Loading

andyfaff commented Feb 14, 2018

stsievert commented Feb 14, 2018

newville commented Feb 14, 2018

ghost commented Feb 16, 2018

ghost commented Feb 16, 2018

andyfaff commented Feb 16, 2018 via email

ghost commented Feb 16, 2018

stsievert commented Feb 16, 2018

andyfaff commented Feb 16, 2018 via email

mikofski commented Feb 24, 2018

andyfaff commented Feb 24, 2018 via email

jonas-eschle commented Feb 7, 2021

andyfaff commented Feb 8, 2021

jonas-eschle commented Feb 9, 2021

newville commented Feb 9, 2021

jonas-eschle commented Feb 9, 2021

andyfaff commented Feb 9, 2021

jonas-eschle commented Feb 10, 2021

stsievert commented Feb 14, 2018 •

edited

Loading

person142 commented Feb 14, 2018 •

edited

Loading