Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: optimize: Class based Optimizers #8414

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from
Draft

Conversation

andyfaff
Copy link
Contributor

This PR introduces an Optimize class for scipy.optimize. The Optimize class can be advanced like an iterator, or be advanced to convergence with the solve method.
The PR is intended to provide a cleaner interface to the minimisation process, with the user being able to interact with the solver during iteration. This allows the variation of solver hyper-parameters, or other user definable interactions. As such callbacks become redundant, as you can access all the optimizer information required.
Other historical parameters may become redundant in these objects, such as disp. This is also a chance for a ground-zero approach to how people interact with optimizers.

This PR is intended as a Request For Comment to gain feedback on architecture of the solvers. Given that it's a key part of infrastructure this may create a large amount of feedback, if I don't reply to it all bear in mind that I don't have unlimited time resource.
In it's current state I've implemented Optimizer, Function, LBFGS and NelderMead classes. LBFGS and NelderMead are being called by the corresponding functions already in existence. The current test suite passes, but no unit tests have been written for the functionality.`

These optimisers were straightforward to implement. However, I don't know how this approach would translate to other minimisers, or things like least_squares.

@person142
Copy link
Member

It's possible to push things even further into userland by not passing in a callable and instead just requesting that the user provide a function value(/gradient/Hessian) at every step. This might help keep the API uncluttered for something like reusing gradient computations in the Hessian. But maybe folks will feel that it's a step too far.

@andyfaff
Copy link
Contributor Author

You can supply a Function class to Optimizer, which doesn't need to be supplied with any callables. But it does require that you override the func, jac, and hess methods. With that kind of design you should be able to achieve that.

@newville
Copy link
Contributor

@andyfaff This seems like an interesting approach for some of the solvers in scipy.optimize. As it is a Request for Comments, and potentially a fairly large change to how one interacts with scipy.optimize, I would suggest a longer description of what you are aiming to do, including example user code, and a discussion of how the APIs would change and what implications there might be for existing code, new algorithms, new use cases, and so on.

Presenting this as a PR sort of builds in assumptions, as well as an investment of vision and work. As such, you also saying that you not have time to respond to all comments might be read as meaning that you do expect this vision to be universally shared and the questions to be discussed are about implementation, not overall vision.

I raise this concern partly because this Optimize class approach appears to me (happy to be proven wrong) to assume that solvers are iterating in Python. That is not the case for all the routines in scipy.optimize. Is the idea to change the compiled solvers too, or only those that "iterate in Python"? If so, that might imply a performance hit. If not, it might imply a bifurcation of the way the solvers are used, making it more difficult to change the solver for a particular use. To be clear, I don't know this will be the case, but I would request that you should probably describe this in detail before writing code.

I also raise this concern partly because my experience with the changes to scipy.optimize over the past few years has not always made my life as maintainer of lmfit easy. In particular, the least_squares routine that you mention has some very nice features as a solver and some very irritating features as an API. This makes me particularly aware of the importance of designing the API. Having a rational discussion of features and implications of potentially large changes such as this would be most appreciated.

@andyfaff
Copy link
Contributor Author

"you also saying that you not have time to respond to all comments might be read as meaning that you do expect this vision to be universally shared and the questions to be discussed are about implementation, not overall vision."

To be clear - I fully expect everyone to have their own vision, as well as implementation. I wanted to hear both. However, I was expecting a lot of comments, and if I missed replying to any it was because I accidentally overlooked some because I didn't have enough time resource.

@newville
Copy link
Contributor

@andyfaff

To be clear - I fully expect everyone to have their own vision, as well as implementation. I wanted to hear both. However, I was expecting a lot of comments, and if I missed replying to any it was because I accidentally overlooked some because I didn't have enough time resource.

OK. Again, it would be helpful for you to describe your vision in more detail, including what this approach allows that cannot be done now, example usage, and discussion of how it will impact existing code inside and outside of scipy. From a cursory look, the changes you are proposing here could impact many users and have real consequences for people not paying attention to PRs on the development branch.

With scipy now at version 1.0, one way to think about this might be that any API change requires extraordinary burden of proof. Saying "hey, I think optimize should be class-based with an iterator model" could mean a very significant change to the API that breaks things that work in scipy 1.0. I would strongly encourage you to do such design work and try to demonstrate community buy-in to any such changes before writing more code.

That's not to say I'm opposed to the proposed change, of course. Just trying to encourage caution about changes that might have unintended consequences.

@stsievert
Copy link
Contributor

stsievert commented Feb 14, 2018

I'm glad to see this PR. I study optimization, and this enables the interface I want to see. I've briefly developed a similar interface (objopt) before moving to PyTorch (which is better suited for my domain, deep learning). Most other deep learning libraries have similar interfaces to the one proposed (i.e., torch.optim.Optimizer and torch.autograd.Function though there are some differences).

I would suggest a longer description of what you are aiming to do, including example user code, and a discussion of how the APIs would change and what implications there might be for existing code, new algorithms, new use cases, and so on.

This PR enables interacting with the Optimizer as an object. For example, things I often want to do are

  • compute stats after a step (or steps) to see how the optimization is progressing
  • perform changes to internal variables (i.e., on each iteration, coding the gradients or changing the step size)
  • access internal variables (i.e., gradient norm, function value, etc)
  • having simple things enabled by classes (i.e., having a partial solution when a keyboard interrupt is sent)

This PR makes this simple. If this PR is merged, I'd be inclined to develop a library of different machine learning losses and optimizers specific to my research.

Here's a simple example of this PR:

from scipy.optimize import Function, Optimizer

class L2Loss(Function):
    def __init__(self, A, y):
        self.A = A
        self.y = y

    def func(x):
        return LA.norm(self.A@x - self.y)**2

    def grad(x):
        return 2 * self.A.T @ (self.A@x - self.y)

class GradientDescent(Optimizer):
    def __init__(self, *args, step_size=1e-3, **kwargs):
        self.step_size = step_size   
        super().__init__(*arg, **kwargs)

    def __next__(self):
        self.x -= self.step_size*self.grad(x)

if __name__ == "__main__":
    n, d = 100, 10
    A = np.random.randn(n, d)
    x_star = np.random.randn(d)
    y = np.sign(A @ x_star)

    loss = L2Loss(A, y)
    opt = GradientDescent(loss)

    for k, _ in enumerate(opt):
        if k % 100 == 0:
            compute_stats(opt, loss)
    compute_final_stats(opt, loss)

Some details in this example might be missing or inaccurate.

We can support the old API, though this PR doesn't support it (and I'm working on it right now). We can rewrite minimize to make a new Function class from the passed in arguments, and return Optimizer.solve(). See andyfaff#5 (comment) for a prototype.

@josef-pkt
Copy link
Member

Similar to @newville For statsmodels we essentially need good and fast optimizers that work out of the box and don't require user intervention. I don't want to interact with the optimizer in almost all cases, except for maybe an early stopping signal in the callback.

Besides backwards compatibility, my main worry is about the python overhead when we have many simple problems, or need nelder mead with many thousand iterations.

@person142
Copy link
Member

person142 commented Feb 14, 2018

my main worry is about the python overhead when we have many simple problems, or need nelder mead with many thousand iterations.

Note that Nelder Mead is written in Python:

https://github.com/scipy/scipy/blob/master/scipy/optimize/optimize.py#L560

Even solvers written in low-level languages sometimes loop in Python because they are written in Fortran and there are no function pointers; e.g. SLSQP:

https://github.com/scipy/scipy/blob/master/scipy/optimize/slsqp.py#L373

Even if you are using C under the hood, to really run the full optimization in C code would require using a LowLevelCallable, which isn't supported by any optimizers as far as I know.

Not to say that this isn't an issue on its own, but I'd claim it's agnostic on the question of old interface versus the one proposed here.

@andyfaff
Copy link
Contributor Author

@stsievert gave a small demo, but I'll write out a "scipy-pep"-ish which will outline in text what code inspection will fail to do.

  • It's vital that back compatibility is not lost.
  • If it's slow then it won't have good takeup.

@stsievert
Copy link
Contributor

@andyfaff I'd like to help writing the sci-pep. I think we also need to consider

  • enhancements during moderate and expert use (simple use should not change)
  • what incorporating this into SciPy will enable
  • how it will work with the existing minimize code base

@newville
Copy link
Contributor

@andyfaff @stsievert Thanks -- I definitely see potentials benefits of a class-based Optimizer. I'm just saying the user base for scipy.optimize is large, diverse, and that the API should move slowly, and possibly "very slowly". Having some class-based optimizers and some in C/Fortran might be OK, but it might add incompatibilities. Like, how easy will it be for a user to compare using leastsq and Nelder-Mead to solve the same problem?

Even if you are using C under the hood, to really run the full optimization in C code would require using a LowLevelCallable, which isn't supported for any optimizers as far as I know.

The minpack solvers were (hand) written in C long ago using the Python API (not LowLevelCallable, or Cython, or ctypes) to call the users Python objective function. Please do not change these to loop in Python.

@antonior92 antonior92 requested review from jaimefrio and removed request for jaimefrio February 16, 2018 10:59
@ghost
Copy link

ghost commented Feb 16, 2018

I'm not necessarily anti-class, but I'm not sure that classes are an improvement here. As I noted on scipy-dev, I would like to see some consistency between solve_ivp (which I personally think is good) and minimize. I see that on @andyfaff's repository, there are these types of arguments:

* Current API needs improvement
          * minimize is trying to be a class

The problem is that all of these arguments can be applied to the solve_ivp interface. Both minimize and solve_ivp are calling a function that can potentially provide gradient information. Both potentially calculate the gradient themselves (and potentially could re-use computation by returning the results). There is especially no functional difference between the functions being called here (other than the fact that the ODE is vectorized).

@ghost
Copy link

ghost commented Feb 16, 2018

Note that we also have gh-8431. It would in my opinion help if the differences between the Cython and Python interface were confined to typing rather than larger structural changes.

@andyfaff
Copy link
Contributor Author

andyfaff commented Feb 16, 2018 via email

@ghost
Copy link

ghost commented Feb 16, 2018

For sure. I'm not the only voice here; I'm just raising what I consider the potential issues to be.

@stsievert
Copy link
Contributor

For sure. I'm not the only voice here; I'm just raising what I consider the potential issues to be.

Thanks for your comments, I wouldn't be surprised if they wind up in our proposal.

We're focusing on the minimize function because the interface has been an issue to us (or at least me).

@andyfaff
Copy link
Contributor Author

andyfaff commented Feb 16, 2018 via email

@mikofski
Copy link
Contributor

I have similar concerns as @josef-pkt, maybe? I'm not opposed to class based optimizers, but they won't help my workflow, where I just need fast routine optimization methods for thousands, even millions of similar objectives.

Maybe the class based optimizers can be a parallel effort, or maybe they belong in a separate sub-package? Basically I see possibly two different needs for optimizers.

  1. Interactive optimizers for machine learning where fine control of unsupervised learning is more important.

  2. Fast routine optimizers for known objective functions.

Please accept my apologies if my assumptions are wrong or misguided, as I'm not an expert in mathematics, optimization, or statistics.

@andyfaff
Copy link
Contributor Author

andyfaff commented Feb 24, 2018 via email

@jonas-eschle
Copy link
Contributor

Hi all, are there any news on this? It seems quite good, especially to have more information available in the callback would indeed be gorgeous!
Many thanks for the work on this!

@andyfaff
Copy link
Contributor Author

andyfaff commented Feb 8, 2021

This PR is very much stalled. I think the best way forward with it would be to have a proof of concept operating in a separate package first, go through a few design cycles and learning from mistakes, before seeking to integrate into scipy.

That last item isn't guaranteed, which is why I haven't gone any further with it.

@jonas-eschle
Copy link
Contributor

I see, it's a pity, the current API is near to unusable if used seriously; too many documentary problems and inconsistencies internal, an overhaul would be great. And an OOP based approach seems the way to go. What about scikit-optimize? Too far?

@newville
Copy link
Contributor

newville commented Feb 9, 2021

@mayou36 Depending on your needs and what inconsistencies give you the most trouble, you might find lmfit (https://lmfit.github.io/lmfit-py/) interesting. It is kind of like a scikit for optimization and curve-fitting, wrapping many scipy methods. It does try to iron out some of the inconsistencies, though I'm sure it does not remove them all.

To be clear, lmfit does not approach the ideas here. It does not have an Optimizer class for which a subclass (or user?) is expected to overwrite methods to do the iteration, decide updated parameter values, determine stopping condition, etc.

I think we would not be opposed to trying that within lmfit, but I would also not promise immediate enthusiasm from current devels (including myself) to focus on this. We have delved a little bit into other algorithms but most of the work in lmfit has used scipy.optimize methods as starting points and has emphasized "downstream" and "meta" needs rather than tackle the mechanics of the fitting methods themselves. That said, I do sort of agree that the developments of the level in this PR are sort of scary for scipy.optimize, and moving such work to a scikit model might make more sense.

@jonas-eschle
Copy link
Contributor

ah, lmfit, that's interesting to hear. Well, that's the exact same corner we're coming from: zfit. This is meant to be a model fitting library like lmfit but more extensive: support for numerical integration, sampling, composed parameters and high performance for large datasets on the scale for High Energy Physics. And all with a clean interface - at least that is all the goal.

We're currently expanding more seriously the minimizer classes and take a more complete look at SciPy and yep, stumbling upon these issues.

In zfit, we have also an OOP approach to the minimizer: instantiate it arbitrarily (whatever arguments, they can be minimizer specific) and use one specific method to minimize.

Interesting! But given that, it may be worth to exchange some experience

@andyfaff
Copy link
Contributor Author

andyfaff commented Feb 9, 2021

I see, it's a pity, the current API is near to unusable if used seriously; too many documentary problems and inconsistencies internal

PRs are always welcome to improve difficult points. I would disagree that minimize is not suitable for serious use, it just may not be suitable in the ways that people want to use it. However, there are undoubtedly pain points.

I had envisaged that any Solver classes would reuse the code within the existing minimizers that does each step. A lot of these are compiled codes, so the classes would have to use a lot of private scipy functions. I had a vision that any development would be towards the classes being eventually absorbed into scipy (not guaranteed), rather that being a standalone package.

A lot of the features I would expect to be in the class are in DifferentialEvolutionSolver, e.g. the solver is iterable, etc.

@jonas-eschle
Copy link
Contributor

PRs are always welcome to improve difficult points. I would disagree that minimize is not suitable for serious use, it just may not be suitable in the ways that people want to use it. However, there are undoubtedly pain points.

Of course, in general it is a great collection. What I meant is the fact that with the new interface and the documentation, basically every feature has to be tested by hand and the docs have to be rewritten completely from scratch. With a dedicated function, at least it was clear what went into it.

I had envisaged that any Solver classes would reuse the code within the existing minimizers that does each step. A lot of these are compiled codes, so the classes would have to use a lot of private scipy functions. I had a vision that any development would be towards the classes being eventually absorbed into scipy (not guaranteed), rather that being a standalone package.

Yes, being absorbed is of course a good idea, but if not...

A lot of the features I would expect to be in the class are in DifferentialEvolutionSolver, e.g. the solver is iterable, etc.

I see, that's interesting. It's a stateful approach, we had quite a few discussions on this. I guess while it is nice to massage the function, it somehow mixes the configuration, minimization and the result of the fit into one class. But then it's also useful to have a state as the minimization is at a state.

What is the status of this then, what exactly would be required to move forward?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.optimize
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants