-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: optimize: Class based Optimizers #8414
base: main
Are you sure you want to change the base?
Conversation
It's possible to push things even further into userland by not passing in a callable and instead just requesting that the user provide a function value(/gradient/Hessian) at every step. This might help keep the API uncluttered for something like reusing gradient computations in the Hessian. But maybe folks will feel that it's a step too far. |
You can supply a Function class to Optimizer, which doesn't need to be supplied with any callables. But it does require that you override the func, jac, and hess methods. With that kind of design you should be able to achieve that. |
@andyfaff This seems like an interesting approach for some of the solvers in scipy.optimize. As it is a Request for Comments, and potentially a fairly large change to how one interacts with Presenting this as a PR sort of builds in assumptions, as well as an investment of vision and work. As such, you also saying that you not have time to respond to all comments might be read as meaning that you do expect this vision to be universally shared and the questions to be discussed are about implementation, not overall vision. I raise this concern partly because this Optimize class approach appears to me (happy to be proven wrong) to assume that solvers are iterating in Python. That is not the case for all the routines in I also raise this concern partly because my experience with the changes to |
To be clear - I fully expect everyone to have their own vision, as well as implementation. I wanted to hear both. However, I was expecting a lot of comments, and if I missed replying to any it was because I accidentally overlooked some because I didn't have enough time resource. |
OK. Again, it would be helpful for you to describe your vision in more detail, including what this approach allows that cannot be done now, example usage, and discussion of how it will impact existing code inside and outside of scipy. From a cursory look, the changes you are proposing here could impact many users and have real consequences for people not paying attention to PRs on the development branch. With scipy now at version 1.0, one way to think about this might be that any API change requires extraordinary burden of proof. Saying "hey, I think That's not to say I'm opposed to the proposed change, of course. Just trying to encourage caution about changes that might have unintended consequences. |
I'm glad to see this PR. I study optimization, and this enables the interface I want to see. I've briefly developed a similar interface (objopt) before moving to PyTorch (which is better suited for my domain, deep learning). Most other deep learning libraries have similar interfaces to the one proposed (i.e., torch.optim.Optimizer and torch.autograd.Function though there are some differences).
This PR enables interacting with the
This PR makes this simple. If this PR is merged, I'd be inclined to develop a library of different machine learning losses and optimizers specific to my research. Here's a simple example of this PR: from scipy.optimize import Function, Optimizer
class L2Loss(Function):
def __init__(self, A, y):
self.A = A
self.y = y
def func(x):
return LA.norm(self.A@x - self.y)**2
def grad(x):
return 2 * self.A.T @ (self.A@x - self.y)
class GradientDescent(Optimizer):
def __init__(self, *args, step_size=1e-3, **kwargs):
self.step_size = step_size
super().__init__(*arg, **kwargs)
def __next__(self):
self.x -= self.step_size*self.grad(x)
if __name__ == "__main__":
n, d = 100, 10
A = np.random.randn(n, d)
x_star = np.random.randn(d)
y = np.sign(A @ x_star)
loss = L2Loss(A, y)
opt = GradientDescent(loss)
for k, _ in enumerate(opt):
if k % 100 == 0:
compute_stats(opt, loss)
compute_final_stats(opt, loss) Some details in this example might be missing or inaccurate. We can support the old API, though this PR doesn't support it (and I'm working on it right now). We can rewrite |
Similar to @newville For statsmodels we essentially need good and fast optimizers that work out of the box and don't require user intervention. I don't want to interact with the optimizer in almost all cases, except for maybe an early stopping signal in the callback. Besides backwards compatibility, my main worry is about the python overhead when we have many simple problems, or need nelder mead with many thousand iterations. |
Note that Nelder Mead is written in Python: https://github.com/scipy/scipy/blob/master/scipy/optimize/optimize.py#L560 Even solvers written in low-level languages sometimes loop in Python because they are written in Fortran and there are no function pointers; e.g. SLSQP: https://github.com/scipy/scipy/blob/master/scipy/optimize/slsqp.py#L373 Even if you are using C under the hood, to really run the full optimization in C code would require using a Not to say that this isn't an issue on its own, but I'd claim it's agnostic on the question of old interface versus the one proposed here. |
@stsievert gave a small demo, but I'll write out a "scipy-pep"-ish which will outline in text what code inspection will fail to do.
|
@andyfaff I'd like to help writing the sci-pep. I think we also need to consider
|
@andyfaff @stsievert Thanks -- I definitely see potentials benefits of a class-based Optimizer. I'm just saying the user base for
The minpack solvers were (hand) written in C long ago using the Python API (not LowLevelCallable, or Cython, or ctypes) to call the users Python objective function. Please do not change these to loop in Python. |
I'm not necessarily anti-class, but I'm not sure that classes are an improvement here. As I noted on scipy-dev, I would like to see some consistency between
The problem is that all of these arguments can be applied to the |
Note that we also have gh-8431. It would in my opinion help if the differences between the Cython and Python interface were confined to typing rather than larger structural changes. |
Yes, we're just fleshing out a case for consideration at the moment, a
scipy PEP if you will. That'll gave a more complete picture. You can head
on over to my fork to have a look. I think it's andyfaff/scipy/6 but bear
in mind that we're still fleshing out topics we want to cover. We're going
to a big effort to do this, because we realise it's a big development, and
want to make sure the pros and cons are covered.
BTW checkout the test_functions.py in benchmarks. They're already exist in
class form, because function/grad/Hess naturally belong together.:-)
|
For sure. I'm not the only voice here; I'm just raising what I consider the potential issues to be. |
Thanks for your comments, I wouldn't be surprised if they wind up in our proposal. We're focusing on the |
To reply to Matt, no intention of breaking into stuff that loops in C, like
minpack.
Although if that were to be tried by somone, ( eg in cython) it would have
to be shown by comprehensive testing and benchmarking that it was not a
retrograde step.
I fully expect all changes proposed here to be the same, comprehensive test
and benchmark. The proposal will leave the outward behaviour of fmin the
same, there would just be a NelderMead class. The changes required to use
the class are minor, but you can still use fmin if you want. We'll cover
example usages in the PEP.
There will be overhead in having a class, but I think this will be
outweighed by actual function calculation. I'm investigating benchmarking
at the moment.
I think there will be less maintenance overhead in the long run, and its
easier to make comprehensive tests.
|
e191f77
to
ad3c986
Compare
I have similar concerns as @josef-pkt, maybe? I'm not opposed to class based optimizers, but they won't help my workflow, where I just need fast routine optimization methods for thousands, even millions of similar objectives. Maybe the class based optimizers can be a parallel effort, or maybe they belong in a separate sub-package? Basically I see possibly two different needs for optimizers.
Please accept my apologies if my assumptions are wrong or misguided, as I'm not an expert in mathematics, optimization, or statistics. |
One of the biggest advantages is that the maintenance burden will become
very much reduced. This is because once you fix things in a base class, all
inheriting classes get fixed too. If we want to introduce a new feature,
all inheriting classes get that too.
Going to a class based framework does mean overhead, but it depends on the
thing you're trying to optimize. If your objective function takes more than
e.g. 10-20 us to evaluate then you won't see a difference. If your
objective is super quick, then you might see a slight slowdown, of around
15%.
For guidance the objective functions I deal with normally take around 500us
to 5 ms for each evaluation, a few orders of magnitude larger. I've come
across many others on the 1 s time scale and longer. Those objective
functions totally outweigh small amounts of minimizer overhead. But I
understand there will be problems at the other end of the timescale. It's
instructive to consider something like LBFGS or BFGS. For those optimizers
there is an estimation of the inverse Hessian at the end. The estimation of
that matrix takes a significant amount of the entire optimization time (if
your objective function is very quick). Those who want blazing fast speed
probably don't want to be calculating that. However, speed isn't
everything, and I point back to my first statement - smaller maintenance
burden and simpler development. If anyone has problems they think would be
good benchmarks for such changes, then please send them to me.
|
Probably better that the approach is not in public API for a release or two.
Hi all, are there any news on this? It seems quite good, especially to have more information available in the callback would indeed be gorgeous! |
This PR is very much stalled. I think the best way forward with it would be to have a proof of concept operating in a separate package first, go through a few design cycles and learning from mistakes, before seeking to integrate into scipy. That last item isn't guaranteed, which is why I haven't gone any further with it. |
I see, it's a pity, the current API is near to unusable if used seriously; too many documentary problems and inconsistencies internal, an overhaul would be great. And an OOP based approach seems the way to go. What about scikit-optimize? Too far? |
@mayou36 Depending on your needs and what inconsistencies give you the most trouble, you might find lmfit (https://lmfit.github.io/lmfit-py/) interesting. It is kind of like a scikit for optimization and curve-fitting, wrapping many scipy methods. It does try to iron out some of the inconsistencies, though I'm sure it does not remove them all. To be clear, lmfit does not approach the ideas here. It does not have an Optimizer class for which a subclass (or user?) is expected to overwrite methods to do the iteration, decide updated parameter values, determine stopping condition, etc. I think we would not be opposed to trying that within lmfit, but I would also not promise immediate enthusiasm from current devels (including myself) to focus on this. We have delved a little bit into other algorithms but most of the work in lmfit has used |
ah, lmfit, that's interesting to hear. Well, that's the exact same corner we're coming from: zfit. This is meant to be a model fitting library like lmfit but more extensive: support for numerical integration, sampling, composed parameters and high performance for large datasets on the scale for High Energy Physics. And all with a clean interface - at least that is all the goal. We're currently expanding more seriously the minimizer classes and take a more complete look at SciPy and yep, stumbling upon these issues. In zfit, we have also an OOP approach to the minimizer: instantiate it arbitrarily (whatever arguments, they can be minimizer specific) and use one specific method to minimize. Interesting! But given that, it may be worth to exchange some experience |
PRs are always welcome to improve difficult points. I would disagree that I had envisaged that any Solver classes would reuse the code within the existing minimizers that does each step. A lot of these are compiled codes, so the classes would have to use a lot of private scipy functions. I had a vision that any development would be towards the classes being eventually absorbed into scipy (not guaranteed), rather that being a standalone package. A lot of the features I would expect to be in the class are in |
Of course, in general it is a great collection. What I meant is the fact that with the new interface and the documentation, basically every feature has to be tested by hand and the docs have to be rewritten completely from scratch. With a dedicated function, at least it was clear what went into it.
Yes, being absorbed is of course a good idea, but if not...
I see, that's interesting. It's a stateful approach, we had quite a few discussions on this. I guess while it is nice to massage the function, it somehow mixes the configuration, minimization and the result of the fit into one class. But then it's also useful to have a state as the minimization is at a state. What is the status of this then, what exactly would be required to move forward? |
This PR introduces an
Optimize
class forscipy.optimize
. TheOptimize
class can be advanced like an iterator, or be advanced to convergence with thesolve
method.The PR is intended to provide a cleaner interface to the minimisation process, with the user being able to interact with the solver during iteration. This allows the variation of solver hyper-parameters, or other user definable interactions. As such
callback
s become redundant, as you can access all the optimizer information required.Other historical parameters may become redundant in these objects, such as
disp
. This is also a chance for a ground-zero approach to how people interact with optimizers.This PR is intended as a Request For Comment to gain feedback on architecture of the solvers. Given that it's a key part of infrastructure this may create a large amount of feedback, if I don't reply to it all bear in mind that I don't have unlimited time resource.
In it's current state I've implemented
Optimizer
,Function
,LBFGS
andNelderMead
classes.LBFGS
andNelderMead
are being called by the corresponding functions already in existence. The current test suite passes, but no unit tests have been written for the functionality.`These optimisers were straightforward to implement. However, I don't know how this approach would translate to other minimisers, or things like
least_squares
.