-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deprecation of polish
in optimize._differentialevolution.py
#4880
Comments
Quick reply: a polishing step is applied in |
What is meant by these terms? With respect to the overfeatured statement, I would supply a counterargument. Wouldn't addition of an "aggressive" keyword be an overfeature for
It can be applied externally by a user, but not at the same cost. In fact it's more expensive. Imagine 100 different users using
The code is clean at the moment. L-BFGS-B is already scipy, so it's not an external dependency. L-BFGS-B is well tested and is not at risk of removal. If anything untoward happened to L-BFGS-B than
Consider the Griewank function (http://en.wikipedia.org/wiki/Griewank_function), which is a smooth, non-linear, continuous, differentiable function. DE is able to find global minima in situations like this, in contrast to gradient based techniques which fail miserably. However, once it's in the close vicinity of the global minimum it is not efficient at reaching the absolute minimum that is possible to achieve. Here a gradient based technique would excel. The idea is to use DE to find and get as close as possible to the global minimum and subsequently apply LFBGS to 'polish' to the absolute lowest minimum that's achievable. If the polishing step finds a lower energy, then great. If it doesn't find a lower energy (possibly because of the reasons you outlined above), then you haven't lost anything because you still have the value from the DE step. It may cost an extra 50 function evaluations, but when this is compared against 20000 function evaluations it's a very low overhead. Personally I use DE for curvefitting multidimensional systems. These are normally continuous and differentiable. Polishing is vital for this.
For continuous differentiable functions polishing is way more efficient once in the global minimum. This is demonstrated by examining the
So do I, the sole purpose of |
Polishing is not a normal expected behaviour of DE. It should be normally off (if not removed completely from the function). For example: I catch execution of objective function every iteration to plot convergence of objective function and convergence of parameters. I expect that total number of executions of the objective function, according to my knowledge of differential evolution theory, would be the number of generations multiplied by generation (or population) length. Right? The polishing is great suited for your application and for your particular purposes, that is good. It works well and gives best results. But there is a possibility that for other purposes polishing will not work well (as mentioned noisy functions, non-differentiable functions). It will be a redundant feature which just increases the counter of objective function evaluations. If your objective function is fast, then no problem, you lose just a fraction of a second. But if your objective function is very slow. Then additional 70-80 evaluations of a 5-minute lasting objective function will increase you execution of DE time by 6 hours without any considerable improvement of the result. |
Sorry, I respectfully disagree that the polishing should be switched off by default.
The maximum number of function evaluations after the differential evolution section is actually
It clearly states in the documentation that a final polishing step is carried out, it's not hidden. The documentation for the underlying class solver also states that the LBFGS polishing step requires a few more function evaluations (the same statement could be added to the differential_evolution function). The I did the following test on the Hartmann 6 function: from __future__ import division, print_function
import numpy as np
from scipy.optimize import differential_evolution as de
class Hartmann6(object):
def __init__(self, dimensions=6):
self.a = np.asarray([[10., 3., 17., 3.5, 1.7, 8.],
[0.05, 10., 17., 0.1, 8., 14.],
[3., 3.5, 1.7, 10., 17., 8.],
[17., 8., 0.05, 10., 0.1, 14.]])
self.p = np.asarray([[0.1312, 0.1696, 0.5569, 0.0124, 0.8283, 0.5886],
[0.2329, 0.4135, 0.8307, 0.3736, 0.1004, 0.9991],
[0.2348, 0.1451, 0.3522, 0.2883, 0.3047, 0.665],
[0.4047, 0.8828, 0.8732, 0.5743, 0.1091, 0.0381]])
self.c = np.asarray([1.0, 1.2, 3.0, 3.2])
def fun(self, x, *args):
XX = np.atleast_2d(x)
d = np.sum(self.a * (XX - self.p) ** 2, axis=1)
return -np.sum(self.c * np.exp(-d))
h = Hartmann6()
res = de(h.fun, [(0, 1)] * 6, seed=1)
print(res.nfev)
res = de(h.fun, [(0, 1)] * 6, seed=1, polish=False)
print(res.nfev) The number of iterations with polishing is 1870, and the lowest energy is -3.32236801139. The global benchmarking suite (#4191) takes on average 4830 iterations (https://gist.github.com/andyfaff/24c96a3d5dbc7b0272b2) across it's 194 global minimisation problems (with polishing). Assuming that 70
I use this for non-linear least squares, which is a fairly typical use of a minimizer, the polishing is worth it for 1.5% extra
I agree that for some solutions polishing may not improve the solution. In your case this polishing step may have cost you an extra 6 hours, but |
Let's keep in mind that setting What constitutes " simple, universal and not specialised and overfeatured" is a matter of taste. There are lots of places in scipy, and indeed most libraries, where functionality is provided that can be built of of smaller pieces that are also provided. An obvious example from scipy is providing It is very important that as a project, we don't break people's existing code. This means that removing or changing this option has to have a deprecation cycle associated with it, where first a warning would be raised and then later we would actually make the real change. Doing this has a real cost associated with it, everyone using this function would have to change their code, and if the option is removed entirely, write more code to return the lost polishing step. This isn't even counting the developer effort to make theses changes to the scipy code base. Given that I'm -1 here I think. |
I agree with @ewmoore . |
@pavelponomarev from your last comment, should I understand that this issue may be closed? If so, please do. |
The discussion started in #4864 . I propose that the usage of
polish
in the_differentialevolution.py
must be deprecated. The polish functionality simply repeats the functionality of otherscipy.optimize
module. The library functions must be simple, universal and not specialised and overfeatured. The polish can be applied to the result externally by a user usingoptimize.minimize
with the same cost, but_differentialevolution.py
will be more universal, will have one less dependency and cleaner code.Additional reasoning:
The function is called
differential_evolution
and not adifferential_evolution_with_polishing
. I would not expect from this function to do something else from DE. OK, if there is statistics that this polishing really goes in majority of real world cases, then let leave it. But, at least the default behavior should be changed topolish = False
. (Are there some benchmarks about comparing what is more efficient -- letting a couple of additional iterations of DE, or applying polishing? What about the type of the testfunction?)Additionally, this polisher, I hypothesize, works OK only with differentiable functions. But there are a lot of real engineering problems (my among them), where objective functions are not only nonlinear, but also are discontinuous and non-differentiable and somewhat noizy, and where the global minimum can be just near the discontinuity. I still prefer modularity, and I belive, that all library functions should be modular and perform good just one task.
So, this additional feature could be removed with mentioning in documentation, or in the examples, that
L-BFGS-B
polishing is usually best suited and gives good results for differentiable smooth objective functions. Or, at least, the default behaviour should be changed topolish = False
.The text was updated successfully, but these errors were encountered: