deprecation of `polish` in `optimize._differentialevolution.py` #4880

pavelponomarev · 2015-05-15T22:14:44Z

The discussion started in #4864 . I propose that the usage of polish in the _differentialevolution.py must be deprecated. The polish functionality simply repeats the functionality of other scipy.optimize module. The library functions must be simple, universal and not specialised and overfeatured. The polish can be applied to the result externally by a user using optimize.minimize with the same cost, but _differentialevolution.py will be more universal, will have one less dependency and cleaner code.

Additional reasoning:
The function is called differential_evolution and not a differential_evolution_with_polishing. I would not expect from this function to do something else from DE. OK, if there is statistics that this polishing really goes in majority of real world cases, then let leave it. But, at least the default behavior should be changed to polish = False. (Are there some benchmarks about comparing what is more efficient -- letting a couple of additional iterations of DE, or applying polishing? What about the type of the testfunction?)
Additionally, this polisher, I hypothesize, works OK only with differentiable functions. But there are a lot of real engineering problems (my among them), where objective functions are not only nonlinear, but also are discontinuous and non-differentiable and somewhat noizy, and where the global minimum can be just near the discontinuity. I still prefer modularity, and I belive, that all library functions should be modular and perform good just one task.

So, this additional feature could be removed with mentioning in documentation, or in the examples, that L-BFGS-B polishing is usually best suited and gives good results for differentiable smooth objective functions. Or, at least, the default behaviour should be changed to polish = False.

The text was updated successfully, but these errors were encountered:

andyfaff · 2015-05-16T00:43:14Z

Quick reply: a polishing step is applied in optimize.brute, via the finish keyword. Why is differential_evolution any different?

andyfaff · 2015-05-16T02:01:49Z

library functions must be simple, universal and not specialised and overfeatured.

What is meant by these terms? differential_evolution is used for global minimisation with bounds. The user of the code doesn't actually have to do anything special allow the polishing step to work (simple), and it works in all cases (universal). The only drawback is that it requires a few extra function evaluations, which are of negligible cost compared to the rest of the minimisation. But the benefit is that it can result in a lower global energy.

With respect to the overfeatured statement, I would supply a counterargument. Wouldn't addition of an "aggressive" keyword be an overfeature for differential_evolution?

The polish can be applied to the result externally by a user using optimize.minimize with the same cost

It can be applied externally by a user, but not at the same cost. In fact it's more expensive. Imagine 100 different users using differential_evolution. If they all polish the final result then that's at least 100 extra sections of code where they all do the same thing. That code duplication is eliminated by having it in differential_evolution.

will have one less dependency and cleaner code.

The code is clean at the moment. L-BFGS-B is already scipy, so it's not an external dependency. L-BFGS-B is well tested and is not at risk of removal. If anything untoward happened to L-BFGS-B than differential_evolution would be the least of scipy's worries.

this polisher, I hypothesize, works OK only with differentiable functions. But there are a lot of real engineering problems (my among them), where objective functions are not only nonlinear, but also are discontinuous and non-differentiable and somewhat noizy, and where the global minimum can be just near the discontinuity.

Consider the Griewank function (http://en.wikipedia.org/wiki/Griewank_function), which is a smooth, non-linear, continuous, differentiable function. DE is able to find global minima in situations like this, in contrast to gradient based techniques which fail miserably. However, once it's in the close vicinity of the global minimum it is not efficient at reaching the absolute minimum that is possible to achieve. Here a gradient based technique would excel. The idea is to use DE to find and get as close as possible to the global minimum and subsequently apply LFBGS to 'polish' to the absolute lowest minimum that's achievable. If the polishing step finds a lower energy, then great. If it doesn't find a lower energy (possibly because of the reasons you outlined above), then you haven't lost anything because you still have the value from the DE step. It may cost an extra 50 function evaluations, but when this is compared against 20000 function evaluations it's a very low overhead.
I've tested out the default differential_evolution function against a huge range of problems (#4191). It works well over a wide range of them, including non-linear, discontinuous, noisy, non-differentiable problems. There are difficult problems such as Damavandi, or Bukin6, but any minimizer would have a hard time against those.

Personally I use DE for curvefitting multidimensional systems. These are normally continuous and differentiable. Polishing is vital for this.

Are there some benchmarks about comparing what is more efficient -- letting a couple of additional iterations of DE, or applying polishing?

For continuous differentiable functions polishing is way more efficient once in the global minimum. This is demonstrated by examining the rosen example from the lecture link you provided. LFBGS would reach that minimum in at least one order of magnitude less calculations than DE. When polishing doesn't work the overhead will be miniscule compared to the calculation time of the DE.

I belive, that all library functions should be modular and perform good just one task.

So do I, the sole purpose of differential_evolution is to get to the lowest possible global minimum, there is no other task it does.

pavelponomarev · 2015-05-16T11:38:08Z

Polishing is not a normal expected behaviour of DE.

It should be normally off (if not removed completely from the function). For example: I catch execution of objective function every iteration to plot convergence of objective function and convergence of parameters. I expect that total number of executions of the objective function, according to my knowledge of differential evolution theory, would be the number of generations multiplied by generation (or population) length. Right?
Then, I find that after application of the scipy.optimize.DE with default parameters the number of evaluations of the objective function is significantly higher? Is my understanding of differential evolution wrong? Do I have a mistake in my code? Actually NO. As it appears, the scipy.optimize.DE performs not differential evolution by default, but differential evolution + something else. When I use aspirin I expect it to remedy my pain, I do not expect that a pharmacist added there something else.

The polishing is great suited for your application and for your particular purposes, that is good. It works well and gives best results. But there is a possibility that for other purposes polishing will not work well (as mentioned noisy functions, non-differentiable functions). It will be a redundant feature which just increases the counter of objective function evaluations. If your objective function is fast, then no problem, you lose just a fraction of a second. But if your objective function is very slow. Then additional 70-80 evaluations of a 5-minute lasting objective function will increase you execution of DE time by 6 hours without any considerable improvement of the result.

andyfaff · 2015-05-16T13:18:03Z

Sorry, I respectfully disagree that the polishing should be switched off by default.

I expect that total number of executions of the objective function, according to my knowledge of differential evolution theory, would be the number of generations multiplied by generation length. Right?

The maximum number of function evaluations after the differential evolution section is actually min((maxiter + 1) * popsize * Y, maxfun), where Y is the number of parameters. The docstrings are slightly incorrect on this point.

Then, I find that after application of the scipy.optimize.DE with default parameters the number of evaluations of the objective function is significantly higher? Is my understanding of differential evolution wrong? Do I have a mistake in my code? Actually NO. As it appears, the scipy.optimize.DE performs not differential evolution by default, but differential evolution + something else.

It clearly states in the documentation that a final polishing step is carried out, it's not hidden. The documentation for the underlying class solver also states that the LBFGS polishing step requires a few more function evaluations (the same statement could be added to the differential_evolution function). The optimize.brute function uses fmin as a default polishing step.

I did the following test on the Hartmann 6 function:

from __future__ import division, print_function
import numpy as np
from scipy.optimize import differential_evolution as de

class Hartmann6(object):
    def __init__(self, dimensions=6):
        self.a = np.asarray([[10., 3., 17., 3.5, 1.7, 8.],
                          [0.05, 10., 17., 0.1, 8., 14.],
                          [3., 3.5, 1.7, 10., 17., 8.],
                          [17., 8., 0.05, 10., 0.1, 14.]])
        self.p = np.asarray([[0.1312, 0.1696, 0.5569, 0.0124, 0.8283, 0.5886],
                             [0.2329, 0.4135, 0.8307, 0.3736, 0.1004, 0.9991],
                             [0.2348, 0.1451, 0.3522, 0.2883, 0.3047, 0.665],
                             [0.4047, 0.8828, 0.8732, 0.5743, 0.1091, 0.0381]])
        self.c = np.asarray([1.0, 1.2, 3.0, 3.2])

    def fun(self, x, *args):
        XX = np.atleast_2d(x)
        d = np.sum(self.a * (XX - self.p) ** 2, axis=1)
        return -np.sum(self.c * np.exp(-d))

h = Hartmann6()
res = de(h.fun, [(0, 1)] * 6, seed=1)
print(res.nfev)
res = de(h.fun, [(0, 1)] * 6, seed=1, polish=False)
print(res.nfev)

The number of iterations with polishing is 1870, and the lowest energy is -3.32236801139.
The number of iterations with polishing is 1800, and the lowest energy is -3.31953623611.
This is a 3.9% increase in the number of function evaluations, but a lower energy is achieved.

The global benchmarking suite (#4191) takes on average 4830 iterations (https://gist.github.com/andyfaff/24c96a3d5dbc7b0272b2) across it's 194 global minimisation problems (with polishing). Assuming that 70 nfev is normal for LBFGS then the overhead of polishing is 70 / (4830 - 70) = 0.015, or 1.5%. These problems are a reflection of the kinds of problems that differential_evolution will be used for, including many noncontinuous, non differentiable functions; this is why it's a benchmark suite. If one was optimizing code one wouldn't really bother for a 1.5% speedup.

The polishing is great suited for your application and for your particular purposes, that is good.

I use this for non-linear least squares, which is a fairly typical use of a minimizer, the polishing is worth it for 1.5% extra nfev. For the majority of problems it's worth having the polishing step, for those that don't benefit the overhead isn't high, for a situation where it's really not wanted you can set polishing=False.

But there is a possibility that for other purposes polishing will not work well (as mentioned noisy functions, non-differentiable functions).

I agree that for some solutions polishing may not improve the solution. In your case this polishing step may have cost you an extra 6 hours, but nfev during the DE section would have taken 4830 * 5 / 60 = 402 hours (assuming an average number of iterations). You can set polish=False, then the polishing won't be done.

ewmoore · 2015-05-16T13:21:39Z

Let's keep in mind that setting polish=False removes polishing from the function. This works today.

What constitutes " simple, universal and not specialised and overfeatured" is a matter of taste. There are lots of places in scipy, and indeed most libraries, where functionality is provided that can be built of of smaller pieces that are also provided. An obvious example from scipy is providing integrate.dbl_quad when this is just a wrapper that calls integrate.quad to build up a 2D integral. Some of this is important because it allows the library to centralize tricky or error prone code to a single place and helps removed duplication of effort. Principles like DRY and TOOWTDI come in here. I would argue that allowing an optional polishing step is fine, the goal of the function is to find the global minimum of a multivariate function. In fact, this is the first line of the docstring.

It is very important that as a project, we don't break people's existing code. This means that removing or changing this option has to have a deprecation cycle associated with it, where first a warning would be raised and then later we would actually make the real change. Doing this has a real cost associated with it, everyone using this function would have to change their code, and if the option is removed entirely, write more code to return the lost polishing step. This isn't even counting the developer effort to make theses changes to the scipy code base. Given that I'm -1 here I think.

pavelponomarev · 2015-05-16T13:47:10Z

I agree with @ewmoore .
This discussion was started in other thread #4864 , which might involve substantial redesign of internals of the _differentialevolution.py, changes in the API or introduction of a new function to utilize possibility of parallelization in DE. In the perspective of significant changes, I though that it would be nice to straighten this thing as well. As a stand alone fix, this proposal is not feasible keeping in mind those changes required by the users. But it might be feasible if the function will be redesigned significantly, or a new function for parallel differential evolution will be added to scipy.optimize.

dlax · 2015-05-17T20:04:36Z

@pavelponomarev from your last comment, should I understand that this issue may be closed? If so, please do.

pavelponomarev mentioned this issue May 15, 2015

enh: modified _differentialevolution.py to enable parallel evaluation of the objective function for the whole population #4864

Closed

pavelponomarev closed this as completed May 17, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deprecation of `polish` in `optimize._differentialevolution.py` #4880

deprecation of `polish` in `optimize._differentialevolution.py` #4880

pavelponomarev commented May 15, 2015

andyfaff commented May 16, 2015

andyfaff commented May 16, 2015

pavelponomarev commented May 16, 2015

andyfaff commented May 16, 2015

ewmoore commented May 16, 2015

pavelponomarev commented May 16, 2015

dlax commented May 17, 2015

deprecation of polish in optimize._differentialevolution.py #4880

deprecation of polish in optimize._differentialevolution.py #4880

Comments

pavelponomarev commented May 15, 2015

andyfaff commented May 16, 2015

andyfaff commented May 16, 2015

pavelponomarev commented May 16, 2015

andyfaff commented May 16, 2015

ewmoore commented May 16, 2015

pavelponomarev commented May 16, 2015

dlax commented May 17, 2015

deprecation of `polish` in `optimize._differentialevolution.py` #4880

deprecation of `polish` in `optimize._differentialevolution.py` #4880