New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enh: modified _differentialevolution.py to enable parallel evaluation of the objective function for the whole population #4864
Comments
I can't speak for better, but the package emcee addresses this by taking a parameter "map"; it defaults to the standard python map function, but you can supply the one from multiprocessing or they provide a version of map that works with MPI. It keeps the implementation of the objective function separate from the parallelization issues. If you're going to keep the boolean argument, it would be preferable to call it "parallel" - less confusing, barely longer. |
Thank you. I will change that flag to "parallel" and prepare a pull request. I am not sure if this flag is OK. But the possibility for parallelisation is really must have here. |
Using the emcee API (just add a |
pv, is this API enables parallelization inside that package? How about to give to user the freedom of selection of parallelization method. E.g. joblib.Parallel instead of multiprocessing.Pool, or even some MPI. I think the keyword "pool" might be confusing. The description of the keyword could look like
|
Doing it like that requires modifications in the objective function, and slightly more hassle with setting up the parallelization as potentially persistent pools have to be passed also to the objective function. Separating the parallelization from the objective function makes it slightly easier to switch between serial and parallel implementations. |
Yes, and user takes care of the dispatch of separate jobs for the evaluation by the objective function. This helps if your objective function is very heavy and calculates during like several minutes, which naturally requires multiprocessing environment or even cluster. That's why there should be a freedom of selection how jobs are dispatched - within just one PC with multiprocessing.Pool functionality, or among several PCs using MPI. If your objective function is fast, just do not use this parallel option, but with very heavy fitness functions this slight hassle makes sense. |
Yes, but note that selection of the parallelization method is also possible via the pool/map API, cf. how it is done in emcee. Functionality-wise, both approaches can achieve what is desired. The question is then just what is the most convenient way to do it, and suitable for most purposes. |
Potentially interesting - I've been wondering how to parallelise differential evolution for a while. Unfortunately, I don't think that this parallelisation method is going to be better than the existing serial version. There are different parallelisation approaches that would be more worthwhile. I'll expand on that in a later comment, when I have more time. |
Regarding API for parallelization, note that in |
I do not like the implementation of parallelization inside the module. This adds a dependency and somewhat limits the user in selection of parallelization technique. E.g. if threading is utilized, this practically limits the parallelization to number of cores within one PC. I want to have possibility for my own implementation of parallelization using MPI, which enables parallelization on a cluster with thousands cores.
The proposed fix to I have got linear speed-up of computations. My objective function is very heavy, single run performs a FEM computation during 4 minutes. The proposed fix already works for me. I do not need any complication with implementation of parallelization within |
Here is the simplest code snippet which shows the advantage from scipy.optimize import rosen, differential_evolution
from joblib import Parallel, delayed, cpu_count
import time
bounds = [(0,2), (0, 2), (0, 2), (0, 2), (0, 2)]
def objfunc(params):
for it in xrange(1000000):
it**2
return rosen(params)
def pareval(listcoords):
listresults = Parallel(n_jobs=cpu_count())(delayed(objfunc)(i) for i in listcoords)
return listresults
def parallel_run():
result = differential_evolution(pareval, bounds, parallel=True, maxiter=10, polish=False)
print result.x, result.fun
def serial_run():
result = differential_evolution(objfunc, bounds, maxiter=10, polish=False)
print result.x, result.fun
start_time = time.time()
serial_run()
print("Serial run took %s seconds using 1 core " % (time.time() - start_time))
start_time = time.time()
parallel_run()
print "Parallel run took %s seconds using %s cores" % ((time.time() - start_time), cpu_count()) and result is:
Should I continue to prepare the pull request? |
Additionally, I think, the usage of |
Consider an optimisation where there are Before the optimisation takes place the initial energy of each member of the population is calculated. Currently these energies are calculated by serial iteration through each member of the population. Once all the energies are calculated the lowest energy solution is placed in position 0 of the population. The population is then evolved. In the traditional implementation of Differential Evolution (which we have in It is this last sentence that is the vital consideration here. In a given generation it is possible for each of the M trial vectors to change the best solution, Any changes to the algorithm that reduces the update frequency of the best solution vector, or does not allow other trial vectors to benefit from improvements to other population members (within a given generation), will drastically reduce the ability and efficiency of the entire DE process in finding a global minimum. Unfortunately the current parallelisation proposal does just that. For a given generation there are M trial vectors created. The energies of these trial vectors are calculated simultaneously. Once these have been calculated the trial members can replace the corresponding original population members if the trial energy is lower. A different way of using parallelisation has been proposed in the past. Rather than evaluate a single population in parallel, why not generate several populations, |
With regards to polishing the final result, this is frequently done by users of this code; so frequently, that it's worth taking on the work of polishing it within |
I think we can find a more suitable test than |
@andyfaff the kind of parallelization you're talking about exists in the form of pygmo: http://esa.github.io/pygmo/ It runs multiple optimizers in separate "islands" and has a "migration operator" to exchange promising solutions between islands. Makes it very flexible and parallelizable, but also maybe a little heavyweight for scipy. |
@andyfaff with regards to "polish" keyword I still disagree. The function is called So, this additional feature could be removed with mentioning in documentation, or in the examples, that |
Regarding the 'polish' interface, differential_evolution(func, ...) is not the same as def polished_func(x):
return minimize(func, x, ...).fun
differential_evolution(polished_func, polish=False, ...) see https://github.com/scipy/scipy/blob/maintenance/0.16.x/scipy/optimize/_differentialevolution.py#L542 |
Now that I've read the polishing code more closely I see that it is applied only to the best population member at the very end of the optimization. It's not a local optimization step taken at each function evaluation, and it's not applied at the end of each generation. I kind of agree with @pavelponomarev that including it in |
With regards to parallelization... The realized DE strategy in I like mentioned here approach by @andyfaff about braking the population to several smaller populations which allows to achieve quasi-aggressive performance. Here, the proposed by @rgommers keyword I propose to have default behavior of DE to be aggressive serial and without polishing (this is generally good for light objective functions). But the non-aggressive pure DE also should be implemented with possibility for parallel evaluation of the objective function for the whole population (this is good for very heavy objective functions). And maybe let us leave thoughts about best implementation of quasi-aggressive strategy for parallel DE for future commits (as this heavyweight behaviour is implemented in PyGMO as mentioned by @aarchiba ). |
Can we split this discussion/issue into two parts? The first part On 16 May 2015 at 01:35, Pavel Ponomarev notifications@github.com wrote:
Dr. Andrew Nelson |
Agree). Here it is #4880 . |
I read the lecture slide link that was mentioned. The tl;dr message is that the 'aggressive' approach visits each member of the population serially, if the trial vector is successful it is updated immediately. Subsequent trial vectors are able to benefit from earlier improvements made to the population within the same generation (best solution is updated several times per generation). Currently In the non-aggressive approach the trial vectors are all independent of each other. This makes it possible to calculate the objective function for them all in a parallel manner. Parallelisability is coincidental here, use of a Lets suppose an The key thing is to discuss what should happen in situations where parallelisation of objective functions could occur (@dlax). The behaviour for all existing |
If there is going to be a general option to allow parallelization of optimization algorithms (which could include some of the local ones - gradient evaluation can be parallelized, for example) we should think about how to do that efficiently. Creating a new objective function may not be the best way to do it. First: do we need more generality? This is sort of a SIMD model: the optimizer comes up with a list of values, sends them off to be evaluated, and waits for them all to finish. For some of the genetic algorithms, or the brute-force optimization, you could perhaps resume the computation before all the values had returned. Second, if the generality is appropriate, is this the best way to get parallelization? As I pointed out earlier, emcee solves the same problem - allowing multiple simultaneous objective evaluations - by allowing the user to provide a suitable implementation of map, perhaps the one from multiprocessing or perhaps the MPI-based one they provide. This particular implementation doesn't allow one to take advantage of numpy-style vectorization, if that's a concern, but it doesn't require modifying the objective function to change the parallelism. On the other hand, if vectorized functions are the way to go, a version of np.vectorize that used parallelization under the hood would make it easy to parallelize with this implementation. But there might be other ways of letting the user flexibly add parallelism to algorithms that permit it. |
I like that emcee way. There should be a choice for the user to take care of the parallelization by themselves, or use default built-in parallelization (e.g. threading). I think it is not necessary to make a decorator for objective functions. |
Maybe there's a way to avoid a second keyword, either by overloading |
I finally visited the emcee code mentioned by @aarchiba and @pv and I like the way it is done there. Usage of the keyword 'pool' is much more convenient and concise and they provide also a pool compatible with MPI. I should have done it in May. |
Yes scipy is MIT-licensed, and source files from legitimately MIT-licensed projects can be included into scipy. It looks like emcee used to be GPL but they switched at some point. |
Hello, Here are the working files for emcee-style parallelization https://github.com/scipy/scipy/compare/master...pavelponomarev:DE_with_parallel_pools?expand=1
|
The spelling in joblib (and cKDTree) is |
Do we want a dependency on MPI and joblib? Presumably we can do without any of those here. |
@sturlamolden |
I'd really like to look over this code in detail but I'm on holiday until
mid July. A few things are worth mentioning:
1. Backwards compatibility is very important. I don't think we can change
the callback function at this point. Please also see point 3.
2. People use docstrings to work out how to use code. I really think the
solver object should retain its docstring.
3. The interfaces to the scipy.optimize minimizer functions were all made
very similar, using the same input keywords and same output keywords. I'd
be hesitant about removing maxfun and maxiter. The same argument applies to
changing the signature of the callback function.
4. Is there a need to include joblib or mpilib? Could we just say that the
pool keyword should be a pool object that has a map method? This way you
can use whatever pool you like. A default internal pool could be from
multiprocessing. This way there are no added dependencies that need to be
updated.
|
@andyfaff I am going to vacation as well until the end of July, so perhaps no hurry here)
|
@pavelponomarev I was just commenting on the spelling ( Whether we should use joblib in SciPy is a bigger discussion, but now would be a good time to take it. MPI is a totally another matter, there are many implementations of the MPI runtime and there are several Python bindings – including a modified Python interpreters. I think it is better avoided for now. MPI is great, but we need to make some choises on how to support MPI in SciPy. |
My biggest complaint against joblib in its current form is the memory mapping of temporary files for sharing NumPy arrays between processes. This means it uses shared memory on Linux (due to tmpfs), but physical files on Mac and Windows. But standardizing on joblib is in my opinion better than having a number of ad hoc solutions for parallelization. I am slightly in favor of using joblib in SciPy, but only if it is chosen as the standard way of parallel computing in SciPy. |
@pavelponomarev, it'd probably help if you could split your changes into several atomic commits. A "636 additions, 189 deletions" diffstat does not ease review. |
@dlax , here it is: |
The current code of only a Whether or not we add |
@rgommers , I agree. |
@pavelponomarev we shouldn't merge anything until it's clear what we want to do. Some options:
I think I prefer (1), followed by (2) and then (3). Two keywords really is one too many for an API that could be used throughout Scipy. |
I also like (1)! |
We used (3) in |
Just Wondering if this paralell version will be included in the next version of scipy ? |
This PR is currently stalled and won't be in SciPy 0.18. |
So I don't see it listed for 0.19 either. Is this likely to be ready for the next release? I'm sure ya'll are swamped; this would just be really very useful for my research. |
I couldn't agree more with @rocapp. A parallel version of DE would be truly welcomed! |
@rgommers : would it make sense to also provide the same parallelization option for 'scipy.optimize.brute' as well. Since scipy.optimize.brute is guaranteed to work for any kind of optimization problem (agreed that it is inefficient), it would be really helpful to have the same switch for this method as well. |
@hnazkani please don't ask the same question in two places:) I'll answer your other one. |
Hi,
https://gist.github.com/pavelponomarev/20ea37324b219c7ed461/revisions
This file is modified to enable utilization of multiprocessing.Pool.map(func, iterable[, chunksize]) for parallel evaluation of the objective function.
Please review the code. This is my first submission to github projects.
Is introduction of a new optional parameter "parall" appropriate? Maybe some other better strategy can be used to enable vectorization of the objective function calculation?
BR, Pavel
The text was updated successfully, but these errors were encountered: