Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enh: modified _differentialevolution.py to enable parallel evaluation of the objective function for the whole population #4864

Closed
pavelponomarev opened this issue May 12, 2015 · 57 comments · Fixed by #8259

Comments

@pavelponomarev
Copy link

Hi,

https://gist.github.com/pavelponomarev/20ea37324b219c7ed461/revisions
This file is modified to enable utilization of multiprocessing.Pool.map(func, iterable[, chunksize]) for parallel evaluation of the objective function.

Please review the code. This is my first submission to github projects.
Is introduction of a new optional parameter "parall" appropriate? Maybe some other better strategy can be used to enable vectorization of the objective function calculation?

BR, Pavel

@aarchiba
Copy link
Contributor

I can't speak for better, but the package emcee addresses this by taking a parameter "map"; it defaults to the standard python map function, but you can supply the one from multiprocessing or they provide a version of map that works with MPI. It keeps the implementation of the objective function separate from the parallelization issues. If you're going to keep the boolean argument, it would be preferable to call it "parallel" - less confusing, barely longer.

@pavelponomarev
Copy link
Author

Thank you. I will change that flag to "parallel" and prepare a pull request. I am not sure if this flag is OK. But the possibility for parallelisation is really must have here.

@pv
Copy link
Member

pv commented May 13, 2015

Using the emcee API (just add a pool= keyword arguments) sounds useful to me, in particular in the sense that it is not tied to multiprocessing. There are also some other routines in Scipy that could in principle be parallelized to some degree in a similar way.

@pavelponomarev
Copy link
Author

pv, is this API enables parallelization inside that package? How about to give to user the freedom of selection of parallelization method. E.g. joblib.Parallel instead of multiprocessing.Pool, or even some MPI. I think the keyword "pool" might be confusing. The description of the keyword could look like

parallel : bool, optional
Changes the objective function f([x], *args) to accept vectorized
arguments, where [x] is the list of parameters of the whole population.
This means in practice that current generation can be evaluated
externally using parallel computing packages and functions like
multiprocessing.Pool.map or joblib.Parallel, e.g.:
Parallel(n_jobs=8)(delayed(func)(i) for i in [x])

@pv
Copy link
Member

pv commented May 13, 2015

Doing it like that requires modifications in the objective function, and slightly more hassle with setting up the parallelization as potentially persistent pools have to be passed also to the objective function. Separating the parallelization from the objective function makes it slightly easier to switch between serial and parallel implementations.

@pavelponomarev
Copy link
Author

Yes, and user takes care of the dispatch of separate jobs for the evaluation by the objective function. This helps if your objective function is very heavy and calculates during like several minutes, which naturally requires multiprocessing environment or even cluster. That's why there should be a freedom of selection how jobs are dispatched - within just one PC with multiprocessing.Pool functionality, or among several PCs using MPI. If your objective function is fast, just do not use this parallel option, but with very heavy fitness functions this slight hassle makes sense.

@pv
Copy link
Member

pv commented May 13, 2015

Yes, but note that selection of the parallelization method is also possible via the pool/map API, cf. how it is done in emcee.

Functionality-wise, both approaches can achieve what is desired. The question is then just what is the most convenient way to do it, and suitable for most purposes.

@andyfaff
Copy link
Contributor

Potentially interesting - I've been wondering how to parallelise differential evolution for a while. Unfortunately, I don't think that this parallelisation method is going to be better than the existing serial version. There are different parallelisation approaches that would be more worthwhile. I'll expand on that in a later comment, when I have more time.
However, the proof is in the benchmarking, I think we would need to use #4191 (which provides multiple global optimisation functions) to see if an improvement is possible with this parallelisation approach. One would want to know a) if a lower energy is obtained and b) if a lower number of nfev is required to obtain that energy.

@rgommers
Copy link
Member

Regarding API for parallelization, note that in cKDTree we just shipped a n_jobs keyword. Maybe should have gotten a bit more discussion. Would be really good to pick one way to do it. n_jobs is nice, because it's used in scikit-learn and statsmodels already.

@pavelponomarev
Copy link
Author

I do not like the implementation of parallelization inside the module. This adds a dependency and somewhat limits the user in selection of parallelization technique. E.g. if threading is utilized, this practically limits the parallelization to number of cores within one PC. I want to have possibility for my own implementation of parallelization using MPI, which enables parallelization on a cluster with thousands cores.

n_jobs is a good keyword, but the sensible number of possible parallel jobs in the DE is determined by the population length which naturally is determined from other keywords of DE.

The proposed fix to _differentialevolution.py changes only the way how the objective function is evaluated for the population -- from calculation for each separate member of the population to the whole population simultaneously. That does not add any dependencies, and leaves the freedom for implementation of parallelization for the user. Nothing complicated is added, and this already works.

I have got linear speed-up of computations. My objective function is very heavy, single run performs a FEM computation during 4 minutes. The proposed fix already works for me. I do not need any complication with implementation of parallelization within _differentialevolution.py. I want to keep things simple and leave implementation of the algorithm to scipy.optimize, and implementation of parallelization to joblib or my own MPI.

@pavelponomarev
Copy link
Author

Here is the simplest code snippet which shows the advantage

from scipy.optimize import rosen, differential_evolution
from joblib import Parallel, delayed, cpu_count
import time

bounds = [(0,2), (0, 2), (0, 2), (0, 2), (0, 2)]

def objfunc(params):
    for it in xrange(1000000):
        it**2
    return rosen(params)

def pareval(listcoords):
    listresults = Parallel(n_jobs=cpu_count())(delayed(objfunc)(i) for i in listcoords) 
    return listresults

def parallel_run():
    result = differential_evolution(pareval, bounds, parallel=True, maxiter=10, polish=False)
    print result.x, result.fun

def serial_run():
    result = differential_evolution(objfunc, bounds, maxiter=10, polish=False)
    print result.x, result.fun

start_time = time.time()
serial_run()
print("Serial run took %s seconds using 1 core " % (time.time() - start_time))    

start_time = time.time()
parallel_run()
print "Parallel run took %s seconds using %s cores" % ((time.time() - start_time), cpu_count())

and result is:

[ 0.54651845  0.35382038  0.1226159   0.03008397  0.03142196] 2.75419151958
Serial run took 42.4755949974 seconds using 1 core 
[ 0.87704828  0.87434803  0.83732696  0.68769892  0.49790808] 1.87122034154
Parallel run took 12.6878271103 seconds using 4 cores

Should I continue to prepare the pull request?

@pavelponomarev
Copy link
Author

Additionally, I think, the usage of polish in the _differentialevolution.py must be deprecated. The polish functionality simply repeats the functionality of other scipy.optimize module. The library functions must be simple, universal and not specialised and overfeatured. The polish can be applied to the result externally by a user using optimize.minimize with the same cost, but _differentialevolution.py will be more universal, will have one less dependency and cleaner code.

@andyfaff
Copy link
Contributor

Consider an optimisation where there are M population members and there will be up to N generations (N < maxiter) evolving the population. Each population member represents a solution vector.

Before the optimisation takes place the initial energy of each member of the population is calculated. Currently these energies are calculated by serial iteration through each member of the population. Once all the energies are calculated the lowest energy solution is placed in position 0 of the population. population[0] always represents the best solution so far. There is scope for this initialisation calculation to be done in a parallel manner because there is no evolution of the population at this stage. However, bear in mind that this initialisation is only typically <1% of total calculation.

The population is then evolved. In the traditional implementation of Differential Evolution (which we have in scipy.optimize) each population member is visited in a serial manner. A trial vector is created which randomly takes values from the population member under examination, or from a 'blended' vector (this choice is controlled by the recombination constant, there is at least one entry from the blended vector in the trial vector). The 'blended' vector is synthesized from the population[0] plus a scaled difference of two randomly chosen population vectors (the scaling is controlled by the mutation constant). The energy of the trial vector is calculated by the objective function. If the energy of the trial vector is better than the energy of the existing population member the trial vector replaces the population member. If the energy is also less than the best solution so far the trial vector replaces population[0] as well.

It is this last sentence that is the vital consideration here. In a given generation it is possible for each of the M trial vectors to change the best solution, population[0]. In addition, it is possible for each of the M population members to be improved. Thus, within a given generation the best solution will probably change several times; there will also be several population members that will lower their energies as well. Remember that the blended vector outlined above is synthesized using values from the best solution and other randomly chosen population members. Thus, even within a single generation one is able to take advantage of improvements in both the best solution and other population improvements that have only occurred within that generation. In other words trial vectors show continuous improvement within a generation (e.g. trial[90] could be synthesised from population[0] and population[30] and population[57], all of which could have changed for the better).

Any changes to the algorithm that reduces the update frequency of the best solution vector, or does not allow other trial vectors to benefit from improvements to other population members (within a given generation), will drastically reduce the ability and efficiency of the entire DE process in finding a global minimum.

Unfortunately the current parallelisation proposal does just that. For a given generation there are M trial vectors created. The energies of these trial vectors are calculated simultaneously. Once these have been calculated the trial members can replace the corresponding original population members if the trial energy is lower. population[0] is then replaced by the population member with the lowest energy.
This reduces the strength of the algorithm in two ways. Firstly, the best solution vector can only change once per generation. Secondly, all the trial vectors are created and calculated independently. Both of these remove continuous improvement of trial vectors within a generation. For a given problem I would hypothesize this proposal would converge around N times more slowly.

A different way of using parallelisation has been proposed in the past. Rather than evaluate a single population in parallel, why not generate several populations, P? One would iterate serially through each of those P populations, like is done at the moment. At the end of a generation (a synchronisation point) you compare population[0] for each of the P populations. The overall best solution is then copied to the other populations.

@andyfaff
Copy link
Contributor

With regards to polishing the final result, this is frequently done by users of this code; so frequently, that it's worth taking on the work of polishing it within differential_evolution. Doing it internally reduces the chances of a user making a mistake. Because DE mandates the use of limits L-BFGS-B is the only algorithm that can be used here.
I would argue that the polishing step in differential_evolution is simple, is universal, and does not make it over featured. I would rather external codebases didn't have to worry about the extra polishing step themselves, the probability of them making a bug is larger.
Besides deprecating / removing a keyword should happen only in rare circumstances.

@andyfaff
Copy link
Contributor

I think we can find a more suitable test than rosen, something which has a lot more features e.g. Griewank or BiggsExp04 (#4191). The best test approach would be to go through the global benchmark suite 100 times for each approach and examine the total number of function evaluations, and the success rate.
Please note that the rosen function has a global minimum of 0.0, and a best solution of [1, 1, 1...]. In the test you ran above neither the serial or parallel versions reached a satisfactory minimum. More iterations are required for this. All that happened is that the parallel version finished in a smaller amount of time than the serial version, which is to be expected. The key question is which approach converges in a smaller amount of function iterations, and also reaches a global minimum?

@aarchiba
Copy link
Contributor

@andyfaff the kind of parallelization you're talking about exists in the form of pygmo: http://esa.github.io/pygmo/ It runs multiple optimizers in separate "islands" and has a "migration operator" to exchange promising solutions between islands. Makes it very flexible and parallelizable, but also maybe a little heavyweight for scipy.

@pavelponomarev
Copy link
Author

@andyfaff with regards to "polish" keyword I still disagree. The function is called differential_evolution and not a differential_evolution_with_polishing. I would not expect from this function to do something else from DE. OK, if there is statistics that this polishing really goes in majority of real world cases, then let leave it. But, at least the default behavior should be changed to polish = False. (Are there some benchmarks about comparing what is more efficient -- letting a couple of additional iterations of DE, or applying polishing? What about the type of the testfunction?)
Additionally, this polisher, I hypothesize, works OK only with differentiable functions. But there are a lot of real engineering problems (my among them), where objective functions are not only nonlinear, but also are discontinuous and non-differentiable and somewhat noizy, and where the global minimum can be just near the discontinuity. I still prefer modularity, and I belive, that all library functions should be modular and perform good just one task.

So, this additional feature could be removed with mentioning in documentation, or in the examples, that L-BFGS-B polishing is usually best suited and gives good results for differentiable smooth objective functions. Or, at least, the default behaviour should be changed to polish = False.

@argriffing
Copy link
Contributor

Regarding the 'polish' interface, differential_evolution appears to use polishing in a somewhat sophisticated way, so

differential_evolution(func, ...)

is not the same as

def polished_func(x):
    return minimize(func, x, ...).fun
differential_evolution(polished_func, polish=False, ...)

see https://github.com/scipy/scipy/blob/maintenance/0.16.x/scipy/optimize/_differentialevolution.py#L542

@argriffing
Copy link
Contributor

Now that I've read the polishing code more closely I see that it is applied only to the best population member at the very end of the optimization. It's not a local optimization step taken at each function evaluation, and it's not applied at the end of each generation. I kind of agree with @pavelponomarev that including it in differential_evolution is unnecessarily monolithic especially when it is True by default.

@pavelponomarev
Copy link
Author

With regards to parallelization... The realized DE strategy in optimize._differentialevolution is called "aggressive" (at least in lecture slides of prof. Alotto http://www.dii.unipd.it/~alotto/didattica/corsi/Elettrotecnica%20computazionale/DE.pdf) due to the usage of best members from the current generation along with the previous generation for mutation (as was nicely described by @andyfaff ). This is one of the advanced strategies and definitely superior to pure DE. The pure DE does not mix generations during the mutation process. Hence, in pure DE, the parallel evaluation of the objective function (or computation of energies) can be applied to the whole generation simultaneously.

I like mentioned here approach by @andyfaff about braking the population to several smaller populations which allows to achieve quasi-aggressive performance. Here, the proposed by @rgommers keyword n_jobs can be nicely used.

I propose to have default behavior of DE to be aggressive serial and without polishing (this is generally good for light objective functions). But the non-aggressive pure DE also should be implemented with possibility for parallel evaluation of the objective function for the whole population (this is good for very heavy objective functions). And maybe let us leave thoughts about best implementation of quasi-aggressive strategy for parallel DE for future commits (as this heavyweight behaviour is implemented in PyGMO as mentioned by @aarchiba ).

@andyfaff
Copy link
Contributor

Can we split this discussion/issue into two parts? The first part
regarding the polishing. The second part regarding the 'aggressive'
(parallel) strategy. They are separate/orthogonal to each other.
@pavelponomarev https://github.com/pavelponomarev can you start a
separate thread for the polishing?

On 16 May 2015 at 01:35, Pavel Ponomarev notifications@github.com wrote:

With regards to parallelization... The realized DE strategy in
optimize._differentialevolution is called "aggressive" (at least in
lecture slides of prof. Alotto
http://www.dii.unipd.it/~alotto/didattica/corsi/Elettrotecnica%20computazionale/DE.pdf
http://www.dii.unipd.it/%7Ealotto/didattica/corsi/Elettrotecnica%20computazionale/DE.pdf).
This is one of the advanced strategies and definitely superior to pure DE.
The pure DE does not mix generations during the mutation process. Hence,
the parallel evaluation of the objective function (or computation of
energies) can be applied to the whole generation simultaneously.

I like mentioned here approach by @andyfaff https://github.com/andyfaff
about braking the population to several smaller populations which allows to
achieve quasi-aggressive performance. Here, the proposed by @rgommers
https://github.com/rgommers keyword n_jobs can be nicely used.

I propose to have default behavior of DE to be aggressive serial and
without polishing. But the non-aggressive pure DE also should be
implemented with possibility for parallel evaluation of the objective
function for the whole population. And leave thoughts about best
implementation of quasi-aggressive strategy for parallel DE for future
commits.


Reply to this email directly or view it on GitHub
#4864 (comment).


Dr. Andrew Nelson


@pavelponomarev
Copy link
Author

Agree). Here it is #4880 .

@andyfaff
Copy link
Contributor

I read the lecture slide link that was mentioned. The tl;dr message is that the 'aggressive' approach visits each member of the population serially, if the trial vector is successful it is updated immediately. Subsequent trial vectors are able to benefit from earlier improvements made to the population within the same generation (best solution is updated several times per generation).
In contrast the 'non-aggressive' approach generates trial vectors for the entire population in parallel (all trial vectors are independent of each other). Once all energies have been calculated the population is updated (best solution is updated once per generation).

Currently differential_evolution uses the aggressive approach. @pavelponomarev is proposing to add the non-aggressive approach. A more appropriate keyword to use if this is to be added would be
aggressive=True.

In the non-aggressive approach the trial vectors are all independent of each other. This makes it possible to calculate the objective function for them all in a parallel manner. Parallelisability is coincidental here, use of a parallel keyword obscures the fact that the minimisation algorithm has changed. The non-aggressive approach could also use a non-parallel calculation of the objective function.

Lets suppose an aggressive keyword is added. If aggressive is False then use the non-aggressive approach. The default (True) would be to follow the existing approach. The use of terms such as aggressive/non-aggressive should not be overly interpreted as good/bad, they're just different.

The key thing is to discuss what should happen in situations where parallelisation of objective functions could occur (@dlax). The behaviour for all existing scipy.optimize objective functions is receive an x
parameter array (x.shape = (Y, )), returning a single energy value. We could specify that if aggressive is False then the objective function should expect to receive an (M, Y) shaped array and should return (M, ) energies. However, this wouldn't be consistent with the aggressive is True case, nor would it be consistent with the rest of scipy.optimize. Besides someone might want to use the non-aggressive approach with a normal objective function.
There are several places in scipy.optimize where parallelisation is possible; the non-aggressive approach in this proposal and optimize.brute immediately spring to mind. In optimize.brute the calculation is vectorized, but not parallelized, using np.vectorize. The non-aggressive proposal is not too dissimilar, perhaps this approach can be used to start with.
However, I think scipy.optimize needs a common approach to enabling this kind of parallelisation. We don't want to have case-by-case approaches. Could we create a scipy.optimize.parallelized decorator that would indicate that an objective function was reentrant? The decorator function could handle the parallelisation, with the objective function remaining consistent/unchanged. (It could also indicate that it accepts an (M, Y) array).
The complicating factor, as people have pointed out, is that people would like to change how the calculation is parallelised

@aarchiba
Copy link
Contributor

If there is going to be a general option to allow parallelization of optimization algorithms (which could include some of the local ones - gradient evaluation can be parallelized, for example) we should think about how to do that efficiently. Creating a new objective function may not be the best way to do it.

First: do we need more generality? This is sort of a SIMD model: the optimizer comes up with a list of values, sends them off to be evaluated, and waits for them all to finish. For some of the genetic algorithms, or the brute-force optimization, you could perhaps resume the computation before all the values had returned.

Second, if the generality is appropriate, is this the best way to get parallelization? As I pointed out earlier, emcee solves the same problem - allowing multiple simultaneous objective evaluations - by allowing the user to provide a suitable implementation of map, perhaps the one from multiprocessing or perhaps the MPI-based one they provide. This particular implementation doesn't allow one to take advantage of numpy-style vectorization, if that's a concern, but it doesn't require modifying the objective function to change the parallelism. On the other hand, if vectorized functions are the way to go, a version of np.vectorize that used parallelization under the hood would make it easy to parallelize with this implementation. But there might be other ways of letting the user flexibly add parallelism to algorithms that permit it.

@pavelponomarev
Copy link
Author

I like that emcee way. There should be a choice for the user to take care of the parallelization by themselves, or use default built-in parallelization (e.g. threading). I think it is not necessary to make a decorator for objective functions.

@rgommers
Copy link
Member

emcee doesn't just use a pool, its sampler signature is ..., thread=1, pool=None, ... - so two keywords. With the default being user-friendly and very similar to scikit-learn, not requiring creation of a custom pool. So my conclusion is that n_jobs=1 is a sane and well-tested choice, and adding an option for a custom pool may be useful to power users but is of secondary importance.

Maybe there's a way to avoid a second keyword, either by overloading n_jobs (ugly) or using some kind of decorator that adds the option to set a property to choose a pool or threading backend.

@pavelponomarev
Copy link
Author

I finally visited the emcee code mentioned by @aarchiba and @pv and I like the way it is done there. Usage of the keyword 'pool' is much more convenient and concise and they provide also a pool compatible with MPI. I should have done it in May.
The emcee is licensed under MIT. Is it compatible with the scipy? Can we directly borrow pieces of code from there?

@argriffing
Copy link
Contributor

The emcee is licensed under MIT. Is it compatible with the scipy? Can we directly borrow pieces of code from there?

Yes scipy is MIT-licensed, and source files from legitimately MIT-licensed projects can be included into scipy. It looks like emcee used to be GPL but they switched at some point.

@pavelponomarev
Copy link
Author

Hello,

Here are the working files for emcee-style parallelization https://github.com/scipy/scipy/compare/master...pavelponomarev:DE_with_parallel_pools?expand=1
Please check and comment.
There two pools are added -- MPI and joblib. Both are working good. The MPIpool was copied from emcee and augmented with a function MPIpool.poolsize() to automatically enable quasi-aggressive mutation in DE.
The changes in DE are as follows:

  1. Added optional variable pool (default False) to enable parallelization using provided pools. This enables parallelization without altering the objective function.
  2. Added flag aggressive (default True) to enable aggressive mutation strategy as discussed in enh: aggressive DE strategies #4853
  3. Added optional variable njobs which defines the size of subpopulations for quasi-aggressive (enh: modified _differentialevolution.py to enable parallel evaluation of the objective function for the whole population #4864) mutation in parallel case.
  4. callback is modified to expose the whole population, not just the best individual. This is required to set up custom stopping criteria which can be useful in real-life engineering problems with computationally-expensive objective functions. This way the search process can be stopped with an arbitrary criterion and computation time would not be wasted. See e.g. https://scholar.google.com/scholar?cluster=15734086883203880529 . Additionally it is used to monitor the convergence of parameters. It helps to qualitatively evaluate the objective function (e.g. when the objective function has several equal global minimums) by observing the clustering of the individuals near the minimums.
  5. Option maxfun and warning check for maxfev are removed as not necessary. The author of original code was also skeptical about usefulness of this at lines 286, 287. Additionally it can not be used in parallel.
  6. Documentation strings in DifferentialEvolutionSolver class were removed as those are identical to the module documentation. I do not see a reason why the same documentation should be maintained in two different places.
  7. The documentation and examples are updated according to the introduced changes.

@sturlamolden
Copy link
Contributor

The spelling in joblib (and cKDTree) is n_jobs with an underscore.

@sturlamolden
Copy link
Contributor

Do we want a dependency on MPI and joblib? Presumably we can do without any of those here.

@pavelponomarev
Copy link
Author

@sturlamolden njobs was deliberately distinguished from the number of workers as this determines the sub-population size for aggressive mutation. This njobs can be bigger than the number of parallel workers. Perhaps, a name like subpopsize could be better.

@andyfaff
Copy link
Contributor

andyfaff commented Jun 26, 2015 via email

@pavelponomarev
Copy link
Author

@andyfaff I am going to vacation as well until the end of July, so perhaps no hurry here)

  1. I would disagree - for all population based algorithms (unlike single valued optimizations) the information on population members is really essential. It shows the convergence (for single-valued minimizations it is enough to expose only the best single solution within the iteration). The callback function is the best way to provide this information. Alternatively we should design another way of how to expose the information on all individuals outside the DE module.
    It is important when, e.g. the objective function has several equal global minimums. Current callback function will not provide any information on that. However, if the whole population is exposed, it is possible to qualitatively assess the clustering of the individuals near those minimums and conclude qualitatively that problem has several optimums.
  2. The exposed public interface as I see it is differential_evolution function, and not DifferentialEvolutionSolver. People would look for the first option in docstring (I actually never used docstring, so maybe I am wrong here). It is REALLY wrong to keep the same info in two different places. I would like to keep the docs for the module, and for those who uses docstrings leave a link in the Solver class.
  3. The maxfun is really useless when parallelization is implemented. Moreover, I do not see any application for maxfun. If there is any need for this, then it is fairly easy to count the executions of the objective function within the objective function itself.
    Regarding the callback, I think that DE in scipy is the first population-based method. There were no other population based methods before. In situations when something fundamentally new comes up it is natural to change the standard interfaces. I understand that there previously were no usage at all of population based algorithms in scipy, so maybe we should design a standart interface for population based algorithms here. And in this new design a possibility for inclusion of new parameters (which might be needed in future) should be implemented.
  4. In emcee that pool is provided for convenience. I want the same convenience here as well. The dependencies are actually optional. Providing own pools helps to deal with default parameters of the pools in a good way. I agree, that an own multiprocessing pool could be a good option. But, considering easiness and end-user usability, I want to use joblib as it provides great debugging possibilities and responds well for CTRL-C. With multiprocessing provided pool it is a total headache during debugging. And a custom MPI pool is also necessary for HPC applications.

@sturlamolden
Copy link
Contributor

@pavelponomarev I was just commenting on the spelling (n_jobs vs. njobs).

Whether we should use joblib in SciPy is a bigger discussion, but now would be a good time to take it.

MPI is a totally another matter, there are many implementations of the MPI runtime and there are several Python bindings – including a modified Python interpreters. I think it is better avoided for now. MPI is great, but we need to make some choises on how to support MPI in SciPy.

@sturlamolden
Copy link
Contributor

My biggest complaint against joblib in its current form is the memory mapping of temporary files for sharing NumPy arrays between processes. This means it uses shared memory on Linux (due to tmpfs), but physical files on Mac and Windows.

But standardizing on joblib is in my opinion better than having a number of ad hoc solutions for parallelization. I am slightly in favor of using joblib in SciPy, but only if it is chosen as the standard way of parallel computing in SciPy.

@dlax
Copy link
Member

dlax commented Jun 26, 2015

@pavelponomarev, it'd probably help if you could split your changes into several atomic commits. A "636 additions, 189 deletions" diffstat does not ease review.

@pavelponomarev
Copy link
Author

@dlax , here it is:
https://github.com/scipy/scipy/compare/master...pavelponomarev:parallel_DE?expand=1
Here only the core parallel functionality is updated in comparison with the existent serial function.

@rgommers
Copy link
Member

The current code of only a pool= kw isn't very good imho. Several people who prefer it refer to emcee, but as I pointed out in #4864 (comment) it does not use pool in isolation.

Whether or not we add pool for power users (I don't care too much about that), we need something that takes an integer. Could be called n_jobs or threads or workers, that doesn't really matter I think.

@pavelponomarev
Copy link
Author

@rgommers , I agree.
The pools are required for power users to use MPI pool or Joblib pool. Additional integer keyword should be added for normal parallel use with standart multiprocessing module. Let us try to push first this functionality for power users (e.g. as it is done in #5141). And then, in a separate PR, this integer parameter could be introduced.

@rgommers
Copy link
Member

@pavelponomarev we shouldn't merge anything until it's clear what we want to do. Some options:

  1. Add a workers=1 keyword that takes integers as well as any kind of object with a map method. Addresses both use-cases.
  2. Add n_jobs=1 (joblib-style/based) and pool=None. If pool is supplied by user, n_jobs is ignored.
  3. Add only n_jobs=1. This covers most of the use-cases and is consistent with scikit-learn, statsmodels, etc.

I think I prefer (1), followed by (2) and then (3). Two keywords really is one too many for an API that could be used throughout Scipy.

@pavelponomarev
Copy link
Author

I also like (1)!
Then I, or someone else, should add a commit with replacement of pool keyword to #5141 after all required tests are fixed there. I will return to this after a few days, when we gather more comments.

@sturlamolden
Copy link
Contributor

We used (3) in cKDTree in SciPy 0.16. But I don't really care what it's called :-)

@pavelponomarev
Copy link
Author

pavelponomarev@a6ce5be

@deeplearning-ai-research

Just Wondering if this paralell version will be included in the next version of scipy ?
Thanks

@andyfaff
Copy link
Contributor

This PR is currently stalled and won't be in SciPy 0.18.

@rocapp
Copy link

rocapp commented Mar 14, 2017

So I don't see it listed for 0.19 either. Is this likely to be ready for the next release? I'm sure ya'll are swamped; this would just be really very useful for my research.

@ktamiola
Copy link

ktamiola commented Apr 2, 2018

I couldn't agree more with @rocapp. A parallel version of DE would be truly welcomed!

@astrophysaxist
Copy link

Just joining the chorus of @rocapp and @ktamiola in support of a parallel version of DE.

@hnazkani
Copy link

hnazkani commented Feb 3, 2019

@rgommers : would it make sense to also provide the same parallelization option for 'scipy.optimize.brute' as well. Since scipy.optimize.brute is guaranteed to work for any kind of optimization problem (agreed that it is inefficient), it would be really helpful to have the same switch for this method as well.

@rgommers
Copy link
Member

rgommers commented Feb 3, 2019

@hnazkani please don't ask the same question in two places:) I'll answer your other one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.