What does it take to parallelize the search? #26

ClimbsRocks · 2017-06-16T17:48:06Z

Great tool! Allows me to drastically expand the search space over using GridSearchCV. Really promising for deep learning, as well as standard scikit-learn interfaced ML models.

Because I'm searching over a large space, this obviously involves training a bunch of models, and doing a lot of computations. scikit-learn's model training parallelizes this to ease the pain somewhat.

I tried using the toolbox.register('map', pool.map) approach as described out by deap, but didn't see any parallelization.

Is there a different approach I should take instead? Or is that a feature that hasn't been built yet? If so, what are the steps needed to get parallelization working?

The text was updated successfully, but these errors were encountered:

rsteca · 2017-06-16T17:55:16Z

In the EvolutionaryAlgorithmSearchCV class you can use the n_jobs parameter to tell how many processes you want to use for parallel computation

ClimbsRocks · 2017-06-16T19:01:01Z

Thanks for the quick response! I'd been feeding in n_jobs=-1 having just copied over the code from scikit-learn's GridSearchCV. using n_jobs=8 does indeed try to make it run in parallel! now just dealing with pickling of functions issues.

Have you looked at using pathos for multiprocessing instead of the native python multiprocessing? https://github.com/uqfoundation/pathos

If not, I'll look into making that swap myself, to see if it can be fixed in sklearn-deap, or if it has to be done in deap.

ClimbsRocks · 2017-06-16T19:12:05Z

Oh cool, that was a much easier fix than I'd feared.

For anyone else running into a similar issue, I simply added this to the top of the file where I fun EvolutionaryAlgorithmSearchCV, and it worked:

import copy_reg
import types

def _pickle_method(m):
    if m.im_self is None:
        return getattr, (m.im_class, m.im_func.func_name)
    else:
        return getattr, (m.im_self, m.im_func.func_name)

copy_reg.pickle(types.MethodType, _pickle_method)

ClimbsRocks · 2017-06-16T19:13:26Z

With that out of the way, there's a very good chance this will be baked into auto_ml soon! I've been needing a better way to optimize the hyperparameter search, and this seems like the best yet.

rsteca · 2017-06-16T19:18:43Z

Cool! I think I should fix the n_jobs=-1 problem for it work the same way as in scikit-learn

ClimbsRocks closed this as completed Jun 16, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What does it take to parallelize the search? #26

What does it take to parallelize the search? #26

ClimbsRocks commented Jun 16, 2017

rsteca commented Jun 16, 2017

ClimbsRocks commented Jun 16, 2017

ClimbsRocks commented Jun 16, 2017 •

edited

ClimbsRocks commented Jun 16, 2017

rsteca commented Jun 16, 2017

What does it take to parallelize the search? #26

What does it take to parallelize the search? #26

Comments

ClimbsRocks commented Jun 16, 2017

rsteca commented Jun 16, 2017

ClimbsRocks commented Jun 16, 2017

ClimbsRocks commented Jun 16, 2017 • edited

ClimbsRocks commented Jun 16, 2017

rsteca commented Jun 16, 2017

ClimbsRocks commented Jun 16, 2017 •

edited