Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What does it take to parallelize the search? #26

Closed
ClimbsRocks opened this issue Jun 16, 2017 · 5 comments
Closed

What does it take to parallelize the search? #26

ClimbsRocks opened this issue Jun 16, 2017 · 5 comments

Comments

@ClimbsRocks
Copy link
Contributor

Great tool! Allows me to drastically expand the search space over using GridSearchCV. Really promising for deep learning, as well as standard scikit-learn interfaced ML models.

Because I'm searching over a large space, this obviously involves training a bunch of models, and doing a lot of computations. scikit-learn's model training parallelizes this to ease the pain somewhat.

I tried using the toolbox.register('map', pool.map) approach as described out by deap, but didn't see any parallelization.

Is there a different approach I should take instead? Or is that a feature that hasn't been built yet? If so, what are the steps needed to get parallelization working?

@rsteca
Copy link
Owner

rsteca commented Jun 16, 2017

In the EvolutionaryAlgorithmSearchCV class you can use the n_jobs parameter to tell how many processes you want to use for parallel computation

@ClimbsRocks
Copy link
Contributor Author

Thanks for the quick response! I'd been feeding in n_jobs=-1 having just copied over the code from scikit-learn's GridSearchCV. using n_jobs=8 does indeed try to make it run in parallel! now just dealing with pickling of functions issues.

Have you looked at using pathos for multiprocessing instead of the native python multiprocessing? https://github.com/uqfoundation/pathos

If not, I'll look into making that swap myself, to see if it can be fixed in sklearn-deap, or if it has to be done in deap.

@ClimbsRocks
Copy link
Contributor Author

ClimbsRocks commented Jun 16, 2017

Oh cool, that was a much easier fix than I'd feared.

For anyone else running into a similar issue, I simply added this to the top of the file where I fun EvolutionaryAlgorithmSearchCV, and it worked:

import copy_reg
import types

def _pickle_method(m):
    if m.im_self is None:
        return getattr, (m.im_class, m.im_func.func_name)
    else:
        return getattr, (m.im_self, m.im_func.func_name)

copy_reg.pickle(types.MethodType, _pickle_method)

@ClimbsRocks
Copy link
Contributor Author

With that out of the way, there's a very good chance this will be baked into auto_ml soon! I've been needing a better way to optimize the hyperparameter search, and this seems like the best yet.

@rsteca
Copy link
Owner

rsteca commented Jun 16, 2017

Cool! I think I should fix the n_jobs=-1 problem for it work the same way as in scikit-learn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants