Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thoughts on Parallelism #47

Closed
koaning opened this issue Jan 2, 2018 · 11 comments
Closed

Thoughts on Parallelism #47

koaning opened this issue Jan 2, 2018 · 11 comments

Comments

@koaning
Copy link
Contributor

koaning commented Jan 2, 2018

We already apply some performance tricks with the .evaluate() mechanic but we may be able to add some form of parallelism/queing to perhaps make things even more performant.

In terms of easy win: it seems like the .map (and thus mutate) can be run in parallel in general. Same would hold for .evaluate() in the BasePopulation.

Do we want to explore this?

@rogiervandergeer
Copy link
Collaborator

I think we should. If there is a way to easily make evol more than twice as performant, then we have to implement that.

But we should be careful. It is not difficult to come up with mutate functions which will break any parallelism (e.g. any lambda or a function which tracks state). Therefore we have to make the parallelism optional.

I think the biggest win can be achieved when working with islands. Then you can make basically everything parallel. So although it certainly doesn't hurt to work on this before we've implemented the islands, we need to make sure the two can work together.

@jasondemorrow
Copy link
Contributor

FWIW, as a user of this library (great work BTW, thank you), I'd be fine with a set of "lower level" concurrency modules, provided with the caveat that mutate and breed should be implemented with care.

@koaning
Copy link
Contributor Author

koaning commented Dec 27, 2018 via email

@jasondemorrow
Copy link
Contributor

jasondemorrow commented Dec 28, 2018 via email

@koaning
Copy link
Contributor Author

koaning commented Dec 28, 2018

Note that the fitness function is something that we currently evaluate lazily. Suppose that we do two mutate steps and then a survive step: we only need to evaluate an individual at the survive step, not at either mutation steps. The evaluation can be expensive, which is why the main tactic we deploy is to delay it.

Assuming that the functions that you supply to mutate aren't lambda functions it shouldn't be too difficult to use python's multiprocessing module to ensure that certain steps are able to run in parallel. This would initially be implemented in a ParallelPopulation. Would this work for your use-case? I think certain steps can be done in parallel (anything that is like a map) but other steps cannot easily work that way (anything that is like a reduce).

Note that a ParallelPopulation on a single machine is something we could start implementing on the short term, but a multi-machine approach would take a but more experience/investigation. I also think we'll limit the ParallelPopulation to CPU for the short term.

@jasondemorrow
Copy link
Contributor

Thanks for the link. Coming from mostly a C++/Java background, I was interested to learn that Python implements threading much differently than I'd expect. But yes, the solution you suggest sounds perfect for my use case. I'll be glad to help in whatever way I can.

@jasondemorrow
Copy link
Contributor

I've just submitted a PR with a very simple, arg-driven impl. using multiproc (the pathos port that uses dill in place of pickle). At one point I updated the population unit test to compare execution times. On my machine, evaluating a population with 3 concurrent workers was 3 times faster, as expected.

@koaning
Copy link
Contributor Author

koaning commented Jan 1, 2019

Interesting. I'll have a look, I've never had any experience with pathos. Is there a good reason to favour it over multiprocessing? At the moment our only dependency is pytest for testing and if possible we'd love to keep this package as light as possible.

@rogiervandergeer opinions?

@jasondemorrow
Copy link
Contributor

The main reason is that, unlike pickle, dill is capable of serializing instance methods and lambdas so they can be piped to the new process. It's possible to drop that dependency, but (if I understand correctly) it would mean detaching all functions needed from their instances and making them module-scoped.

@koaning
Copy link
Contributor Author

koaning commented Jan 1, 2019

There's great value in being able to support lambdas.

@koaning
Copy link
Contributor Author

koaning commented Jan 14, 2019

@rogiervandergeer close this?

@koaning koaning closed this as completed Jan 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants