New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add binary-tree engine interconnect example #1295
Conversation
@@ -0,0 +1,71 @@ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could make this one executable and give it the usual shebang #!/usr/bin/env python
, which will make it even more obvious (on *nix) to people which one they're meant to run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Thanks for the example! It's great as-is, but I'm wondering if we shouldn't add a section to the docs pointing to the examples. If nothing else, it will make it more likely that people will find them... Thoughts? |
Sure, I can make a prominent note and link somewhere in the parallel docs. |
I really want to rewrite the parallel docs as a whole, but I don't know if I'm ever going to have the time to do that. The pages right now are much too long, and not particularly helpful or application-oriented. |
Oh, I wasn't thinking of a wholesale rewrite; just of mentioning it, and perhaps of making a few pages that at least put the examples (even if just in a big code block) in the html docs, so people find them more easily when they read the docs. But we can leave that for later, at some point we're going to have to do a bit of focused work on our docs. So do you want to add a note about this in the docs before merge? |
Yes, I didn't mean to imply that rewriting the docs was a prerequisite to adding the link, just making a note of my general considerations on the docs. I will add a note to the top of the intro page in this PR shortly. I've been thinking a fair amount about the parallel docs, and I think a from-scratch tutorial-style rewrite is necessary, leaving a detailed in-depth exploration of how things work separate. Right now, these two goals of our documentation are much too entangled, detracting from the success of communicating at either level. It's also true that the organization of the docs is principally unchanged from when it was originally written for the IPython.kernel code, which is becoming less and less appropriate as time goes on. |
note/link added. |
Great, let's merge this puppy then. BTW, I totally agree with your assessment of the status of the parallel docs, and I think similar issues also apply to the interactive part. I'd like to organize the whole thing a bit more like Justin has done with the StarCluster ones, which is very pleasant to navigate. Oh well, time... |
# set root to False everywhere | ||
view['root_id'] = root_id | ||
# set it to True on the root node | ||
root['root'] = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmh, did you mean 'True' here instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be wrong if it was actually used for anything, but it isn't. Removed.
Thanks very much for this example. I will do some experiments it this WE but don't wait for my feedback to merge this. One remark though: don't you think this example primitives (the functions to set up the tree and performing the |
|
||
def publish(self, value): | ||
assert self.root | ||
self.pub.send_multipart(self.serialize(value)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I am not mistaken, this is doing a broadcast to all nodes at once rather that following down the tree links. Is this more efficient that recursively sending the results using the parent to children links recursively?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The answer to that question is probably dependent on various facts about your cluster and data, but this is the simplest and thus most appropriate to the example.
The principal reason these examples we have are not part of IPython.parallel is that they were both tossed together in an afternoon, to show that perfectly arbitrary engine interconnects are quite simple to implement as appropriate to your particular problem. A bundled version would require more thought, and would have to take better care to make sure it gets addresses right in a variety of situations, and handle data serialization efficiently, etc. It's also much harder to write a generic zeromq interconnect to be used as a library, because socket types really determine what is available to do. The 'right way' to do this with zeromq is to create the network topology that is appropriate for you, which is not something that is easy to define in a library. For instance, the bintree here uses push/pull connections to establish the tree, and a single PUB on root with SUB on everything else, whereas the previous all:all interconnect uses ROUTER sockets for direct addressing and PUB/SUB everywhere so that all engines can initiate a broadcast. Another example is the wave2d one, where nearest-neighbor connections are made. A great deal more thought would have to go into building base classes that are actually useful as something more than a starting point for copy/paste and adapt, which is exactly what these are meant to be. The PyCon sprints will be a good time to visit this. |
comments should be addressed, ready for another test and merge if you want. I'm perfectly happy letting this one sit for a while, as it is just an example and not at risk of any conflict with progress. I think some interconnect primitives will be useful to have in IPython.parallel, but we do need to figure out the right level of generality that would actually be more useful than copy/paste examples, which is far from clear to me at this point. |
Thanks very much for the feedback. I agree that making it generic at the right level is a hard problem that needs practical experimentations with realistic use cases and the sprint will be a great opportunity to collect some preliminary experience with this. Before merging I think it would be good to give some motivation for the spanning tree construct in the top level docstring (i.e. making commutative reduce operations scalable to hundreds or thousands of nodes by limiting network contention: the binary tree construct ensures that no single node will receive more than 2 payloads on it's incoming sockets at once). |
@ogrisel, thanks for joining in! I agree with you that a motivating discussion in the entry point docstring would be great to have. @minrk, you could even grab some text from the thread we had with Olivier on the mailing list where this came up. As you point out, this one is low risk for conflicts, so we can let it mature here until it's ready. |
We did not use this during the sprint but directly went for the MPI AllReduce implementation. However I still think this is interesting since it makes it possible to implement AllReduce on a pure IPython cluster (without MPI) and makes it possible to dynamically change the size of the cluster which MPI does not allow (as far as I know). |
Those are sensible points, and in light of some recent successes we've had with the notebook and parallel machinery on EC2, all the more reason to show users how they can do these kinds of things in non-MPI environments. Thanks a lot, Olivier for the feedback! |
That's offtopic w.r.t. the current pull-request but I started experimenting with libcloud as a minimalistic alternative to starcluster to deploy cloud-based IPython clusters not restricted to AMZ EC2. (I haven't started the IPython config yet). It's there: https://github.com/pydata/pyrallel/blob/master/pyrallel/cloud.py It's work in progress (does nothing useful so far) but I thought you might be interested in it anyway. I won't have much to work on it in the coming weeks so don't expect that it will turn into a useful product before a couple of weeks / months. |
Any particular reason not to use starcluster? So far I'm really happy with it, and it seems to offer a bunch of things that are actually super useful in practice... Just curious. |
Only for the ability to run on non-EC2 infrastructure. It's an ontological statement: one of the main reasons I use and code open source stuff is to avoid vendor lock-in. Sounds more sustainable to me. |
Certainly! Though StarCluster itself is fully open source, and I think it would be very cool for it to grow support for other cloud backends. Right now it's fairly tied to EC2, but I don't think there's anything fundamental (other than the manpower to do it) preventing it from abstracting out the backend. |
I might at some point but StartCluster does so much more than just starting nodes with ipython engines which is my only usecase right now. Also I don't want to have to maintain a cloud provider specific image. Rather have the dependencies setup scripted and put under version control in git. |
And we'd certainly love to have a lightweight alternative for the ipython case, obviously :) So big +1 to your libcloud work, even if it takes a bit of time to mature... |
from discussion on ipython-user
I finally added the paragraphs of discussion from the list, so I think this can be merged. |
Looks good, merging now. Thanks everyone for the work and patience. |
Add binary-tree engine interconnect example. This implements a parallel [all]reduce as used in traditional MapReduce scenarios; this is a useful example showing how the IPython.parallel tools can be configured with a different interconnect topology in addition to the default view of N engines connected to 1 controller in a simple star topology.
Thanks! |
Add binary-tree engine interconnect example. This implements a parallel [all]reduce as used in traditional MapReduce scenarios; this is a useful example showing how the IPython.parallel tools can be configured with a different interconnect topology in addition to the default view of N engines connected to 1 controller in a simple star topology.
implements parallel [all]reduce
As discussed on-list, this is a useful example showing a different interconnect topology