Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add binary-tree engine interconnect example #1295

Merged
merged 6 commits into from Apr 27, 2012
Merged

Conversation

minrk
Copy link
Member

@minrk minrk commented Jan 20, 2012

implements parallel [all]reduce

As discussed on-list, this is a useful example showing a different interconnect topology

@@ -0,0 +1,71 @@
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could make this one executable and give it the usual shebang #!/usr/bin/env python, which will make it even more obvious (on *nix) to people which one they're meant to run.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@fperez
Copy link
Member

fperez commented Jan 20, 2012

Thanks for the example! It's great as-is, but I'm wondering if we shouldn't add a section to the docs pointing to the examples. If nothing else, it will make it more likely that people will find them... Thoughts?

@minrk
Copy link
Member Author

minrk commented Jan 20, 2012

Sure, I can make a prominent note and link somewhere in the parallel docs.

@minrk
Copy link
Member Author

minrk commented Jan 20, 2012

I really want to rewrite the parallel docs as a whole, but I don't know if I'm ever going to have the time to do that. The pages right now are much too long, and not particularly helpful or application-oriented.

@fperez
Copy link
Member

fperez commented Jan 20, 2012

Oh, I wasn't thinking of a wholesale rewrite; just of mentioning it, and perhaps of making a few pages that at least put the examples (even if just in a big code block) in the html docs, so people find them more easily when they read the docs.

But we can leave that for later, at some point we're going to have to do a bit of focused work on our docs. So do you want to add a note about this in the docs before merge?

@minrk
Copy link
Member Author

minrk commented Jan 20, 2012

Yes, I didn't mean to imply that rewriting the docs was a prerequisite to adding the link, just making a note of my general considerations on the docs. I will add a note to the top of the intro page in this PR shortly.

I've been thinking a fair amount about the parallel docs, and I think a from-scratch tutorial-style rewrite is necessary, leaving a detailed in-depth exploration of how things work separate. Right now, these two goals of our documentation are much too entangled, detracting from the success of communicating at either level. It's also true that the organization of the docs is principally unchanged from when it was originally written for the IPython.kernel code, which is becoming less and less appropriate as time goes on.

@minrk
Copy link
Member Author

minrk commented Jan 20, 2012

note/link added.

@fperez
Copy link
Member

fperez commented Jan 20, 2012

Great, let's merge this puppy then. BTW, I totally agree with your assessment of the status of the parallel docs, and I think similar issues also apply to the interactive part. I'd like to organize the whole thing a bit more like Justin has done with the StarCluster ones, which is very pleasant to navigate.

Oh well, time...

# set root to False everywhere
view['root_id'] = root_id
# set it to True on the root node
root['root'] = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmh, did you mean 'True' here instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be wrong if it was actually used for anything, but it isn't. Removed.

@ogrisel
Copy link
Contributor

ogrisel commented Jan 20, 2012

Thanks very much for this example. I will do some experiments it this WE but don't wait for my feedback to merge this. One remark though: don't you think this example primitives (the functions to set up the tree and performing the reduce and allreduce calls themselves) should be part of the IPython.parallel lib itself rather than an external example that the user will have to copy and paste in their own code base?


def publish(self, value):
assert self.root
self.pub.send_multipart(self.serialize(value))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am not mistaken, this is doing a broadcast to all nodes at once rather that following down the tree links. Is this more efficient that recursively sending the results using the parent to children links recursively?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The answer to that question is probably dependent on various facts about your cluster and data, but this is the simplest and thus most appropriate to the example.

@minrk
Copy link
Member Author

minrk commented Jan 20, 2012

The principal reason these examples we have are not part of IPython.parallel is that they were both tossed together in an afternoon, to show that perfectly arbitrary engine interconnects are quite simple to implement as appropriate to your particular problem. A bundled version would require more thought, and would have to take better care to make sure it gets addresses right in a variety of situations, and handle data serialization efficiently, etc.

It's also much harder to write a generic zeromq interconnect to be used as a library, because socket types really determine what is available to do. The 'right way' to do this with zeromq is to create the network topology that is appropriate for you, which is not something that is easy to define in a library.

For instance, the bintree here uses push/pull connections to establish the tree, and a single PUB on root with SUB on everything else, whereas the previous all:all interconnect uses ROUTER sockets for direct addressing and PUB/SUB everywhere so that all engines can initiate a broadcast. Another example is the wave2d one, where nearest-neighbor connections are made.

A great deal more thought would have to go into building base classes that are actually useful as something more than a starting point for copy/paste and adapt, which is exactly what these are meant to be. The PyCon sprints will be a good time to visit this.

@minrk
Copy link
Member Author

minrk commented Jan 20, 2012

comments should be addressed, ready for another test and merge if you want. I'm perfectly happy letting this one sit for a while, as it is just an example and not at risk of any conflict with progress.

I think some interconnect primitives will be useful to have in IPython.parallel, but we do need to figure out the right level of generality that would actually be more useful than copy/paste examples, which is far from clear to me at this point.

@ogrisel
Copy link
Contributor

ogrisel commented Jan 20, 2012

Thanks very much for the feedback. I agree that making it generic at the right level is a hard problem that needs practical experimentations with realistic use cases and the sprint will be a great opportunity to collect some preliminary experience with this. Before merging I think it would be good to give some motivation for the spanning tree construct in the top level docstring (i.e. making commutative reduce operations scalable to hundreds or thousands of nodes by limiting network contention: the binary tree construct ensures that no single node will receive more than 2 payloads on it's incoming sockets at once).

@fperez
Copy link
Member

fperez commented Jan 20, 2012

@ogrisel, thanks for joining in! I agree with you that a motivating discussion in the entry point docstring would be great to have. @minrk, you could even grab some text from the thread we had with Olivier on the mailing list where this came up. As you point out, this one is low risk for conflicts, so we can let it mature here until it's ready.

@fperez
Copy link
Member

fperez commented Apr 14, 2012

@minrk, what's your take on this? Do you want to finish it up, or did anything come up during your work with @ogrisel at the sprint to change your mind on it?

@ogrisel
Copy link
Contributor

ogrisel commented Apr 15, 2012

We did not use this during the sprint but directly went for the MPI AllReduce implementation. However I still think this is interesting since it makes it possible to implement AllReduce on a pure IPython cluster (without MPI) and makes it possible to dynamically change the size of the cluster which MPI does not allow (as far as I know).

@fperez
Copy link
Member

fperez commented Apr 15, 2012

Those are sensible points, and in light of some recent successes we've had with the notebook and parallel machinery on EC2, all the more reason to show users how they can do these kinds of things in non-MPI environments. Thanks a lot, Olivier for the feedback!

@ogrisel
Copy link
Contributor

ogrisel commented Apr 15, 2012

That's offtopic w.r.t. the current pull-request but I started experimenting with libcloud as a minimalistic alternative to starcluster to deploy cloud-based IPython clusters not restricted to AMZ EC2. (I haven't started the IPython config yet). It's there:

https://github.com/pydata/pyrallel/blob/master/pyrallel/cloud.py
https://github.com/pydata/pyrallel/blob/master/examples/cloudcluster.py

It's work in progress (does nothing useful so far) but I thought you might be interested in it anyway. I won't have much to work on it in the coming weeks so don't expect that it will turn into a useful product before a couple of weeks / months.

@fperez
Copy link
Member

fperez commented Apr 15, 2012

Any particular reason not to use starcluster? So far I'm really happy with it, and it seems to offer a bunch of things that are actually super useful in practice... Just curious.

@ogrisel
Copy link
Contributor

ogrisel commented Apr 15, 2012

Only for the ability to run on non-EC2 infrastructure. It's an ontological statement: one of the main reasons I use and code open source stuff is to avoid vendor lock-in. Sounds more sustainable to me.

@fperez
Copy link
Member

fperez commented Apr 16, 2012

Certainly! Though StarCluster itself is fully open source, and I think it would be very cool for it to grow support for other cloud backends. Right now it's fairly tied to EC2, but I don't think there's anything fundamental (other than the manpower to do it) preventing it from abstracting out the backend.

@ogrisel
Copy link
Contributor

ogrisel commented Apr 16, 2012

I might at some point but StartCluster does so much more than just starting nodes with ipython engines which is my only usecase right now. Also I don't want to have to maintain a cloud provider specific image. Rather have the dependencies setup scripted and put under version control in git.

@fperez
Copy link
Member

fperez commented Apr 16, 2012

And we'd certainly love to have a lightweight alternative for the ipython case, obviously :) So big +1 to your libcloud work, even if it takes a bit of time to mature...

from discussion on ipython-user
@minrk
Copy link
Member Author

minrk commented Apr 27, 2012

I finally added the paragraphs of discussion from the list, so I think this can be merged.

@fperez
Copy link
Member

fperez commented Apr 27, 2012

Looks good, merging now. Thanks everyone for the work and patience.

fperez added a commit that referenced this pull request Apr 27, 2012
Add binary-tree engine interconnect example.

This implements a parallel [all]reduce as used in traditional MapReduce scenarios; this is a useful example showing how the IPython.parallel tools can be configured with a different interconnect topology in addition to the default view of N engines connected to 1 controller in a simple star topology.
@fperez fperez merged commit eada829 into ipython:master Apr 27, 2012
@ogrisel
Copy link
Contributor

ogrisel commented Apr 28, 2012

Thanks!

@minrk minrk deleted the btree branch March 31, 2014 23:36
mattvonrocketstein pushed a commit to mattvonrocketstein/ipython that referenced this pull request Nov 3, 2014
Add binary-tree engine interconnect example.

This implements a parallel [all]reduce as used in traditional MapReduce scenarios; this is a useful example showing how the IPython.parallel tools can be configured with a different interconnect topology in addition to the default view of N engines connected to 1 controller in a simple star topology.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants