add binary-tree engine interconnect example #1295

minrk · 2012-01-20T00:45:48Z

implements parallel [all]reduce

As discussed on-list, this is a useful example showing a different interconnect topology

fperez · 2012-01-20T02:37:44Z

docs/examples/parallel/interengine/bintree_script.py

We could make this one executable and give it the usual shebang #!/usr/bin/env python, which will make it even more obvious (on *nix) to people which one they're meant to run.

fperez · 2012-01-20T02:43:10Z

Thanks for the example! It's great as-is, but I'm wondering if we shouldn't add a section to the docs pointing to the examples. If nothing else, it will make it more likely that people will find them... Thoughts?

minrk · 2012-01-20T05:13:14Z

Sure, I can make a prominent note and link somewhere in the parallel docs.

minrk · 2012-01-20T05:16:39Z

I really want to rewrite the parallel docs as a whole, but I don't know if I'm ever going to have the time to do that. The pages right now are much too long, and not particularly helpful or application-oriented.

fperez · 2012-01-20T05:37:19Z

Oh, I wasn't thinking of a wholesale rewrite; just of mentioning it, and perhaps of making a few pages that at least put the examples (even if just in a big code block) in the html docs, so people find them more easily when they read the docs.

But we can leave that for later, at some point we're going to have to do a bit of focused work on our docs. So do you want to add a note about this in the docs before merge?

minrk · 2012-01-20T06:03:02Z

Yes, I didn't mean to imply that rewriting the docs was a prerequisite to adding the link, just making a note of my general considerations on the docs. I will add a note to the top of the intro page in this PR shortly.

I've been thinking a fair amount about the parallel docs, and I think a from-scratch tutorial-style rewrite is necessary, leaving a detailed in-depth exploration of how things work separate. Right now, these two goals of our documentation are much too entangled, detracting from the success of communicating at either level. It's also true that the organization of the docs is principally unchanged from when it was originally written for the IPython.kernel code, which is becoming less and less appropriate as time goes on.

minrk · 2012-01-20T06:07:40Z

note/link added.

fperez · 2012-01-20T06:33:53Z

Great, let's merge this puppy then. BTW, I totally agree with your assessment of the status of the parallel docs, and I think similar issues also apply to the interactive part. I'd like to organize the whole thing a bit more like Justin has done with the StarCluster ones, which is very pleasant to navigate.

Oh well, time...

fperez · 2012-01-20T06:36:22Z

docs/examples/parallel/interengine/bintree_script.py

+# set root to False everywhere
+view['root_id'] = root_id
+# set it to True on the root node
+root['root'] = False


Mmh, did you mean 'True' here instead?

That would be wrong if it was actually used for anything, but it isn't. Removed.

ogrisel · 2012-01-20T08:55:10Z

Thanks very much for this example. I will do some experiments it this WE but don't wait for my feedback to merge this. One remark though: don't you think this example primitives (the functions to set up the tree and performing the reduce and allreduce calls themselves) should be part of the IPython.parallel lib itself rather than an external example that the user will have to copy and paste in their own code base?

ogrisel · 2012-01-20T09:16:30Z

docs/examples/parallel/interengine/bintree.py

+
+    def publish(self, value):
+        assert self.root
+        self.pub.send_multipart(self.serialize(value))


If I am not mistaken, this is doing a broadcast to all nodes at once rather that following down the tree links. Is this more efficient that recursively sending the results using the parent to children links recursively?

The answer to that question is probably dependent on various facts about your cluster and data, but this is the simplest and thus most appropriate to the example.

minrk · 2012-01-20T09:36:51Z

The principal reason these examples we have are not part of IPython.parallel is that they were both tossed together in an afternoon, to show that perfectly arbitrary engine interconnects are quite simple to implement as appropriate to your particular problem. A bundled version would require more thought, and would have to take better care to make sure it gets addresses right in a variety of situations, and handle data serialization efficiently, etc.

It's also much harder to write a generic zeromq interconnect to be used as a library, because socket types really determine what is available to do. The 'right way' to do this with zeromq is to create the network topology that is appropriate for you, which is not something that is easy to define in a library.

For instance, the bintree here uses push/pull connections to establish the tree, and a single PUB on root with SUB on everything else, whereas the previous all:all interconnect uses ROUTER sockets for direct addressing and PUB/SUB everywhere so that all engines can initiate a broadcast. Another example is the wave2d one, where nearest-neighbor connections are made.

A great deal more thought would have to go into building base classes that are actually useful as something more than a starting point for copy/paste and adapt, which is exactly what these are meant to be. The PyCon sprints will be a good time to visit this.

implements parallel [all]reduce

minrk · 2012-01-20T10:04:19Z

comments should be addressed, ready for another test and merge if you want. I'm perfectly happy letting this one sit for a while, as it is just an example and not at risk of any conflict with progress.

I think some interconnect primitives will be useful to have in IPython.parallel, but we do need to figure out the right level of generality that would actually be more useful than copy/paste examples, which is far from clear to me at this point.

ogrisel · 2012-01-20T10:25:09Z

Thanks very much for the feedback. I agree that making it generic at the right level is a hard problem that needs practical experimentations with realistic use cases and the sprint will be a great opportunity to collect some preliminary experience with this. Before merging I think it would be good to give some motivation for the spanning tree construct in the top level docstring (i.e. making commutative reduce operations scalable to hundreds or thousands of nodes by limiting network contention: the binary tree construct ensures that no single node will receive more than 2 payloads on it's incoming sockets at once).

fperez · 2012-01-20T20:17:06Z

@ogrisel, thanks for joining in! I agree with you that a motivating discussion in the entry point docstring would be great to have. @minrk, you could even grab some text from the thread we had with Olivier on the mailing list where this came up. As you point out, this one is low risk for conflicts, so we can let it mature here until it's ready.

fperez · 2012-04-14T09:24:46Z

@minrk, what's your take on this? Do you want to finish it up, or did anything come up during your work with @ogrisel at the sprint to change your mind on it?

ogrisel · 2012-04-15T13:56:06Z

We did not use this during the sprint but directly went for the MPI AllReduce implementation. However I still think this is interesting since it makes it possible to implement AllReduce on a pure IPython cluster (without MPI) and makes it possible to dynamically change the size of the cluster which MPI does not allow (as far as I know).

fperez · 2012-04-15T21:33:33Z

Those are sensible points, and in light of some recent successes we've had with the notebook and parallel machinery on EC2, all the more reason to show users how they can do these kinds of things in non-MPI environments. Thanks a lot, Olivier for the feedback!

ogrisel · 2012-04-15T21:58:32Z

That's offtopic w.r.t. the current pull-request but I started experimenting with libcloud as a minimalistic alternative to starcluster to deploy cloud-based IPython clusters not restricted to AMZ EC2. (I haven't started the IPython config yet). It's there:

https://github.com/pydata/pyrallel/blob/master/pyrallel/cloud.py
https://github.com/pydata/pyrallel/blob/master/examples/cloudcluster.py

It's work in progress (does nothing useful so far) but I thought you might be interested in it anyway. I won't have much to work on it in the coming weeks so don't expect that it will turn into a useful product before a couple of weeks / months.

fperez · 2012-04-15T22:37:00Z

Any particular reason not to use starcluster? So far I'm really happy with it, and it seems to offer a bunch of things that are actually super useful in practice... Just curious.

ogrisel · 2012-04-15T22:56:08Z

Only for the ability to run on non-EC2 infrastructure. It's an ontological statement: one of the main reasons I use and code open source stuff is to avoid vendor lock-in. Sounds more sustainable to me.

fperez · 2012-04-16T00:39:31Z

Certainly! Though StarCluster itself is fully open source, and I think it would be very cool for it to grow support for other cloud backends. Right now it's fairly tied to EC2, but I don't think there's anything fundamental (other than the manpower to do it) preventing it from abstracting out the backend.

ogrisel · 2012-04-16T03:46:35Z

I might at some point but StartCluster does so much more than just starting nodes with ipython engines which is my only usecase right now. Also I don't want to have to maintain a cloud provider specific image. Rather have the dependencies setup scripted and put under version control in git.

fperez · 2012-04-16T03:49:23Z

And we'd certainly love to have a lightweight alternative for the ipython case, obviously :) So big +1 to your libcloud work, even if it takes a bit of time to mature...

from discussion on ipython-user

minrk · 2012-04-27T19:34:20Z

I finally added the paragraphs of discussion from the list, so I think this can be merged.

fperez · 2012-04-27T22:56:58Z

Looks good, merging now. Thanks everyone for the work and patience.

Add binary-tree engine interconnect example. This implements a parallel [all]reduce as used in traditional MapReduce scenarios; this is a useful example showing how the IPython.parallel tools can be configured with a different interconnect topology in addition to the default view of N engines connected to 1 controller in a simple star topology.

ogrisel · 2012-04-28T15:04:37Z

Thanks!

Add binary-tree engine interconnect example. This implements a parallel [all]reduce as used in traditional MapReduce scenarios; this is a useful example showing how the IPython.parallel tools can be configured with a different interconnect topology in addition to the default view of N engines connected to 1 controller in a simple star topology.

fperez reviewed Jan 20, 2012
View reviewed changes

ogrisel reviewed Jan 20, 2012
View reviewed changes

minrk added 4 commits January 20, 2012 01:54

add binary-tree engine interconnect example

f164fa4

implements parallel [all]reduce

make bintree_script example executable

bec8f80

add note about examples to head of parallel docs

d2afff8

doctest bintree functions

e89fc68

note

cce2cf4

add bintree paragraphs by @ogrisel

8cd7132

from discussion on ipython-user

fperez merged commit eada829 into ipython:master Apr 27, 2012

minrk deleted the btree branch March 31, 2014 23:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add binary-tree engine interconnect example #1295

add binary-tree engine interconnect example #1295

minrk commented Jan 20, 2012

fperez Jan 20, 2012

minrk Jan 20, 2012

fperez commented Jan 20, 2012

minrk commented Jan 20, 2012

minrk commented Jan 20, 2012

fperez commented Jan 20, 2012

minrk commented Jan 20, 2012

minrk commented Jan 20, 2012

fperez commented Jan 20, 2012

fperez Jan 20, 2012

minrk Jan 20, 2012

ogrisel commented Jan 20, 2012

ogrisel Jan 20, 2012

minrk Jan 20, 2012

minrk commented Jan 20, 2012

minrk commented Jan 20, 2012

ogrisel commented Jan 20, 2012

fperez commented Jan 20, 2012

fperez commented Apr 14, 2012

ogrisel commented Apr 15, 2012

fperez commented Apr 15, 2012

ogrisel commented Apr 15, 2012

fperez commented Apr 15, 2012

ogrisel commented Apr 15, 2012

fperez commented Apr 16, 2012

ogrisel commented Apr 16, 2012

fperez commented Apr 16, 2012

minrk commented Apr 27, 2012

fperez commented Apr 27, 2012

ogrisel commented Apr 28, 2012

add binary-tree engine interconnect example #1295

add binary-tree engine interconnect example #1295

Conversation

minrk commented Jan 20, 2012

fperez Jan 20, 2012

Choose a reason for hiding this comment

minrk Jan 20, 2012

Choose a reason for hiding this comment

fperez commented Jan 20, 2012

minrk commented Jan 20, 2012

minrk commented Jan 20, 2012

fperez commented Jan 20, 2012

minrk commented Jan 20, 2012

minrk commented Jan 20, 2012

fperez commented Jan 20, 2012

fperez Jan 20, 2012

Choose a reason for hiding this comment

minrk Jan 20, 2012

Choose a reason for hiding this comment

ogrisel commented Jan 20, 2012

ogrisel Jan 20, 2012

Choose a reason for hiding this comment

minrk Jan 20, 2012

Choose a reason for hiding this comment

minrk commented Jan 20, 2012

minrk commented Jan 20, 2012

ogrisel commented Jan 20, 2012

fperez commented Jan 20, 2012

fperez commented Apr 14, 2012

ogrisel commented Apr 15, 2012

fperez commented Apr 15, 2012

ogrisel commented Apr 15, 2012

fperez commented Apr 15, 2012

ogrisel commented Apr 15, 2012

fperez commented Apr 16, 2012

ogrisel commented Apr 16, 2012

fperez commented Apr 16, 2012

minrk commented Apr 27, 2012

fperez commented Apr 27, 2012

ogrisel commented Apr 28, 2012