Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

add binary-tree engine interconnect example #1295

Merged
merged 6 commits into from Apr 27, 2012

Conversation

Projects
None yet
3 participants
Owner

minrk commented Jan 20, 2012

implements parallel [all]reduce

As discussed on-list, this is a useful example showing a different interconnect topology

@fperez fperez commented on the diff Jan 20, 2012

docs/examples/parallel/interengine/bintree_script.py
@@ -0,0 +1,71 @@
@fperez

fperez Jan 20, 2012

Owner

We could make this one executable and give it the usual shebang #!/usr/bin/env python, which will make it even more obvious (on *nix) to people which one they're meant to run.

@fperez fperez commented on the diff Jan 20, 2012

docs/examples/parallel/interengine/bintree_script.py
+
+from IPython.parallel import Client, Reference
+
+
+# connect client and create views
+rc = Client()
+rc.block=True
+ids = rc.ids
+
+root_id = ids[0]
+root = rc[root_id]
+
+view = rc[:]
+
+# run bintree.py script defining bintree functions, etc.
+execfile('bintree.py')
@fperez

fperez Jan 20, 2012

Owner

Unfortunately (I think it was a really bad idea) execfile was removed in python 3.x. To avoid having to run 2to3 on the examples, I think we should refrain from using it; the easier the examples are to run straight up, the more value they have for users.

@minrk

minrk Jan 20, 2012

Owner

I also use regular print statements, which won't run until they get a 2to3 pass. Code that works in both 2.6 and 3.x is a good deal uglier than both 2.6-specific code and the 2to3 output of the same, so I would actually prefer deferring to 2to3.

I expect a large majority of our examples require a 2to3 pass right now, as well.

@fperez

fperez Jan 20, 2012

Owner

No prob.

Owner

fperez commented Jan 20, 2012

Thanks for the example! It's great as-is, but I'm wondering if we shouldn't add a section to the docs pointing to the examples. If nothing else, it will make it more likely that people will find them... Thoughts?

Owner

minrk commented Jan 20, 2012

Sure, I can make a prominent note and link somewhere in the parallel docs.

Owner

minrk commented Jan 20, 2012

I really want to rewrite the parallel docs as a whole, but I don't know if I'm ever going to have the time to do that. The pages right now are much too long, and not particularly helpful or application-oriented.

Owner

fperez commented Jan 20, 2012

Oh, I wasn't thinking of a wholesale rewrite; just of mentioning it, and perhaps of making a few pages that at least put the examples (even if just in a big code block) in the html docs, so people find them more easily when they read the docs.

But we can leave that for later, at some point we're going to have to do a bit of focused work on our docs. So do you want to add a note about this in the docs before merge?

Owner

minrk commented Jan 20, 2012

Yes, I didn't mean to imply that rewriting the docs was a prerequisite to adding the link, just making a note of my general considerations on the docs. I will add a note to the top of the intro page in this PR shortly.

I've been thinking a fair amount about the parallel docs, and I think a from-scratch tutorial-style rewrite is necessary, leaving a detailed in-depth exploration of how things work separate. Right now, these two goals of our documentation are much too entangled, detracting from the success of communicating at either level. It's also true that the organization of the docs is principally unchanged from when it was originally written for the IPython.kernel code, which is becoming less and less appropriate as time goes on.

Owner

minrk commented Jan 20, 2012

note/link added.

Owner

fperez commented Jan 20, 2012

Great, let's merge this puppy then. BTW, I totally agree with your assessment of the status of the parallel docs, and I think similar issues also apply to the interactive part. I'd like to organize the whole thing a bit more like Justin has done with the StarCluster ones, which is very pleasant to navigate.

Oh well, time...

@fperez fperez and 1 other commented on an outdated diff Jan 20, 2012

docs/examples/parallel/interengine/bintree_script.py
+root = rc[root_id]
+
+view = rc[:]
+
+# run bintree.py script defining bintree functions, etc.
+execfile('bintree.py')
+
+# generate binary tree of parents
+btree = bintree(ids)
+
+view.run('bintree.py')
+view.scatter('id', ids, flatten=True)
+# set root to False everywhere
+view['root_id'] = root_id
+# set it to True on the root node
+root['root'] = False
@fperez

fperez Jan 20, 2012

Owner

Mmh, did you mean 'True' here instead?

@minrk

minrk Jan 20, 2012

Owner

That would be wrong if it was actually used for anything, but it isn't. Removed.

@fperez fperez commented on the diff Jan 20, 2012

docs/examples/parallel/interengine/bintree_script.py
+"""
+Script for setting up and using [all]reduce with a binary-tree engine interconnect.
+
+usage: `python bintree_script.py`
+
+"""
+
+from IPython.parallel import Client, Reference
+
+
+# connect client and create views
+rc = Client()
+rc.block=True
+ids = rc.ids
+
+root_id = ids[0]
@fperez

fperez Jan 20, 2012

Owner

Do we still need these two variables, root_id and root? They don't seem to be used anywhere below... Might as well remove them completely if not needed, otherwise users will think they are relevant to the example.

@minrk

minrk Jan 20, 2012

Owner

+"""
+Script for setting up and using [all]reduce with a binary-tree engine interconnect.
+
+usage: python bintree_script.py
+
+"""
+
+from IPython.parallel import Client, Reference
+
+
+# connect client and create views
+rc = Client()
+rc.block=True
+ids = rc.ids
+
+root_id = ids[0]

Do we still need these two variables, root_id and root? They don't seem to be used anywhere below... Might as well remove them completely if not needed, otherwise users will think they are relevant to the example.

I think it might still be useful, since the tree does indeed have a root node, and getting the final result from a reduce (as opposed to all reduce) requires direct communication with the root node. I did use them in the previous incarnations of the script, but those statements were cleaned up since then, I guess.

What I will do instead of removing them is add to the reduction an explicit fetch from the root node.


Reply to this email directly or view it on GitHub:
https://github.com/ipython/ipython/pull/1295/files#r370335

@fperez

fperez Jan 20, 2012

Owner

OK, sounds good. Let me know when you think it's good to go, and I'll give it a final round of testing. I already ran it here and everything seems to work correctly; great job.

@ogrisel ogrisel and 1 other commented on an outdated diff Jan 20, 2012

docs/examples/parallel/interengine/bintree.py
+import cPickle as pickle
+import re
+import socket
+import uuid
+
+import zmq
+
+from IPython.parallel.util import disambiguate_url
+
+
+#----------------------------------------------------------------------------
+# bintree-related construction/printing helpers
+#----------------------------------------------------------------------------
+
+def bintree(ids, parent=None):
+ """construct {child:parent} dict representation of a binary tree"""
@ogrisel

ogrisel Jan 20, 2012

Contributor

I think this kind of function would greatly benefit from an inline doctest in the docstring to understand what it actually does in practice on a small example. Same remark applies for reverse_bintree.

@minrk

minrk Jan 20, 2012

Owner

doctests added

@ogrisel

ogrisel Jan 20, 2012

Contributor

thanks.

Contributor

ogrisel commented Jan 20, 2012

Thanks very much for this example. I will do some experiments it this WE but don't wait for my feedback to merge this. One remark though: don't you think this example primitives (the functions to set up the tree and performing the reduce and allreduce calls themselves) should be part of the IPython.parallel lib itself rather than an external example that the user will have to copy and paste in their own code base?

@ogrisel ogrisel commented on the diff Jan 20, 2012

docs/examples/parallel/interengine/bintree.py
+ def serialize(self, obj):
+ """serialize objects.
+
+ Must return list of sendable buffers.
+
+ Can be extended for more efficient/noncopying serialization of numpy arrays, etc.
+ """
+ return [pickle.dumps(obj)]
+
+ def unserialize(self, msg):
+ """inverse of serialize"""
+ return pickle.loads(msg[0])
+
+ def publish(self, value):
+ assert self.root
+ self.pub.send_multipart(self.serialize(value))
@ogrisel

ogrisel Jan 20, 2012

Contributor

If I am not mistaken, this is doing a broadcast to all nodes at once rather that following down the tree links. Is this more efficient that recursively sending the results using the parent to children links recursively?

@minrk

minrk Jan 20, 2012

Owner

The answer to that question is probably dependent on various facts about your cluster and data, but this is the simplest and thus most appropriate to the example.

Owner

minrk commented Jan 20, 2012

The principal reason these examples we have are not part of IPython.parallel is that they were both tossed together in an afternoon, to show that perfectly arbitrary engine interconnects are quite simple to implement as appropriate to your particular problem. A bundled version would require more thought, and would have to take better care to make sure it gets addresses right in a variety of situations, and handle data serialization efficiently, etc.

It's also much harder to write a generic zeromq interconnect to be used as a library, because socket types really determine what is available to do. The 'right way' to do this with zeromq is to create the network topology that is appropriate for you, which is not something that is easy to define in a library.

For instance, the bintree here uses push/pull connections to establish the tree, and a single PUB on root with SUB on everything else, whereas the previous all:all interconnect uses ROUTER sockets for direct addressing and PUB/SUB everywhere so that all engines can initiate a broadcast. Another example is the wave2d one, where nearest-neighbor connections are made.

A great deal more thought would have to go into building base classes that are actually useful as something more than a starting point for copy/paste and adapt, which is exactly what these are meant to be. The PyCon sprints will be a good time to visit this.

Owner

minrk commented Jan 20, 2012

comments should be addressed, ready for another test and merge if you want. I'm perfectly happy letting this one sit for a while, as it is just an example and not at risk of any conflict with progress.

I think some interconnect primitives will be useful to have in IPython.parallel, but we do need to figure out the right level of generality that would actually be more useful than copy/paste examples, which is far from clear to me at this point.

Contributor

ogrisel commented Jan 20, 2012

Thanks very much for the feedback. I agree that making it generic at the right level is a hard problem that needs practical experimentations with realistic use cases and the sprint will be a great opportunity to collect some preliminary experience with this. Before merging I think it would be good to give some motivation for the spanning tree construct in the top level docstring (i.e. making commutative reduce operations scalable to hundreds or thousands of nodes by limiting network contention: the binary tree construct ensures that no single node will receive more than 2 payloads on it's incoming sockets at once).

Owner

fperez commented Jan 20, 2012

@ogrisel, thanks for joining in! I agree with you that a motivating discussion in the entry point docstring would be great to have. @minrk, you could even grab some text from the thread we had with Olivier on the mailing list where this came up. As you point out, this one is low risk for conflicts, so we can let it mature here until it's ready.

Owner

fperez commented Apr 14, 2012

@minrk, what's your take on this? Do you want to finish it up, or did anything come up during your work with @ogrisel at the sprint to change your mind on it?

Contributor

ogrisel commented Apr 15, 2012

We did not use this during the sprint but directly went for the MPI AllReduce implementation. However I still think this is interesting since it makes it possible to implement AllReduce on a pure IPython cluster (without MPI) and makes it possible to dynamically change the size of the cluster which MPI does not allow (as far as I know).

Owner

fperez commented Apr 15, 2012

Those are sensible points, and in light of some recent successes we've had with the notebook and parallel machinery on EC2, all the more reason to show users how they can do these kinds of things in non-MPI environments. Thanks a lot, Olivier for the feedback!

Contributor

ogrisel commented Apr 15, 2012

That's offtopic w.r.t. the current pull-request but I started experimenting with libcloud as a minimalistic alternative to starcluster to deploy cloud-based IPython clusters not restricted to AMZ EC2. (I haven't started the IPython config yet). It's there:

https://github.com/pydata/pyrallel/blob/master/pyrallel/cloud.py
https://github.com/pydata/pyrallel/blob/master/examples/cloudcluster.py

It's work in progress (does nothing useful so far) but I thought you might be interested in it anyway. I won't have much to work on it in the coming weeks so don't expect that it will turn into a useful product before a couple of weeks / months.

Owner

fperez commented Apr 15, 2012

Any particular reason not to use starcluster? So far I'm really happy with it, and it seems to offer a bunch of things that are actually super useful in practice... Just curious.

Contributor

ogrisel commented Apr 15, 2012

Only for the ability to run on non-EC2 infrastructure. It's an ontological statement: one of the main reasons I use and code open source stuff is to avoid vendor lock-in. Sounds more sustainable to me.

Owner

fperez commented Apr 16, 2012

Certainly! Though StarCluster itself is fully open source, and I think it would be very cool for it to grow support for other cloud backends. Right now it's fairly tied to EC2, but I don't think there's anything fundamental (other than the manpower to do it) preventing it from abstracting out the backend.

Contributor

ogrisel commented Apr 16, 2012

I might at some point but StartCluster does so much more than just starting nodes with ipython engines which is my only usecase right now. Also I don't want to have to maintain a cloud provider specific image. Rather have the dependencies setup scripted and put under version control in git.

Owner

fperez commented Apr 16, 2012

And we'd certainly love to have a lightweight alternative for the ipython case, obviously :) So big +1 to your libcloud work, even if it takes a bit of time to mature...

@minrk minrk add bintree paragraphs by @ogrisel
from discussion on ipython-user
8cd7132
Owner

minrk commented Apr 27, 2012

I finally added the paragraphs of discussion from the list, so I think this can be merged.

Owner

fperez commented Apr 27, 2012

Looks good, merging now. Thanks everyone for the work and patience.

@fperez fperez added a commit that referenced this pull request Apr 27, 2012

@fperez fperez Merge pull request #1295 from minrk/btree
Add binary-tree engine interconnect example.

This implements a parallel [all]reduce as used in traditional MapReduce scenarios; this is a useful example showing how the IPython.parallel tools can be configured with a different interconnect topology in addition to the default view of N engines connected to 1 controller in a simple star topology.
eada829

@fperez fperez merged commit eada829 into ipython:master Apr 27, 2012

Contributor

ogrisel commented Apr 28, 2012

Thanks!

@minrk minrk deleted the minrk:btree branch Mar 31, 2014

@mattvonrocketstein mattvonrocketstein pushed a commit to mattvonrocketstein/ipython that referenced this pull request Nov 3, 2014

@fperez fperez Merge pull request #1295 from minrk/btree
Add binary-tree engine interconnect example.

This implements a parallel [all]reduce as used in traditional MapReduce scenarios; this is a useful example showing how the IPython.parallel tools can be configured with a different interconnect topology in addition to the default view of N engines connected to 1 controller in a simple star topology.
8bc4bb5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment