Add cluster-id for multiple cluster instances per profile #795

Closed
wants to merge 5 commits into
from

Projects

None yet

2 participants

@minrk
Member
minrk commented Sep 15, 2011

Default behavior is unchanged, but specifying a '--cluster-id' arg will allow starting additional clusters within a single profile.

I also cleaned up the code in parallel.apps.launchers a fair amount, by adding a few Mixin classes to consolidate some of the extremely repetitive code.

This will need some testing in various environments, to catch possible typos from the changes, but it's a major improvement overall.

The consolidation also fixes a few inconsistencies, such as the MPI launchers using program/program_args, while the local launchers used controller_cmd/controller_args. Now the names are more sensible and consistent across launcher types.

When checking the generated config files, I also fixed a small aesthetic issue when printing the class list from which a Configurable will inherit configuration. It now excludes any base class that doesn't have anything to inherit.

Should close #794

@fperez fperez and 1 other commented on an outdated diff Sep 16, 2011
IPython/parallel/apps/baseapp.py
@@ -116,6 +117,18 @@ class BaseParallelApplication(BaseIPythonApplication):
log_url = Unicode('', config=True,
help="The ZMQ URL of the iplogger to aggregate logging.")
+ cluster_id = Unicode('', config=True,
+ help="""String id to add to runtime files, to prevent name collisions when
+ using multiple clusters with a single profile.
@fperez
fperez Sep 16, 2011 IPython member

Perhaps add a hint here to users on what they can/should use as id's? I'm sure that strings with spaces in them are likely to cause grief somewhere down the road, but any alphanumeric id is probably OK. So just a quick hint would help new users know what kind of value is sensible here.

@minrk
minrk Sep 16, 2011 IPython member

Pretty much any text should be fine. I tried it with --cluster-id "hello world", and there's no problem. Obviously, it has all the usual inconveniences of filenames with spaces, but it does work, at least in the quick few cases I've tried.

I'll add a bit more detail to the message, though.

@fperez fperez commented on the diff Sep 16, 2011
IPython/parallel/apps/baseapp.py
@@ -116,6 +117,18 @@ class BaseParallelApplication(BaseIPythonApplication):
log_url = Unicode('', config=True,
help="The ZMQ URL of the iplogger to aggregate logging.")
+ cluster_id = Unicode('', config=True,
+ help="""String id to add to runtime files, to prevent name collisions when
+ using multiple clusters with a single profile.
+
+ When set, files will be named like: 'ipcontroller-<cluster_id>-engine.json'
+ """
+ )
+ def _cluster_id_changed(self, name, old, new):
+ self.name = self.__class__.name
+ if new:
+ self.name += '-%s'%new
@fperez
fperez Sep 16, 2011 IPython member

Type this as

'-%s' % new

(note spaces) for readability. I'm not super picky with things like 2 + 3 * 4, which I actually find hinder readability, but string interpolation is one case where I find that sticking to the 'whitespace around operators' guideline does significantly help, since otherwise the format specification blends visually a lot with the rhs.

@fperez fperez commented on an outdated diff Sep 16, 2011
IPython/parallel/apps/ipengineapp.py
@@ -158,7 +158,15 @@ class IPEngineApp(BaseParallelApplication):
controller and engine are started at the same time and it
may take a moment for the controller to write the connector files.""")
- url_file_name = Unicode(u'ipcontroller-engine.json')
+ url_file_name = Unicode(u'ipcontroller-engine.json', config=True)
+
+ def _cluster_id_changed(self, name, old, new):
+ if new:
+ base = 'ipcontroller-%s'%new
@fperez
fperez Sep 16, 2011 IPython member

As above for string interpolation, same below... Won't repeat further to keep noise down.

@fperez fperez commented on the diff Sep 16, 2011
IPython/parallel/apps/launcher.py
@@ -213,6 +215,33 @@ class BaseLauncher(LoggingConfigurable):
"""
raise NotImplementedError('signal must be implemented in a subclass')
+class ClusterAppMixin(HasTraits):
@fperez
fperez Sep 16, 2011 IPython member

Good idea to have these mixins, I'm glad to see less of that code repetition.

@fperez fperez and 1 other commented on an outdated diff Sep 16, 2011
IPython/parallel/apps/launcher.py
-class PBSEngineSetLauncher(PBSLauncher):
+class PBSEngineSetLauncher(BatchClusterAppMixin, PBSLauncher):
@fperez
fperez Sep 16, 2011 IPython member

Any particular reason the mixin class is first on this one? Typically (and in the ones above) mixins go last, so I'm curious...

@minrk
minrk Sep 16, 2011 IPython member

it was to get the changed default value for the context to inherit properly. I don't know if it's a traitlets bug or what, but see this example:

class Base(HasTraits):
    a = Int(5)

class Mixin(HasTraits):
    def _a_default(self):
        return 10

class A(Base, Mixin):
    pass

class B(Mixin, Base):
    pass

a = A()
print a.a # 5
b = B()
print b.a # 10

So declaring _a_default() in Mixin does not result in the same behavior as declaring it in A or Base.

Now, an alternative would be to just use the b._trait_values dict, instead of a special context dict, and give the template access to all of the launcher's traits. That would mean I don't have to do these initialization shenanigans.

@fperez fperez commented on an outdated diff Sep 16, 2011
IPython/parallel/apps/launcher.py
@@ -979,7 +998,7 @@ class SGELauncher(PBSLauncher):
queue_regexp = Unicode('#\$\W+-q\W+\$?\w+')
queue_template = Unicode('#$ -q {queue}')
-class SGEControllerLauncher(SGELauncher):
+class SGEControllerLauncher(BatchClusterAppMixin, SGELauncher):
@fperez
fperez Sep 16, 2011 IPython member

Same question about mixin order, and same below; won't repeat everywhere to reduce noise.

@fperez
Member
fperez commented Sep 16, 2011

Code-wise, this looks like a no-brainer to me, modulo minor comments above. Testing-wise though, what testing scenarios do you have in mind before feeling comfortable merging it? Can I help with some of that, or should we ping the list to try to recruit a few testers?

minrk added some commits Sep 14, 2011
@minrk minrk add cluster_id to parallel apps
This allows multiple controller instances per profile.  Default behavior remains unchanged.
4079cf8
@minrk minrk add cluster_id support to ipcluster/launchers
This includes a moderate reorganization of the common launcher args.

start() no longer takes profile_dir, which is now a trait,
as is cluster_id.   This is implemented via small Mixin classes,
consolidating many duplicated controller_cmd/args / engine_cmd/args lines.
8f563c9
@minrk minrk cleanup inheritance line in auto-config files
rather than explicitly excluding particular base classes,
exclude any base classes that have no config traits to inherit.
7a02b90
@minrk
Member
minrk commented Sep 16, 2011

I've made the changes mentioned. Since there aren't any tests for launchers, and they are not used very often, errors can go unnoticed. So I just want to try a few simple runs with at least SGE/SSH/MPI (the systems I have available to me), to make sure I haven't messed things up.

I would like real tests for the launchers, but since launchers can't run without their various environments (PBS, SGE, etc.) available, unit tests don't have much value.

@minrk minrk parallel.apps cleanup per review
STY: added spaces in string formatting calls
STY: expanded helpstring for cluster-id

ENH: allow BatchClusterAppMixin to be the second inherited class
d24b9a6
@fperez
Member
fperez commented Sep 16, 2011

OK, as far as I'm concerned, this is good to go in, once you feel comfortable with whatever testing you can do on your own. Because of this very issue, this is the kind of code that should go in as early as possible, so that we get the kind of 'in the wild' testing that is hard to automate. The longer it is in the master branch, the higher the chances of someone running it in an environment we don't have access to and pointing out potential problems.

So go from my side, merge when you're comfortable. Thanks!

@minrk minrk allow launcher specification by prefix alone
e.g. ipcluster start --engines=SGE
9937d3c
@minrk minrk added a commit that referenced this pull request Sep 20, 2011
@minrk minrk Merge PR #795 (cluster-id and launcher cleanup)
closes gh-795
3994edb
@minrk minrk added a commit that closed this pull request Sep 20, 2011
@minrk minrk Merge PR #795 (cluster-id and launcher cleanup)
closes gh-795
3994edb
@minrk minrk closed this in 3994edb Sep 20, 2011
@mattvonrocketstein mattvonrocketstein pushed a commit to mattvonrocketstein/ipython that referenced this pull request Nov 3, 2014
@minrk minrk Merge PR #795 (cluster-id and launcher cleanup)
closes gh-795
e8422d6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment