Add cluster-id for multiple cluster instances per profile #795

Closed
wants to merge 5 commits into from

2 participants

@minrk
IPython member

Default behavior is unchanged, but specifying a '--cluster-id' arg will allow starting additional clusters within a single profile.

I also cleaned up the code in parallel.apps.launchers a fair amount, by adding a few Mixin classes to consolidate some of the extremely repetitive code.

This will need some testing in various environments, to catch possible typos from the changes, but it's a major improvement overall.

The consolidation also fixes a few inconsistencies, such as the MPI launchers using program/program_args, while the local launchers used controller_cmd/controller_args. Now the names are more sensible and consistent across launcher types.

When checking the generated config files, I also fixed a small aesthetic issue when printing the class list from which a Configurable will inherit configuration. It now excludes any base class that doesn't have anything to inherit.

Should close #794

@fperez fperez and 1 other commented on an outdated diff Sep 16, 2011
IPython/parallel/apps/baseapp.py
@@ -116,6 +117,18 @@ class BaseParallelApplication(BaseIPythonApplication):
log_url = Unicode('', config=True,
help="The ZMQ URL of the iplogger to aggregate logging.")
+ cluster_id = Unicode('', config=True,
+ help="""String id to add to runtime files, to prevent name collisions when
+ using multiple clusters with a single profile.
@fperez
IPython member
fperez added a note Sep 16, 2011

Perhaps add a hint here to users on what they can/should use as id's? I'm sure that strings with spaces in them are likely to cause grief somewhere down the road, but any alphanumeric id is probably OK. So just a quick hint would help new users know what kind of value is sensible here.

@minrk
IPython member
minrk added a note Sep 16, 2011

Pretty much any text should be fine. I tried it with --cluster-id "hello world", and there's no problem. Obviously, it has all the usual inconveniences of filenames with spaces, but it does work, at least in the quick few cases I've tried.

I'll add a bit more detail to the message, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@fperez fperez commented on the diff Sep 16, 2011
IPython/parallel/apps/baseapp.py
@@ -116,6 +117,18 @@ class BaseParallelApplication(BaseIPythonApplication):
log_url = Unicode('', config=True,
help="The ZMQ URL of the iplogger to aggregate logging.")
+ cluster_id = Unicode('', config=True,
+ help="""String id to add to runtime files, to prevent name collisions when
+ using multiple clusters with a single profile.
+
+ When set, files will be named like: 'ipcontroller-<cluster_id>-engine.json'
+ """
+ )
+ def _cluster_id_changed(self, name, old, new):
+ self.name = self.__class__.name
+ if new:
+ self.name += '-%s'%new
@fperez
IPython member
fperez added a note Sep 16, 2011

Type this as

'-%s' % new

(note spaces) for readability. I'm not super picky with things like 2 + 3 * 4, which I actually find hinder readability, but string interpolation is one case where I find that sticking to the 'whitespace around operators' guideline does significantly help, since otherwise the format specification blends visually a lot with the rhs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@fperez fperez commented on an outdated diff Sep 16, 2011
IPython/parallel/apps/ipengineapp.py
@@ -158,7 +158,15 @@ class IPEngineApp(BaseParallelApplication):
controller and engine are started at the same time and it
may take a moment for the controller to write the connector files.""")
- url_file_name = Unicode(u'ipcontroller-engine.json')
+ url_file_name = Unicode(u'ipcontroller-engine.json', config=True)
+
+ def _cluster_id_changed(self, name, old, new):
+ if new:
+ base = 'ipcontroller-%s'%new
@fperez
IPython member
fperez added a note Sep 16, 2011

As above for string interpolation, same below... Won't repeat further to keep noise down.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@fperez fperez commented on the diff Sep 16, 2011
IPython/parallel/apps/launcher.py
@@ -213,6 +215,33 @@ class BaseLauncher(LoggingConfigurable):
"""
raise NotImplementedError('signal must be implemented in a subclass')
+class ClusterAppMixin(HasTraits):
@fperez
IPython member
fperez added a note Sep 16, 2011

Good idea to have these mixins, I'm glad to see less of that code repetition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@fperez fperez and 1 other commented on an outdated diff Sep 16, 2011
IPython/parallel/apps/launcher.py
-class PBSEngineSetLauncher(PBSLauncher):
+class PBSEngineSetLauncher(BatchClusterAppMixin, PBSLauncher):
@fperez
IPython member
fperez added a note Sep 16, 2011

Any particular reason the mixin class is first on this one? Typically (and in the ones above) mixins go last, so I'm curious...

@minrk
IPython member
minrk added a note Sep 16, 2011

it was to get the changed default value for the context to inherit properly. I don't know if it's a traitlets bug or what, but see this example:

class Base(HasTraits):
    a = Int(5)

class Mixin(HasTraits):
    def _a_default(self):
        return 10

class A(Base, Mixin):
    pass

class B(Mixin, Base):
    pass

a = A()
print a.a # 5
b = B()
print b.a # 10

So declaring _a_default() in Mixin does not result in the same behavior as declaring it in A or Base.

Now, an alternative would be to just use the b._trait_values dict, instead of a special context dict, and give the template access to all of the launcher's traits. That would mean I don't have to do these initialization shenanigans.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@fperez fperez commented on an outdated diff Sep 16, 2011
IPython/parallel/apps/launcher.py
@@ -979,7 +998,7 @@ class SGELauncher(PBSLauncher):
queue_regexp = Unicode('#\$\W+-q\W+\$?\w+')
queue_template = Unicode('#$ -q {queue}')
-class SGEControllerLauncher(SGELauncher):
+class SGEControllerLauncher(BatchClusterAppMixin, SGELauncher):
@fperez
IPython member
fperez added a note Sep 16, 2011

Same question about mixin order, and same below; won't repeat everywhere to reduce noise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@fperez
IPython member

Code-wise, this looks like a no-brainer to me, modulo minor comments above. Testing-wise though, what testing scenarios do you have in mind before feeling comfortable merging it? Can I help with some of that, or should we ping the list to try to recruit a few testers?

minrk added some commits Sep 14, 2011
@minrk minrk add cluster_id to parallel apps
This allows multiple controller instances per profile.  Default behavior remains unchanged.
4079cf8
@minrk minrk add cluster_id support to ipcluster/launchers
This includes a moderate reorganization of the common launcher args.

start() no longer takes profile_dir, which is now a trait,
as is cluster_id.   This is implemented via small Mixin classes,
consolidating many duplicated controller_cmd/args / engine_cmd/args lines.
8f563c9
@minrk minrk cleanup inheritance line in auto-config files
rather than explicitly excluding particular base classes,
exclude any base classes that have no config traits to inherit.
7a02b90
@minrk
IPython member

I've made the changes mentioned. Since there aren't any tests for launchers, and they are not used very often, errors can go unnoticed. So I just want to try a few simple runs with at least SGE/SSH/MPI (the systems I have available to me), to make sure I haven't messed things up.

I would like real tests for the launchers, but since launchers can't run without their various environments (PBS, SGE, etc.) available, unit tests don't have much value.

@minrk minrk parallel.apps cleanup per review
STY: added spaces in string formatting calls
STY: expanded helpstring for cluster-id

ENH: allow BatchClusterAppMixin to be the second inherited class
d24b9a6
@fperez
IPython member

OK, as far as I'm concerned, this is good to go in, once you feel comfortable with whatever testing you can do on your own. Because of this very issue, this is the kind of code that should go in as early as possible, so that we get the kind of 'in the wild' testing that is hard to automate. The longer it is in the master branch, the higher the chances of someone running it in an environment we don't have access to and pointing out potential problems.

So go from my side, merge when you're comfortable. Thanks!

@minrk minrk allow launcher specification by prefix alone
e.g. ipcluster start --engines=SGE
9937d3c
@minrk minrk added a commit that referenced this pull request Sep 20, 2011
@minrk minrk Merge PR #795 (cluster-id and launcher cleanup)
closes gh-795
3994edb
@minrk minrk added a commit that closed this pull request Sep 20, 2011
@minrk minrk Merge PR #795 (cluster-id and launcher cleanup)
closes gh-795
3994edb
@minrk minrk closed this in 3994edb Sep 20, 2011
@mattvonrocketstein mattvonrocketstein pushed a commit to mattvonrocketstein/ipython that referenced this pull request Nov 3, 2014
@minrk minrk Merge PR #795 (cluster-id and launcher cleanup)
closes gh-795
e8422d6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment