New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add HTCondor bindings to IPython.parallel #3465
Conversation
Previously I used condor_submit -verbose so that I could pull the job id from the first line. However, when submitting many jobs it's silly to have to search through some many lines of output. Now we have a regexp for Condor that matches on the non-verbose output. Here the job_id is on the end of the output, so using grouping in our job_id_regexp becomes very useful. To account for this I have added a job_id_regexp_group property to the BatchLauncher and it's subclasses. The default of 0 means the whole regexp is matched - however now an integer can be passed in here to instead select a subgroup of the expression (see CondorLauncher and the mechanism will be clear).
def _insert_queue_in_script(self): | ||
"""Inserts a queue if required into the batch script. | ||
""" | ||
print self.queue_regexp.search(self.batch_template) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print statement
Is there a reason to have special |
The probably used to do things, but don't anymore. Go ahead and remove all the empty ones. |
I think there is reason for Btw do you know why the approach in lines 80-84 is taken instead of the approach I'm using for condor? Launching the |
That's not what those lines do. Those lines load the ipcontroller/engine startup scripts with the same Python executable as the launcher, specifically so that the PATH to ipcontroller/engine cannot cause confusion by being somewhere weird. It's quite reasonable for the controller/engine scripts to be in a directory other than the Python executable (in fact, I personally have zero systems where ipengine is in the same dir as python). For that reason, I try to use just the base |
I see, my assumption was clearly wrong about the scripts being reliably located. Removed |
|
||
def _insert_queue_in_script(self): | ||
"""AFAIK, Condor doesn't have a concept of multiple queues that can be | ||
specified in the script.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra .
Looks good to me after the one tiny typo. Thanks for going over it with me! |
There you go. No problems, thanks for looking it over so swiftly! |
Oh, one more thing: Can you add Condor to the tests in IPython.parallel.tests.test_launcher? Should be just two trivial entries below |
Just spotted two minor things:
I'll go ahead and implement these two changes now |
OK that's all done now. |
Great, thanks. |
Add HTCondor bindings to IPython.parallel
Add HTCondor bindings to IPython.parallel
This PR adds a new set of subclasses in
IPython/parallel/apps/launcher.py
to add support for the HTCondor batch management system.I've tried to make a template that has sensible defaults that should work out of the box for most users.
Help config setup
I think I have added config/help support where required but I am still getting my head around the config/traitlets setup in IPython. Does everything look ok?
BatchLauncher changes
job_array
at the end)job_id_regexp_group
to allow for subclasses to specify what group of thejob_id_regexp
needs to be matched. By default this is 0, which means all current implementations remain unchanged, whilst allowing for Condor to have a simple regexp setup. I'm not the worlds foremost regexp expert so there might be a way around this without introducing groups..Batch launch commands changes
Condor destroys sys.executable for the jobs it farms out, ruining Python's module path checking. This is pretty problematic for the current batch setup, which calls something like
/local/python/path/bin/python -c 'commands required to start the engine or controller' --profile-dir=...
on the remote.
However, by instead calling
/local/python/path/bin/ipengine --profile-dir=...
and/local/python/path/bin/ipcontroller --profile-dir=...
we can skirt around the issue as on entry to these scripts the shebang causes
/local/python/path/bin/python
to be called, with the sys.executable path set correctly.I can't think of any downside to this approach, other than you now demand that python is running from a folder containing
ipengine
andipcontroller
, which is of course the setup everyone has from install. My only concern is that I have no idea how these scripts (and Condor in general for that matter) behave in the Windows world.Documentation changes
Finally, once accepted the docs probably want to be upgraded to mention the support. Is that something I should do in this PR?