New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-27676: Add ability to remove submitted runs. #24
Conversation
(Also removed a couple of unnecessary lines from report.py found while looking to see if could reuse code.)
doc/lsst.ctrl.bps/quickstart.rst
Outdated
|
||
Sometimes there may be a small delay between executing cancel and jobs | ||
disappearing from the WMS queue. Under normal conditions this delay | ||
is less than a minute. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix indentation. LSST Dev Guide says:
In directives, align to the directive’s name
doc/lsst.ctrl.bps/quickstart.rst
Outdated
.. note:: | ||
|
||
Using the WMS commands directly under normal circumstances is not | ||
advised as bps may someday include additional code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix indentation.
python/lsst/ctrl/bps/cancel.py
Outdated
wms_id : `str` | ||
Cancel submitted jobs matching specified WMS id. | ||
pass_thru : `str` | ||
A string to pass directly to the WMS service class. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we list parameters in the description in the order matching their order in function's signature?
python/lsst/ctrl/bps/cancel.py
Outdated
wms_service : `str` or `~lsst.ctrl.bps.wms_service.WmsService` | ||
Name of the Workload Management System service class. | ||
user : `str` | ||
Cancel all submitted workflows for specified user. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a description of what the command will use this argument for rather than what the argument is. Can we change it to something more like "The user whose submitted workflows should be canceled"?
python/lsst/ctrl/bps/cancel.py
Outdated
require_bps : `bool` | ||
Require given run_id/user to be a bps submitted job. | ||
wms_id : `str` | ||
Cancel submitted jobs matching specified WMS id. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above.
def list_submitted_jobs(self, wms_id=None, user=None, require_bps=True, pass_thru=None): | ||
"""Query WMS for list of submitted WMS workflows/jobs. This should | ||
be a quick lookup function to create list of jobs ids for cancel | ||
and other functions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we "move out" the second sentence from the summary to the extended description?
Id that can be used by WMS service to look up workflow/job. | ||
user : `str`, optional | ||
Limit list to submissions by this particular user. | ||
require_bps : `bool` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the qualifier optional
is missing (judging from function's signature).
|
||
if cluster_id != 0: | ||
constraint = f"(DAGManJobId == {int(float(cluster_id))} || ClusterId == " \ | ||
f"{int(float(cluster_id))})" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're here cluster_id
is definitely an integer so the conversions in the f-string are redundant.
|
||
# prune child jobs where DAG job is in queue (i.e., aren't orphans) | ||
job_ids = [] | ||
for job_id, jinfo in jobs.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change jinfo
to job_info
please?
_LOG.debug("Canceling cluster_id = %s", cluster_id) | ||
schedd = htcondor.Schedd() | ||
constraint = f"ClusterId == {cluster_id}" | ||
if "-forcex" in pass_thru: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If pass_thru
is set to its default value (None
), the code will raise an exception. You need either additional check or use a different default value.
# If wms_id represents path, get numeric id | ||
try: | ||
cluster_id = int(float(wms_id)) | ||
except ValueError: | ||
wms_path = Path(wms_id) | ||
if wms_path.exists(): | ||
try: | ||
cluster_id, _ = read_dag_log(wms_id) | ||
cluster_id = int(float(cluster_id)) | ||
except StopIteration: | ||
cluster_id = 0 | ||
else: | ||
cluster_id = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I'm missing something, you could essentially replace these lines with a call of _wms_id_to_cluster()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops copied it to make the function for elsewhere and forgot to change it here. Thanks.
Added cancel subcommand and support in HTCondor and Pegasus plugins.
(Also removed a couple of unnecessary lines from report.py found while looking to see if could reuse code.)