Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-27676: Add ability to remove submitted runs. #24

Merged
merged 2 commits into from Apr 15, 2021

Conversation

MichelleGower
Copy link
Collaborator

Added cancel subcommand and support in HTCondor and Pegasus plugins.

(Also removed a couple of unnecessary lines from report.py found while looking to see if could reuse code.)

(Also removed a couple of unnecessary lines from report.py
found while looking to see if could reuse code.)

Sometimes there may be a small delay between executing cancel and jobs
disappearing from the WMS queue. Under normal conditions this delay
is less than a minute.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix indentation. LSST Dev Guide says:

In directives, align to the directive’s name

.. note::

Using the WMS commands directly under normal circumstances is not
advised as bps may someday include additional code.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix indentation.

wms_id : `str`
Cancel submitted jobs matching specified WMS id.
pass_thru : `str`
A string to pass directly to the WMS service class.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we list parameters in the description in the order matching their order in function's signature?

wms_service : `str` or `~lsst.ctrl.bps.wms_service.WmsService`
Name of the Workload Management System service class.
user : `str`
Cancel all submitted workflows for specified user.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a description of what the command will use this argument for rather than what the argument is. Can we change it to something more like "The user whose submitted workflows should be canceled"?

require_bps : `bool`
Require given run_id/user to be a bps submitted job.
wms_id : `str`
Cancel submitted jobs matching specified WMS id.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above.

def list_submitted_jobs(self, wms_id=None, user=None, require_bps=True, pass_thru=None):
"""Query WMS for list of submitted WMS workflows/jobs. This should
be a quick lookup function to create list of jobs ids for cancel
and other functions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we "move out" the second sentence from the summary to the extended description?

Id that can be used by WMS service to look up workflow/job.
user : `str`, optional
Limit list to submissions by this particular user.
require_bps : `bool`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the qualifier optional is missing (judging from function's signature).


if cluster_id != 0:
constraint = f"(DAGManJobId == {int(float(cluster_id))} || ClusterId == " \
f"{int(float(cluster_id))})"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're here cluster_id is definitely an integer so the conversions in the f-string are redundant.


# prune child jobs where DAG job is in queue (i.e., aren't orphans)
job_ids = []
for job_id, jinfo in jobs.items():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change jinfo to job_info please?

_LOG.debug("Canceling cluster_id = %s", cluster_id)
schedd = htcondor.Schedd()
constraint = f"ClusterId == {cluster_id}"
if "-forcex" in pass_thru:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If pass_thru is set to its default value (None), the code will raise an exception. You need either additional check or use a different default value.

Comment on lines 115 to 127
# If wms_id represents path, get numeric id
try:
cluster_id = int(float(wms_id))
except ValueError:
wms_path = Path(wms_id)
if wms_path.exists():
try:
cluster_id, _ = read_dag_log(wms_id)
cluster_id = int(float(cluster_id))
except StopIteration:
cluster_id = 0
else:
cluster_id = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm missing something, you could essentially replace these lines with a call of _wms_id_to_cluster().

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops copied it to make the function for elsewhere and forgot to change it here. Thanks.

@MichelleGower MichelleGower changed the title Add ability to remove submitted runs. DM-27676: Add ability to remove submitted runs. Apr 15, 2021
@MichelleGower MichelleGower merged commit 16743df into master Apr 15, 2021
@MichelleGower MichelleGower deleted the tickets/DM-27676 branch April 15, 2021 01:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants