-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Log reason when workflow scheduling of a step is delayed. #3849
Log reason when workflow scheduling of a step is delayed. #3849
Conversation
Currently the logs contain fairly uninformative step delayed messages that don't allow admins to understand why parts of the workflow are being delayed during scheduling. Here are some examples of before and after. When running the test: ``` ./run_tests.sh -api test/api/test_workflows.py:WorkflowsApiTestCase.test_workflow_pause ``` Before these lines would show up: ``` galaxy.workflow.run DEBUG 2017-03-30 09:23:47,327 Workflow step 2 of invocation 1 invoked (174.149 ms) galaxy.workflow.run DEBUG 2017-03-30 09:23:47,347 Workflow step 3 of invocation 1 invoked (19.355 ms) galaxy.workflow.run DEBUG 2017-03-30 09:23:47,364 Workflow step 4 of invocation 1 delayed (16.262 ms) ``` Now these same lines are as follows: ``` galaxy.workflow.run DEBUG 2017-03-30 09:19:28,601 Workflow step 2 of invocation 1 invoked (172.849 ms) galaxy.workflow.run DEBUG 2017-03-30 09:19:28,619 Marking step 3 outputs delayed (executing pause step) galaxy.workflow.run DEBUG 2017-03-30 09:19:28,620 Workflow step 3 of invocation 1 invoked (17.999 ms) galaxy.workflow.run DEBUG 2017-03-30 09:19:28,633 Workflow step 4 of invocation 1 delayed (dependent step [3] delayed, so this step must be delayed) (13.398 ms) ``` Also, when running the test: ``` ./run_tests.sh -api test/api/test_workflows.py:WorkflowsApiTestCase.test_workflow_run_dynamic_output_collections_3 ``` Before these lines would be printed: ``` galaxy.workflow.run DEBUG 2017-03-30 09:25:35,910 Workflow step 4 of invocation 1 invoked (281.479 ms) galaxy.workflow.run DEBUG 2017-03-30 09:25:35,937 Workflow step 5 of invocation 1 delayed (25.904 ms) galaxy.workflow.run DEBUG 2017-03-30 09:25:35,970 Workflow step 6 of invocation 1 delayed (32.597 ms) ``` Now these same lines are as follows: ``` galaxy.workflow.run DEBUG 2017-03-30 09:27:54,270 Workflow step 4 of invocation 1 invoked (295.826 ms) galaxy.workflow.run DEBUG 2017-03-30 09:27:54,315 Workflow step 5 of invocation 1 delayed (dependent collection [1] not yet populated with datasets) (44.581 ms) galaxy.workflow.run DEBUG 2017-03-30 09:27:54,326 Workflow step 6 of invocation 1 delayed (dependent step [5] delayed, so this step must be delayed) (10.804 ms) ``` When running workflows with steps that depend explicitly on other steps (instead of implicitly between dataset connections), lines such as: ``` galaxy.workflow.run DEBUG 2017-03-30 09:33:52,966 Marking step 3 outputs delayed (workflow paused at this step waiting for review) galaxy.workflow.run DEBUG 2017-03-30 09:33:53,001 Workflow step 4 of invocation 1 delayed (dependent step [3] delayed, so this step must be delayed) (0.157 ms) galaxy.workflow.run DEBUG 2017-03-30 09:33:53,001 Workflow step 5 of invocation 1 delayed (depends on step [4] but that step has not been invoked yet) (0.084 ms) ``` and ``` galaxy.workflow.run DEBUG 2017-03-30 09:33:57,090 Workflow step 5 of invocation 1 delayed (depends on step [4] but one or more jobs created from that step have not finished yet) (0.144 ms) ``` now may appear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a bunch for this @jmchilton!
lib/galaxy/workflow/run.py
Outdated
def mark_step_outputs_delayed(self, step): | ||
def mark_step_outputs_delayed(self, step, why=None): | ||
if why: | ||
message = "Marking step %s outputs delayed (%s)" % (step.id, why) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about:
message = "Marking step %s outputs of invocation %s delayed (%s)" % (step.id, self.workflow_invocation.id, why)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - this is a good suggestion - I'll walk through the others.
lib/galaxy/workflow/run.py
Outdated
step_delayed = delayed_steps = True | ||
more_info = "(%s)" % de.why | ||
self.progress.mark_step_outputs_delayed( step ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can be more consistent and also simplify lines 181-183 if:
- you modify line 350 as per my suggestion there
- drop
more_info
- pass
de.why
to self.progress.mark_step_outputs_delayed()` here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't quite how I think about this - but you are right the code is more consistent and the logs look better - so it makes me think I'm thinking about this wrong. Give me a second and I'll rebase.
lib/galaxy/workflow/run.py
Outdated
@@ -177,7 +179,8 @@ def invoke( self ): | |||
raise | |||
|
|||
step_verb = "invoked" if not step_delayed else "delayed" | |||
log.debug("Workflow step %s of invocation %s %s %s" % (step.id, workflow_invocation.id, step_verb, step_timer)) | |||
status = step_verb + (" %s" % more_info if more_info else "") | |||
log.debug("Workflow step %s of invocation %s %s %s" % (step.id, workflow_invocation.id, status, step_timer)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines 181-183 can be simplified to:
if not step_delayed:
log.debug("Workflow step %s of invocation %s invoked %s" % (step.id, workflow_invocation.id, step_timer))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do loose the timing information then 🙁.
@nsoranzo Pushed a commit with your recommendations - thanks for such a detailed review of this. |
+1, looks good to me, much more useful. @nsoranzo happy with the changes? |
Currently the logs contain fairly uninformative step delayed messages that don't allow admins to understand why parts of the workflow are being delayed during scheduling.
Here are some examples of before and after.
When running the test:
Before these lines would show up:
Now these same lines are as follows:
Also, when running the test:
Before these lines would be printed:
Now these same lines are as follows:
When running workflows with steps that depend explicitly on other steps (instead of implicitly between dataset connections), lines such as:
and
now may appear.