Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log reason when workflow scheduling of a step is delayed. #3849

Merged
merged 2 commits into from
Mar 30, 2017

Conversation

jmchilton
Copy link
Member

Currently the logs contain fairly uninformative step delayed messages that don't allow admins to understand why parts of the workflow are being delayed during scheduling.

Here are some examples of before and after.

When running the test:

./run_tests.sh -api test/api/test_workflows.py:WorkflowsApiTestCase.test_workflow_pause

Before these lines would show up:

galaxy.workflow.run DEBUG 2017-03-30 09:23:47,327 Workflow step 2 of invocation 1 invoked (174.149 ms)
galaxy.workflow.run DEBUG 2017-03-30 09:23:47,347 Workflow step 3 of invocation 1 invoked (19.355 ms)
galaxy.workflow.run DEBUG 2017-03-30 09:23:47,364 Workflow step 4 of invocation 1 delayed (16.262 ms)

Now these same lines are as follows:

galaxy.workflow.run DEBUG 2017-03-30 09:19:28,601 Workflow step 2 of invocation 1 invoked (172.849 ms)
galaxy.workflow.run DEBUG 2017-03-30 09:19:28,619 Marking step 3 outputs delayed (executing pause step)
galaxy.workflow.run DEBUG 2017-03-30 09:19:28,620 Workflow step 3 of invocation 1 invoked (17.999 ms)
galaxy.workflow.run DEBUG 2017-03-30 09:19:28,633 Workflow step 4 of invocation 1 delayed (dependent step [3] delayed, so this step must be delayed) (13.398 ms)

Also, when running the test:

./run_tests.sh -api test/api/test_workflows.py:WorkflowsApiTestCase.test_workflow_run_dynamic_output_collections_3

Before these lines would be printed:

galaxy.workflow.run DEBUG 2017-03-30 09:25:35,910 Workflow step 4 of invocation 1 invoked (281.479 ms)
galaxy.workflow.run DEBUG 2017-03-30 09:25:35,937 Workflow step 5 of invocation 1 delayed (25.904 ms)
galaxy.workflow.run DEBUG 2017-03-30 09:25:35,970 Workflow step 6 of invocation 1 delayed (32.597 ms)

Now these same lines are as follows:

galaxy.workflow.run DEBUG 2017-03-30 09:27:54,270 Workflow step 4 of invocation 1 invoked (295.826 ms)
galaxy.workflow.run DEBUG 2017-03-30 09:27:54,315 Workflow step 5 of invocation 1 delayed (dependent collection [1] not yet populated with datasets) (44.581 ms)
galaxy.workflow.run DEBUG 2017-03-30 09:27:54,326 Workflow step 6 of invocation 1 delayed (dependent step [5] delayed, so this step must be delayed) (10.804 ms)

When running workflows with steps that depend explicitly on other steps (instead of implicitly between dataset connections), lines such as:

galaxy.workflow.run DEBUG 2017-03-30 09:33:52,966 Marking step 3 outputs delayed (workflow paused at this step waiting for review)
galaxy.workflow.run DEBUG 2017-03-30 09:33:53,001 Workflow step 4 of invocation 1 delayed (dependent step [3] delayed, so this step must be delayed) (0.157 ms)
galaxy.workflow.run DEBUG 2017-03-30 09:33:53,001 Workflow step 5 of invocation 1 delayed (depends on step [4] but that step has not been invoked yet) (0.084 ms)

and

galaxy.workflow.run DEBUG 2017-03-30 09:33:57,090 Workflow step 5 of invocation 1 delayed (depends on step [4] but one or more jobs created from that step have not finished yet) (0.144 ms)

now may appear.

Currently the logs contain fairly uninformative step delayed messages that don't allow admins to understand why parts of the workflow are being delayed during scheduling.

Here are some examples of before and after.

When running the test:

```
./run_tests.sh -api test/api/test_workflows.py:WorkflowsApiTestCase.test_workflow_pause
```

Before these lines would show up:

```
galaxy.workflow.run DEBUG 2017-03-30 09:23:47,327 Workflow step 2 of invocation 1 invoked (174.149 ms)
galaxy.workflow.run DEBUG 2017-03-30 09:23:47,347 Workflow step 3 of invocation 1 invoked (19.355 ms)
galaxy.workflow.run DEBUG 2017-03-30 09:23:47,364 Workflow step 4 of invocation 1 delayed (16.262 ms)
```

Now these same lines are as follows:

```
galaxy.workflow.run DEBUG 2017-03-30 09:19:28,601 Workflow step 2 of invocation 1 invoked (172.849 ms)
galaxy.workflow.run DEBUG 2017-03-30 09:19:28,619 Marking step 3 outputs delayed (executing pause step)
galaxy.workflow.run DEBUG 2017-03-30 09:19:28,620 Workflow step 3 of invocation 1 invoked (17.999 ms)
galaxy.workflow.run DEBUG 2017-03-30 09:19:28,633 Workflow step 4 of invocation 1 delayed (dependent step [3] delayed, so this step must be delayed) (13.398 ms)
```

Also, when running the test:

```
./run_tests.sh -api test/api/test_workflows.py:WorkflowsApiTestCase.test_workflow_run_dynamic_output_collections_3
```

Before these lines would be printed:

```
galaxy.workflow.run DEBUG 2017-03-30 09:25:35,910 Workflow step 4 of invocation 1 invoked (281.479 ms)
galaxy.workflow.run DEBUG 2017-03-30 09:25:35,937 Workflow step 5 of invocation 1 delayed (25.904 ms)
galaxy.workflow.run DEBUG 2017-03-30 09:25:35,970 Workflow step 6 of invocation 1 delayed (32.597 ms)
```

Now these same lines are as follows:

```
galaxy.workflow.run DEBUG 2017-03-30 09:27:54,270 Workflow step 4 of invocation 1 invoked (295.826 ms)
galaxy.workflow.run DEBUG 2017-03-30 09:27:54,315 Workflow step 5 of invocation 1 delayed (dependent collection [1] not yet populated with datasets) (44.581 ms)
galaxy.workflow.run DEBUG 2017-03-30 09:27:54,326 Workflow step 6 of invocation 1 delayed (dependent step [5] delayed, so this step must be delayed) (10.804 ms)
```

When running workflows with steps that depend explicitly on other steps (instead of implicitly between dataset connections), lines such as:

```
galaxy.workflow.run DEBUG 2017-03-30 09:33:52,966 Marking step 3 outputs delayed (workflow paused at this step waiting for review)
galaxy.workflow.run DEBUG 2017-03-30 09:33:53,001 Workflow step 4 of invocation 1 delayed (dependent step [3] delayed, so this step must be delayed) (0.157 ms)
galaxy.workflow.run DEBUG 2017-03-30 09:33:53,001 Workflow step 5 of invocation 1 delayed (depends on step [4] but that step has not been invoked yet) (0.084 ms)
```

and

```
galaxy.workflow.run DEBUG 2017-03-30 09:33:57,090 Workflow step 5 of invocation 1 delayed (depends on step [4] but one or more jobs created from that step have not finished yet) (0.144 ms)
```

now may appear.
Copy link
Member

@nsoranzo nsoranzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a bunch for this @jmchilton!

def mark_step_outputs_delayed(self, step):
def mark_step_outputs_delayed(self, step, why=None):
if why:
message = "Marking step %s outputs delayed (%s)" % (step.id, why)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about:

            message = "Marking step %s outputs of invocation %s delayed (%s)" % (step.id, self.workflow_invocation.id, why)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - this is a good suggestion - I'll walk through the others.

step_delayed = delayed_steps = True
more_info = "(%s)" % de.why
self.progress.mark_step_outputs_delayed( step )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can be more consistent and also simplify lines 181-183 if:

  • you modify line 350 as per my suggestion there
  • drop more_info
  • pass de.why to self.progress.mark_step_outputs_delayed()` here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't quite how I think about this - but you are right the code is more consistent and the logs look better - so it makes me think I'm thinking about this wrong. Give me a second and I'll rebase.

@@ -177,7 +179,8 @@ def invoke( self ):
raise

step_verb = "invoked" if not step_delayed else "delayed"
log.debug("Workflow step %s of invocation %s %s %s" % (step.id, workflow_invocation.id, step_verb, step_timer))
status = step_verb + (" %s" % more_info if more_info else "")
log.debug("Workflow step %s of invocation %s %s %s" % (step.id, workflow_invocation.id, status, step_timer))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 181-183 can be simplified to:

             if not step_delayed:
                 log.debug("Workflow step %s of invocation %s invoked %s" % (step.id, workflow_invocation.id, step_timer))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do loose the timing information then 🙁.

@jmchilton
Copy link
Member Author

@nsoranzo Pushed a commit with your recommendations - thanks for such a detailed review of this.

@dannon
Copy link
Member

dannon commented Mar 30, 2017

+1, looks good to me, much more useful. @nsoranzo happy with the changes?

@dannon dannon merged commit 200e475 into galaxyproject:dev Mar 30, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants