Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log reason when workflow scheduling of a step is delayed. #3849

Merged
merged 2 commits into from Mar 30, 2017

Conversation

Projects
None yet
3 participants
@jmchilton
Copy link
Member

commented Mar 30, 2017

Currently the logs contain fairly uninformative step delayed messages that don't allow admins to understand why parts of the workflow are being delayed during scheduling.

Here are some examples of before and after.

When running the test:

./run_tests.sh -api test/api/test_workflows.py:WorkflowsApiTestCase.test_workflow_pause

Before these lines would show up:

galaxy.workflow.run DEBUG 2017-03-30 09:23:47,327 Workflow step 2 of invocation 1 invoked (174.149 ms)
galaxy.workflow.run DEBUG 2017-03-30 09:23:47,347 Workflow step 3 of invocation 1 invoked (19.355 ms)
galaxy.workflow.run DEBUG 2017-03-30 09:23:47,364 Workflow step 4 of invocation 1 delayed (16.262 ms)

Now these same lines are as follows:

galaxy.workflow.run DEBUG 2017-03-30 09:19:28,601 Workflow step 2 of invocation 1 invoked (172.849 ms)
galaxy.workflow.run DEBUG 2017-03-30 09:19:28,619 Marking step 3 outputs delayed (executing pause step)
galaxy.workflow.run DEBUG 2017-03-30 09:19:28,620 Workflow step 3 of invocation 1 invoked (17.999 ms)
galaxy.workflow.run DEBUG 2017-03-30 09:19:28,633 Workflow step 4 of invocation 1 delayed (dependent step [3] delayed, so this step must be delayed) (13.398 ms)

Also, when running the test:

./run_tests.sh -api test/api/test_workflows.py:WorkflowsApiTestCase.test_workflow_run_dynamic_output_collections_3

Before these lines would be printed:

galaxy.workflow.run DEBUG 2017-03-30 09:25:35,910 Workflow step 4 of invocation 1 invoked (281.479 ms)
galaxy.workflow.run DEBUG 2017-03-30 09:25:35,937 Workflow step 5 of invocation 1 delayed (25.904 ms)
galaxy.workflow.run DEBUG 2017-03-30 09:25:35,970 Workflow step 6 of invocation 1 delayed (32.597 ms)

Now these same lines are as follows:

galaxy.workflow.run DEBUG 2017-03-30 09:27:54,270 Workflow step 4 of invocation 1 invoked (295.826 ms)
galaxy.workflow.run DEBUG 2017-03-30 09:27:54,315 Workflow step 5 of invocation 1 delayed (dependent collection [1] not yet populated with datasets) (44.581 ms)
galaxy.workflow.run DEBUG 2017-03-30 09:27:54,326 Workflow step 6 of invocation 1 delayed (dependent step [5] delayed, so this step must be delayed) (10.804 ms)

When running workflows with steps that depend explicitly on other steps (instead of implicitly between dataset connections), lines such as:

galaxy.workflow.run DEBUG 2017-03-30 09:33:52,966 Marking step 3 outputs delayed (workflow paused at this step waiting for review)
galaxy.workflow.run DEBUG 2017-03-30 09:33:53,001 Workflow step 4 of invocation 1 delayed (dependent step [3] delayed, so this step must be delayed) (0.157 ms)
galaxy.workflow.run DEBUG 2017-03-30 09:33:53,001 Workflow step 5 of invocation 1 delayed (depends on step [4] but that step has not been invoked yet) (0.084 ms)

and

galaxy.workflow.run DEBUG 2017-03-30 09:33:57,090 Workflow step 5 of invocation 1 delayed (depends on step [4] but one or more jobs created from that step have not finished yet) (0.144 ms)

now may appear.

Log when workflow scheduling is delayed about why.
Currently the logs contain fairly uninformative step delayed messages that don't allow admins to understand why parts of the workflow are being delayed during scheduling.

Here are some examples of before and after.

When running the test:

```
./run_tests.sh -api test/api/test_workflows.py:WorkflowsApiTestCase.test_workflow_pause
```

Before these lines would show up:

```
galaxy.workflow.run DEBUG 2017-03-30 09:23:47,327 Workflow step 2 of invocation 1 invoked (174.149 ms)
galaxy.workflow.run DEBUG 2017-03-30 09:23:47,347 Workflow step 3 of invocation 1 invoked (19.355 ms)
galaxy.workflow.run DEBUG 2017-03-30 09:23:47,364 Workflow step 4 of invocation 1 delayed (16.262 ms)
```

Now these same lines are as follows:

```
galaxy.workflow.run DEBUG 2017-03-30 09:19:28,601 Workflow step 2 of invocation 1 invoked (172.849 ms)
galaxy.workflow.run DEBUG 2017-03-30 09:19:28,619 Marking step 3 outputs delayed (executing pause step)
galaxy.workflow.run DEBUG 2017-03-30 09:19:28,620 Workflow step 3 of invocation 1 invoked (17.999 ms)
galaxy.workflow.run DEBUG 2017-03-30 09:19:28,633 Workflow step 4 of invocation 1 delayed (dependent step [3] delayed, so this step must be delayed) (13.398 ms)
```

Also, when running the test:

```
./run_tests.sh -api test/api/test_workflows.py:WorkflowsApiTestCase.test_workflow_run_dynamic_output_collections_3
```

Before these lines would be printed:

```
galaxy.workflow.run DEBUG 2017-03-30 09:25:35,910 Workflow step 4 of invocation 1 invoked (281.479 ms)
galaxy.workflow.run DEBUG 2017-03-30 09:25:35,937 Workflow step 5 of invocation 1 delayed (25.904 ms)
galaxy.workflow.run DEBUG 2017-03-30 09:25:35,970 Workflow step 6 of invocation 1 delayed (32.597 ms)
```

Now these same lines are as follows:

```
galaxy.workflow.run DEBUG 2017-03-30 09:27:54,270 Workflow step 4 of invocation 1 invoked (295.826 ms)
galaxy.workflow.run DEBUG 2017-03-30 09:27:54,315 Workflow step 5 of invocation 1 delayed (dependent collection [1] not yet populated with datasets) (44.581 ms)
galaxy.workflow.run DEBUG 2017-03-30 09:27:54,326 Workflow step 6 of invocation 1 delayed (dependent step [5] delayed, so this step must be delayed) (10.804 ms)
```

When running workflows with steps that depend explicitly on other steps (instead of implicitly between dataset connections), lines such as:

```
galaxy.workflow.run DEBUG 2017-03-30 09:33:52,966 Marking step 3 outputs delayed (workflow paused at this step waiting for review)
galaxy.workflow.run DEBUG 2017-03-30 09:33:53,001 Workflow step 4 of invocation 1 delayed (dependent step [3] delayed, so this step must be delayed) (0.157 ms)
galaxy.workflow.run DEBUG 2017-03-30 09:33:53,001 Workflow step 5 of invocation 1 delayed (depends on step [4] but that step has not been invoked yet) (0.084 ms)
```

and

```
galaxy.workflow.run DEBUG 2017-03-30 09:33:57,090 Workflow step 5 of invocation 1 delayed (depends on step [4] but one or more jobs created from that step have not finished yet) (0.144 ms)
```

now may appear.
@nsoranzo
Copy link
Member

left a comment

Thanks a bunch for this @jmchilton!

def mark_step_outputs_delayed(self, step):
def mark_step_outputs_delayed(self, step, why=None):
if why:
message = "Marking step %s outputs delayed (%s)" % (step.id, why)

This comment has been minimized.

Copy link
@nsoranzo

nsoranzo Mar 30, 2017

Member

What about:

            message = "Marking step %s outputs of invocation %s delayed (%s)" % (step.id, self.workflow_invocation.id, why)

This comment has been minimized.

Copy link
@jmchilton

jmchilton Mar 30, 2017

Author Member

Yes - this is a good suggestion - I'll walk through the others.

step_delayed = delayed_steps = True
more_info = "(%s)" % de.why
self.progress.mark_step_outputs_delayed( step )

This comment has been minimized.

Copy link
@nsoranzo

nsoranzo Mar 30, 2017

Member

I think you can be more consistent and also simplify lines 181-183 if:

  • you modify line 350 as per my suggestion there
  • drop more_info
  • pass de.why to self.progress.mark_step_outputs_delayed()` here

This comment has been minimized.

Copy link
@jmchilton

jmchilton Mar 30, 2017

Author Member

It isn't quite how I think about this - but you are right the code is more consistent and the logs look better - so it makes me think I'm thinking about this wrong. Give me a second and I'll rebase.

@@ -177,7 +179,8 @@ def invoke( self ):
raise

step_verb = "invoked" if not step_delayed else "delayed"
log.debug("Workflow step %s of invocation %s %s %s" % (step.id, workflow_invocation.id, step_verb, step_timer))
status = step_verb + (" %s" % more_info if more_info else "")
log.debug("Workflow step %s of invocation %s %s %s" % (step.id, workflow_invocation.id, status, step_timer))

This comment has been minimized.

Copy link
@nsoranzo

nsoranzo Mar 30, 2017

Member

Lines 181-183 can be simplified to:

             if not step_delayed:
                 log.debug("Workflow step %s of invocation %s invoked %s" % (step.id, workflow_invocation.id, step_timer))

This comment has been minimized.

Copy link
@jmchilton

jmchilton Mar 30, 2017

Author Member

We do loose the timing information then 🙁.

@jmchilton

This comment has been minimized.

Copy link
Member Author

commented Mar 30, 2017

@nsoranzo Pushed a commit with your recommendations - thanks for such a detailed review of this.

@dannon

This comment has been minimized.

Copy link
Member

commented Mar 30, 2017

+1, looks good to me, much more useful. @nsoranzo happy with the changes?

@dannon dannon merged commit 200e475 into galaxyproject:dev Mar 30, 2017

4 of 5 checks passed

integration test Test scheduled.
Details
api test Build finished. 271 tests run, 0 skipped, 0 failed.
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
framework test Build finished. 140 tests run, 0 skipped, 0 failed.
Details
toolshed test Build finished. 579 tests run, 0 skipped, 0 failed.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.