Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[17.01] History-local serial workflow scheduling. #3520

Conversation

dannon
Copy link
Member

@dannon dannon commented Jan 31, 2017

A less intrusive (and more easily removed, once we no longer need it) approach to resolving #3474.

@bgruening Want to test this with your use-case?

@dannon dannon changed the title [17.01] [WIP] History-local serial workflow scheduling. [17.01] History-local serial workflow scheduling. Jan 31, 2017
@jmchilton
Copy link
Member

@galaxybot test this

@jgoecks
Copy link
Contributor

jgoecks commented Feb 1, 2017

Why make this an option? Seems this should be the one-and-only behavior.

@jmchilton
Copy link
Member

@jgoecks This really breaks down for workflows that cannot be fully scheduled at one time - for instance workflows that use collection operations or have steps that produce dynamically discovered collections. Multiple sites have been using workflows of this style for at least a year and possibly longer and these things would not schedule properly with this in place.

@dannon
Copy link
Member Author

dannon commented Feb 1, 2017

@jgoecks I'd prefer it as the default configuration (so, enabled by default, for now), with an option to disable it, but I figured this way would have less pushback.

@dannon dannon added this to the 17.01 milestone Feb 1, 2017
@dannon
Copy link
Member Author

dannon commented Feb 1, 2017

(looks like the failure here is just travis's osx box not being healthy)

@martenson
Copy link
Member

@dannon correct, mac builds are hosed https://www.traviscistatus.com/incidents/k79mjcv403c4

@guerler
Copy link
Contributor

guerler commented Feb 1, 2017

This looks like a good fix to me. Thx @dannon.

@bgruening
Copy link
Member

Works for me! Tested and I got the expected results! @guerler was this a +1?

@martenson martenson merged commit 409ca0a into galaxyproject:release_17.01 Feb 3, 2017
@martenson
Copy link
Member

I did merge this as the UX hurdle seems to be significant in some cases. However I would also like to immediately deprecate this config option in 17.01 release notes because this solution is far from ideal.

@dannon
Copy link
Member Author

dannon commented Feb 3, 2017

I think 'immediately deprecating' the option is a bit silly, and the case about this being the 'wrong' and 'broken' approach is pretty overstated.

There are improvements we can make down the road, but this makes things work like people expect in almost all cases.

@martenson
Copy link
Member

Afaik with this option we are downgrading our own capabilities in addition to allowing use cases where invocations can go to limbo without any feedback to the user. I think that is a wrong approach to solving a UX hurdle. Given you switched the default to False I was thinking you saw it similarly.

@dannon
Copy link
Member Author

dannon commented Feb 3, 2017

No, I just figured it'd be easier to get it in and fix the problem for users having trouble with this, with less argument, if I swapped the default :) I actually would like the default to be true, though, like I first wrote it.

I really do not see this as 'downgrading capabilities' for (to make up a statistic, which I think is reasonably accurate) 99.8% of people. For the other .2%, people can execute workflows en-masse targetting different histories to execute in parallel if they would like. A ton of workflows executing into the same history, given what we know about being able to use large histories, is a very bad idea right now anyway.

TLDR, the only people I think this will negatively affect 1) are very few in number, 2) have simple workarounds, 3) those workarounds (multiple histories) might be a better organizational solution anyway.

@martenson
Copy link
Member

@dannon Gotcha, thanks for writing down your thinking.

@jmchilton
Copy link
Member

I actually would like the default to be true, though, like I first wrote it.

I don't agree with the rest at all - but I truly appreciate the compromise on the default - thank you! It has been targeting the API since 16.10 and only @bgruening has complained. If more users find this problematic over time - I'll open the PR to switching the default myself.

jmchilton added a commit to jmchilton/galaxy that referenced this pull request Mar 27, 2017
…dom handler.

Lets revisit the problem that background scheduling workflows (as is the default UI behavior as of 16.10) makes it easier for histories to contain datasets interleaved from different workflow invocations under certain reasonable conditions (galaxyproject#3474).

Considering only a four year old workflow and tool feature set (no collection operations, no dynamic dataset discovery, only tool and input workflow modules), all workflows can and will fully schedule on the first scheduling iteration. Under those circumstances, this solution is functionally equivalent to history_local_serial_workflow_scheduling introduced galaxyproject#3520 - but should be more performant because all such workflows fully schedule in the first iteration and the double loop introduced here https://github.com/galaxyproject/galaxy/pull/3520/files#diff-d7e80a366f3965777de95cb0f5b13a4e is avoided for each workflow invocation for each iteration. This addresses both concerns I outlined [here](galaxyproject#3816 (comment)).

For workflows that use certain classes of newer tools or newer workflow features - I'd argue this approach will not degrade as harshly as enabling history_local_serial_workflow_scheduling.

For instance, imagine a workflow with a dynamic dataset collection output step (such as used by IUC tools Deseq2, Trinity, Stacks, and various Mothur tools) half way through that takes 24 hour of queue time to reach. Now imagine a user running 5 such workflows at once.

- Without this and without history_local_serial_workflow_scheduling, the 5 workflows will each run as fast as possible and the UI will show as much of each workflow as can be scheduled but the order of the datsets may be shuffled. The workflows will be complete for the users in 48 hours.
- With history_local_serial_workflow_scheduling enabled, only 1 workflow will be scheduled only half way for the first 24 hours and the user will be given no visual indication for why the other workflows are not running for 1 day. The final workflow output will take nearly a week to be complete for the users.
- With this enabled - the new default in this commit - each workflow will be scheduled in two chunks but these chunks will be contingious and it should be fairly clear to the user what tool caused the discontinuity of the datasets in the history. So things are still mostly ordered, but the draw backs of history_local_serial_workflow_scheduling are avoided entirely. Namely, the other four workflows aren't hidden from the user without a UI indication and the workflows will still only take 48 hours to be complete and outputs ready for the user.

The only drawback of this new default behavior is that you could potentially see some performance improvements by scheduling multiple workflow invocations within one history - but this was never a design goal in my mind when implementing background scheduling and under typical Galaxy use cases I don't think this would be worth the UI problems. So, the older behavior can be re-enabled by setting parallelize_workflow_scheduling_within_histories to True in galaxy.ini but it won't be on by default or really recommended if the Galaxy UI is being used.
jmchilton added a commit to jmchilton/galaxy that referenced this pull request Mar 27, 2017
…dom handler.

Lets revisit the problem that background scheduling workflows (as is the default UI behavior as of 16.10) makes it easier for histories to contain datasets interleaved from different workflow invocations under certain reasonable conditions (galaxyproject#3474).

Considering only a four year old workflow and tool feature set (no collection operations, no dynamic dataset discovery, only tool and input workflow modules), all workflows can and will fully schedule on the first scheduling iteration. Under those circumstances, this solution is functionally equivalent to history_local_serial_workflow_scheduling introduced galaxyproject#3520 - but should be more performant because all such workflows fully schedule in the first iteration and the double loop introduced here https://github.com/galaxyproject/galaxy/pull/3520/files#diff-d7e80a366f3965777de95cb0f5b13a4e is avoided for each workflow invocation for each iteration. This addresses both concerns I outlined [here](galaxyproject#3816 (comment)).

For workflows that use certain classes of newer tools or newer workflow features - I'd argue this approach will not degrade as harshly as enabling history_local_serial_workflow_scheduling.

For instance, imagine a workflow with a dynamic dataset collection output step (such as used by IUC tools Deseq2, Trinity, Stacks, and various Mothur tools) half way through that takes 24 hour of queue time to reach. Now imagine a user running 5 such workflows at once.

- Without this and without history_local_serial_workflow_scheduling, the 5 workflows will each run as fast as possible and the UI will show as much of each workflow as can be scheduled but the order of the datsets may be shuffled. The workflows will be complete for the users in 48 hours.
- With history_local_serial_workflow_scheduling enabled, only 1 workflow will be scheduled only half way for the first 24 hours and the user will be given no visual indication for why the other workflows are not running for 1 day. The final workflow output will take nearly a week to be complete for the users.
- With this enabled - the new default in this commit - each workflow will be scheduled in two chunks but these chunks will be contingious and it should be fairly clear to the user what tool caused the discontinuity of the datasets in the history. So things are still mostly ordered, but the draw backs of history_local_serial_workflow_scheduling are avoided entirely. Namely, the other four workflows aren't hidden from the user without a UI indication and the workflows will still only take 48 hours to be complete and outputs ready for the user.

The only drawback of this new default behavior is that you could potentially see some performance improvements by scheduling multiple workflow invocations within one history - but this was never a design goal in my mind when implementing background scheduling and under typical Galaxy use cases I don't think this would be worth the UI problems. So, the older behavior can be re-enabled by setting parallelize_workflow_scheduling_within_histories to True in galaxy.ini but it won't be on by default or really recommended if the Galaxy UI is being used.
jmchilton added a commit to jmchilton/galaxy that referenced this pull request Mar 30, 2017
…dom handler.

Lets revisit the problem that background scheduling workflows (as is the default UI behavior as of 16.10) makes it easier for histories to contain datasets interleaved from different workflow invocations under certain reasonable conditions (galaxyproject#3474).

Considering only a four year old workflow and tool feature set (no collection operations, no dynamic dataset discovery, only tool and input workflow modules), all workflows can and will fully schedule on the first scheduling iteration. Under those circumstances, this solution is functionally equivalent to history_local_serial_workflow_scheduling introduced galaxyproject#3520 - but should be more performant because all such workflows fully schedule in the first iteration and the double loop introduced here https://github.com/galaxyproject/galaxy/pull/3520/files#diff-d7e80a366f3965777de95cb0f5b13a4e is avoided for each workflow invocation for each iteration. This addresses both concerns I outlined [here](galaxyproject#3816 (comment)).

For workflows that use certain classes of newer tools or newer workflow features - I'd argue this approach will not degrade as harshly as enabling history_local_serial_workflow_scheduling.

For instance, imagine a workflow with a dynamic dataset collection output step (such as used by IUC tools Deseq2, Trinity, Stacks, and various Mothur tools) half way through that takes 24 hour of queue time to reach. Now imagine a user running 5 such workflows at once.

- Without this and without history_local_serial_workflow_scheduling, the 5 workflows will each run as fast as possible and the UI will show as much of each workflow as can be scheduled but the order of the datsets may be shuffled. The workflows will be complete for the users in 48 hours.
- With history_local_serial_workflow_scheduling enabled, only 1 workflow will be scheduled only half way for the first 24 hours and the user will be given no visual indication for why the other workflows are not running for 1 day. The final workflow output will take nearly a week to be complete for the users.
- With this enabled - the new default in this commit - each workflow will be scheduled in two chunks but these chunks will be contingious and it should be fairly clear to the user what tool caused the discontinuity of the datasets in the history. So things are still mostly ordered, but the draw backs of history_local_serial_workflow_scheduling are avoided entirely. Namely, the other four workflows aren't hidden from the user without a UI indication and the workflows will still only take 48 hours to be complete and outputs ready for the user.

The only drawback of this new default behavior is that you could potentially see some performance improvements by scheduling multiple workflow invocations within one history - but this was never a design goal in my mind when implementing background scheduling and under typical Galaxy use cases I don't think this would be worth the UI problems. So, the older behavior can be re-enabled by setting parallelize_workflow_scheduling_within_histories to True in galaxy.ini but it won't be on by default or really recommended if the Galaxy UI is being used.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants