Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
[17.01] Restrict workflow scheduling within a history to a fixed, random handler. #3820
Lets revisit the problem that background scheduling workflows (as is the default UI behavior as of 16.10) makes it easier for histories to contain datasets interleaved from different workflow invocations under certain reasonable conditions (#3474).
Considering only a four year old workflow and tool feature set (no collection operations, no dynamic dataset discovery, only tool and input workflow modules), all workflows can and will fully schedule on the first scheduling iteration. Under those circumstances, this solution is functionally equivalent to history_local_serial_workflow_scheduling introduced #3520 - but should be more performant because all such workflows fully schedule in the first iteration and the double loop introduced here https://github.com/galaxyproject/galaxy/pull/3520/files#diff-d7e80a366f3965777de95cb0f5b13a4e is avoided for each workflow invocation for each iteration. This addresses both concerns I outlined here.
For workflows that use certain classes of newer tools or newer workflow features - I'd argue this approach will not degrade as harshly as enabling history_local_serial_workflow_scheduling.
For instance, imagine a workflow with a dynamic dataset collection output step (such as used by IUC tools Deseq2, Trinity, Stacks, and various Mothur tools) half way through that takes 24 hour of queue time to reach. Now imagine a user running 5 such workflows at once.
The only drawback of this new default behavior versus the previous default is that you could potentially see some performance improvements by scheduling multiple workflow invocations within one history - but this was never a design goal in my mind when implementing background scheduling and under typical Galaxy use cases I don't think this would be worth the UI problems. So, the older behavior can be re-enabled by setting parallelize_workflow_scheduling_within_histories to True in galaxy.ini but it won't be on by default or really recommended if the Galaxy UI is being used.
Since it leverages the testing enhancements therein, this is built on #3659.
This may (probably does) incidentally fix #3818.
@nsoranzo I rebased with your suggestions - thanks as always for the detailed review.
@bgruening Great - I'll pull it out of WIP when I get your +1 or when I get more testing done myself. I have a test case in there that verifies the handler setting is correct and I've reviewed the code a few times and I don't think a single handler would ever process two workflows at once so I think this should all work - but I haven't observed an actual concrete problem and verified this fixes it yet.
Running this now for a day, it does not break more things. But still does not solve the non persistent order of inputs in one-single history.
We have 8 BAM files in a history and one workflow with one tool (rmdup). We execute this workflow on all inputs and get 8 ordered outputs (run on dataset 1-8). So far so good. During the execution of these 8 rmdup runs, we start 16 new workflows (8 initial BAMs, 8 in yellow state, currently running). The result is
@bgruening I thought the problem was ordering within a workflow invocation - not ordering of the workflow invocations themselves. I can make sure the workflows get processed in the handlers by ascending ID if this is a problem. Though I don't know if they are sent to the invocation endpoint in a fixed, ordered way. I can add some test cases for this stuff though.
referenced this pull request
Mar 28, 2017
There were two problems, one is inside a workflow and the other is with multiple workflows. We spend the day to test larger worklfows and run them multiple time in one history and it seems this is working fine :)