-
Notifications
You must be signed in to change notification settings - Fork 598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resume should work with list-returning operators #1475
Comments
A minimal example to replicate the problem is process foo {
input: val(x) from (1,2,3)
output: val(x) into ch
/echo $x/
}
process bar {
echo true
input: val(y) from ch.collate(3)
/your_command --items $y/
} I see your issue with the caching, however this is a subtle point because it really depends if the |
In my case the command does not care about the order of collated elements. I assume this is a common scenario, as collate depends on channel ordering which is unpredictable/random as far as the user is concerned. When I checked, it seemed to me that the resumed process collated elements in a different order even and I ended up with different 'collate' buckets. I will check my findings and either post an example here or retract the statement; hopefully later this week. |
A quick workaround should be process bar {
echo true
input: val(y) from ch.collate(3).toSortedList()
/your_command --items $y/
} |
With this code:
I get:
The issue that hampers me is that the collate acts on a different order entirely after the resumption.
So it changes the logic of the process. |
As a side note, I use collate and metafile approaches to reduce the load on the file system. I will soon have workflows with close to 100k samples; in many workflows it helps a lot if I do some mini-batching internally (outside NF knowledge) using collate/metafile-type approaches. Whilst not perhaps philosophically pure, it saves a large amount of headaches and time, and gets more out of the resources we have. |
Sorry,
Maybe the best choice is 3, showing a warning if the elements cannot be sorted and falling back on the old behaviour. |
Can I verify one thing: There are two elements to this as far as I can see: the channel order and the chunk/bag sorting. You write about the sorting of the output, but I believe the bigger problem is the order of the channel input. The change of channel order leads to the issue that chunks become different. For example compare these outputs for two runs, the second with
You can see that collate chunks (viewed as bags) are defined by the channel order, and it is different between runs. The solutions above seem to focus on each individual output bag ordering, rather than the problem of the effect that changing channel orderings changes the bag definitions. |
Then any of above will solve your use case, let's discuss a bit more on Gitter. |
Two comments, one question. (1) I've experienced/finally realised that (2) In my current pipeline the channel that needs collating has all its inputs ready. This allows
This approach seems to work on the small test case above. I assume there is no issue with that, not even when there are 100k elements in the channel ... I'll go and test 56k elements now. |
No, The problem of |
ok, I misunderstood. Yes I would want to avoid that, but I found that in one particular pipeline it is actually OK to use |
Documenting here that e.g.
bar will be re-run due to random ordering of |
I am having an issue with Here is part of my workflow: workflow {
main:
Channel.of('tumour', 'control') | merge(Channel.of(params.tumour_frac, params.control_frac)) \
| merge(Channel.fromPath([params.tumour_bam, params.control_bam])) \
| subsample \
| sortBamByName \
| bamToFastq \
| set { subsampledFastqs }
// CRASH HAPPENED IN mapFastq
subsampledFastqs | mapFastq | set { rawMappedBams }
[...] The problem is that even though the two merges are entirely deterministic and all steps up to and including bamToFastq completed for the two files, nextflow keeps resuming from |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Hi, Is there any update on what's the solution to getting collate to play nicely with resume? |
Hello, I am also facing issue because of the channel order. Has there been any solution to handle this case? |
I can think of two possible solutions:
|
Bug report
Expected behavior and actual behavior
A pipeline may restart a process when resuming if the process uses
collate()
or another list-returning operator.Steps to reproduce the problem
Use a process that uses
collate()
on a sufficiently large input channel. (Can construct example if needed, but we discussed this on gitter). Run the workflow, run it again with-resume
.The text was updated successfully, but these errors were encountered: