update subworkflows for differential and functional enrichment analysis#11024
update subworkflows for differential and functional enrichment analysis#11024suzannejin merged 9 commits intomasterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Updates nf-core subworkflows used by differential abundance + functional enrichment pipelines to align emitted outputs with an upcoming output migration in nf-core/differentialabundance.
Changes:
- Refactors
differential_functional_enrichmentGSEA wiring (CHIP handling + input reshaping) and adds new artifact output channels for gprofiler2 and GSEA. - Adds new
plotsandotheroutput channels toabundance_differential_filterand updates workflow wiring accordingly. - Updates
nf-testsnapshots to reflect the new output structure.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| subworkflows/nf-core/differential_functional_enrichment/main.nf | Adjusts GSEA channel construction and adds aggregated artifact outputs for gprofiler2/GSEA. |
| subworkflows/nf-core/differential_functional_enrichment/meta.yml | Documents new outputs (gprofiler2_artifacts, gsea_artifacts). |
| subworkflows/nf-core/differential_functional_enrichment/tests/main.nf.test.snap | Snapshot updates for new outputs and metadata ordering. |
| subworkflows/nf-core/abundance_differential_filter/main.nf | Adds plots + other channels and minor closure style tweaks. |
| subworkflows/nf-core/abundance_differential_filter/meta.yml | Documents the new outputs (plots, other). |
| subworkflows/nf-core/abundance_differential_filter/tests/main.nf.test.snap | Snapshot updates for new outputs and metadata. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
subworkflows/nf-core/differential_functional_enrichment/meta.yml
Outdated
Show resolved
Hide resolved
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…ore/modules into update-for-workflow-outputs
pinin4fjords
left a comment
There was a problem hiding this comment.
I'm very unconvinced by the channel combinations, can we leave them separate please to allow flexible usage? All you should really need to do here is add any outputs that currently don't go to channels, to support the workflow output syntax in the pipeline.
| GSEA_GSEA.out.rpt | ||
| .mix(GSEA_GSEA.out.index_html) | ||
| .mix(GSEA_GSEA.out.heat_map_corr_plot) | ||
| .mix(GSEA_GSEA.out.report_tsvs_ref) |
There was a problem hiding this comment.
Did you intend to mix these here as well as above?
| decoupler_png = DECOUPLER_DECOUPLER.out.png | ||
|
|
||
| // main results | ||
| results = ch_results |
There was a problem hiding this comment.
I'm unconvinced by outputting everything in a single channel this way. You're mixing channel 'arity' and forcing anyone who wants to access specific channels to sort though a heterogenous set. Can we leave these separate please, and defer any combinations to the workflow?
There was a problem hiding this comment.
yes.. i was trying to see if i can simplify, but didnt like it either. reverting back to the previous commit now.
|
e.g. here's how I prepared subworkflows for workflow output syntax: #9765 Essentially just making sure that everything was available via channels. No need to combine everything artificially into a small number of outputs. |
55f1d16 to
2957563
Compare
|
hi @pinin4fjords , sorry for the confusion, i was trying to see if extra simplification works, but it did not. |
| ch_plots = DESEQ2_DIFFERENTIAL.out.dispersion_plot_png | ||
| .mix(LIMMA_DIFFERENTIAL.out.md_plot) | ||
|
|
||
| ch_other = DESEQ2_DIFFERENTIAL.out.rdata |
There was a problem hiding this comment.
Sorry to be a pain, thinking about this we should probably emit disparate things separately, the 'other' doesn't really work. If I want the session info in a consuming workflow I don't want to have to filter it out.
I'm inclined to say the same about the plots unless they're equivalent plots (which I don't think is the case here).
There was a problem hiding this comment.
I splitted the output for the ABUNDANCE_DIFFERENTIAL_FILTER subworkflow.
Now, for the GSEA artifacts, you want to emit a different output for each of the 21 artifacts?
That's a lot of duplicated verbosity to deal with at all subworkflow, workflow levels.
There was a problem hiding this comment.
It just doesn't make sense to combine disparate things into channels. I agree the current implementation makes things awkward- it's why I put this PR on hold for rnaseq: nf-core/rnaseq#1679
Nextflow will have some improvements, notably record types, that may help this soon. It's fine to put this on pause until that happens if you feel the same here as I did for rnaseq.
There was a problem hiding this comment.
The migration here is mainly needed to solve this issue affecting pipeline cache.
That's why I wouldn't pause it for this PR. I'd vote to have a less perfect subworkflow output structure for the moment (it can always be adapted later on).
There was a problem hiding this comment.
Could you explain why separating the outputs would block that work?
There was a problem hiding this comment.
What meant is that pausing the migration will block that work, not splitting the output itself.
However, the entire idea about pipeline hub is that alternative methods (in this case functional enrichment) can be exchanged easily. This means adding a new method will ideally only require changes in the subworkflow, not the pipeline itself. By splitting outputs, instead of classifying outputs into the same category, will make this even more difficult.
But yes, i agree that having them in artifacts/other is also not ideal. I will split the outputs as temporal solution. With syntax improvements in the future this may be better addressed.
There was a problem hiding this comment.
When different methods have clear analogs, putting them in the same channel makes sense (perhaps after some normalisation of the structure- for future). But combining e.g. size factors and session info will never make sense at the component level.
03506ff to
729a7c3
Compare
| gprofiler2_all_enrich = GPROFILER2_GOST.out.all_enrich | ||
| gprofiler2_sub_enrich = GPROFILER2_GOST.out.sub_enrich | ||
| gprofiler2_plot_html = GPROFILER2_GOST.out.plot_html | ||
| gprofiler2_artifacts = GPROFILER2_GOST.out.plot_png |
There was a problem hiding this comment.
Same deal with these things. Arguably makes sense to group e.g. plots and reports, but we shouldn't be combining all the disparate things
There was a problem hiding this comment.
To be fair, I know why you want to do this- I wanted something similar when I opened nextflow-io/nextflow#6756, i.e. to have channels mapping directly to publish locations.
But we can't make assumptions in components about how pipelines will want to publish.
There was a problem hiding this comment.
ok nice it looks like with records, things will become easier.
Update subworkflows for differential and functional enrichment analysis.
This update is needed for the workflow output migration here
PR checklist
Closes #XXX
topic: versions- See version_topicslabelnf-core modules test <MODULE> --profile dockernf-core modules test <MODULE> --profile singularitynf-core modules test <MODULE> --profile condanf-core subworkflows test <SUBWORKFLOW> --profile dockernf-core subworkflows test <SUBWORKFLOW> --profile singularitynf-core subworkflows test <SUBWORKFLOW> --profile conda