-
Notifications
You must be signed in to change notification settings - Fork 950
Update differential subworkflows to take list of contrasts as input #8346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ning differential subworkflow
…re/modules into update_differential_method
|
@pinin4fjords hello!! Some minor changes here, just changed the subworkflows to take in list of contrasts as required by the new pipeline structure |
pinin4fjords
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Some points. I haven't replicated for the other subworkflow, but you'll see what I mean.
| def meta_input_new = meta_input + [ 'method_differential': analysis_method ] | ||
| def criteria = multiMapCriteria { meta, abundance, analysis_method, fc_threshold, stat_threshold, samplesheet, transcript_length, control_features, contrast, variable, reference, target, formula, comparison -> | ||
| def meta_with_method = meta + [ 'differential_method': analysis_method ] | ||
| def meta_for_diff = meta_with_method + contrast |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def meta_for_diff = meta_with_method + contrast | |
| def meta_for_diff = meta_with_method + meta_contrast |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably keep the mergeMaps usage? Realise I didn't use it in #448, but that was probably a mistake. We still need to concatenate e.g. the study-wise ID with the contrast ID, rather than just overwriting the former with the latter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i actually don't like the fact it will merge for everything though.
If you want, I can modify the function so that it only merge id, and raise error if other elements are repeated.
I think it should be explicitly handled by the user instead of the automatic merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think silent over-writing of common keys (as this version will do) is better? I do not. I'd rather they were concatentated, as at least information is not lost.
| } | ||
|
|
||
| // create joined channel with samplesheet and optional transcript lengths and control features | ||
| ch_samplesheet_with_control = ch_samplesheet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we just completely forget about handling these correctly before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems so...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, before we basically implemented the subworkflow assuming that the pipeline will always only provide one ch_samplesheet/ch_contrast etc. It only parallelizes sets of different methods.
Now, we wanted the pipeline to be able to paralellize any rows of full params, hence the subworkflow assume now any input channel can have various rows, and need to be join/combine relying on the meta.
| .combine( | ||
| // take the first contrast for each common meta | ||
| ch_contrasts.map { meta, contrast, variable, reference, target, formula, comparison -> | ||
| [meta, contrast[0], variable[0], reference[0], target[0], formula[0], comparison[0]] | ||
| } | ||
| , by:0 | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| .combine( | |
| // take the first contrast for each common meta | |
| ch_contrasts.map { meta, contrast, variable, reference, target, formula, comparison -> | |
| [meta, contrast[0], variable[0], reference[0], target[0], formula[0], comparison[0]] | |
| } | |
| , by:0 | |
| ) | |
| .join(ch_contrasts.transpose(), by:0) |
If we use a join here after the transpose, instead of a combine, that should give us the first matching contrast more simply, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the thing is that the subworkflow can potentially receive various ch_contrats with different metas (if rows in the toolsheet also allow changing input data etc).
So we do need to take the first for each common meta.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure you quite understood. Unless I'm having a senior moment (it happens), these two sets of logic are functionally identical, mine is just simpler.
You have a map which extracts the first element of the collapsed contrasts, prior to doing a combine 'with key' (the by) part. This is effectively an outer join, made to function like an inner join by moving all items on the right to a single contrast for a given key.
In my version I 'uncollapse' the contrasts using transpose(), and then do an inner join with join. Since there will be multiple keys on the right with the same key value, this will have the effect of taking the 'first' matching one, replicating your logic. It's the same deal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, did not see it before!
Thank you for explaining, i will change it
|
just applied everything you suggested :) |
pinin4fjords
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, looks good i think
…f-core#8346) * update abundance_differential_filter * round limma digits for consistency * update test configs and snapshots * update enrichment workflow. Still need to solve bug when multiply running differential subworkflow * fix bug (abundance_differential_filter): proper multiply combine norm channels * add comments * update functional subworkflow snapshot * update meta.yaml * update meta.yaml * apply suggestions * add back mergeMaps
PR checklist
Closes #XXX
versions.ymlfile.labelnf-core modules test <MODULE> --profile dockernf-core modules test <MODULE> --profile singularitynf-core modules test <MODULE> --profile condanf-core subworkflows test <SUBWORKFLOW> --profile dockernf-core subworkflows test <SUBWORKFLOW> --profile singularitynf-core subworkflows test <SUBWORKFLOW> --profile conda