-
Notifications
You must be signed in to change notification settings - Fork 715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Subworkflow for phylogenetic placement: fasta_newick_epang_gappa
#2856
Subworkflow for phylogenetic placement: fasta_newick_epang_gappa
#2856
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! But this is the first subworkflow that I review, probably a second opinion would be good as well.
workflow FASTA_NEWICK_EPANG_GAPPA { | ||
|
||
take: | ||
ch_pp_data // channel: [ meta: val(meta), data: [ alignmethod: val(alignmethod), queryseqfile: file(queryseqfile), refseqfile: file(refseqfile), refphylogeny: file(refphylogeny), hmmfile: file(hmmfile), model: val(model) ] ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's preffered to split this channel into multiple input channels that will get merged in the subworkflow. This reduces the complexity for the user too (so you don't have to use arrays in arrays etc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that that's difficult to synchronise if the pieces come from different sources, i.e. to make sure that they come in the same order. (At least I don't know how to do that.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can join the channels on meta. Just gotta make sure that all channels that have to be merged do contain exactly the same meta. (You can use the .join()
operator for this: https://www.nextflow.io/docs/latest/operator.html#join)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. I'd like to give some more context. The workflow is called by my phyloplace pipeline (it's basically the whole pipeline). The sample sheet for the pipeline, parsed with splitCsv()
, is turned into a single channel with this structure. If I were to give the subworkflow separate channels, I'd have to split the channel before calling the subworkflow, then join()
them in the subworkflow.
I chose the map structure for the channel because I personally prefer to have named inputs when the number of inputs is large.
I plan to include the subworkflow also into ampliseq. I don't foresee any problems calling it with this structure. (Of course it should be documented though. That was an oversight of mine mostly because I'm not so familiar with the documentation format.)
Moreover, I was recently advised to merge channels into one when submitting a module. An advise that I later was grateful for, because I found it more explicit to join()
before calling the module rather than the module doing that for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I agree that the input channel here is a little more complex than for other workflows, I find it quite intuitive due to the naming. I am fine with it as is. @nvnieuwk is there a rule for input channels?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aaah okay the context is very clear, thanks! You can leave it as it is :)
Co-authored-by: Daniel Straub <42973691+d4straub@users.noreply.github.com>
…modules into phyloplace-swf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure my suggestions are perfect, but at least to me it makes understanding much easier.
Co-authored-by: Daniel Straub <42973691+d4straub@users.noreply.github.com>
Co-authored-by: Daniel Straub <42973691+d4straub@users.noreply.github.com>
Thank you, much better! |
…f-core#2856) * Generated subworkflow from template * Started * Continued work * Rename, add tests * Prettier, remove extra tag * Remove trailing whitespace * Remove quotes to hopefully fix prettier issue * Remove checks for md5sums that frequently fail * It whas the colon not the quotes of course * Update subworkflows/nf-core/fasta_newick_epang_gappa/meta.yml Co-authored-by: Daniel Straub <42973691+d4straub@users.noreply.github.com> * Improved documentation of input * Better documentation of input channel * Remoed two remaining md5sum tests for versions.yml * Update subworkflows/nf-core/fasta_newick_epang_gappa/meta.yml Co-authored-by: Daniel Straub <42973691+d4straub@users.noreply.github.com> * Update subworkflows/nf-core/fasta_newick_epang_gappa/meta.yml Co-authored-by: Daniel Straub <42973691+d4straub@users.noreply.github.com> --------- Co-authored-by: Daniel Straub <42973691+d4straub@users.noreply.github.com>
…f-core#2856) * Generated subworkflow from template * Started * Continued work * Rename, add tests * Prettier, remove extra tag * Remove trailing whitespace * Remove quotes to hopefully fix prettier issue * Remove checks for md5sums that frequently fail * It whas the colon not the quotes of course * Update subworkflows/nf-core/fasta_newick_epang_gappa/meta.yml Co-authored-by: Daniel Straub <42973691+d4straub@users.noreply.github.com> * Improved documentation of input * Better documentation of input channel * Remoed two remaining md5sum tests for versions.yml * Update subworkflows/nf-core/fasta_newick_epang_gappa/meta.yml Co-authored-by: Daniel Straub <42973691+d4straub@users.noreply.github.com> * Update subworkflows/nf-core/fasta_newick_epang_gappa/meta.yml Co-authored-by: Daniel Straub <42973691+d4straub@users.noreply.github.com> --------- Co-authored-by: Daniel Straub <42973691+d4straub@users.noreply.github.com>
PR checklist
This adds a subworkflow that runs phylogenetic placement with EPA-NG and Gappa.
versions.yml
file.label
PROFILE=docker pytest --tag <MODULE> --symlink --keep-workflow-wd --git-aware
PROFILE=singularity pytest --tag <MODULE> --symlink --keep-workflow-wd --git-aware
PROFILE=conda pytest --tag <MODULE> --symlink --keep-workflow-wd --git-aware