Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subworkflow for phylogenetic placement: fasta_newick_epang_gappa #2856

Merged
merged 22 commits into from
Feb 13, 2023

Conversation

erikrikarddaniel
Copy link
Member

PR checklist

This adds a subworkflow that runs phylogenetic placement with EPA-NG and Gappa.

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the module conventions in the contribution docs
  • If necessary, include test data in your PR.
  • Remove all TODO statements.
  • Emit the versions.yml file.
  • Follow the naming conventions.
  • Follow the parameters requirements.
  • Follow the input/output options guidelines.
  • Add a resource label
  • Use BioConda and BioContainers if possible to fulfil software requirements.
  • Ensure that the test works with either Docker / Singularity. Conda CI tests can be quite flaky:
    • PROFILE=docker pytest --tag <MODULE> --symlink --keep-workflow-wd --git-aware
    • PROFILE=singularity pytest --tag <MODULE> --symlink --keep-workflow-wd --git-aware
    • PROFILE=conda pytest --tag <MODULE> --symlink --keep-workflow-wd --git-aware

Copy link
Contributor

@d4straub d4straub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! But this is the first subworkflow that I review, probably a second opinion would be good as well.

workflow FASTA_NEWICK_EPANG_GAPPA {

take:
ch_pp_data // channel: [ meta: val(meta), data: [ alignmethod: val(alignmethod), queryseqfile: file(queryseqfile), refseqfile: file(refseqfile), refphylogeny: file(refphylogeny), hmmfile: file(hmmfile), model: val(model) ] ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's preffered to split this channel into multiple input channels that will get merged in the subworkflow. This reduces the complexity for the user too (so you don't have to use arrays in arrays etc)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that that's difficult to synchronise if the pieces come from different sources, i.e. to make sure that they come in the same order. (At least I don't know how to do that.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can join the channels on meta. Just gotta make sure that all channels that have to be merged do contain exactly the same meta. (You can use the .join() operator for this: https://www.nextflow.io/docs/latest/operator.html#join)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I'd like to give some more context. The workflow is called by my phyloplace pipeline (it's basically the whole pipeline). The sample sheet for the pipeline, parsed with splitCsv(), is turned into a single channel with this structure. If I were to give the subworkflow separate channels, I'd have to split the channel before calling the subworkflow, then join() them in the subworkflow.

I chose the map structure for the channel because I personally prefer to have named inputs when the number of inputs is large.

I plan to include the subworkflow also into ampliseq. I don't foresee any problems calling it with this structure. (Of course it should be documented though. That was an oversight of mine mostly because I'm not so familiar with the documentation format.)

Moreover, I was recently advised to merge channels into one when submitting a module. An advise that I later was grateful for, because I found it more explicit to join() before calling the module rather than the module doing that for me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I agree that the input channel here is a little more complex than for other workflows, I find it quite intuitive due to the naming. I am fine with it as is. @nvnieuwk is there a rule for input channels?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aaah okay the context is very clear, thanks! You can leave it as it is :)

Copy link
Contributor

@d4straub d4straub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure my suggestions are perfect, but at least to me it makes understanding much easier.

erikrikarddaniel and others added 2 commits February 10, 2023 12:16
Co-authored-by: Daniel Straub <42973691+d4straub@users.noreply.github.com>
Co-authored-by: Daniel Straub <42973691+d4straub@users.noreply.github.com>
@erikrikarddaniel
Copy link
Member Author

Not sure my suggestions are perfect, but at least to me it makes understanding much easier.

Thank you, much better!

@erikrikarddaniel
Copy link
Member Author

Thank you both @nvnieuwk and @d4straub !

@erikrikarddaniel erikrikarddaniel merged commit 6ad90f5 into nf-core:master Feb 13, 2023
@erikrikarddaniel erikrikarddaniel deleted the phyloplace-swf branch February 13, 2023 09:49
FloWuenne pushed a commit to FloWuenne/modules that referenced this pull request Feb 14, 2023
…f-core#2856)

* Generated subworkflow from template

* Started

* Continued work

* Rename, add tests

* Prettier, remove extra tag

* Remove trailing whitespace

* Remove quotes to hopefully fix prettier issue

* Remove checks for md5sums that frequently fail

* It whas the colon not the quotes of course

* Update subworkflows/nf-core/fasta_newick_epang_gappa/meta.yml

Co-authored-by: Daniel Straub <42973691+d4straub@users.noreply.github.com>

* Improved documentation of input

* Better documentation of input channel

* Remoed two remaining md5sum tests for versions.yml

* Update subworkflows/nf-core/fasta_newick_epang_gappa/meta.yml

Co-authored-by: Daniel Straub <42973691+d4straub@users.noreply.github.com>

* Update subworkflows/nf-core/fasta_newick_epang_gappa/meta.yml

Co-authored-by: Daniel Straub <42973691+d4straub@users.noreply.github.com>

---------

Co-authored-by: Daniel Straub <42973691+d4straub@users.noreply.github.com>
sam-white04 pushed a commit to fulcrumgenomics/nf-core-modules that referenced this pull request Feb 27, 2023
…f-core#2856)

* Generated subworkflow from template

* Started

* Continued work

* Rename, add tests

* Prettier, remove extra tag

* Remove trailing whitespace

* Remove quotes to hopefully fix prettier issue

* Remove checks for md5sums that frequently fail

* It whas the colon not the quotes of course

* Update subworkflows/nf-core/fasta_newick_epang_gappa/meta.yml

Co-authored-by: Daniel Straub <42973691+d4straub@users.noreply.github.com>

* Improved documentation of input

* Better documentation of input channel

* Remoed two remaining md5sum tests for versions.yml

* Update subworkflows/nf-core/fasta_newick_epang_gappa/meta.yml

Co-authored-by: Daniel Straub <42973691+d4straub@users.noreply.github.com>

* Update subworkflows/nf-core/fasta_newick_epang_gappa/meta.yml

Co-authored-by: Daniel Straub <42973691+d4straub@users.noreply.github.com>

---------

Co-authored-by: Daniel Straub <42973691+d4straub@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants