Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat dnascope #180

Merged
merged 30 commits into from
May 18, 2022
Merged

Feat dnascope #180

merged 30 commits into from
May 18, 2022

Conversation

ramprasadn
Copy link
Collaborator

PR checklist

This PR adds sentieon's dnamodelapply and dnascope modules, places them in a subworkflow, and makes relevant changes in other parts of the workflow.

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
    • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
    • If necessary, also make a PR on the nf-core/raredisease branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR> -stub).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@github-actions
Copy link

github-actions bot commented May 17, 2022

nf-core lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 1f91aaa

+| ✅ 141 tests passed       |+
#| ❔   6 tests were ignored |#
!| ❗  14 tests had warnings |!

❗ Test warnings:

  • readme - README did not have a Nextflow minimum version badge.
  • pipeline_todos - TODO string in README.md: Add full-sized test dataset and amend the paragraph below if applicable
  • pipeline_todos - TODO string in README.md: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file.
  • pipeline_todos - TODO string in README.md: Add bibliography of tools and data used in your pipeline
  • pipeline_todos - TODO string in base.config: Check the defaults for all processes
  • pipeline_todos - TODO string in base.config: Customise requirements for specific processes.
  • pipeline_todos - TODO string in test_full.config: Specify the paths to your full test data ( on nf-core/test-datasets or directly in repositories, e.g. SRA)
  • pipeline_todos - TODO string in test_full.config: Give any required params for the test so that command line flags are not needed
  • pipeline_todos - TODO string in test.config: Give any required params for the test so that command line flags are not needed
  • pipeline_todos - TODO string in usage.md: Add documentation about anything specific to running your pipeline. For general topics, please point to (and add to) the main nf-core website.
  • pipeline_todos - TODO string in output.md: Write this documentation describing your workflow's output
  • pipeline_todos - TODO string in WorkflowMain.groovy: Add Zenodo DOI for pipeline after first release
  • pipeline_todos - TODO string in awsfulltest.yml: You can customise AWS full pipeline tests as required
  • pipeline_todos - TODO string in ci.yml: You can customise CI pipeline run tests as required

❔ Tests ignored:

  • files_unchanged - File ignored due to lint config: .github/CONTRIBUTING.md
  • files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/bug_report.yml
  • files_unchanged - File ignored due to lint config: .github/PULL_REQUEST_TEMPLATE.md
  • files_unchanged - File ignored due to lint config: .github/workflows/branch.yml
  • files_unchanged - File ignored due to lint config: .github/workflows/linting_comment.yml
  • files_unchanged - File ignored due to lint config: .github/workflows/linting.yml

✅ Tests passed:

Run details

  • nf-core/tools version 2.4.1
  • Run at 2022-05-18 14:10:16

@ramprasadn ramprasadn marked this pull request as ready for review May 17, 2022 08:42
@ramprasadn ramprasadn requested a review from jemten May 17, 2022 14:11
@ramprasadn ramprasadn requested a review from sima-r May 17, 2022 14:11
@ramprasadn ramprasadn mentioned this pull request May 17, 2022
10 tasks
Copy link
Collaborator

@jemten jemten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @ramprasadn and @sima-r. Some small questions but I think it looks good otherwise. Choose if you want to address them 😄

@@ -28,11 +30,13 @@ params {
'GRCh38' {
fasta = "${params.local_genomes}/grch38_homo_sapiens_-assembly-.fasta"
fai = "${params.local_genomes}/grch38_homo_sapiens_-assembly-.fasta.fai"
bwa = "${params.local_genomes}/bwa/"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have the same format on bwa and the bwamem2 index.
Could you do "${params.local_genomes}/grch38_homo_sapiens_-assembly-.fasta.{amb,ann,bwt,pac,sa}"?

modules/local/sentieon/dnascope.nf Show resolved Hide resolved
--algo DNAModelApply \\
--model $ml_model \\
-v $vcf \\
${prefix}_dnascope_ml.vcf
Copy link
Collaborator

@jemten jemten May 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to have this produce bgzipped output?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible; sentieon uses a separate module (util vcfconvert) for compression. However, that module doesn't accept input from terminal so using pipe is not an option, and so it has to be a separate step in the pipeline.

Also, I didn't focus so much on the compression here because vcf files generated here will have to be combined in the next step to generate a single multisample vcf in case of duos and trios. Since we will only publish that multisample vcf, it made sense to compress that file directly and not these individual sample vcfs. What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just adding .gz to file extension did the trick! 😄

$dbsnp \\
$args2 \\
$model \\
${prefix}_dnascope.vcf
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question about bgzip

Copy link
Contributor

@sima-r sima-r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add the tabix/bgzip in the call_snv_sentieon subworkflow.

bwamem2 = "${params.local_genomes}/grch38_homo_sapiens_-assembly-.fasta.{0123,amb,ann,bwt.2bit.64,pac}"
call_interval = ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have call_interval variable here assuming this holding a bed file I guess? Would it be ok to have it here if one wants to give only some particular chromosomes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem here is that if I assume they are going to send in a value and they use a path(file), then the file won't get staged in the work directory. But if I assume then the input is going to be a file, I can create a path-type channel, which will stage the file in the work directory by default. And one can always put the chromosomes they are interested in, in a file. That's why I chose to do the latter.


script:
def args = task.ext.args ?: ''
def args2 = task.ext.args2 ?: ''
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happened to args3 which returned call_inretvals var? Maybe you wanted to leave it for sites' config?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I see what was the change, never mind this comment! :D

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I realized having call_intervals in modules.config will not work if the user gives a file as an input so I have moved it back into the module..

conf/genomes.config Outdated Show resolved Hide resolved
Co-authored-by: Anders Jemt <jemten@users.noreply.github.com>
@ramprasadn ramprasadn merged commit b25228f into nf-core:dev May 18, 2022
@ramprasadn ramprasadn deleted the feat-dnascope branch May 18, 2022 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants