-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat dnascope #180
Feat dnascope #180
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @ramprasadn and @sima-r. Some small questions but I think it looks good otherwise. Choose if you want to address them 😄
conf/genomes.config
Outdated
@@ -28,11 +30,13 @@ params { | |||
'GRCh38' { | |||
fasta = "${params.local_genomes}/grch38_homo_sapiens_-assembly-.fasta" | |||
fai = "${params.local_genomes}/grch38_homo_sapiens_-assembly-.fasta.fai" | |||
bwa = "${params.local_genomes}/bwa/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to have the same format on bwa and the bwamem2 index.
Could you do "${params.local_genomes}/grch38_homo_sapiens_-assembly-.fasta.{amb,ann,bwt,pac,sa}"
?
--algo DNAModelApply \\ | ||
--model $ml_model \\ | ||
-v $vcf \\ | ||
${prefix}_dnascope_ml.vcf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to have this produce bgzipped output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible; sentieon uses a separate module (util vcfconvert) for compression. However, that module doesn't accept input from terminal so using pipe is not an option, and so it has to be a separate step in the pipeline.
Also, I didn't focus so much on the compression here because vcf files generated here will have to be combined in the next step to generate a single multisample vcf in case of duos and trios. Since we will only publish that multisample vcf, it made sense to compress that file directly and not these individual sample vcfs. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just adding .gz to file extension did the trick! 😄
modules/local/sentieon/dnascope.nf
Outdated
$dbsnp \\ | ||
$args2 \\ | ||
$model \\ | ||
${prefix}_dnascope.vcf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same question about bgzip
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add the tabix/bgzip in the call_snv_sentieon subworkflow.
bwamem2 = "${params.local_genomes}/grch38_homo_sapiens_-assembly-.fasta.{0123,amb,ann,bwt.2bit.64,pac}" | ||
call_interval = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have call_interval variable here assuming this holding a bed file I guess? Would it be ok to have it here if one wants to give only some particular chromosomes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem here is that if I assume they are going to send in a value and they use a path(file), then the file won't get staged in the work directory. But if I assume then the input is going to be a file, I can create a path-type channel, which will stage the file in the work directory by default. And one can always put the chromosomes they are interested in, in a file. That's why I chose to do the latter.
modules/local/sentieon/dnascope.nf
Outdated
|
||
script: | ||
def args = task.ext.args ?: '' | ||
def args2 = task.ext.args2 ?: '' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happened to args3 which returned call_inretvals var? Maybe you wanted to leave it for sites' config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh I see what was the change, never mind this comment! :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I realized having call_intervals in modules.config will not work if the user gives a file as an input so I have moved it back into the module..
Co-authored-by: Anders Jemt <jemten@users.noreply.github.com>
PR checklist
This PR adds sentieon's dnamodelapply and dnascope modules, places them in a subworkflow, and makes relevant changes in other parts of the workflow.
nf-core lint
).nextflow run . -profile test,docker --outdir <OUTDIR> -stub
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).