Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable GATK filters when joint calling #1050

Merged
merged 2 commits into from Jun 1, 2023

Conversation

amizeranschi
Copy link
Contributor

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs- [ ] If necessary, also make a PR on the nf-core/sarek branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@asp8200
Copy link
Contributor

asp8200 commented May 30, 2023

With the change introduced by this PR, the pipeline will "silently" skip the haplotypecaller-filter on joint-germline runs, right? I'm just wondering whether it would be better to stop the pipeline (with meaningful err msg) if the user tries to run joint-germline with filtering or at least inform the user that the filtering is being skipped 🤔

Here are some somewhat related scenarios where the pipeline issues a warning:

[log.warn "If Haplotypecaller is specified, without --dbsnp or --known_indels no filtering will be done. For filtering, please provide at least one of --dbsnpor--known_indels`. ....](

sarek/workflows/sarek.nf

Lines 113 to 124 in c87f4eb

if (params.tools && params.tools.split(',').contains('haplotypecaller')){
log.warn "If Haplotypecaller is specified, without `--dbsnp` or `--known_indels no filtering will be done. For filtering, please provide at least one of `--dbsnp` or `--known_indels`.\nFor more information see FilterVariantTranches (single-sample, default): https://gatk.broadinstitute.org/hc/en-us/articles/5358928898971-FilterVariantTranches\nFor more information see VariantRecalibration (--joint_germline): https://gatk.broadinstitute.org/hc/en-us/articles/5358906115227-VariantRecalibrator\nFor more information on GATK Best practice germline variant calling: https://gatk.broadinstitute.org/hc/en-us/articles/360035535932-Germline-short-variant-discovery-SNPs-Indels-"
}
}
if (params.joint_germline && (!params.tools || !params.tools.split(',').contains('haplotypecaller'))){
log.error "The Haplotypecaller should be specified as one of the tools when doing joint germline variant calling. (The Haplotypecaller could be specified by adding `--tools haplotypecaller` to the nextflow command.) "
exit 1
}
if (params.joint_germline && (!params.dbsnp || !params.known_indels || !params.known_snps || params.no_intervals)){
log.warn "If Haplotypecaller is specified, without `--dbsnp`, `--known_snps`, `--known_indels` or the associated resource labels (ie `known_snps_vqsr`), no variant recalibration will be done. For recalibration you must provide all of these resources.\nFor more information see VariantRecalibration: https://gatk.broadinstitute.org/hc/en-us/articles/5358906115227-VariantRecalibrator \nJoint germline variant calling also requires intervals in order to genotype the samples. As a result, if `--no_intervals` is set to `true` the joint germline variant calling will not be performed."
}
)

@amizeranschi
Copy link
Contributor Author

amizeranschi commented May 30, 2023

I'm just wondering whether it would be better to stop the pipeline (with meaningful err msg) if the user tries to run joint-germline with filtering or at least inform the user that the filtering is being skipped

My worry here is that, in this scenario, the user doesn't really have a choice about turning VCF filtering on in the first place. It's just on by default, with an option for disabling it via --skip_tools haplotypecaller_filter. So we'd essentially stop the pipeline or warn the users about some pipeline logic that they didn't initiate, which might be confusing.

@asp8200
Copy link
Contributor

asp8200 commented May 30, 2023

I'm just wondering whether it would be better to stop the pipeline (with meaningful err msg) if the user tries to run joint-germline with filtering or at least inform the user that the filtering is being skipped

My worry here is that, in this scenario, the user doesn't really have a choice about turning VCF filtering on in the first place. It's just on by default, with an option for disabling it via --skip_tools haplotypecaller_filter. So we'd essentially stop the pipeline or warn the users about some pipeline logic that they didn't initiate, which might be confusing.

Also valid points. I don't have a strong opinion on this; just wanted to share my two cents. Perhaps somebody else will chip in as well.

Anyways, thanks for the PR.

@FriederikeHanssen
Copy link
Contributor

So this established back the old behavior. The pipeline was not suppoed to do filtering for joint germline, but single sample only, (where it sometimes can be skipped because it takes a long time): https://gatk.broadinstitute.org/hc/en-us/articles/360035535932-Germline-short-variant-discovery-SNPs-Indels-

I think we might need to refactor this subworkflow a bit to make it more intuitive. and put the whole FilterVarianttranches/CNNScore part into a singlesample haplotypecaller subworkflow, but I would do that in a separate PR

@maxulysse maxulysse merged commit 3b7258e into nf-core:dev Jun 1, 2023
31 checks passed
@amizeranschi amizeranschi deleted the amizeranschi-issue-1025 branch June 1, 2023 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants