Fix/sentieon gvcftyper intervals#11668
Merged
Merged
Conversation
14 tasks
The test asserted variantsMD5 differs from the buggy md5, but the genome.bed used here covers all variants in test.genome.vcf.gz, so the BED is a no-op and the md5 is the same with or without --interval. Surfacing the bug would require a gVCF with variants outside the BED's coverage, which the test-datasets don't provide.
SPPearce
approved these changes
May 18, 2026
manascripts
pushed a commit
to manascripts/modules
that referenced
this pull request
May 21, 2026
* fix: sentieon gvcftyper accepts intervals * style: line breaks in sentieon gvcftyper driver call Per @SPPearce review on nf-core#11582 — match the multi-line format used in sentieon/haplotyper. * lint: rename meta1..meta4 to meta2..meta5 per nf-core convention * test: drop tautological regression test The test asserted variantsMD5 differs from the buggy md5, but the genome.bed used here covers all variants in test.genome.vcf.gz, so the BED is a no-op and the md5 is the same with or without --interval. Surfacing the bug would require a gVCF with variants outside the BED's coverage, which the test-datasets don't provide.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
SENTIEON_GVCFTYPERdeclarespath(intervals)as an input and stages the file into the work dir, but the renderedsentieon drivercommand never references it — so GVCFtyper has beenrunning unconstrained even when callers pass per-interval BEDs.
The fix mirrors the pattern already used in
sentieon/haplotyper(modules/nf-core/sentieon/haplotyper/main.nf:39).Reason for PR
In a multi-sample joint-genotyping pipeline (e.g.
nf-core/sarek's sentieon path), GVCFtyper is invoked per shard with a per-interval BED so each shard genotypes only its assigned region;the per-interval VCFs are then concatenated by GATK4 MergeVcfs (which assumes non-overlapping inputs). Without
--interval, every shard processes the full gVCF, so neighbouring shardsemit overlapping records at interval boundaries — the concat then produces duplicate/overlapping variant calls. Single-sample runs are largely unaffected, which is why this slipped
through.
Evidence the input was being ignored
The
sentieon driverinvocation lacks any--intervalflag despitepath(intervals)being declared and staged. With the fix the command renders correctly as:Note: the existing snapshot file does not demonstrate the bug behaviorally — the BED in the test-datasets happens to cover all variants in the test gVCF, so the constrained and
unconstrained outputs produce the same md5. The bug is real (the flag was missing from the command), but exercising it from tests would require a gVCF with variants outside the BED's
coverage, which the current test-datasets don't provide.
Tests
No new test added. With the available test data any regression test using it would be tautological (BED covers all gVCF variants, so the BED is a no-op regardless of whether
--intervalis honored). The fix mirrors the pattern in
sentieon/haplotyper; existing tests continue to pass.PR checklist
existing module
topic: versions— unchanged from existing modulelabel—label 'process_high'already presentnf-core modules test sentieon/gvcftyper --profile dockernf-core modules test sentieon/gvcftyper --profile singularitynf-core modules test sentieon/gvcftyper --profile conda