-
Notifications
You must be signed in to change notification settings - Fork 241
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
The code was handling the merging in a simplistic way, SNPs vs everything else. As pointed by #2084, it also would not merge indels with `-m +indels`, only with `-m +both` or `-m +any`. Also splitting by type was not functioning properly due to an error where two incompatible bitmasks were used together (e.g. COLLAPSE_SNPS vs VCF_SNP) This is now fixed and improved, the new behavior is as follows: - multiallelic sites with containing SNPs but not indels are split with `-m -snps` but not with `-m -indels`, and analogously for indels. - multiallelic sites containing both SNPs and indels are split when any of the following is given: `-m -snps`, `-m -indels`, `-m both`, `-m any` - merging with `-m +snps` and `-m +indels` should work as expected in case of pure SNP or indel sites. When the input sites contain a mixture of types (e.g. SNP + indel), such sites will not be merged. - merging with `-m +both` will merge together not just SNPs with SNPs and indels with indels, but also "other types" with "other types". Note: this could be improved by providing the user with a way to fine-tune the desired behaviour, for example something like -m +snps+mnps,indels to merge SNPs with MNPs together and indels together. This would not be too difficult to add, but would complicate the user interface. Another improvement would be to make it possible to split multiallelic sites containing both SNPs and indels so that a) two mutliallelic sites are emitted, one with SNPs only and one with indels only b) as above, but one is transformed into multiple biallelic sites and one multiallelic site This could be further improved (and complicated) by considering other variant types. Resolves #2084
- Loading branch information
Showing
14 changed files
with
207 additions
and
71 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
##fileformat=VCFv4.2 | ||
##FILTER=<ID=PASS,Description="All filters passed"> | ||
##contig=<ID=1,length=248387328> | ||
##reference=file:ref.fa | ||
#CHROM POS ID REF ALT QUAL FILTER INFO | ||
1 1 . C T,CTT,A,CAA . . . | ||
1 2 . C <DEL>,<DUP> . . . |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
##fileformat=VCFv4.2 | ||
##FILTER=<ID=PASS,Description="All filters passed"> | ||
##contig=<ID=1,length=248387328> | ||
##reference=file:ref.fa | ||
#CHROM POS ID REF ALT QUAL FILTER INFO | ||
1 1 . C T,CTT . . . | ||
1 1 . C A,CAA . . . | ||
1 2 . C <DEL>,<DUP> . . . |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
##fileformat=VCFv4.2 | ||
##contig=<ID=1,length=248387328> | ||
##reference=file:ref.fa | ||
#CHROM POS ID REF ALT QUAL FILTER INFO | ||
1 1 . C T,CTT . . . | ||
1 1 . C A,CAA . . . | ||
1 2 . C <DEL> . . . | ||
1 2 . C <DUP> . . . |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
##fileformat=VCFv4.2 | ||
##FILTER=<ID=PASS,Description="All filters passed"> | ||
##contig=<ID=chr1,length=248387328> | ||
##contig=<ID=chr2,length=242696752> | ||
##contig=<ID=chr3,length=201105948> | ||
##reference=file:ref.fa | ||
#CHROM POS ID REF ALT QUAL FILTER INFO | ||
chr1 29291 . C T,G,A . . . | ||
chr2 29292 . T C . . . | ||
chr2 29292 . T TCCCTCTCCTTTCTCCTCTCTAGCC,TCTCTTTCTCACTGTCTCTCTAGCC,TCCCTCTCCTTTCTCCTCTCTAGC,TCCATCTGTATCCTCTCTAAGC,TCCCTCTCCTTTCTCCTCAGCC,TCCCTCTCCCTTTCTCCTCTCTAGCC,TCCTCTCCTTTCTCCTCTACCGC,TCCCTCTCCTTTCTCTCTCTAGCC,TCCCTCTCCTTTCTCCTCTAGCC,TCCCTCTCCTTTTCCTCCCCAGCC,TCCCTCTCCTTCTCCTCTCTAGCC,TCCCTCTCCCTTCTCCTCTCTCAC . . . | ||
chr3 29292 . T TCCCTCTCCTTTCTCCTCTCTAGCC,TCTCTTTCTCACTGTCTCTCTAGCC,TCCCTCTCCTTTCTCCTCTCTAGC,TCCATCTGTATCCTCTCTAAGC,TCCCTCTCCTTTCTCCTCAGCC,TCCCTCTCCCTTTCTCCTCTCTAGCC,TCCTCTCCTTTCTCCTCTACCGC,TCCCTCTCCTTTCTCTCTCTAGCC,TCCCTCTCCTTTCTCCTCTAGCC,TCCCTCTCCTTTTCCTCCCCAGCC,TCCCTCTCCTTCTCCTCTCTAGCC,TCCCTCTCCCTTCTCCTCTCTCAC . . . | ||
chr3 29292 . T CGTA . . . |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
##fileformat=VCFv4.2 | ||
##FILTER=<ID=PASS,Description="All filters passed"> | ||
##contig=<ID=chr1,length=248387328> | ||
##contig=<ID=chr2,length=242696752> | ||
##contig=<ID=chr3,length=201105948> | ||
##reference=file:ref.fa | ||
#CHROM POS ID REF ALT QUAL FILTER INFO | ||
chr1 29291 . C T . . . | ||
chr1 29291 . C G . . . | ||
chr1 29291 . C A . . . | ||
chr2 29292 . T C . . . | ||
chr2 29292 . T TCCCTCTCCTTTCTCCTCTCTAGCC,TCTCTTTCTCACTGTCTCTCTAGCC,TCCCTCTCCTTTCTCCTCTCTAGC,TCCATCTGTATCCTCTCTAAGC,TCCCTCTCCTTTCTCCTCAGCC,TCCCTCTCCCTTTCTCCTCTCTAGCC,TCCTCTCCTTTCTCCTCTACCGC,TCCCTCTCCTTTCTCTCTCTAGCC,TCCCTCTCCTTTCTCCTCTAGCC,TCCCTCTCCTTTTCCTCCCCAGCC,TCCCTCTCCTTCTCCTCTCTAGCC,TCCCTCTCCCTTCTCCTCTCTCAC . . . | ||
chr3 29292 . T TCCCTCTCCTTTCTCCTCTCTAGCC,TCTCTTTCTCACTGTCTCTCTAGCC,TCCCTCTCCTTTCTCCTCTCTAGC,TCCATCTGTATCCTCTCTAAGC,TCCCTCTCCTTTCTCCTCAGCC,TCCCTCTCCCTTTCTCCTCTCTAGCC,TCCTCTCCTTTCTCCTCTACCGC,TCCCTCTCCTTTCTCTCTCTAGCC,TCCCTCTCCTTTCTCCTCTAGCC,TCCCTCTCCTTTTCCTCCCCAGCC,TCCCTCTCCTTCTCCTCTCTAGCC,TCCCTCTCCCTTCTCCTCTCTCAC . . . | ||
chr3 29292 . T CGTA . . . |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
##fileformat=VCFv4.2 | ||
##FILTER=<ID=PASS,Description="All filters passed"> | ||
##contig=<ID=chr1,length=248387328> | ||
##contig=<ID=chr2,length=242696752> | ||
##contig=<ID=chr3,length=201105948> | ||
##reference=file:ref.fa | ||
#CHROM POS ID REF ALT QUAL FILTER INFO | ||
chr1 29291 . C T,G,A . . . | ||
chr2 29292 . T C . . . | ||
chr2 29292 . T TCCCTCTCCTTTCTCCTCTCTAGCC . . . | ||
chr2 29292 . T TCTCTTTCTCACTGTCTCTCTAGCC . . . | ||
chr2 29292 . T TCCCTCTCCTTTCTCCTCTCTAGC . . . | ||
chr2 29292 . T TCCATCTGTATCCTCTCTAAGC . . . | ||
chr2 29292 . T TCCCTCTCCTTTCTCCTCAGCC . . . | ||
chr2 29292 . T TCCCTCTCCCTTTCTCCTCTCTAGCC . . . | ||
chr2 29292 . T TCCTCTCCTTTCTCCTCTACCGC . . . | ||
chr2 29292 . T TCCCTCTCCTTTCTCTCTCTAGCC . . . | ||
chr2 29292 . T TCCCTCTCCTTTCTCCTCTAGCC . . . | ||
chr2 29292 . T TCCCTCTCCTTTTCCTCCCCAGCC . . . | ||
chr2 29292 . T TCCCTCTCCTTCTCCTCTCTAGCC . . . | ||
chr2 29292 . T TCCCTCTCCCTTCTCCTCTCTCAC . . . | ||
chr3 29292 . T TCCCTCTCCTTTCTCCTCTCTAGCC . . . | ||
chr3 29292 . T TCTCTTTCTCACTGTCTCTCTAGCC . . . | ||
chr3 29292 . T TCCCTCTCCTTTCTCCTCTCTAGC . . . | ||
chr3 29292 . T TCCATCTGTATCCTCTCTAAGC . . . | ||
chr3 29292 . T TCCCTCTCCTTTCTCCTCAGCC . . . | ||
chr3 29292 . T TCCCTCTCCCTTTCTCCTCTCTAGCC . . . | ||
chr3 29292 . T TCCTCTCCTTTCTCCTCTACCGC . . . | ||
chr3 29292 . T TCCCTCTCCTTTCTCTCTCTAGCC . . . | ||
chr3 29292 . T TCCCTCTCCTTTCTCCTCTAGCC . . . | ||
chr3 29292 . T TCCCTCTCCTTTTCCTCCCCAGCC . . . | ||
chr3 29292 . T TCCCTCTCCTTCTCCTCTCTAGCC . . . | ||
chr3 29292 . T TCCCTCTCCCTTCTCCTCTCTCAC . . . | ||
chr3 29292 . T CGTA . . . |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
##fileformat=VCFv4.2 | ||
##FILTER=<ID=PASS,Description="All filters passed"> | ||
##contig=<ID=chr1,length=248387328> | ||
##contig=<ID=chr2,length=242696752> | ||
##contig=<ID=chr3,length=201105948> | ||
##reference=file:ref.fa | ||
#CHROM POS ID REF ALT QUAL FILTER INFO | ||
chr1 29291 . C T,G,A . . . | ||
chr2 29292 . T C,TCCCTCTCCTTTCTCCTCTCTAGCC,TCTCTTTCTCACTGTCTCTCTAGCC,TCCCTCTCCTTTCTCCTCTCTAGC,TCCATCTGTATCCTCTCTAAGC,TCCCTCTCCTTTCTCCTCAGCC,TCCCTCTCCCTTTCTCCTCTCTAGCC,TCCTCTCCTTTCTCCTCTACCGC,TCCCTCTCCTTTCTCTCTCTAGCC,TCCCTCTCCTTTCTCCTCTAGCC,TCCCTCTCCTTTTCCTCCCCAGCC,TCCCTCTCCTTCTCCTCTCTAGCC,TCCCTCTCCCTTCTCCTCTCTCAC . . . | ||
chr3 29292 . T CGTA,TCCCTCTCCTTTCTCCTCTCTAGCC,TCTCTTTCTCACTGTCTCTCTAGCC,TCCCTCTCCTTTCTCCTCTCTAGC,TCCATCTGTATCCTCTCTAAGC,TCCCTCTCCTTTCTCCTCAGCC,TCCCTCTCCCTTTCTCCTCTCTAGCC,TCCTCTCCTTTCTCCTCTACCGC,TCCCTCTCCTTTCTCTCTCTAGCC,TCCCTCTCCTTTCTCCTCTAGCC,TCCCTCTCCTTTTCCTCCCCAGCC,TCCCTCTCCTTCTCCTCTCTAGCC,TCCCTCTCCCTTCTCCTCTCTCAC . . . |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
##fileformat=VCFv4.2 | ||
##contig=<ID=chr1,length=248387328> | ||
##contig=<ID=chr2,length=242696752> | ||
##contig=<ID=chr3,length=201105948> | ||
##reference=file:ref.fa | ||
#CHROM POS ID REF ALT QUAL FILTER INFO | ||
chr1 29291 . C T,G,A . . . | ||
chr2 29292 . T C,TCCCTCTCCTTTCTCCTCTCTAGCC,TCTCTTTCTCACTGTCTCTCTAGCC,TCCCTCTCCTTTCTCCTCTCTAGC,TCCATCTGTATCCTCTCTAAGC,TCCCTCTCCTTTCTCCTCAGCC,TCCCTCTCCCTTTCTCCTCTCTAGCC,TCCTCTCCTTTCTCCTCTACCGC,TCCCTCTCCTTTCTCTCTCTAGCC,TCCCTCTCCTTTCTCCTCTAGCC,TCCCTCTCCTTTTCCTCCCCAGCC,TCCCTCTCCTTCTCCTCTCTAGCC,TCCCTCTCCCTTCTCCTCTCTCAC . . . | ||
chr3 29292 . T CGTA,TCCCTCTCCTTTCTCCTCTCTAGCC,TCTCTTTCTCACTGTCTCTCTAGCC,TCCCTCTCCTTTCTCCTCTCTAGC,TCCATCTGTATCCTCTCTAAGC,TCCCTCTCCTTTCTCCTCAGCC,TCCCTCTCCCTTTCTCCTCTCTAGCC,TCCTCTCCTTTCTCCTCTACCGC,TCCCTCTCCTTTCTCTCTCTAGCC,TCCCTCTCCTTTCTCCTCTAGCC,TCCCTCTCCTTTTCCTCCCCAGCC,TCCCTCTCCTTCTCCTCTCTAGCC,TCCCTCTCCCTTCTCCTCTCTCAC . . . |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters