Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

norm remove duplicates doesn't handle SVLEN, removes non-duplicate symbolic variants #2182

Closed
davmlaw opened this issue May 9, 2024 · 1 comment

Comments

@davmlaw
Copy link

davmlaw commented May 9, 2024

The following VCF contains 3 deletions of length 1kb, 2kb and 3kb:

##fileformat=VCFv4.1
##INFO=<ID=SVLEN,Number=.,Type=Integer,Description="Difference in length between REF and ALT alleles">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
##contig=<ID=NC_000012.11,length=141213431>
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO
NC_000012.11	88520131	23651	C	<DEL>	.	.	SVLEN=-1000;SVTYPE=DEL
NC_000012.11	88520131	24042	C	<DEL>	.	.	SVLEN=-2000;SVTYPE=DEL
NC_000012.11	88520131	24043	C	<DEL>	.	.	SVLEN=-3000;SVTYPE=DEL

If you run (even with "exact") it removes the records with the same chrom/pos/ref/alt even though SVLEN is different (and thus separate variants)

bcftools norm --remove-duplicates --rm-dup=exact symbolic_uniq.vcf

If this is difficult, it would be good to at the least raise a warning about this, as current behavior is silent data loss. Thanks

bcftools --version
bcftools 1.20
Using htslib 1.20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants