Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

norm doesn't normalize symbolic variants #1919

Closed
davmlaw opened this issue May 10, 2023 · 2 comments
Closed

norm doesn't normalize symbolic variants #1919

davmlaw opened this issue May 10, 2023 · 2 comments

Comments

@davmlaw
Copy link

davmlaw commented May 10, 2023

Hi, thanks for bcftools!

A variant NC_000003.11:g.128204049_128206714del can be written in a VCF as having an explicit REF or via a symbolic alt

bcftools norm will left-align the former (to position 128204042), but not the symbolic representation - I think it should normalize both

#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO
NC_000003.11	128204048	.	GAGGAGATCAGGGAGCCATCGAAATCCCAAGATCAGACTGATTGAGTTAGAGACCCAGATTCCTTAATGGTCTGGAACCTCCGGAGTGCCTGAAACATGCACACACAACATGCACACAGACACACGTATACACATGCACACACGCTCCCAAACACATGCACATACAGAGTCACTTCCCTCGCTTCACATACTAAGTCCTGAGGTGGCTTGAATTTTCACTTAAGATTCAGGGAGGGAAAGGGGGAATTCCTGTTCTATGTTCTGGTCAGGCAGTGACCAACCCTGGGGCAAGGAACTGAACTTTGGGGGTACACTGGAAGCACTTAAGAAAATGGCAAAAGTTTTAGAGTCTCCTCTCCCTGACCCGGGGGTCTCAAACATCTGCTGGGGGCTATTAGAGCGAGACATCACCCATCCCCAGATCTGGGAAACCAACACTGCCACCTCTCCCAAGTCACAGCTCCCCACCACAAAAACGCAAATGCTCCCCTCTTCCACGAAGTCCCCAGCACCTGCCTTTACCTGAACAGGAACGAGCCTTGCTGCGCTGCTTAGGGGTGAAGCTGGAGGCCGGTCCCCCCAGGAAGCCTCCGGGGTGGAAGAGTCCGCTGCTGTAGTCGTGGGCAGCCGCCGGCACATAGGAGGGGTAGGTGGGGATGGGGTGGTGTGTAGCAGGCTGGGTGCCCATAGTAGCTAGGCCTGGGCGCAGGGGACTGCCACTTTCCATCTTCATGCTCTCCGTCAGTGACACCTGGTACTTGACGCCGTCCTTGTCCTCTCCTCGGGCTGCACTACCCCCCGCGGAAGATGAGGCTGGAGACGCAGCCCCCGTGGTGCTAGGGTCAGGAGACACTTCTTTGGGTGGCGTGGGTGGGAAGCCGAAAAGGTGGGAGCCAGAGTGGGCTGCTGTAGGGGTGAGGGAGGCCACTGAGCTCCCGCTGCCTCCCCCGCTCCCACCCCCAGCCCCTGGGTACACAGAGAGTGGGCCTCCAGGGCCTCCAGCAGCTGAGGGGTGCAGTGGCGTCTTGGAGAAGGGGCTCACGGTCCAGGGGTTGTGGTGGTGGGCCGCAGCGGCAGAGAGGGCTGCTTTGCCCCCGTCCAGCCAGGGCAAACCCGGGCTGTGCAACAAGTGTGGGCGGCACATCTGGCCTCCGGTCAGGCGGGCTGCGGGCAAAGAGAGAGAGGATCAGGGTGGGCAGAAAGATCAGGGTAGGCAGAGCTAGGGACGCCCCTGACAGACATTGAGATCACGACTCCCAGAACCAGCAGTCATCCCCTCCCCAAAGAAAGCCAGAAACATAATACCCCACCGGTAATAATCAGGAATGTCAGTCCAAGCTGAAGGACAAGTGGCATAGAAGGAACCCCACCGGACAGACCCTACAGGGAACCCTCACAGGCCAGCTGGAAGTGGGCAGAAACCCTGTGGGTCCCAGACCCTCCCCAATCGGCCGCTGCTCCCACCTCTCCCGCCCCAATTTTTCAGCAGCTCGATTCCTGCGGATCCTACATCCGGGAAGCAAGCAGACGGGCCCTCCTCCCCTCCCTCGCCTGGCGCGCGGCGCCTGGGTTCTCATCACCACGGGCCCAGTGCTCACCGTGCGCGGGGCTGTAGGAGACGCGCGCCCGCGCGTGAGCGGGGTTGGCATAGTAGGGGTTGCCCTGCGAGTCGAGGTGATTGAAGAAGACGTCCACCTCGTCTGGAGGCAGCAGCTGCGCGGGTTCCATGTAGTTGTGCGCCAGGCCCGGGTGGTGTGAGTCGGGGTGCTGCGCATTCAGCACGGCCGGGTGCGCCATCCAGCGCGGCTGCTCGGGCGCCACCTCCATGGCCGGCGGCGGCGGCTCAGGGTCTGGGTGCAGACGGCAACGGCCCTGCGCGAGGAAGGGGGAGTGAGGCGTGCCGCCAGCGCCTGACACCCCCCAAAGTCCCACCACGAGGTGTCCCGCACGCCACGGAGCCCCAGCCCAGATCCGGCGAGAAAGAGCACCAGTCCCGGGTGGGAGGAAAGCCCAAGGCTCAAAACGAAAGGAAGGCGGGGGAGGGGGTTCAGCCACGCACACTCACGTGGTGACCCGCGGCTCCAGAATCACACACCCGTGCACATGGGGTCACGCCCGGGGACGGGTCCCGACACCAGTGACCCCAACAAACGCACAGAGCAGCACTTCAGTCAGACACTCACACTGAGCCCCCCCGCCCGGTAGACAAACACATGAACACAGACTCAAAAGTTGGAGACAGGCGCCCGGGCACCCAGTGTGGCACTTGATCCCAGCGACACGCACACACCCACACTTGGCGCCAGATACACATACTGATCTCAACCCCGAAAACATGCACACGCAGCCCCCTGAGCGCAGTACTAAGCGGCACAATCAGGACCTCTCAACAAAGCACACCAAAGCAGTCGCCCGCAGCCTGGCCCCCCGCCCTAAGTCCCCCCAGAGTCCCCTCAAAGCTAGGAGCGCCCCAGGCCCCCAGCCGGCTCTCAAACCCCAAACTTACACACGCAGCCGTGGGGAGGGGAGGGACTCGGCCTCTGAGAGTGAAGGAGTTCCGGCGGGAGCCCCGAGGGCGACGGGCCCAGGGACAGCACGTCCGGAGGCTGGCGGGGCTTACAGGGTAGGAGCTGGGGGTAGAGTGCGCCTCGGCCTCGGGCCCTCCCG	G	.	PASS	.
NC_000003.11	128204048	.	G	<DEL>	.	PASS	END=128206714;SVTYPE=DEL;SVLEN=-2666

I have attached this as a vcf (with .txt extension)
indel_normalise_test.GRCh37.txt

@davmlaw
Copy link
Author

davmlaw commented May 22, 2023

In case I end up attempting to fix this (warning - I haven't written C in 15 years) I am interested in what people believe a good fix would be. Perhaps the way to solve this is to add an option to convert from symbolic variants to explicit ones, and back again. Ie you could go:

bcftools cmd --symbolic-to-explicit | bcftools norm | bcftools cmd --explicit-to-symbolic --max-allele-length=1000

I am not sure what command to put this under - am using "cmd", though perhaps --symbolic-to-explicit could be part of the norm command

@pd3 pd3 closed this as completed in 7040c10 May 23, 2023
@pd3
Copy link
Member

pd3 commented May 23, 2023

An experimental support for normalizing symbolic alleles has been added. So far it only considers deletions (<DEL.*>) but nothing else

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants