Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbSNP 156 VCF now includes non-32 bit integers causing "Extreme INFO/RS value encountered and set to missing" errors #1961

Open
freeseek opened this issue Jul 13, 2023 · 4 comments
Labels
cannot-fix has-workaround htslib-dependent Cannot be fixed until htslib is fixed

Comments

@freeseek
Copy link
Contributor

With release 156, now dbSNP includes rsIDs larger than 2^31 which cannot be properly handled by bcftools anymore:

$ wget https://ftp.ncbi.nih.gov/snp/redesign/latest_release/VCF/GCF_000001405.40.gz{,.tbi}
$ tabix GCF_000001405.40.gz NC_000001.11:6259533-6259533
NC_000001.11	6259533	rs2148352434	C	T	.	.	RS=2148352434;dbSNPBuildID=156;SSR=0;GENEINFO=GPR153:387509;VC=SNV;INT;R5;GNO;FREQ=1000Genomes:0.9998,0.0001562
$ bcftools view -H GCF_000001405.40.gz -r NC_000001.11:6259533-6259533
[W::vcf_parse_info] Extreme INFO/RS value encountered and set to missing at NC_000001.11:6259533
NC_000001.11	6259533	rs2148352434	C	T	.	.	RS=.;dbSNPBuildID=156;SSR=0;GENEINFO=GPR153:387509;VC=SNV;INT;R5;GNO;FREQ=1000Genomes:0.9998,0.0001562

If HTSlib is compiled with option -DVCF_ALLOW_INT64 then it works fine:

$ bcftools view -H GCF_000001405.40.gz -r NC_000001.11:6259533-6259533
NC_000001.11	6259533	rs2148352434	C	T	.	.	RS=2148352434;dbSNPBuildID=156;SSR=0;GENEINFO=GPR153:387509;VC=SNV;INT;R5;GNO;FREQ=1000Genomes:0.9998,0.0001562

However, this cannot be represented anymore as a binary VCF, which is a huge problem:

$ bcftools view -Ou GCF_000001405.40.gz -r NC_000001.11:6259533-6259533 | bcftools view -H
[E::bcf_write] Data at NC_000001.11:6259533 contains 64-bit values not representable in BCF. Please use VCF instead
[main_vcfview] Error: cannot write to (null)

Is there a discussion in samtools/hts-specs to get the BCF specification to update the specification to 64-bit values?

@pd3
Copy link
Member

pd3 commented Jul 14, 2023

Changing BCF specification is not an easy task and may take a long time even if there is a good will to do it.
The problem could be addressed more easily at dbSNP side if the INFO/RS was a string rather than an integer.

@ShrutiBaikerikar
Copy link

Hi,

I am getting the same error when trying to annotate dbSNP 156. I understand from the discussion that this issue can't be fixed temporarily. But can you help me with compiling HTSlib with option -DVCF_ALLOW_INT64. I did read the documentation and it states that this option needs to be added manually in the makefile. I tried that and it's not working. I made this change in the makefile in the htslib-1.20 folder with bcftools-1.20.
Since I have no experience in developing with C++ and make, could you please specify the exact changes to be made in the makefile?
Is this correct?
CFLAGS = -g -Wall -O2 -fvisibility=hidden -DVCF_ALLOW_INT64=1

@pd3
Copy link
Member

pd3 commented Apr 17, 2024

Yes, that is correct, one must compile with -DVCF_ALLOW_INT64. Try to force recompilation of vcf.c with touch vcf.c, see what the standard make command line looks like and add -DVCF_ALLOW_INT64. It should be noted that this has not been terribly well tested, hopefully the code did not deteriorate too much.

Perhaps a simpler workaround is to edit the VCF using the reheader command, changing the offending tag to Type=String

bcftools view -h file.vcf.gz > hdr.txt
# edit hdr.txt and change the offending tag to Type=String
reheader -h hdr.txt -o new.bcf file.vcf.gz

@ShrutiBaikerikar
Copy link

Hi,

Thanks for the solutions. I tried to recompile with touch vcf.c and the addition of -DVCF_ALLOW_INT64 in the makefile but the error persisted.

The second solution, which is changing the tag to Type=String, worked and I could successfully use bcftools view as well as bcftools annotate

Thank you very much for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cannot-fix has-workaround htslib-dependent Cannot be fixed until htslib is fixed
Projects
None yet
Development

No branches or pull requests

3 participants