Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bcftools - issue when trying to merge two VCFs #1685

Closed
AshleyShi2000 opened this issue Mar 22, 2022 · 5 comments
Closed

bcftools - issue when trying to merge two VCFs #1685

AshleyShi2000 opened this issue Mar 22, 2022 · 5 comments
Labels
htslib-dependent Cannot be fixed until htslib is fixed

Comments

@AshleyShi2000
Copy link

I'm getting this error when trying to merge my two VCF files. I first used bgzip to compress my files and then indexed them before merging.
Not ready for type [0]: SNP at 82113

Thanks for helping!

@pd3
Copy link
Member

pd3 commented Mar 25, 2022

That's indeed a very cryptic message, did not expect to ever encounter such case. Could you please check the headers and show which the tags are defined as Number=G/A/R? A small test case to reproduce the problem would be really helpful here.

@AshleyShi2000
Copy link
Author

I've uploaded the original files to OneDrive!
https://1drv.ms/u/s!AuZRdpWMK4MxgUVCAKTE_X6r9Y5a?e=JJ3vfe

These are the commands that I used:
bgzip -c DTB-003.vcf > DTB003.vcf.gz
bcftools index DTB003.vcf.gz
bgzip -c DTB-005.vcf > DTB005.vcf.gz
bcftools index DTB005.vcf.gz
bcftools merge DTB003.vcf.gz DTB005.vcf.gz > merged.vcf
Not ready for type [0]: SNP at 82113

The tags that are defined as Number=A are:
##INFO=<ID=VARTYPE,Number=A,Type=String,Description="Comma separated list of variant types. One per allele">
##INFO=<ID=SNP,Number=A,Type=Flag,Description="Variant is a SNP">
##INFO=<ID=MNP,Number=A,Type=Flag,Description="Variant is a MNP">
##INFO=<ID=INS,Number=A,Type=Flag,Description="Variant is a INS">
##INFO=<ID=DEL,Number=A,Type=Flag,Description="Variant is a DEL">
##INFO=<ID=MIXED,Number=A,Type=Flag,Description="Variant is a MIXED">
##INFO=<ID=HOM,Number=A,Type=Flag,Description="Variant is homozygous">
##INFO=<ID=HET,Number=A,Type=Flag,Description="Variant is heterozygous">

I hope these are what you're looking for. Thanks again for your kind help!

@pd3
Copy link
Member

pd3 commented Mar 29, 2022

Thank you for the test files. The problem is caused by flag types being defined as Number=A. This violates the VCF specification which explicitly states that

The ‘Flag’ type indicates that the INFO field does not contain a Value entry, and hence the Number must be 0
in this case.

An easy workaround should be to fix the header

bcftools view -h file.bcf > hdr.txt
# edit the header and change all Type=Flag tags to Number=0
bcftools reheader -h hdr.txt file.bcf -o fixed.bcf

Ideally the program that produced the VCFs should be fixed.

pd3 added a commit to pd3/htslib that referenced this issue Mar 29, 2022
Invalid definitions are fixed internally and warning such as

    [W::bcf_hdr_register_hrec] The definition of Flag "INFO/SNP" is invalid, forcing Number=0

are printed so that downstream analyses can work (e.g. `bcftools merge`).
However, output VCF headers are not fixed.

This could go one step further and also modify the headers.

See also samtools/bcftools#1685
@pd3
Copy link
Member

pd3 commented Mar 29, 2022

After this pull request is fixed samtools/htslib#1415, bcftools will be able to work with such files.

Note that it is only a workaround, this should really be addressed at the VCF producer side.

@pd3 pd3 added the htslib-dependent Cannot be fixed until htslib is fixed label Mar 29, 2022
daviesrob pushed a commit to samtools/htslib that referenced this issue Apr 1, 2022
Invalid definitions are fixed internally and warning such as

    [W::bcf_hdr_register_hrec] The definition of Flag "INFO/SNP" is invalid, forcing Number=0

are printed so that downstream analyses can work (e.g. `bcftools merge`).
However, output VCF headers are not fixed.

This could go one step further and also modify the headers.

See also samtools/bcftools#1685
@pd3
Copy link
Member

pd3 commented Apr 1, 2022

This is now merged via samtools/htslib#1415

@pd3 pd3 closed this as completed Apr 1, 2022
daviesrob pushed a commit to daviesrob/htslib that referenced this issue Apr 4, 2022
Invalid definitions are fixed internally and warning such as

    [W::bcf_hdr_register_hrec] The definition of Flag "INFO/SNP" is invalid, forcing Number=0

are printed so that downstream analyses can work (e.g. `bcftools merge`).
However, output VCF headers are not fixed.

This could go one step further and also modify the headers.

See also samtools/bcftools#1685
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
htslib-dependent Cannot be fixed until htslib is fixed
Projects
None yet
Development

No branches or pull requests

2 participants