New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[E::bcf_hdr_read] Failed to read BCF header #672
Comments
The error comes from here https://github.com/samtools/htslib/blob/develop/vcf.c#L923. Most likely the header is too big to fit in SIZE_MAX bytes. I am not sure this check is a good idea, apparently the standard merely says that the size must be at least 65535 bytes, which is too limiting. As far as I know, the BCF specification does not impose any limit on the header size, certainly not this small. @daviesrob do you want to comment? |
You are right, the header could cause this error. For the same data and commands, if I use pinus taeda as reference, there are no errors. Could you help me to fix this problem? Thank you! |
BCF does have a limit on the size of the header, as it stores the length in the four bytes following the Given the number of scaffolds, it's quite possible that this header is longer than the limit. I'm afraid the only solutions would be to use VCF instead of BCF, or try to shrink the header somehow. Given the number of records, you would need to use less than 73 bytes per scaffold. |
This would be a very bad vulnerability if it were allowed to happen. |
@daviesrob My information about the SIZE_MAX variable was based on this thread. I did not investigate whether the assertion in the answer marked as best is correct https://stackoverflow.com/questions/22514803/maximum-size-of-size-t The limit on the size of the BCF header is of course part of the implementation, but not of the specification yet. Perhpaps the sanity check should be done differently and not rely on SIZE_MAX. We know exactly what the maximum value can be in four-byte integer. The format cannot be platform dependent. |
I will try to comment that line and recompile htslib and bcftools to see whether it will work or not. I will give you the feedback later, Thank you. |
Unless you are using some obscure system, I think @daviesrob is right and it will not help - the header is probably too big to be represented in this version of BCF. |
Hi @daviesrob , what do you mean by " use VCF instead of BCF"? I want to use Samtools and Bcftools pipeline for my project. I have more than 400 bam files, it is hard to shrink the header for each file. Can you give some suggestion? |
I mean you need to write out the text VCF format instead of the binary BCF format. You can get samtools and bcftools to do this by using the right command-line options, for example:
For bcftools |
Hi,
I am using Samtools and Bcftools to call SNPs with sugar pine as a reference (58407655 scaffolds). I met an error message "[E::bcf_hdr_read] Failed to read BCF header". Could anyone can help me fix this?
Best,
Wei
The text was updated successfully, but these errors were encountered: