New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed a raft of integer overflows in VCF land. #1044
Conversation
- Cast data into size_t before multiplication to avoid wrapping around int32. - Added checks for return values to align_mem and ks_resize - Simplified the byzantine calculation in align_mem - Fixed kroundup_size_t and kroundup32 so they cannot wrap around to zero and turn the realloc into a free. - Also added a check for ~2Gb on total length of FORMAT fields, which nullifies the need for some of the above. We may wish to remove this at some point if we want to cope with truely mammoth multi-sample data, and the above fixes means doing so will not expose bugs. However for now this check adds protection against malformed data creating excessive memory usage and CPU requirements. Credit to OSS-Fuzz Fixes oss-fuzz 21139 Fixes oss-fuzz 20881
if ((z->y>>4&0xf) == BCF_HT_STR) { | ||
if (z->is_gt) { // genotypes | ||
int32_t is_phased = 0, *x = (int32_t*)(z->buf + z->size * m); | ||
int32_t is_phased = 0, *x = (int32_t*)(z->buf + z->size * (size_t)m); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
m
is a local variable. Rather than all this casting, can it be declared more accurately as size_t m
instead? This might trickle down to fmt_aux_t::max_m
and/or fmt_aux_t::size
and suggest changing them to size_t
too, but that doesn't seem like a bad thing…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function is so vast I lose track of the scope of these variables, so I just went with the minimalist change.
I'll take a look at changing the type for local var m. I don't wish to start changing data types though as that requires a deeper understanding of vcf which I don't currently have time to, nor the desire due to the cost of -10 sanity points!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm looking at it m
is overloaded. The bit of code you're referring to uses m as sample id. In other places it's just listed as a vector (unspecified which). If I change type of m, should I also change type of g? of r? of l? or j? etc. I could create a new var sid
say for sample id and use that throughout the specific block of code you refer to, but it's just as much code changing to read.
I think I'll keep this as is and request that @pd3 looks at changing types if he feels it is valid. I simply don't understand all the knock on implications of changing type here given the overloading of var names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aarrgggghhhhh… fair enough
vcf.c
Outdated
// Limit the total memory to ~2Gb per VCF row. This should mean | ||
// malformed VCF data is less likely to take excessive memory and/or | ||
// time. | ||
if (v->n_sample * (size_t)f->size > INT_MAX) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cast needs to be to uint64_t
, as with size_t
the multiplication could still wrap around on 32-bit platforms. For the same reason, if this limitation is ever relaxed the test would have to be kept but with INT_MAX
changed to SIZE_MAX
.
Merged into develop as 29c294e |
This issue was fixed in 1.11 by PRs samtools#1044 and samtools#1104. It was detected via fuzz testing (https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=24097) but the Reproducer Testcase also has an invalid `#CHROM` line which resulted in an error message in HTSlib versions <= 1.9. This error message masked the segfault caused by the actual issue, namely a VCF record whose in-memory representation requires more than 2GiB. A clean test case produces a segfault all the way back to HTSlib 1.0.
This issue was fixed in 1.11 by PRs #1044 and #1104. It was detected via fuzz testing (https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=24097) but the Reproducer Testcase also has an invalid `#CHROM` line which resulted in an error message in HTSlib versions <= 1.9. This error message masked the segfault caused by the actual issue, namely a VCF record whose in-memory representation requires more than 2GiB. A clean test case produces a segfault all the way back to HTSlib 1.0.
Cast data into size_t before multiplication to avoid wrapping around
int32.
Added checks for return values to align_mem and ks_resize
Simplified the byzantine calculation in align_mem
Fixed kroundup_size_t and kroundup32 so they cannot wrap around to
zero and turn the realloc into a free!
Also added a check for ~2Gb on total length of FORMAT fields, which
nullifies the need for some of the above. We may wish to remove
this at some point if we want to cope with truely mammoth
multi-sample data, and the above fixes means doing so will not
expose bugs.
However for now this check adds protection against malformed data
creating excessive memory usage and CPU requirements.
Credit to OSS-Fuzz
Fixes oss-fuzz 21139
Fixes oss-fuzz 20881