Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed a raft of integer overflows in VCF land. #1044

Closed
wants to merge 2 commits into from

Conversation

jkbonfield
Copy link
Contributor

  • Cast data into size_t before multiplication to avoid wrapping around
    int32.

  • Added checks for return values to align_mem and ks_resize

  • Simplified the byzantine calculation in align_mem

  • Fixed kroundup_size_t and kroundup32 so they cannot wrap around to
    zero and turn the realloc into a free!

  • Also added a check for ~2Gb on total length of FORMAT fields, which
    nullifies the need for some of the above. We may wish to remove
    this at some point if we want to cope with truely mammoth
    multi-sample data, and the above fixes means doing so will not
    expose bugs.

    However for now this check adds protection against malformed data
    creating excessive memory usage and CPU requirements.

Credit to OSS-Fuzz
Fixes oss-fuzz 21139
Fixes oss-fuzz 20881

- Cast data into size_t before multiplication to avoid wrapping around
  int32.

- Added checks for return values to align_mem and ks_resize

- Simplified the byzantine calculation in align_mem

- Fixed kroundup_size_t and kroundup32 so they cannot wrap around to
  zero and turn the realloc into a free.

- Also added a check for ~2Gb on total length of FORMAT fields, which
  nullifies the need for some of the above.  We may wish to remove
  this at some point if we want to cope with truely mammoth
  multi-sample data, and the above fixes means doing so will not
  expose bugs.

  However for now this check adds protection against malformed data
  creating excessive memory usage and CPU requirements.

Credit to OSS-Fuzz
Fixes oss-fuzz 21139
Fixes oss-fuzz 20881
if ((z->y>>4&0xf) == BCF_HT_STR) {
if (z->is_gt) { // genotypes
int32_t is_phased = 0, *x = (int32_t*)(z->buf + z->size * m);
int32_t is_phased = 0, *x = (int32_t*)(z->buf + z->size * (size_t)m);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m is a local variable. Rather than all this casting, can it be declared more accurately as size_t m instead? This might trickle down to fmt_aux_t::max_m and/or fmt_aux_t::size and suggest changing them to size_t too, but that doesn't seem like a bad thing…

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function is so vast I lose track of the scope of these variables, so I just went with the minimalist change.

I'll take a look at changing the type for local var m. I don't wish to start changing data types though as that requires a deeper understanding of vcf which I don't currently have time to, nor the desire due to the cost of -10 sanity points!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm looking at it m is overloaded. The bit of code you're referring to uses m as sample id. In other places it's just listed as a vector (unspecified which). If I change type of m, should I also change type of g? of r? of l? or j? etc. I could create a new var sid say for sample id and use that throughout the specific block of code you refer to, but it's just as much code changing to read.

I think I'll keep this as is and request that @pd3 looks at changing types if he feels it is valid. I simply don't understand all the knock on implications of changing type here given the overloading of var names.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aarrgggghhhhh… fair enough

vcf.c Outdated
// Limit the total memory to ~2Gb per VCF row. This should mean
// malformed VCF data is less likely to take excessive memory and/or
// time.
if (v->n_sample * (size_t)f->size > INT_MAX) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cast needs to be to uint64_t, as with size_t the multiplication could still wrap around on 32-bit platforms. For the same reason, if this limitation is ever relaxed the test would have to be kept but with INT_MAX changed to SIZE_MAX.

@valeriuo
Copy link
Contributor

Merged into develop as 29c294e

@valeriuo valeriuo closed this Mar 11, 2020
jmarshall added a commit to jmarshall/htslib that referenced this pull request Jun 9, 2022
This issue was fixed in 1.11 by PRs samtools#1044 and samtools#1104. It was detected via
fuzz testing (https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=24097)
but the Reproducer Testcase also has an invalid `#CHROM` line which
resulted in an error message in HTSlib versions <= 1.9.

This error message masked the segfault caused by the actual issue, namely
a VCF record whose in-memory representation requires more than 2GiB.
A clean test case produces a segfault all the way back to HTSlib 1.0.
jkbonfield pushed a commit that referenced this pull request Jun 9, 2022
This issue was fixed in 1.11 by PRs #1044 and #1104. It was detected via
fuzz testing (https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=24097)
but the Reproducer Testcase also has an invalid `#CHROM` line which
resulted in an error message in HTSlib versions <= 1.9.

This error message masked the segfault caused by the actual issue, namely
a VCF record whose in-memory representation requires more than 2GiB.
A clean test case produces a segfault all the way back to HTSlib 1.0.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants