Skip to content

New implementation of methods readBitsInt{Be,Le}() (fixes known bugs) + bits int fuzzer #949

Closed
@generalmimon

Description

@generalmimon

Current algorithm used for bit-sized integer reading gives incorrect results in all languages that have fixed bit precision - see kaitai-io/kaitai_struct_javascript_runtime#20. The problem occurs notably when an integer with the highest possible number of bits (32 bits in JavaScript, 64 bits in C++/STL, C#, Go, Java, Nim, PHP) is read from an unaligned bit position - then always some bits get lost in the process.

There is also another issue in the JS runtime (kaitai-io/kaitai_struct_javascript_runtime#22), which is a result of me "improving" the algorithm without taking into account the specifics of JS bitwise operators.

I've discovered other bugs in the past - for example our implementation of readBitsIntLe() in JavaScript and Java was using a "sign-propagating" right shift (>>) instead of the correct unsigned right shift (>>>), which showed that >> is a awkward operator that summons an entire train of 1-bits starting on the most significant bit, if the MSB (sign bit) happened to be set to 1. I mean - this may be great for division of negative integers (if you wondered what Math.floor(-13 / 4) equals to, you can do -13 >> 2 = -4 = Math.floor(-3.25), nice), but I doubt it's useful for anything else.

Actually the same problem was in PHP (kaitai-io/kaitai_struct_php_runtime@c0cc3e2), but PHP even doesn't have any unsigned right shift built-in, so I had to supply a tiny function to emulate it. Another funny thing in PHP (solved in the same commit) is this:

$ php -r "var_dump(1 << 63);"
int(-9223372036854775808)
$ php -r "var_dump((1 << 63) - 1);"
float(-9.2233720368548E+18)

(which led to a loss of precision).


The point I'm making here is that writing reliable readBitsInt{Be,Le}() functions is hard. All bugs so far were revealed at random, so it's still quite likely that there are some inputs which wouldn't work but just have not been tried or reported yet. Therefore I also wrote a simple "fuzzer" for bit integer layouts and inputs - but instead of generating some random data, it evaluates all possible bit layouts and combination of "fillings" per configuration. The fuzzer is written in Python, but is supposed to work for all our target languages - the Python code just generates .ksy and .kst files and then the existing infrastructure of https://github.com/kaitai-io/kaitai_struct_tests is used to generate unit tests, run them and aggregate results. I will publish the code soon.

I've already rewritten the functions in JavaScript to fix all known bugs and I'm quite happy with them, so now I'll gradually update all runtime libraries to use this new implementation. I think it's a notable change, so I'm creating this issue.

I also like the idea of the fuzzer and I'm wondering if we could use it also for other areas of Kaitai Struct - it virtually eliminates the possibility of any hidden bug in the tested function. Therefore, I hope that this is the last implementation of readBitsInt{Be,Le}() we will ever need - I'm going to test all target languages with the fuzzer so that all language-specific properties of the used arithmetic and bitwise operators that could cause bugs do not go unnoticed.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions