add fuzzing #1194

pauldreik · 2024-07-21T13:56:13Z

This adds fuzzing for selected parts of the json functionality.

Motivation

Security and untrusted input

Json parsing is often used on external and/or untrusted input. It is therefore important to make sure there are bugs such as out of bounds reads/writes, signed integer overflow or other kinds of UB that can be triggered.
For this purpose, fuzzing is an excellent tool.

While developing this, a number of bugs have been revealed and promptly fixed.

reading generic json can cause a stack overflow #1173 stack overflow
read_json on json_t invoked with invalid unicode causes out of bounds read in hex_to_u32 #1189 out of bounds read
out of bounds write in prettify_json #1185 out of bounds write
glz::minify_json invoked on "f" causes out of bounds read in minify.hpp #1176 out of bounds read
reading generic json on invalid input causes out of bounds read in read.hpp #1172 out of bounds read
prettify_json("\xf3") invokes out ouf bounds read in prettify.hpp #1170 out of bounds read
undefined behaviour, out of bounds in GLZ_SKIP_WS #1167 out of bounds read
prettify_json("\"") invokes memcpy on nullptr in dump.hpp #1175 memcpy on nullptr which is UB

Correctness

This PR also adds roundtrip fuzzers, making sure data can be serialized and deserialized without change.
While developing this several problems were found:

Roundtrip failure for double -0x1.e42427b42cb42p+949 #1183 roundrip failure for double
converting float -8536070.f to string gives "-08536070" #1178 incorrect conversion of float

What this PR adds

Json parser fuzzers

These are intended to cover the supposedly most common use cases and were
based on the examples in the README.

Roundtrip fuzzers

These are for checking conversions do roundtrip.

Exhaustive tests

For 32 bit types it is feasible to test all possible values. Such a test for conversion of integers has been added.
In a future extension, it would be good to also do this for 32 bit floating point.

CI jobs

Short-fuzz

A CI job is added that runs all the fuzzers for a short while. This is often sufficient to cover shallow bugs. Most of the problems uncovered while working with this took less than ten seconds to find. The job does not store the corpus between runs, that could be improved later if desired (using github action cache).
The output is only saved in case a fuzzer fails.

Exhaustive

This CI job compiles without sanitizers and with full optimization, so it goes quick enough to run in CI. It uses all cores.

Preventing code rot

The fuzzers have been arranged so they compile also on compilers not supporting libFuzzer. For those, a separate main function is provided. The fuzzers build by default, meaning api changes in glaze that break the fuzzers will not go undetected.

What comes next

This could be improved in a number of ways, it is intended to be a small start in the spirit of having something usable now rather than perfect never. But here are possible improvements:

An oss-fuzz build script would make it possible to easily run with memory sanitizer.
Fuzz also the binary formats
replay fuzzdata on platforms not supporting libFuzzer.
storing the fuzz corpus between runs
adding a nightly fuzz job which allows to take longer time

there is also an exhaustive test for making sure 16 and 32 bit integers roundtrip.

often, a minute of fuzzing is sufficient to uncover problems so it is beneficial to run this as a pre merge job. longer fuzzing sessions can be run manually or through other ci jobs.

stephenberry · 2024-07-22T16:20:41Z

This is awesome! I was actually not aware of libFuzzer. Thanks for all the work you put into this! I'll merge in the latest code and get this merged with main.

pauldreik · 2024-07-22T16:42:43Z

Thanks, happy to help with this awesome library!

The quickfuzz job fails because it finds a problem - there seems to be one more problem with the generic json parsing. Do you want me to write a reproducer? The troublesome input is available as an artifact.

stephenberry · 2024-07-22T17:15:16Z

@pauldreik, I'm trying to make sense of the artifact:

Is this the input string that caused the error?
{\" \\\\\\\\\\\\[\\\\\\\\\\uuu\365uuu \\\\\\\\\\\\[\\\\\\\\\\uuuuuuuuuuuuuuuuPuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu\\\\\\\" @\000\000\\\\\"\"

From the artifact:

EraseBytes-CopyPart-
/usr/bin/../lib/gcc/x86_64-linux-gnu/14/../../../../include/c++/14/array:219:9: runtime error: addition of unsigned offset to 0x559ee23e6b80 overflowed to 0x559ee23e6b75
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /usr/bin/../lib/gcc/x86_64-linux-gnu/14/../../../../include/c++/14/array:219:9 
MS: 3 CopyPart-CopyPart-ChangeBit-; base unit: ad35dbc704368f64f81f8c4c572a72586cd6b194
0x7b,0x22,0x20,0x20,0x20,0x20,0x20,0x20,0x5c,0x5c,0x5c,0x5c,0x5c,0x5c,0x5b,0x5c,0x5c,0x5c,0x5c,0x5c,0x75,0x75,0x75,0xf5,0x75,0x75,0x75,0x20,0x20,0x20,0x20,0x20,0x5c,0x5c,0x5c,0x5c,0x5c,0x5c,0x5b,0x5c,0x5c,0x5c,0x5c,0x5c,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x50,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x5c,0x5c,0x5c,0x22,0x20,0x20,0x40,0x0,0x0,0x5c,0x5c,0x22,0x22,0x20,0x20,0x20,0x20,0x24,0x0,0x20,
{\"      \\\\\\\\\\\\[\\\\\\\\\\uuu\365uuu     \\\\\\\\\\\\[\\\\\\\\\\uuuuuuuuuuuuuuuuPuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu\\\\\\\"  @\000\000\\\\\"\"    $\000 
artifact_prefix='artifacts/fuzz_json_generic/'; Test unit written to artifacts/fuzz_json_generic/crash-e7ad8b1dfdf68006f77563e49d80617871a17958
Base64: eyIgICAgICBcXFxcXFxbXFxcXFx1dXX1dXV1ICAgICBcXFxcXFxbXFxcXFx1dXV1dXV1dXV1dXV1dXV1UHV1dXV1dXV1dXV1dXV1dXV1dXV1dXV1dXV1dXV1dXV1dXVcXFwiICBAAABcXCIiICAgICQAIA==

stephenberry · 2024-07-22T17:29:13Z

@pauldreik, I'm struggling a bit with replicating the issue with fuzz_json_generic

pauldreik · 2024-07-22T17:48:37Z

yes, that is the data.
I put that in a file called "bad" (using base64 -d on the bottom line of the log, or you can take the crash file).
Then I built with gcc:

pauldreik@privat-kodning:~/code/delaktig/glaze/fuzzing$ cmake -B blah -S .. -DCMAKE_BUILD_TYPE=Debug -DBUILD_TESTING=Off
pauldreik@privat-kodning:~/code/delaktig/glaze/fuzzing$ cd blah/
pauldreik@privat-kodning:~/code/delaktig/glaze/fuzzing/blah$ make -j $(nproc)

ran the fuzzer locally:

pauldreik@privat-kodning:~/code/delaktig/glaze/fuzzing/blah$ fuzzing/fuzz_json_generic ../bad 
invoking fuzzer on data from file "../bad"
=================================================================
==25240==ERROR: AddressSanitizer: global-buffer-overflow on address 0x56429db712d5 at pc 0x56429db1ae4f bp 0x7ffeaa9cab50 sp 0x7ffeaa9cab48
READ of size 1 at 0x56429db712d5 thread T0
    #0 0x56429db1ae4e in glz::detail::hex_to_u32(char const*) (/home/pauldreik/code/delaktig/glaze/fuzzing/blah/fuzzing/fuzz_json_generic+0x83e4e) (BuildId: cc67de5cff972849c1227a728062ef98a66adc5c)

pauldreik · 2024-07-22T17:59:36Z

And on a different machine where I have a working clang, I did this:

cd fuzzing
./build_and_run_fuzzers.sh  # interrupt when it starts running the fuzzer to avoid waiting
banan@debian11-template:~/cdoe/glaze/fuzzing$ build-fuzzer-clang/fuzzing/fuzz_json_generic bad
INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 2123226015
INFO: Loaded 1 modules   (5751 inline 8-bit counters): 5751 [0x55d759d674a0, 0x55d759d68b17), 
INFO: Loaded 1 PC tables (5751 PCs): 5751 [0x55d759d68b18,0x55d759d7f288), 
build-fuzzer-clang/fuzzing/fuzz_json_generic: Running 1 inputs 1 time(s) each.
Running: bad
/usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/array:61:33: runtime error: index 18446744073709551605 out of bounds for type 'const _Type' (aka 'const unsigned char[256]')
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/array:61:33

to get a smaller input, I ran this:
banan@debian11-template:~/cdoe/glaze/fuzzing$ ./minimize_and_cleanse.sh build-fuzzer-clang/fuzzing/fuzz_json_generic bad

and that also reproduces, but with a smaller input (using the file "cleaned_crash" created by the previous step)

banan@debian11-template:~/cdoe/glaze/fuzzing$ build-fuzzer-clang/fuzzing/fuzz_json_generic cleaned_crash 
INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 3394512031
INFO: Loaded 1 modules   (5751 inline 8-bit counters): 5751 [0x55bf500a54a0, 0x55bf500a6b17), 
INFO: Loaded 1 PC tables (5751 PCs): 5751 [0x55bf500a6b18,0x55bf500bd288), 
build-fuzzer-clang/fuzzing/fuzz_json_generic: Running 1 inputs 1 time(s) each.
Running: cleaned_crash
/usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/array:61:33: runtime error: index 18446744073709551615 out of bounds for type 'const _Type' (aka 'const unsigned char[256]')
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/array:61:33

Here is the minimized input:

banan@debian11-template:~/cdoe/glaze/fuzzing$ hd cleaned_crash 
00000000  22 5c 75 ff 22                                    |"\u."|
00000005

pauldreik · 2024-07-22T18:05:54Z

this replicates the issue in a test:

    "invalid json_t read3"_test = [] {
       glz::json_t json{};
       auto blah = std::vector<char>{0x22, 0x5c, 0x75,char(0xff), 0x22, 0x00};
       expect(glz::read_json(json, blah));
    };

I get:

20:04:48: Starting /home/banan/cdoe/glaze/out/build/clang_latest/tests/json_test/json_test...
/usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/array:61:33: runtime error: index 18446744073709551615 out of bounds for type 'const _Type' (aka 'const unsigned char[256]')
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/array:61:33 
/usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/array:61:33: runtime error: addition of unsigned offset to 0x564ba01b0440 overflowed to 0x564ba01b043f
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/array:61:33 
FAILED "invalid json_t read3

stephenberry · 2024-07-22T18:13:16Z

Thanks, I'll look into this.

stephenberry · 2024-07-22T19:55:52Z

@pauldreik, I found the issue (another missing uint8_t cast) and now all these fuzz tests are passing. So, I'll merge.

pauldreik added 3 commits July 21, 2024 15:13

add json fuzzers

674e564

there is also an exhaustive test for making sure 16 and 32 bit integers roundtrip.

add CI job for running a short fuzz session

2186ebc

often, a minute of fuzzing is sufficient to uncover problems so it is beneficial to run this as a pre merge job. longer fuzzing sessions can be run manually or through other ci jobs.

add CI job running exhaustive 16 and 32 bit integer roundtrip tests

0622494

Merge branch 'main' into pr/1194

4b53f96

Merge branch 'main' into pr/1194

1918ee9

Attempting to replicate fuzz issue with json_t

b73e9f5

Added necessary unsigned conversion for unicode table lookup

a96cb7e

stephenberry merged commit c0803f6 into stephenberry:main Jul 22, 2024
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add fuzzing #1194

add fuzzing #1194

pauldreik commented Jul 21, 2024

stephenberry commented Jul 22, 2024

pauldreik commented Jul 22, 2024 •

edited

Loading

stephenberry commented Jul 22, 2024

stephenberry commented Jul 22, 2024

pauldreik commented Jul 22, 2024

pauldreik commented Jul 22, 2024

pauldreik commented Jul 22, 2024 •

edited

Loading

stephenberry commented Jul 22, 2024

stephenberry commented Jul 22, 2024

add fuzzing #1194

add fuzzing #1194

Conversation

pauldreik commented Jul 21, 2024

Motivation

Security and untrusted input

Correctness

What this PR adds

Json parser fuzzers

Roundtrip fuzzers

Exhaustive tests

CI jobs

Short-fuzz

Exhaustive

Preventing code rot

What comes next

stephenberry commented Jul 22, 2024

pauldreik commented Jul 22, 2024 • edited Loading

stephenberry commented Jul 22, 2024

stephenberry commented Jul 22, 2024

pauldreik commented Jul 22, 2024

pauldreik commented Jul 22, 2024

pauldreik commented Jul 22, 2024 • edited Loading

stephenberry commented Jul 22, 2024

stephenberry commented Jul 22, 2024

pauldreik commented Jul 22, 2024 •

edited

Loading

pauldreik commented Jul 22, 2024 •

edited

Loading