Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add fuzzing #1194

Merged
merged 7 commits into from
Jul 22, 2024
Merged

add fuzzing #1194

merged 7 commits into from
Jul 22, 2024

Conversation

pauldreik
Copy link
Contributor

This adds fuzzing for selected parts of the json functionality.

Motivation

Security and untrusted input

Json parsing is often used on external and/or untrusted input. It is therefore important to make sure there are bugs such as out of bounds reads/writes, signed integer overflow or other kinds of UB that can be triggered.
For this purpose, fuzzing is an excellent tool.

While developing this, a number of bugs have been revealed and promptly fixed.

Correctness

This PR also adds roundtrip fuzzers, making sure data can be serialized and deserialized without change.
While developing this several problems were found:

What this PR adds

Json parser fuzzers

These are intended to cover the supposedly most common use cases and were
based on the examples in the README.

Roundtrip fuzzers

These are for checking conversions do roundtrip.

Exhaustive tests

For 32 bit types it is feasible to test all possible values. Such a test for conversion of integers has been added.
In a future extension, it would be good to also do this for 32 bit floating point.

CI jobs

Short-fuzz

A CI job is added that runs all the fuzzers for a short while. This is often sufficient to cover shallow bugs. Most of the problems uncovered while working with this took less than ten seconds to find. The job does not store the corpus between runs, that could be improved later if desired (using github action cache).
The output is only saved in case a fuzzer fails.

Exhaustive

This CI job compiles without sanitizers and with full optimization, so it goes quick enough to run in CI. It uses all cores.

Preventing code rot

The fuzzers have been arranged so they compile also on compilers not supporting libFuzzer. For those, a separate main function is provided. The fuzzers build by default, meaning api changes in glaze that break the fuzzers will not go undetected.

What comes next

This could be improved in a number of ways, it is intended to be a small start in the spirit of having something usable now rather than perfect never. But here are possible improvements:

  • An oss-fuzz build script would make it possible to easily run with memory sanitizer.
  • Fuzz also the binary formats
  • replay fuzzdata on platforms not supporting libFuzzer.
  • storing the fuzz corpus between runs
  • adding a nightly fuzz job which allows to take longer time

there is also an exhaustive test for making sure 16 and 32 bit integers roundtrip.
often, a minute of fuzzing is sufficient to uncover problems so it
is beneficial to run this as a pre merge job.

longer fuzzing sessions can be run manually or through other ci jobs.
@stephenberry
Copy link
Owner

This is awesome! I was actually not aware of libFuzzer. Thanks for all the work you put into this! I'll merge in the latest code and get this merged with main.

@pauldreik
Copy link
Contributor Author

pauldreik commented Jul 22, 2024

Thanks, happy to help with this awesome library!

The quickfuzz job fails because it finds a problem - there seems to be one more problem with the generic json parsing. Do you want me to write a reproducer? The troublesome input is available as an artifact.

@stephenberry
Copy link
Owner

@pauldreik, I'm trying to make sense of the artifact:

Is this the input string that caused the error?
{\" \\\\\\\\\\\\[\\\\\\\\\\uuu\365uuu \\\\\\\\\\\\[\\\\\\\\\\uuuuuuuuuuuuuuuuPuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu\\\\\\\" @\000\000\\\\\"\"

From the artifact:

EraseBytes-CopyPart-
/usr/bin/../lib/gcc/x86_64-linux-gnu/14/../../../../include/c++/14/array:219:9: runtime error: addition of unsigned offset to 0x559ee23e6b80 overflowed to 0x559ee23e6b75
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /usr/bin/../lib/gcc/x86_64-linux-gnu/14/../../../../include/c++/14/array:219:9 
MS: 3 CopyPart-CopyPart-ChangeBit-; base unit: ad35dbc704368f64f81f8c4c572a72586cd6b194
0x7b,0x22,0x20,0x20,0x20,0x20,0x20,0x20,0x5c,0x5c,0x5c,0x5c,0x5c,0x5c,0x5b,0x5c,0x5c,0x5c,0x5c,0x5c,0x75,0x75,0x75,0xf5,0x75,0x75,0x75,0x20,0x20,0x20,0x20,0x20,0x5c,0x5c,0x5c,0x5c,0x5c,0x5c,0x5b,0x5c,0x5c,0x5c,0x5c,0x5c,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x50,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x75,0x5c,0x5c,0x5c,0x22,0x20,0x20,0x40,0x0,0x0,0x5c,0x5c,0x22,0x22,0x20,0x20,0x20,0x20,0x24,0x0,0x20,
{\"      \\\\\\\\\\\\[\\\\\\\\\\uuu\365uuu     \\\\\\\\\\\\[\\\\\\\\\\uuuuuuuuuuuuuuuuPuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu\\\\\\\"  @\000\000\\\\\"\"    $\000 
artifact_prefix='artifacts/fuzz_json_generic/'; Test unit written to artifacts/fuzz_json_generic/crash-e7ad8b1dfdf68006f77563e49d80617871a17958
Base64: eyIgICAgICBcXFxcXFxbXFxcXFx1dXX1dXV1ICAgICBcXFxcXFxbXFxcXFx1dXV1dXV1dXV1dXV1dXV1UHV1dXV1dXV1dXV1dXV1dXV1dXV1dXV1dXV1dXV1dXV1dXVcXFwiICBAAABcXCIiICAgICQAIA==

@stephenberry
Copy link
Owner

@pauldreik, I'm struggling a bit with replicating the issue with fuzz_json_generic

@pauldreik
Copy link
Contributor Author

yes, that is the data.
I put that in a file called "bad" (using base64 -d on the bottom line of the log, or you can take the crash file).
Then I built with gcc:

pauldreik@privat-kodning:~/code/delaktig/glaze/fuzzing$ cmake -B blah -S .. -DCMAKE_BUILD_TYPE=Debug -DBUILD_TESTING=Off
pauldreik@privat-kodning:~/code/delaktig/glaze/fuzzing$ cd blah/
pauldreik@privat-kodning:~/code/delaktig/glaze/fuzzing/blah$ make -j $(nproc)

ran the fuzzer locally:

pauldreik@privat-kodning:~/code/delaktig/glaze/fuzzing/blah$ fuzzing/fuzz_json_generic ../bad 
invoking fuzzer on data from file "../bad"
=================================================================
==25240==ERROR: AddressSanitizer: global-buffer-overflow on address 0x56429db712d5 at pc 0x56429db1ae4f bp 0x7ffeaa9cab50 sp 0x7ffeaa9cab48
READ of size 1 at 0x56429db712d5 thread T0
    #0 0x56429db1ae4e in glz::detail::hex_to_u32(char const*) (/home/pauldreik/code/delaktig/glaze/fuzzing/blah/fuzzing/fuzz_json_generic+0x83e4e) (BuildId: cc67de5cff972849c1227a728062ef98a66adc5c)


@pauldreik
Copy link
Contributor Author

And on a different machine where I have a working clang, I did this:

cd fuzzing
./build_and_run_fuzzers.sh  # interrupt when it starts running the fuzzer to avoid waiting
banan@debian11-template:~/cdoe/glaze/fuzzing$ build-fuzzer-clang/fuzzing/fuzz_json_generic bad
INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 2123226015
INFO: Loaded 1 modules   (5751 inline 8-bit counters): 5751 [0x55d759d674a0, 0x55d759d68b17), 
INFO: Loaded 1 PC tables (5751 PCs): 5751 [0x55d759d68b18,0x55d759d7f288), 
build-fuzzer-clang/fuzzing/fuzz_json_generic: Running 1 inputs 1 time(s) each.
Running: bad
/usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/array:61:33: runtime error: index 18446744073709551605 out of bounds for type 'const _Type' (aka 'const unsigned char[256]')
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/array:61:33

to get a smaller input, I ran this:
banan@debian11-template:~/cdoe/glaze/fuzzing$ ./minimize_and_cleanse.sh build-fuzzer-clang/fuzzing/fuzz_json_generic bad

and that also reproduces, but with a smaller input (using the file "cleaned_crash" created by the previous step)

banan@debian11-template:~/cdoe/glaze/fuzzing$ build-fuzzer-clang/fuzzing/fuzz_json_generic cleaned_crash 
INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 3394512031
INFO: Loaded 1 modules   (5751 inline 8-bit counters): 5751 [0x55bf500a54a0, 0x55bf500a6b17), 
INFO: Loaded 1 PC tables (5751 PCs): 5751 [0x55bf500a6b18,0x55bf500bd288), 
build-fuzzer-clang/fuzzing/fuzz_json_generic: Running 1 inputs 1 time(s) each.
Running: cleaned_crash
/usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/array:61:33: runtime error: index 18446744073709551615 out of bounds for type 'const _Type' (aka 'const unsigned char[256]')
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/array:61:33 

Here is the minimized input:

banan@debian11-template:~/cdoe/glaze/fuzzing$ hd cleaned_crash 
00000000  22 5c 75 ff 22                                    |"\u."|
00000005

@pauldreik
Copy link
Contributor Author

pauldreik commented Jul 22, 2024

this replicates the issue in a test:

    "invalid json_t read3"_test = [] {
       glz::json_t json{};
       auto blah = std::vector<char>{0x22, 0x5c, 0x75,char(0xff), 0x22, 0x00};
       expect(glz::read_json(json, blah));
    };

I get:

20:04:48: Starting /home/banan/cdoe/glaze/out/build/clang_latest/tests/json_test/json_test...
/usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/array:61:33: runtime error: index 18446744073709551615 out of bounds for type 'const _Type' (aka 'const unsigned char[256]')
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/array:61:33 
/usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/array:61:33: runtime error: addition of unsigned offset to 0x564ba01b0440 overflowed to 0x564ba01b043f
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/array:61:33 
FAILED "invalid json_t read3

@stephenberry
Copy link
Owner

Thanks, I'll look into this.

@stephenberry
Copy link
Owner

@pauldreik, I found the issue (another missing uint8_t cast) and now all these fuzz tests are passing. So, I'll merge.

@stephenberry stephenberry merged commit c0803f6 into stephenberry:main Jul 22, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants