Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial checkin of CPython fuzz tests. #731

Closed
wants to merge 5 commits into from
Closed

Conversation

ssbr
Copy link
Contributor

@ssbr ssbr commented Jul 21, 2017

This depends on changes that haven't landed in CPython yet. You can test it by replacing the git clone with:

RUN git clone -b fix-issue-29505 --depth 1 https://github.com/ssbr/cpython.git cpython

The corresponding pull request to CPython is here: python/cpython#2878

Those are the first fuzz tests I've ever written, and maybe are too trivial or something. Will get to better fuzz tests soon.

Extra notes:
Google bug: b/37562550 (and there are others for other bits of CPython)
Python bug: https://bugs.python.org/issue29505

This depends on changes that haven't landed in CPython yet. You can test it by replacing the git clone with:

  RUN git clone -b fix-issue-29505 --depth 1 https://github.com/ssbr/cpython.git cpython

I would definitely appreciate some comments on the actual fuzz tests in that branch before I send it in a PR to CPython upstream. Those are the first fuzz tests I've ever written, and maybe are too trivial or something?
Copy link
Contributor

@kcc kcc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments on the actual fuzz tests

I don't know the code here but my guess is that

  • _Py_HashBytes is unlikely to be interesting alone (too simple??) -- hash functions are usually very straight code.
  • PyUnicode_FromStringAndSize and PyFloat_FromString could be interesting as it involves some kind of parsing.

Maybe I would also add some regular expressions -- those are likely to produce interesting results quickly.

@@ -0,0 +1,41 @@
#!/bin/bash -eu
# Copyright 2016 Google Inc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2017

################################################################################

# Workaround for distutils, which doesn't copy $CC except on OSX.
export LDSHARED="clang -shared"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$CC instead of clang

@@ -0,0 +1,47 @@
"""Generate a fuzz test.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this file be part of the python trunk?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, do you really need to write this in python?
Even a simple C include will work

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this file be part of the python trunk?

I expect the code in this script to eventually be shared across both cpython2/cpython3, and probably also non-cpython fuzz tests that use the python runtime (e.g. simplejson). It could go in cpython, it just seems like there'll be slightly more duplication if I do. (Also, it'd be "dead code" in cpython's repo as it would never be executed).

Does that make sense?

Also, do you really need to write this in python?
Even a simple C include will work

I guess I am terrified of #include SOME_CONSTANT. Does that work if I do -D 'SOME_CONSTANT="path/to/fuzz_foo.inc"'? Or are you thinking I should use SOME_CONSTANT and pass -D "source=$(cat path/to/fuzz_foo.inc)"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it'd be "dead code" in cpython's repo as it would never be executed

That's the point. We encourage everyone to not have fuzz targets as dead code, and instead use them as part of CI
(w/o fuzzing): see https://github.com/google/oss-fuzz/blob/master/docs/ideal_integration.md

I guess I am terrified of #include SOME_CONSTANT

Yea, don't do that.

Also, why do you have this two-level complexity where you first define e.g. fuzz_builtin_unicode
and then call it in LLVMFuzzerTestOneInput.

The way we propose is to just have LLVMFuzzerTestOneInput in your code for every API you want to test,
see e.g. https://github.com/google/boringssl/tree/master/fuzz
Then you won't need this python script, and it will be very easy to incorporate the fuzz targets into the upstream build system
(e.g. using https://github.com/llvm-mirror/llvm/tree/master/lib/Fuzzer/standalone or similar)

Copy link
Contributor Author

@ssbr ssbr Jul 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way we propose is to just have LLVMFuzzerTestOneInput in your code for every API you want to test,
see e.g. https://github.com/google/boringssl/tree/master/fuzz
Then you won't need this python script, and it will be very easy to incorporate the fuzz targets into the upstream build system
(e.g. using https://github.com/llvm-mirror/llvm/tree/master/lib/Fuzzer/standalone or similar)

The upstream build system runs all tests in a single process against a single binary, so this would be an ODR violation. That's the original reason for the separation. That, and I couldn't figure out how to integrate dynamic code generation with Python's uild system... :/

But thinking on it more, I think I can move this upstream and share code without ODR violations, by making the function name parameterized, or by only invoking it once. Not quite sure how to invoke it (either as a script or a .h) from Python's build system though...

I'll think on this and try to come up with something that moves this to cpython.

PyErr_Print();
abort();
}
return rv;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLVMFuzzerTestOneInput should return 0 (other values are reserved for future)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I see that functions from https://github.com/ssbr/cpython/tree/fix-issue-29505/Modules/_fuzz only return 0, i.e. this is fine

* It's 2017
* $CC, not clang.

Did not (yet) fix anything else -- e.g. where the script should live, or if it
should use cpp or m4 or something instead of Python. ;)
@ssbr
Copy link
Contributor Author

ssbr commented Jul 24, 2017

I don't know how code review on github works, but I uploaded a second commit, PTAL.

ssbr added a commit to ssbr/cpython that referenced this pull request Jul 25, 2017
To see how it's defined, see the current WIP PR for CPython:

    python/cpython@master...ssbr:fix-issue-29505
@ssbr
Copy link
Contributor Author

ssbr commented Jul 26, 2017

Updated again, now moved the definition to CPython. It's a little wonky, so feel free to request more changes in the CPython branch. (I know the CPython devs will...)

https://github.com/python/cpython/compare/master...ssbr:fix-issue-29505?expand=1

Thanks for all your review so far, PTAL!


# Build python to run the app.
pushd cpython
./configure --prefix=$OUT/ --with-ensurepip=no
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. Will this copy everything into $OUT?
$OUT needs to contain only the fuzzer binaries and their dependencies, nothing extra

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CPython is dynamically linked, so we need it -- it isn't extra.

It's possible we can prune a lot of CPython though. (e.g. we don't need modules like TKinter/tk.) Most of those should already be excluded from the build because they're optional and we probably didn't install their dependencies.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, try to not put anything other then the fuzzer binaries and their deps into $OUT -- otherwise ClusterFuzz will get confused and will try to execute something that it shouldn't.

Is the python part of the change submitted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Python part isn't submitted -- the fuzz test needs to be written in C instead of C++, both to get approved and to integrate into the build system (which, annoyingly, passes -std=c99 even for C++ code). I tagged you in the review thread.

Copy link
Collaborator

@oliverchang oliverchang Sep 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On our fuzzing infrastructure, $OUT is different from the one that the binaries are built in. Will this still work, or are there hardcoded paths that will break?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the question. It works if I run this command, for example:

python infra/helper.py build_image cpython3 && python infra/helper.py build_fuzzers --sanitizer=address cpython3 && python infra/helper.py run_fuzzer cpython3 fuzz_builtin_unicode

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we build, $OUT is something like /out

When we run this on our fuzzing infrastructure, $OUT is something completely different. Are there paths etc that depend on things being in /out ?

Copy link
Contributor Author

@ssbr ssbr Sep 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know Python's build process well enough to answer. Is there a way to test this locally?

@alex
Copy link
Contributor

alex commented Jul 26, 2017

@ssbr Just reviewed your PR for CPython. Do you have a problem with me adding myself to the CCs for this? I'm a Python core developer and a member of the python security team.

@ssbr
Copy link
Contributor Author

ssbr commented Jul 26, 2017

@alex (I can't figure out how to reply to your comment, can you tell I'm a GitHub noob?), yes, that sounds like a great idea!

@ssbr
Copy link
Contributor Author

ssbr commented Sep 6, 2017

Those initial fuzz tests were merged into upstream, want to take another look?

@kcc
Copy link
Contributor

kcc commented Sep 8, 2017

How much stuff will it put into $OUT?
Only the stuff that is needed to run the fuzzers, or all of the python?

@ssbr
Copy link
Contributor Author

ssbr commented Sep 8, 2017

It will put all of Python into $OUT, which makes sense, because the fuzzer needs the entirety of the Python runtime to run. We could probably filter out the pure-python part of the stdlib if we put some effort into it.

@kcc
Copy link
Contributor

kcc commented Sep 8, 2017

@oliverchang will this work from OSS-Fuzz POV if $OUT has the entirety of Python?

@ssbr
Copy link
Contributor Author

ssbr commented Sep 8, 2017

If not, I think we'll need to do something internally with Hermetic Python to avoid all this dynamic linking shenanigans. And then all fuzz tests would need to be run on the internal infrastructure using the versions that are vendored in third_party. Maybe not optimal, but at least we'll be able to get them running.

Does that sound right?

(I just want to be sure I'm not wasting my time here...)

@alex
Copy link
Contributor

alex commented Sep 8, 2017

The current fuzzers we have only rely on libpython, so if we could get libpython as a .a, I guess that might work.

The desired fuzzers for things like json or cvs would be more of a challenge. Those modules are generally dynamically loaded .sos -- I don't think there's an existing way to link those in at compile time. (Though perhaps there is, sys and other builtin modules happen somehow)

@oliverchang
Copy link
Collaborator

@oliverchang will this work from OSS-Fuzz POV if $OUT has the entirety of Python?

Yep. We look for ELF files with LLVMFuzzerTestOneInput in its contents so this should work,

But, as for #731 (comment),

I built this locally, and did

python infra/helper.py build_image cpython3
python infra/helper.py build_fuzzers cpython3
python infra/helper.py shell cpython3

$ mkdir /out2
$ mv /out/* /out2
$ /out2/fuzz_builtin_float

And see:

Fatal Python error: Py_Initialize: Unable to get the locale encoding
ModuleNotFoundError: No module named 'encodings'

Current thread 0x00007f6c4dc9e780 (most recent call first):
==62== ERROR: libFuzzer: deadly signal
    #0 0x533243 in __sanitizer_print_stack_trace /src/llvm/projects/compiler-rt/lib/asan/asan_stack.cc:38
    #1 0x57eeb5 in fuzzer::Fuzzer::CrashCallback() /src/libfuzzer/FuzzerLoop.cpp:196:5
    #2 0x57ee4c in fuzzer::Fuzzer::StaticCrashSignalCallback() /src/libfuzzer/FuzzerLoop.cpp:175:6
   ....

So it does look like the build expects things to always exist in the same $OUT that's passed to configure. @ssbr do you see any way around this so that we can relocate $OUT to anywhere and things will still work?

(We probably should improve our run_fuzzer command to always to this to catch more errors like this).

@ssbr
Copy link
Contributor Author

ssbr commented Sep 8, 2017

(We probably should improve our run_fuzzer command to always to this to catch more errors like this).

Filed #837

@ssbr do you see any way around this so that we can relocate $OUT to anywhere and things will still work?

Is there an environment variable we can use to know the true later value of $OUT?

I can think of some hacky ways to try to fix this (PYTHONPATH environment variable, for starters), but I suspect this is fighting a losing battle. It'd be better to just use the same $OUT or something, or else to give up and use a different fuzz runner (like one where we can use hermetic python / bazel).

@kcc
Copy link
Contributor

kcc commented Sep 8, 2017

You can use bazel here (grpc does this: https://github.com/google/oss-fuzz/blob/master/projects/grpc/build.sh)

@oliverchang
Copy link
Collaborator

Is there an environment variable we can use to know the true later value of $OUT?

Unfortunately not. This can change at any time on our actual fuzzing infrastructure.

@ssbr
Copy link
Contributor Author

ssbr commented Sep 8, 2017

You can use bazel here (grpc does this: https://github.com/google/oss-fuzz/blob/master/projects/grpc/build.sh)

We can't use hermetic python though, right? (Requires a patched glibc or something.) That's only available internally to Google. So maybe what I mean is "blaze".

@kcc
Copy link
Contributor

kcc commented Sep 8, 2017

That I don't know :(

@ssbr
Copy link
Contributor Author

ssbr commented Sep 8, 2017

So here is my suggestion: I've asked the python-dev IRC channel if they have any ideas on how to make it work with how oss-fuzz builds/tests binaries. If it can be made to work, great! If not, I propose we don't use oss-fuzz to fuzz this project, and instead fuzz the version stored in google's version control system with our internal fuzzing test tools.

@kcc
Copy link
Contributor

kcc commented Sep 8, 2017

That's up to you, of course.
One of the benefits of oss-fuzz is that anyone on the python team (not only googlers) can see the results.

@oliverchang
Copy link
Collaborator

oliverchang commented Sep 8, 2017

Is there an environment variable we can use to know the true later value of $OUT?

I can think of some hacky ways to try to fix this (PYTHONPATH environment variable, for starters), but I > suspect this is fighting a losing battle. It'd be better to just use the same $OUT or something, or else to > give up and use a different fuzz runner (like one where we can use hermetic python / bazel).

@ssbr Ah, I think I may have misunderstood your comment.

If you mean an environment variable during runtime, we can certainly provide one. It does look like PYTHONPATH fixes this.

Following my previous example, this works:

PYTHONPATH=/out2/lib/python3.7 /out2/fuzz_builtin_float

I guess the fuzz target can initialize this path during initialization?

@ssbr
Copy link
Contributor Author

ssbr commented Sep 8, 2017

You had me right the first time -- I don't trust PYTHONPATH as a solution.

Moreover I don't see a meaningful difference between running the test in oss-fuzz vs in google's internal things. Either way the bug reports will make it upstream and we will hopefully catch security issues. But if we run it in some internal not-open-source tests, then we have the benefit of it working with the usual google build system which has already applied every hack in the book to make python hermetic and copyable, so we can piggyback on that work and stop worrying.

@ssbr
Copy link
Contributor Author

ssbr commented Sep 8, 2017

I'm probably going to need to talk it over with @gpshead and/or @Yhg1s, who work on and maintain Python (and Hermetic Python) at Google.

@oliverchang
Copy link
Collaborator

Moreover I don't see a meaningful difference between running the test in oss-fuzz vs in google's internal things.

I'm not too familiar with how things are internally these days, but OSS-Fuzz will likely reduce the amount of work you have to do wrt bug reports making their way upstream. We continuously test on the latest upstream revisions, and automatically handle triage, de-duping and reporting.

@ssbr
Copy link
Contributor Author

ssbr commented Sep 8, 2017

My assumption was that ISE-team, or whoever else runs the fuzz tests, will handle it. Much like oss-fuzz. 😅

@gpshead
Copy link
Contributor

gpshead commented Sep 11, 2017

fwiw - I object to us running any of this internally at Google. We need to be part of the main oss-fuzz project pulling from upstream revisions. Doing this testing within our blackhole of internal stuff adds more work for us internally (read: which we're not going to do) and wouldn't provide results feedback to the upstream CPython project in a useful timely manner.

We must figure out how to get this to build and run on the external oss-fuzz infrastructure.

@Dor1s
Copy link
Contributor

Dor1s commented Jul 18, 2018

Ping. What's the status here?

@ssbr
Copy link
Contributor Author

ssbr commented Jul 19, 2018

I wasn't able to figure out something that would work except the aforementioned PYTHONPATH, which I never tried. I also ended up leaving the team (youtube-security), so if nobody else picks this up it's dead.

@Dor1s
Copy link
Contributor

Dor1s commented Jul 19, 2018

@ssbr, thanks for the info!

@gpshead, are you interested in getting this landed?

@Dor1s
Copy link
Contributor

Dor1s commented Dec 27, 2018

@durin42 @markus-kusano @gpshead ping

@markus-kusano
Copy link
Contributor

We need to at least figure out how to make Python still work in OSS-Fuzz when /out is moved from /out. Mercurial does this by parsing the path of argv[0]:

https://www.mercurial-scm.org/repo/hg/file/default/contrib/fuzz/pyutil.cc

A use-site of this library is here:

https://www.mercurial-scm.org/repo/hg/file/default/contrib/fuzz/manifest.cc

The fuzz target is in the CPython repo so we'd need to put these changes in cpython and not oss-fuzz.

@Lucas-C
Copy link

Lucas-C commented Jan 8, 2019

I took a look at this, and based on @markus-kusano PR in #2031 and with his help, I think I managed to get it working.
My branch is here: https://github.com/Lucas-C/oss-fuzz/commits/cpython3

The following commands work fine now:

python infra/helper.py run_fuzzer --sanitizer address cpython3 fuzz_builtin_float
python infra/helper.py run_fuzzer --sanitizer address cpython3 fuzz_builtin_int
python infra/helper.py run_fuzzer --sanitizer address cpython3 fuzz_builtin_unicode

At some point I got the ModuleNotFoundError: No module named 'encodings' error, and it was due to the shared modules (Python builtin .py libs) not being installed.

Should I do more tests and can I submit a PR ?

@markus-kusano
Copy link
Contributor

ping @Dor1s

Were you able to move the fuzz target out of /out and into a new directory and still have it work? see #731 (comment). If so: amazing :)

If that is working then I can think of anything else that is blocking this, but someone from OSS-Fuzz would have to take a look. Probably easiest to just open a new pull request for your OSS-Fuzz changes.

@Dor1s
Copy link
Contributor

Dor1s commented Jan 10, 2019

No, I didn't try that (and don't think that I ever said that I was going to :) )

@markus-kusano
Copy link
Contributor

Sorry for the lack of context in that ping :) Just was wondering if someone from OSS-Fuzz could see if this was working, though, I'm not sure if all the CPython changes are in the forked repo.

@Lucas-C
Copy link

Lucas-C commented Jan 11, 2019

I also added a couple of fuzz targets in the last commit there : https://github.com/Lucas-C/cpython-1/commits/master
But there are not needed for the CPython integration to oss-fuzz to work.
And as a side note, with the fuzz_builtin_exec target, libfuzzer generates input like 52**52**32 after a few hours, and they trigger a timeout. As this valid Python code, is it possible to ignore those timeout in some way ?

Were you able to move the fuzz target out of /out and into a new directory and still have it work?

Yes it all works when reproducing @oliverchang's test:

cp -r /out /out2
mkdir input
echo 0 > input/int.txt
/out2/fuzz_builtin_int input

@Dor1s
Copy link
Contributor

Dor1s commented Jan 11, 2019

libfuzzer generates input like 525232 after a few hours, and they trigger a timeout. As this valid Python code, is it possible to ignore those timeout in some way ?

I'm afraid you'll have to blacklist things like this in a fuzz target OR patch the Python code using #ifdef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION, see http://llvm.org/docs/LibFuzzer.html#fuzzer-friendly-build-mode

@Lucas-C
Copy link

Lucas-C commented Jan 11, 2019

Ok thanks for your answer @Dor1s

I'm going to open a new PR as soon as I have finished polishing thins a little

@ethanhs
Copy link

ethanhs commented Feb 7, 2019

@Lucas-C any update on that? I'm excited to see CPython fuzzed! Thank you for working on it :)

@jonathanmetzman
Copy link
Contributor

Superseeded by #2493

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants