-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial checkin of CPython fuzz tests. #731
Conversation
This depends on changes that haven't landed in CPython yet. You can test it by replacing the git clone with: RUN git clone -b fix-issue-29505 --depth 1 https://github.com/ssbr/cpython.git cpython I would definitely appreciate some comments on the actual fuzz tests in that branch before I send it in a PR to CPython upstream. Those are the first fuzz tests I've ever written, and maybe are too trivial or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some comments on the actual fuzz tests
I don't know the code here but my guess is that
- _Py_HashBytes is unlikely to be interesting alone (too simple??) -- hash functions are usually very straight code.
- PyUnicode_FromStringAndSize and PyFloat_FromString could be interesting as it involves some kind of parsing.
Maybe I would also add some regular expressions -- those are likely to produce interesting results quickly.
projects/cpython3/build.sh
Outdated
@@ -0,0 +1,41 @@ | |||
#!/bin/bash -eu | |||
# Copyright 2016 Google Inc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2017
projects/cpython3/build.sh
Outdated
################################################################################ | ||
|
||
# Workaround for distutils, which doesn't copy $CC except on OSX. | ||
export LDSHARED="clang -shared" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$CC instead of clang
projects/cpython3/gen_fuzz.py
Outdated
@@ -0,0 +1,47 @@ | |||
"""Generate a fuzz test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this file be part of the python trunk?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, do you really need to write this in python?
Even a simple C include will work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this file be part of the python trunk?
I expect the code in this script to eventually be shared across both cpython2/cpython3, and probably also non-cpython fuzz tests that use the python runtime (e.g. simplejson). It could go in cpython, it just seems like there'll be slightly more duplication if I do. (Also, it'd be "dead code" in cpython's repo as it would never be executed).
Does that make sense?
Also, do you really need to write this in python?
Even a simple C include will work
I guess I am terrified of #include SOME_CONSTANT
. Does that work if I do -D 'SOME_CONSTANT="path/to/fuzz_foo.inc"'
? Or are you thinking I should use SOME_CONSTANT
and pass -D "source=$(cat path/to/fuzz_foo.inc)"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it'd be "dead code" in cpython's repo as it would never be executed
That's the point. We encourage everyone to not have fuzz targets as dead code, and instead use them as part of CI
(w/o fuzzing): see https://github.com/google/oss-fuzz/blob/master/docs/ideal_integration.md
I guess I am terrified of #include SOME_CONSTANT
Yea, don't do that.
Also, why do you have this two-level complexity where you first define e.g. fuzz_builtin_unicode
and then call it in LLVMFuzzerTestOneInput.
The way we propose is to just have LLVMFuzzerTestOneInput in your code for every API you want to test,
see e.g. https://github.com/google/boringssl/tree/master/fuzz
Then you won't need this python script, and it will be very easy to incorporate the fuzz targets into the upstream build system
(e.g. using https://github.com/llvm-mirror/llvm/tree/master/lib/Fuzzer/standalone or similar)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way we propose is to just have LLVMFuzzerTestOneInput in your code for every API you want to test,
see e.g. https://github.com/google/boringssl/tree/master/fuzz
Then you won't need this python script, and it will be very easy to incorporate the fuzz targets into the upstream build system
(e.g. using https://github.com/llvm-mirror/llvm/tree/master/lib/Fuzzer/standalone or similar)
The upstream build system runs all tests in a single process against a single binary, so this would be an ODR violation. That's the original reason for the separation. That, and I couldn't figure out how to integrate dynamic code generation with Python's uild system... :/
But thinking on it more, I think I can move this upstream and share code without ODR violations, by making the function name parameterized, or by only invoking it once. Not quite sure how to invoke it (either as a script or a .h) from Python's build system though...
I'll think on this and try to come up with something that moves this to cpython.
projects/cpython3/gen_fuzz.py
Outdated
PyErr_Print(); | ||
abort(); | ||
} | ||
return rv; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LLVMFuzzerTestOneInput should return 0 (other values are reserved for future)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I see that functions from https://github.com/ssbr/cpython/tree/fix-issue-29505/Modules/_fuzz only return 0, i.e. this is fine
* It's 2017 * $CC, not clang. Did not (yet) fix anything else -- e.g. where the script should live, or if it should use cpp or m4 or something instead of Python. ;)
I don't know how code review on github works, but I uploaded a second commit, PTAL. |
To see how it's defined, see the current WIP PR for CPython: python/cpython@master...ssbr:fix-issue-29505
Updated again, now moved the definition to CPython. It's a little wonky, so feel free to request more changes in the CPython branch. (I know the CPython devs will...) https://github.com/python/cpython/compare/master...ssbr:fix-issue-29505?expand=1 Thanks for all your review so far, PTAL! |
|
||
# Build python to run the app. | ||
pushd cpython | ||
./configure --prefix=$OUT/ --with-ensurepip=no |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. Will this copy everything into $OUT?
$OUT needs to contain only the fuzzer binaries and their dependencies, nothing extra
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CPython is dynamically linked, so we need it -- it isn't extra.
It's possible we can prune a lot of CPython though. (e.g. we don't need modules like TKinter/tk.) Most of those should already be excluded from the build because they're optional and we probably didn't install their dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, try to not put anything other then the fuzzer binaries and their deps into $OUT -- otherwise ClusterFuzz will get confused and will try to execute something that it shouldn't.
Is the python part of the change submitted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Python part isn't submitted -- the fuzz test needs to be written in C instead of C++, both to get approved and to integrate into the build system (which, annoyingly, passes -std=c99 even for C++ code). I tagged you in the review thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On our fuzzing infrastructure, $OUT is different from the one that the binaries are built in. Will this still work, or are there hardcoded paths that will break?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the question. It works if I run this command, for example:
python infra/helper.py build_image cpython3 && python infra/helper.py build_fuzzers --sanitizer=address cpython3 && python infra/helper.py run_fuzzer cpython3 fuzz_builtin_unicode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we build, $OUT is something like /out
When we run this on our fuzzing infrastructure, $OUT is something completely different. Are there paths etc that depend on things being in /out ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know Python's build process well enough to answer. Is there a way to test this locally?
@ssbr Just reviewed your PR for CPython. Do you have a problem with me adding myself to the CCs for this? I'm a Python core developer and a member of the python security team. |
@alex (I can't figure out how to reply to your comment, can you tell I'm a GitHub noob?), yes, that sounds like a great idea! |
Those initial fuzz tests were merged into upstream, want to take another look? |
How much stuff will it put into $OUT? |
It will put all of Python into $OUT, which makes sense, because the fuzzer needs the entirety of the Python runtime to run. We could probably filter out the pure-python part of the stdlib if we put some effort into it. |
@oliverchang will this work from OSS-Fuzz POV if $OUT has the entirety of Python? |
If not, I think we'll need to do something internally with Hermetic Python to avoid all this dynamic linking shenanigans. And then all fuzz tests would need to be run on the internal infrastructure using the versions that are vendored in third_party. Maybe not optimal, but at least we'll be able to get them running. Does that sound right? (I just want to be sure I'm not wasting my time here...) |
The current fuzzers we have only rely on libpython, so if we could get libpython as a The desired fuzzers for things like json or cvs would be more of a challenge. Those modules are generally dynamically loaded |
Yep. We look for ELF files with LLVMFuzzerTestOneInput in its contents so this should work, But, as for #731 (comment), I built this locally, and did
And see:
So it does look like the build expects things to always exist in the same (We probably should improve our |
Filed #837
Is there an environment variable we can use to know the true later value of $OUT? I can think of some hacky ways to try to fix this (PYTHONPATH environment variable, for starters), but I suspect this is fighting a losing battle. It'd be better to just use the same $OUT or something, or else to give up and use a different fuzz runner (like one where we can use hermetic python / bazel). |
You can use bazel here (grpc does this: https://github.com/google/oss-fuzz/blob/master/projects/grpc/build.sh) |
Unfortunately not. This can change at any time on our actual fuzzing infrastructure. |
We can't use hermetic python though, right? (Requires a patched glibc or something.) That's only available internally to Google. So maybe what I mean is "blaze". |
That I don't know :( |
So here is my suggestion: I've asked the python-dev IRC channel if they have any ideas on how to make it work with how oss-fuzz builds/tests binaries. If it can be made to work, great! If not, I propose we don't use oss-fuzz to fuzz this project, and instead fuzz the version stored in google's version control system with our internal fuzzing test tools. |
That's up to you, of course. |
@ssbr Ah, I think I may have misunderstood your comment. If you mean an environment variable during runtime, we can certainly provide one. It does look like PYTHONPATH fixes this. Following my previous example, this works:
I guess the fuzz target can initialize this path during initialization? |
You had me right the first time -- I don't trust PYTHONPATH as a solution. Moreover I don't see a meaningful difference between running the test in oss-fuzz vs in google's internal things. Either way the bug reports will make it upstream and we will hopefully catch security issues. But if we run it in some internal not-open-source tests, then we have the benefit of it working with the usual google build system which has already applied every hack in the book to make python hermetic and copyable, so we can piggyback on that work and stop worrying. |
I'm not too familiar with how things are internally these days, but OSS-Fuzz will likely reduce the amount of work you have to do wrt bug reports making their way upstream. We continuously test on the latest upstream revisions, and automatically handle triage, de-duping and reporting. |
My assumption was that ISE-team, or whoever else runs the fuzz tests, will handle it. Much like oss-fuzz. 😅 |
fwiw - I object to us running any of this internally at Google. We need to be part of the main oss-fuzz project pulling from upstream revisions. Doing this testing within our blackhole of internal stuff adds more work for us internally (read: which we're not going to do) and wouldn't provide results feedback to the upstream CPython project in a useful timely manner. We must figure out how to get this to build and run on the external oss-fuzz infrastructure. |
Ping. What's the status here? |
I wasn't able to figure out something that would work except the aforementioned |
We need to at least figure out how to make Python still work in OSS-Fuzz when /out is moved from /out. Mercurial does this by parsing the path of argv[0]: https://www.mercurial-scm.org/repo/hg/file/default/contrib/fuzz/pyutil.cc A use-site of this library is here: https://www.mercurial-scm.org/repo/hg/file/default/contrib/fuzz/manifest.cc The fuzz target is in the CPython repo so we'd need to put these changes in cpython and not oss-fuzz. |
I took a look at this, and based on @markus-kusano PR in #2031 and with his help, I think I managed to get it working. The following commands work fine now:
At some point I got the Should I do more tests and can I submit a PR ? |
ping @Dor1s Were you able to move the fuzz target out of /out and into a new directory and still have it work? see #731 (comment). If so: amazing :) If that is working then I can think of anything else that is blocking this, but someone from OSS-Fuzz would have to take a look. Probably easiest to just open a new pull request for your OSS-Fuzz changes. |
No, I didn't try that (and don't think that I ever said that I was going to :) ) |
Sorry for the lack of context in that ping :) Just was wondering if someone from OSS-Fuzz could see if this was working, though, I'm not sure if all the CPython changes are in the forked repo. |
I also added a couple of fuzz targets in the last commit there : https://github.com/Lucas-C/cpython-1/commits/master
Yes it all works when reproducing @oliverchang's test:
|
I'm afraid you'll have to blacklist things like this in a fuzz target OR patch the Python code using |
Ok thanks for your answer @Dor1s I'm going to open a new PR as soon as I have finished polishing thins a little |
@Lucas-C any update on that? I'm excited to see CPython fuzzed! Thank you for working on it :) |
Superseeded by #2493 |
This depends on changes that haven't landed in CPython yet. You can test it by replacing the git clone with:
The corresponding pull request to CPython is here: python/cpython#2878
Those are the first fuzz tests I've ever written, and maybe are too trivial or something. Will get to better fuzz tests soon.
Extra notes:
Google bug: b/37562550 (and there are others for other bits of CPython)
Python bug: https://bugs.python.org/issue29505