Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check min size needed with Arbitrary::size_hint and early exit if the data isn't long enough #59

Closed
fitzgen opened this issue Feb 25, 2020 · 8 comments · Fixed by #60
Closed

Comments

@fitzgen
Copy link
Member

fitzgen commented Feb 25, 2020

@bnjbvr was reporting that starting fuzzing from scratch with a fuzz target that takes an Arbtirary impl was spending a lot of time on three bytes long inputs, where the Arbitrary implementation required more input bytes than given. The fuzzer wasn't giving more input bytes fast enough to get to the meaty bits of the fuzz target.

In theory, we could use Arbitrary::size_hint to create a bunch of random seeds for the corpus of the appropriate size and/or control the maximum length of inputs that libfuzzer will generate.

I'm not sure exactly what this would look like, but it probably requires another env var dance between cargo fuzz and libfuzzer-sys like how the debug printing works.

As far as how this is exposed to users, maybe we should add cargo fuzz seed <target> as a new subcommand?

Or maybe we can just document that users can do something like head -c 100 /dev/random > fuzz/corpus/my-target/my-random-seed to add a random 100 bytes seed to their fuzz target's corpus to get things off the ground...

@Manishearth
Copy link
Member

it feels like libfuzzer should have an option for this? if not perhaps we should ask for one?

@fitzgen
Copy link
Member Author

fitzgen commented Feb 26, 2020

What are you imagining that libfuzzer would do? It doesn't know whether it provided enough bytes for Arbitrary or not. I'm not sure what you mean here.

@rohanpadhye
Copy link

Hello! I'm new to cargo-fuzz but I've encountered similar issues when implementing the Zest algorithm in JQF (a structured-input fuzzer for Java). The solution for this problem was to extend the input on-demand when the Quickcheck generators (analogous to Arbitrary) request more bytes, and then cap this extension to some max limit. The relevant source code is here: https://github.com/rohanpadhye/jqf/blob/aeff24a79e409f6ab2c1b8a5c987dbafb091461d/fuzz/src/main/java/edu/berkeley/cs/jqf/fuzz/ei/ZestGuidance.java#L1068-L1099

I don't know much about libfuzzer internals, but it would be good if a test driver could actually modify and extend the byte array that libfuzzer provides during test execution, so that cargo-fuzz (or libfuzzer-sys) could just populate any extra bytes with freshly generated random values on demand. That would be such a useful feature for libfuzzer to have in many different scenarios other than powering Arbitrary...

@Manishearth
Copy link
Member

@fitzgen ./fuzz --min-len=foo, and a way to pass that down. This would require executing the fuzz target with another env var to get the size hint.

@Manishearth
Copy link
Member

@rohanpadhye Worth proposing to libFuzzer. I suspect that given the general fuzzer focus on fuzzer-driven mutation this may not actually work that well, but worth a shot either way.

@fitzgen
Copy link
Member Author

fitzgen commented Feb 27, 2020

Thanks for the insights @rohanpadhye!

The solution for this problem was to extend the input on-demand when the Quickcheck generators (analogous to Arbitrary) request more bytes, and then cap this extension to some max limit. The relevant source code is here: https://github.com/rohanpadhye/jqf/blob/aeff24a79e409f6ab2c1b8a5c987dbafb091461d/fuzz/src/main/java/edu/berkeley/cs/jqf/fuzz/ei/ZestGuidance.java#L1068-L1099

@Manishearth, this reminds me of how during the redesign of Unstructured, we debated whether to error early when running out of bytes or to artificially extend it with zero bytes. We thought that exiting early would avoid confusing the fuzzer, and keep it in control of exploring the input space. Maybe this was the wrong choice?

We could fairly easily start returning zeros after we've exhausted the input (and have a max limit on how many zeros we return just in case something gets in a loop).

But if we were to actually pursue this, first I would want to take a real world fuzz target (or multiple targets) and test how much code coverage we get after X period of time with and without the zero extending, when starting from an empty corpus. Unfortunately, I don't have the cycles to run the experiment...

Anyways, I'll file an issue for a -min_len flag for libfuzzer.

@fitzgen
Copy link
Member Author

fitzgen commented Feb 27, 2020

Oh another idea is we could do something like this:

if data.len() < Arbitrary::<FuzzTargetType>::size_hint().0 {
    return;
}

...

in the libfuzzer_sys::fuzz_target! macro. That would at least cut down on the reported code covered inside Arbitrary impls and also be a bit faster.

@fitzgen
Copy link
Member Author

fitzgen commented Feb 27, 2020

In fact, I think that would be the best way to handle this.

@fitzgen fitzgen transferred this issue from rust-fuzz/cargo-fuzz Feb 27, 2020
@fitzgen fitzgen transferred this issue from rust-fuzz/arbitrary Feb 27, 2020
@fitzgen fitzgen changed the title Somehow leverage Arbitrary::size_hint to create initial seeds for corpus or control -max_len and -len_control? Check min size needed with Arbitrary::size_hint and early exit if the data isn't long enough Feb 27, 2020
fitzgen added a commit to fitzgen/libfuzzer that referenced this issue Feb 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants