New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use fewer input bytes for arbitrary_loop #117
Conversation
Asking for an arbitrary bool to decide whether the loop should keep going consumes a byte per loop iteration, while calling `int_in_range` instead consumes at most four bytes for any combination of arguments, and often less.
This seems valuable to have even if it ends up consuming the same number of bytes, just from a convenience perspective. Will it help the fuzzer explore the space more efficiently? No idea. Probably depends on how well you choose bounds. |
Unfortunately, we don't have any rigorous benchmark corpus of generators to use here, or any way to formally reason about this kind of change in general. I think the best approach (balancing practicality of running an experiment and trustworthyness of results) in these situations is to show the effect on coverage/time for some real, non-trivial To speculate about this PR's change a bit:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, nice logic clean up too -- do you want to do an experiment on coverage/time?
That is already a better plan than I had: thanks!
Do you want that before merging, or is the logic clean-up good enough reason to merge this? I want to think about experiment setup a little.
I'll put together a PR for that at some point too then! |
If you are going to do an experiment, we might as well wait for its results. FWIW, pretty much any experiment is better than none here, so I wouldn't stress experimental setup too much. Since this is unlikely to affect fuzzer throughput, you could do a fixed number of fuzz runs to even out the comparison and mitigate bias from noise on your system, if you wanted. (Pass If you don't have time to do an experiment at all, then maybe we will just merge based on the clean ups and this probably(?) not making too much of a difference on fuzzer exploration. |
The wasm-tools I tested the existing version for 1,000,000 runs. I'm not testing on a machine configured for benchmarking, but it took roughly 5 minutes and ended with this result:
Then I applied a patch to wasm-smith that's equivalent to this PR and spent 20 minutes getting this result:
So... huh. That's interesting! |
A second run with the patched wasm-smith took only 7 minutes and got a little closer to the unpatched coverage:
While another run of the unpatched fuzz target still took 5 minutes but had lower coverage, closer to the patched case:
So maybe I got extraordinarily unlucky with the RNG the first time? |
I think that because
we can go ahead and merge this. Thanks for digging into this! |
Asking for an arbitrary bool to decide whether the loop should keep going consumes a byte per loop iteration, while calling
int_in_range
instead consumes at most four bytes for any combination of arguments, and often less.I'm trying to develop an intuition for what helps or hinders libFuzzer when driving a fuzz target that uses
arbitrary
, but I don't know enough. Do you suppose this is likely to work better? Do you have any advice on how to reason about questions like this?I've been thinking about a
bounded_arbitrary_len
(orarbitrary_bounded_len
?arbitrary_len_in_range
?) that takes optional min/max bounds likearbitrary_loop
does, but takes bytes from the end likearbitrary_len
does. That similarly could consume fewer bytes than just callingarbitrary_len
and clamping the result. Is that likely to help a fuzzer explore the state space more effectively?