Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

meta: forge fuzzer improvements #387

Closed
mds1 opened this issue Jan 6, 2022 · 12 comments
Closed

meta: forge fuzzer improvements #387

mds1 opened this issue Jan 6, 2022 · 12 comments
Labels
A-testing Area: testing C-forge Command: forge

Comments

@mds1
Copy link
Collaborator

mds1 commented Jan 6, 2022

The current state of fuzzing is naive (we think, more on this below)—it uses proptest strategies to generate random, uniformly distributed inputs based on Solidity types. This is an ok start, but isn't great. There's lots of improvements to the fuzzer that can be made to increase the chances of finding bugs.

This issue is spec out (1) what fuzzing features and functionality we want to have, and then (2) how to approach that, e.g. are custom proptest strategies sufficient, should we switch to a different crate, etc.

I'm certainly no fuzzing expert, but as a jumping off point here's my assessment of functionality we'd want to have, as well as a list of other rust crates for fuzzing to consider

Fuzzing Features

These are just ones that come to mind based on my limited knowledge of fuzzing and my use of foundry so far

Bias random values towards extremes

Based on the proptest docs and this open proptest issue, it appears proptest does not bias towards extreme values and edge cases such as 0, max, min, etc.

"Synchronize" the fuzzed inputs

Some bugs require multiple inputs to be in sync, e.g. all zeros or empty arrays. If each input is generated independently and uniformly, this is unlikely to happen. I think this is the current behavior of proptest, but I may be wrong because the fuzzer did find a failure case with a counter example of [], [], [], 0x, 0x for inputs of types (address,uint256[],uint256[],uint256[],bytes,bytes). I believe this specific example came up on each fuzz run, suggesting proptest may be smarter than myself and the docs give it credit for

Some degree of control over array and bytes lengths

Currently the max size of inputs for arrays and bytes are bounded, but some bugs may only surface with larger inputs. Always allowing very large inputs might slow down tests, so it'd be useful to either allow them be unbound (or have larger bounds) via a flag (e.g. for use in CI), or to bias the fuzzer to have large inputs only be used occasionally

Constant mining

Details and motivation can be found here. The summary is that echidna uses constant mining to extract return values from other methods and uses those as fuzzer inputs. As shown in the link, this helps find bugs that would not be found otherwise

assume functionality

Allow inputs that don't meet certain conditions, such as ignoring the zero address or ensuring that x + y < z, to be discarded and have a new set of inputs generated. Discarded fuzz runs are not counted toward the number of executed runs, so new inputs would be generated and the test re-ran

Bounding inputs to a range

It's debatable whether this should be part of the fuzzer, or if users should just manage it manually by bounding the fuzzer's inputs, but a way to control the range of inputs is often useful to avoid reverts due to overflows, etc.

Symbolically execute to seed fuzzer inputs

See #387 (comment) below:

Consider using SMT solvers like z3 directly to generate random valid inputs, instead of hoping some generic fuzzer will support smart contract specific needs.

h/t @yaronvel

Rust Fuzzing Crates

There's probably others, but here's a few I've come across so far

  • proptest is what we currently use
  • quickcheck — QuickCheck for Haskell is what dapptools uses
  • afl.rs allows fuzzing with AFL
  • cargo-fuzz allows fuzzing with libFuzzer
  • arbitrary is intended to be combined with a fuzzer like libFuzzer and cargo-fuzz or AFL, and to help you turn the raw, untyped byte buffers that they produce into well-typed, valid, structured values (h/t @mattsse)
@mattsse
Copy link
Member

mattsse commented Jan 6, 2022

there's also
https://github.com/rust-fuzz/arbitrary

@gakonst
Copy link
Member

gakonst commented Jan 6, 2022

We could also explore making our own Strategy implementations for proptest.

@yaronvel
Copy link

yaronvel commented Jan 6, 2022

Consider using SMT solvers like z3 directly to generate random valid inputs, instead of hoping some generic fuzzer will support smart contract specific needs.
Can elaborate more on this if you want, in my previous life I once built a randomized testing tool with z3 from scratch.

@wuestholz
Copy link

@mds1 @gakonst Great to hear that there are other efforts to get more users hooked on fuzzing! :) You might want to take a look at our Harvey fuzzer; see https://mariachris.github.io/Pubs/FSE-2020-Harvey.pdf for ideas on what features make a big difference in practice. Harvey is used both in Diligence's MythX service and our dedicated Diligence fuzzing solution.

@gakonst gakonst added the C-forge Command: forge label Jan 8, 2022
@mds1
Copy link
Collaborator Author

mds1 commented Jan 13, 2022

Consider using SMT solvers like z3 directly to generate random valid inputs, instead of hoping some generic fuzzer will support smart contract specific needs.
Can elaborate more on this if you want, in my previous life I once built a randomized testing tool with z3 from scratch.

Hey @yaronvel—I'm not too familiar with how you'd implement this in practice, and foundry doesn't yet have any symbolic execution capabilities, so this probably won't be the first path we go down. However I've always thought seeding the fuzzer with solver outputs would be valuable, so would love if you could elaborate more!

@mds1
Copy link
Collaborator Author

mds1 commented Jan 13, 2022

Great to hear that there are other efforts to get more users hooked on fuzzing! :) You might want to take a look at our Harvey fuzzer; see https://mariachris.github.io/Pubs/FSE-2020-Harvey.pdf for ideas on what features make a big difference in practice. Harvey is used both in Diligence's MythX service and our dedicated Diligence fuzzing solution.

Thanks @wuestholz this looks great, planning to read this soon. Only question until then is: looks like the paper is from November 2020, so is there anything you've learned since then that isn't covered in the paper, or anything you'd change if writing a similar paper today? It's only about a year old, but a year is a long time in the smart contract world! 😁

@wuestholz
Copy link

Great to hear that there are other efforts to get more users hooked on fuzzing! :) You might want to take a look at our Harvey fuzzer; see https://mariachris.github.io/Pubs/FSE-2020-Harvey.pdf for ideas on what features make a big difference in practice. Harvey is used both in Diligence's MythX service and our dedicated Diligence fuzzing solution.

Thanks @wuestholz this looks great, planning to read this soon. Only question until then is: looks like the paper is from November 2020, so is there anything you've learned since then that isn't covered in the paper, or anything you'd change if writing a similar paper today? It's only about a year old, but a year is a long time in the smart contract world! 😁

😁 We're continuously improving Harvey based on feedback from auditors and clients. We're planning to write up one of the novel techniques we added soon (spoiler alert: up to 30% coverage increase for some of our benchmarks). I think all the lessons in the paper still apply today, but we've definitely been optimizing Harvey more for property checking (for instance, in combination with our Scribble specification language). We've also found several cool use-cases for the input prediction technique we presented in our FSE paper.

@gakonst
Copy link
Member

gakonst commented Jan 13, 2022

How much of the specification/property writing do you think we can overload into Solidity syntax, vs having a separate language?

@wuestholz
Copy link

@gakonst I'm hoping a lot! 😁 We're trying to keep Scribble as simple as possible. Ideally, a Solidity developer can understand most specifications "out-of-the-box"; for instance, we allow regular Solidity expressions whenever possible, and only extend the syntax slightly when needed (for instance, old-expressions in postconditions).

From my perspective, the Solidity team can "steal" as much of Scribble as possible. 😉 We partly started Scribble to explore the language design space and are more than happy to evolve the language with feedback from the community.

@transmissions11
Copy link
Contributor

here's a concrete example of the fuzzer missing multiple bugs that other fuzzers (dapptools) would detect easily:

function testEquality(uint256 x, uint256 y) public {
        uint256 xy;

        unchecked {
            xy = x * y;
        }

        if ((x != 0 && xy / x != y)) return;

        assertEq(((xy - 1) / 1e18) + 1, (xy - 1) / (1e18 + 1));
}

Screen Shot 2022-02-06 at 1 36 46 PM

@mds1
Copy link
Collaborator Author

mds1 commented Feb 10, 2022

Primary fuzzing technique we should focus on right now is building the best dictionary, and selecting inputs from both the dictionary and from a randomly generated value, with a user-defined weight specifying the frequency of each.

  • Biasing towards extremes can be handled by adding values to the dictionary
  • Constant mining can be handled by dumping the EVM state into the dictionary
  • Can also add values by parsing solc output or using slither
  • Instead of coverage guidance, add a flag to output a coverage report, so user knows what wasn't hit
  • A cheatcode to add values to dictionary can help improve coverage, based on coverage report

To implement this we need a weighted union of two strategies:

  • Strategy 1: Randomly generated values, which we currently have
  • Strategy 2: The dictionary, using proptest's select, which creates a strategy that uniformly selects one value from values:
  • Union: Use new_weighted function to create a strategy that selects from the above two strategies with the specified weights
  • Echidna's default dictionary frequency is 40%, so we should use that value also. This means random values are used 1.5 times more often than dictionary values, and based on how new_weighted requires wights to be specified, this is equivalent to a weight of 1 for dictionary and 1.5 for random values

@mds1
Copy link
Collaborator Author

mds1 commented Feb 26, 2023

Closing this since the fuzzer has improved since this was created, so this is now stale. Will create a new issue to track future fuzz improvements

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-testing Area: testing C-forge Command: forge
Projects
Archived in project
Development

No branches or pull requests

6 participants