Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testing: add fuzz test support #44551

Open
katiehockman opened this issue Feb 23, 2021 · 46 comments
Open

testing: add fuzz test support #44551

katiehockman opened this issue Feb 23, 2021 · 46 comments

Comments

@katiehockman
Copy link
Member

@katiehockman katiehockman commented Feb 23, 2021

This proposal is to add fuzz test support to Go. This will add a new testing.F type, support FuzzFoo functions in _test.go files, and add new go command behavior.

A design draft has already been published and iterated on based on feedback from the Go community. This is the next step to propose that this design draft become a language feature.

This feature will be considered experimental in Go 1.17, and the API will not be covered by the Go 1 compatibility promise yet. The functionality that goes into this release is expected to have bugs and be missing features, but should serve as a proof-of-concept for which Go developers can experiment and provide feedback. Since this will be an experimental feature to start, we also expect there will be room for growth for the mutator and fuzzing engine in future Go releases.

Below are the parts of the design draft which will not make it into 1.17, and will be considered later on:

  • support for fuzzing with -race and -msan
  • support for fuzzing with -keepfuzzing
  • deduplication of similar crashes caused by different mutations, which would be a prerequisite to implementing -keepfuzzing (to reduce noise)
  • allowing special options while fuzzing (e.g. maximum input size)
  • dictionary support
  • customizable coverage instrumentation while fuzzing (e.g. to only instrument certain packages or files)
  • custom generators for the mutator
  • structured fuzzing support for struct and non-primitive types
  • [Stretch goal for 1.17] structured fuzzing support for primitive types other than []byte (e.g. string, int, float64)
@katiehockman katiehockman added this to the Proposal milestone Feb 23, 2021
@katiehockman katiehockman self-assigned this Feb 23, 2021
@ianlancetaylor ianlancetaylor added this to Incoming in Proposals Feb 23, 2021
@robpike
Copy link
Contributor

@robpike robpike commented Feb 23, 2021

How can we keep the fuzzing corpora small, and the fuzzing process efficient and on point? Although fuzzing is good at finding certain classes of bugs, it is very expensive in CPU and storage and cost/benefit ratio remains unclear. I worry about wasting energy and filling up git repos with testdata noise. Is there a way to make the testing focused and result-driven? Also, will we wake up one day and find fuzzing is a mandatory part of all builds? That would be terrifying.

Apologies if I'm late to the party; I did not comment on the original proposal.

@FiloSottile
Copy link
Member

@FiloSottile FiloSottile commented Feb 23, 2021

How can we keep the fuzzing corpora small, and the fuzzing process efficient and on point? Although fuzzing is good at finding certain classes of bugs, it is very expensive in CPU and storage and cost/benefit ratio remains unclear.

The track record of go-fuzz provides pretty amazing evidence that fuzzing is good at finding bugs that humans had not found. In my experience, just a few CPU minutes of fuzzing can be extremely effective at the margin. There aren't a lot of other ways to turn fixed amounts of CPUs (as opposed to dynamic, runtime checks) into bugs caught.

I worry about wasting energy and filling up git repos with testdata noise. Is there a way to make the testing focused and result-driven?

Yeah, filling repositories with random-looking stuff is definitely undesirable. To be clear, in this design the mutator-generated corpus is not in testdata, we agree that it would be way too noisy to check in.

Samples are only placed in testdata when they resulted in a test failure, and then it's up to the developer to check them in as a regression test, or turn them into a more full-fledged unit test.

Also, will we wake up one day and find fuzzing is a mandatory part of all builds? That would be terrifying.

I'm not sure what you mean with being a part of all builds. If you mean running as part of go build, definitely not.

The expensive, open-ended part of fuzzing won't even be a default part of go test: without the -fuzz flag only seed and testdata corpora will be tested, and the mutator won't run. (That would make tests non-deterministic, aside from resources concerns.)

@robpike
Copy link
Contributor

@robpike robpike commented Feb 24, 2021

Thanks, Filo. Your point about how we store stuff is important, and I missed that.

I understand the value of fuzzing, and my grumpy old man concerns are not an attempt to reduce that. (I will say that some of the bugs it finds are not worth fixing, and certainly not the cost of finding, although it's very hard to make that case.) But I have three general concerns:

  1. A concentration of testing on fuzzing. It's great, but it's not the only way to find bugs, although some act as if it is.

  2. The massive computational resources it consumes. It's directed, but not that far off a million monkeys at a million keyboards.

  3. Its adoption as part of the process. This is where the proposal to incorporate it needs attention. The technical process and the culture must make it clear that (1) and (2) mean that it cannot be part of the standard build process.

Go build will surely not run it always, but organizations will. Being automated, companies will require it, just as they require zero lint errors even as we protest that lint is not perfect.

I am just expressing caution, care, and an eye on the big picture here. I am not saying we shouldn't do this, just that we should do it with all considerations in mind.

@everestmz
Copy link

@everestmz everestmz commented Feb 24, 2021

This is awesome - really excited to see native Go fuzzing in the proposal stage now.

Hopefully it's not too early to make a suggestion for the actual implementation of this, but it would be super useful if the testing.F type was an interface, rather than a struct like testing.T/B are.

I think fuzzing is in sort of a unique spot among the supported testing types in that there's no one correct way to fuzz test your code. Having the ability to build a type that implements the testing.F interface would make it simple to swap out fuzzing backends, enabling the community to take advantage of all the awesome open source work out there by building integrations to things like AFL++.

There's already been interest in something like this from the original Reddit thread mentioned on the proposal document (like this comment here), and I think it would be a missed opportunity to not allow developers to hook these functions into property-based testing tools (like Gopter) to get some minimal testing done at the go test stage, while allowing the same test to later be fuzzed to discover deeper bugs.

There are probably details I've missed here and I'd like to hear some other opinions on this, but given all the potential use cases I think it's worth some discussion.

@FiloSottile
Copy link
Member

@FiloSottile FiloSottile commented Feb 24, 2021

  1. A concentration of testing on fuzzing. It's great, but it's not the only way to find bugs, although some act as if it is.

Agreed, I like to say that fuzzing finds bugs at the margin. It's why we are interested in it as the Security team: bugs caught at the margin are ones that don't make it into production to become vulnerabilities. Just like it finds bugs that are often not found in other ways, it can't find bugs that other ways can.

We'll do our best to communicate to developers in the coming months about what fuzzing can (and can't) do for them, and we'd like to hear it if that content doesn't push in the right direction.

  1. The massive computational resources it consumes. It's directed, but not that far off a million monkeys at a million keyboards.
  2. Its adoption as part of the process. This is where the proposal to incorporate it needs attention. The technical process and the culture must make it clear that (1) and (2) mean that it cannot be part of the standard build process.

Indeed, this is the hardest part: creating a culture in the ecosystem that makes the practice successful. We want fuzzing to be part of the development—not build or security—process: make a change to the relevant code, run go test -fuzz with the locally cached corpus that can already hit most code branches, find a bug quickly, debug-fix-verify with go test, turn the crasher into a unit test or check it in as a regression test.

There is probably also space for fuzzing in CI, like OSS-Fuzz demonstrated and so that the corpus can be centralized, but since it's non-deterministic it can't meaningfully block a build. Hopefully the CLI choices, like not having a fixed fuzzing time, nudge in that direction.

Likewise, I think writing will be the best tool we have to shape usage, and welcome feedback on future drafts!

@bcmills
Copy link
Member

@bcmills bcmills commented Feb 24, 2021

@everestmz

it would be super useful if the testing.F type was an interface, rather than a struct like testing.T/B are.

Why would such an interface type need to be defined in the testing package itself? Keeping testing.F a concrete type allows for future methods to be added and documented, and packages that want to abstract over different F-like implementations can already defined their own interface for the specific methods they need.

@JAicewizard
Copy link

@JAicewizard JAicewizard commented Feb 24, 2021

turn the crasher into a unit test or check it in as a regression test.

Absolute wild wild dream, but... wouldn't it be amazing if there would be an option to transform bugs found with fuzzing into unit test cases automatically? It could be as simple as copying the function, give is a unique name, and replace testData := {source of random data} with testData := {the data that makes the test fail}.

This would make it very clear that fuzzing is generating data that make your test fail, and in itself not a way to test your code. Its a random edg-case generator, not a validator.

@bcmills
Copy link
Member

@bcmills bcmills commented Feb 24, 2021

The flow of control in the design doc seems a bit off to me. The Fuzz method in particular carries a lot of type-state invariants:

  • “Only one call to Fuzz is allowed per fuzz target, and any subsequent calls will panic.”
  • and, Add “cannot be invoked after or within the Fuzz function.”

Making Fuzz a method seems to invite user code to try to perform setup before the call to Fuzz and/or cleanup afterward. (Is that the intent?)

But then I wonder if there is a cleaner alternative, especially given the (*T).Cleanup methods added in #32111. The restriction to a single Fuzz invocation per test makes me think that it functions more like a constructor-function than a method call on an existing object. Would it make sense to make it a constructor method on a *T, returning an explicit object on which Add could then be called? That would make it clearer that the fuzz function should not call Add, since the user would have to go out of their way to even have the object with the Add method in scope at that point.

I'm imagining something like:

package testing

func (*T) NewFuzzer(ff interface{}) *F
func (*F) Add(args interface{})

That would make the example in the doc more like:

func TestFuzzMarshalFoo(t *testing.T) {
	f := t.NewFuzzer(func(t *testing.T, a string, num *big.Int) {
		t.Parallel() // seed corpus tests can run in parallel
		if num.Sign() <= 0 {
			t.Skip() // only test positive numbers
		}
		val, err := MarshalFoo(a, num)
		if err != nil {
			t.Skip()
		}
		if val == nil {
			t.Fatal("MarshalFoo: val == nil, err == nil")
		}
		a2, num2, err := UnmarshalFoo(val)
		if err != nil {
			t.Fatalf("failed to unmarshal valid Foo: %v", err)
		}
		if a2 == nil || num2 == nil {
			t.Error("UnmarshalFoo: a==nil, num==nil, err==nil")
		}
		if a2 != a || !num2.Equal(num) {
			t.Error("UnmarshalFoo does not match the provided input")
		}
	})

	// Seed the initial corpus
	f.Add("cat", big.NewInt(1341))
	f.Add("!mouse", big.NewInt(0))
}

If the -fuzz flag is set, the additional fuzzing would occur after TestFuzzMarshalFoo returns but before its Cleanup callbacks are executed.

@katiehockman
Copy link
Member Author

@katiehockman katiehockman commented Feb 24, 2021

@JAicewizard

wouldn't it be amazing if there would be an option to transform bugs found with fuzzing into unit test cases automatically?

There were discussions about code generation in some of the first designs, but it was eventually decided that it was a complexity without much gain, especially since we can gain the same benefits in a simpler way. With this design, new crashes are put into testdata and will automatically run as a regression test on the next go test run (even when -fuzz isn't used).

And if someone would prefer to put this crash into the code instead of using the one in testdata, they have a couple options:

  1. They can use f.Add with the existing fuzz target (which works basically the same way as it would if it were in testdata, its just in the code instead)
  2. They can copy-paste the Fuzz target and make slight alterations to the code to make it a unit test. We intentionally designed fuzz targets to look very similar to unit tests, which allows fuzz targets to be easily translated into unit tests, but also allows unit tests to be easily translated into fuzz targets.
@josharian
Copy link
Contributor

@josharian josharian commented Feb 24, 2021

Fuzz functions will change over time, as all code does. If/when they change, the crashes in testdata may no longer be effectively testing against regressions. It seems to me that crashes need to be reified in regular tests.

That said, a trained corpus (not just crashes) can make fuzzing much more effective at finding new issues. I understand not wanting to pollute a repo with a corpus, but I think we should design an intended way for people to keep and share corpora. I have used git submodules in the past reasonably hapiily, but that's git-specific (and submodules are generally frustrating to work with). I'm not sure what the answer is here, but I think at least trying for an answer is an important part of the fuzzing big picture.

@dbrumley
Copy link

@dbrumley dbrumley commented Feb 24, 2021

A concentration of testing on fuzzing. It's great, but it's not the only way to find bugs, although some act as if it is.

I think part of the attraction is not just it finds bugs, but also how it integrates with developer workflow. The nice thing about fuzzing is that each bug has a witness input. That input, and the ability to debug with your own tools, makes fuzzing-as-a-process quite nice.

That being said, definitely no silver bullets here.

The massive computational resources it consumes. It's directed, but not that far off a million monkeys at a million keyboards.

This also relates to the proposal "This type of testing works best when able to run more mutations quickly, rather than fewer mutations intelligently.".

AFL/libfuzzer really focus on iteration speed, but there are other approaches like symbolic execution which are slow, but generate new coverage (more or less) by definition on each iteration. Less fast million monkeys; more like applied model checking using heavy-weight SMT/SAT solvers. Most research, and some products, pair the two together. I know for a fact Mayhem and Mechanical Phish did this in the cyber grand challenge, for instance.

The great thing about the proposal is the interface and build process is standardized. I view this proposal as relevant beyond fuzzing to the broader set of dynamic analysis tools.

@everestmz
Copy link

@everestmz everestmz commented Feb 24, 2021

@bcmills

Why would such an interface type need to be defined in the testing package itself?

Putting the interface directly into the testing package gives any codebase that uses the default Go fuzzing tooling significant flexibility when it comes to changing up the testing methodology.

In the world of C++ fuzzing, the fuzz interface has been de-facto standardized on the LLVMFuzzerTestOneInput function, originally created by libFuzzer. Because this is just a function that expects a value, every other C++ fuzzing tool works with the same interface, allowing users to take a codebase full of fuzz functions and use any of the available open-source tools with no code changes.

This ability has been put to great use by tools like OSS-fuzz/clusterfuzz, which allow users to write one fuzz test and have it attacked by 3 different fuzzing engines without any more developer effort.

We constantly see evidence (like this example) of new fuzzing techniques uncovering bugs in code that had previously been thoroughly fuzzed. Making testing.F a concrete type locks down any method that uses it to the standard Go fuzzing tooling, and trying out a different tool would require significant developer effort on any codebase with a healthy collection of fuzz tests.

It would feel like an oversight to hamstring Go fuzzing right out of the gate by limiting available testing methodologies to the default - the fuzzing community consistently develops more and more effective techniques and algorithms, and it would be a shame to not take advantage of that.

Keeping testing.F a concrete type allows for future methods to be added and documented

I might be misunderstanding you here, but how would turning testing.F into an interface preclude adding more functions in the future?

@rsc
Copy link
Contributor

@rsc rsc commented Feb 24, 2021

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc rsc moved this from Incoming to Active in Proposals Feb 24, 2021
@katiehockman
Copy link
Member Author

@katiehockman katiehockman commented Feb 24, 2021

@everestmz

I might be misunderstanding you here, but how would turning testing.F into an interface preclude adding more functions in the future?

I think what @bcmills is saying here is that if we define testing.F as an interface, the interface is basically locked as soon as it's no longer experimental. Otherwise, if we added a new function, then any types that don't update will no longer implement the interface, and will break backwards compatibility. If testing.F is a concrete type, that's not a concern.

I am almost positive that we will come up with more ways to extend the testing.F type in the future, e.g. for providing options while fuzzing (which is still a future consideration). Making it an interface greatly limits flexibility for growth.

It would feel like an oversight to hamstring Go fuzzing right out of the gate by limiting available testing methodologies to the default - the fuzzing community consistently develops more and more effective techniques and algorithms, and it would be a shame to not take advantage of that.

This is a fair point. What do you think about supporting different modes with go build to allow running with different fuzzing engines. For example, go build -libFuzzer which could instrument a go binary with what is needed for running with libFuzzer, and provide a binary that can be run directly with libFuzzer. I realize that is still somewhat limiting, as we would need to provide different modes for the instrumentations that are supported, but it's something that may be prove useful.

@katiehockman
Copy link
Member Author

@katiehockman katiehockman commented Feb 24, 2021

@bcmills
re: func (*T) NewFuzzer(ff interface{}) *F

I can see your point that it has some type-state invariants. However, I'm not as concerned by that. A few things:

The initial designs looked like this: f.Fuzz(func(f *testing.F, b []byte) {...}). A big reason why we decided to change it to f.Fuzz(func(t *testing.T, b []byte) {...}) was in an attempt to make the scope of the fuzz function clearer. Because the fuzz function takes a *testing.T type, it should be clearer that it's basically a standalone unit test, and that you shouldn't be calling *testing.F functions within it (since you have a *testing.T)

I also think that if someone calls f.Add inside of f.Fuzz, then they are confused by what f.Add is doing. I believe that if we document these functions really well, that it should be apparent that seeding the corpus while fuzzing doesn't really make sense, but instead is something you do beforehand.

I also think go vet should be able to help with this.

The main reason I like f.Fuzz instead of f.NewFuzzer is that f.Fuzz is clearly an action. f.Fuzz says "Now start fuzzing this function that I've provided". I find it a bit more confusing that your proposition could call f.NewFuzzer, creating a testing.F type, then do nothing else on it. It just has to be assumed that it will start fuzzing at the end of the function, which isn't obvious to the reader. It also requires that you must define the function that is being fuzzed before you can seed the corpus, even though the actual code would first seed the corpus, then run the function. Basically requiring that the "pre-work" must be written after the actual fuzz function, which feels awkward.

Making Fuzz a method seems to invite user code to try to perform setup before the call to Fuzz and/or cleanup afterward. (Is that the intent?)

Yes and no. Pre-work should be done before calling f.Fuzz (ie. before fuzzing). But if there is cleanup work that needs to be done, then they should call f.Cleanup before f.Fuzz, or t.Cleanup inside the fuzz function, depending on what they want to happen. For example, someone could do this:

func FuzzFoo(f *testing.F) {
	f.Add([]byte{0})
	f.Cleanup(func() { f.Log("done!") })
	f.Fuzz(func(t *testing.T, b []byte) {
		t.Cleanup(func() { t.Logf("inner") })
	})
}
@zeebo
Copy link
Contributor

@zeebo zeebo commented Feb 25, 2021

It doesn't seem necessary to me to have testing.F be an interface to support other implementations because the concrete type does not contain or expose any details about how the fuzzing is happening. For example, it seems to me like the testing package could grow some API that is expected to be called in testing.Main (or something) that can control exactly how all of the testing.F values will later behave.

In other words, it doesn't seem necessary to do the virtual dispatch on the Add and Fuzz methods directly as long as the concrete type internally is able to do the virtual dispatch instead.

@bcmills
Copy link
Member

@bcmills bcmills commented Feb 25, 2021

This feature will be considered experimental in Go 1.17, and the API will not be covered by the Go 1 compatibility promise yet.

In that case, could we gate it behind a build tag or GOEXPERIMENT setting?
(See also the (lengthy!) discussion in #34409.)

@cristian-ilies-vasile
Copy link

@cristian-ilies-vasile cristian-ilies-vasile commented Feb 25, 2021

@FiloSottile
"We'll do our best to communicate to developers in the coming months about what fuzzing can (and can't) do for them"

Well, I recommend the on line great book: The Fuzzing Book (Tools and Techniques for Generating Software Tests)
https://www.fuzzingbook.org/

@katiehockman
Copy link
Member Author

@katiehockman katiehockman commented Feb 25, 2021

@bcmills

In that case, could we gate it behind a build tag or GOEXPERIMENT setting?
(See also the (lengthy!) discussion in #34409.)

@jayconrod and I discussed this, and ended up deciding that there wasn't much to gain from gating it. Writing a fuzz target is already opt-in, since a developer would need to choose to write a target into their own source code.
And since these targets are in _test.go files, even if a module you depend on decides to add a fuzz target to one of their packages, that wouldn't be included in the build or impact your testing, unless you did something like go test ./....

It's not out of the question if we feel that not gating the feature could cause problems, but we couldn't come up with a compelling need for it. LMK if we are missing something here.

@jayconrod
Copy link
Contributor

@jayconrod jayconrod commented Feb 25, 2021

About testing.F being an interface or not:

In our experimental implementation, testing.F doesn't have much logic on its own. It serves as an entry point into the internal/fuzz package, which contains most of the logic. We're trying to keep internal/fuzz only loosely coupled with testing. In the future, we may want to "promote" that package out of internal/, making its API available for uses outside of testing.

@bcmills
Copy link
Member

@bcmills bcmills commented Feb 25, 2021

@katiehockman, ideally a Go user ought to be able to run go test all and get back a meaningful result about the correctness and consistency of the transitive dependencies of their module. (That is especially important when moving to a new, untested platform, or when upgrading one or more dependencies in order to check for accidental regressions or accidental reliance on undocumented behavior.)

If the fuzzing API is stable as of Go 1.17, then users who opt in to fuzzing can guard some of their _test.go files with //go:build go1.17, and not break go test all for their users when they eventually upgrade to (say) Go 1.18 or Go 1.19. However, if the fuzzing API changes between Go 1.17 and Go 1.18, then go test as of Go 1.18 may be completely unable to even build the test for the package.

So I would prefer that we put the unstable parts of the API behind an explicit build tag — perhaps go117fuzz or unstable — so that early adopters won't break go test all for their downstream consumers in a future release.

@katiehockman
Copy link
Member Author

@katiehockman katiehockman commented Mar 1, 2021

@bcmills That's a good point about go test all. Let's plan to add a build flag then. go117fuzz sounds good, since it's more specific.

@rsc
Copy link
Contributor

@rsc rsc commented Mar 10, 2021

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

@rsc rsc moved this from Active to Likely Accept in Proposals Mar 10, 2021
@beoran
Copy link

@beoran beoran commented Mar 11, 2021

I agree that fuzzing is great but I am also agreeing with @robpike's concerns. So I would like to propose three small but significant change to address these concerns:

In stead of go test -fuzz make it go fuzz, and in stead of using the testing.F package and type, make it the fuzzing.F
package and type. And fuzz tests must be placed in in files named xxxx_fuzz.go.

Like this, we create a separation between testing and fuzzing that IMO is very desirable. It helps avoiding that fuzzing becomes incorporated with the normal, existing tests. And it also is more clear to the user whether fuzz tests are available or not.

@FiloSottile
Copy link
Member

@FiloSottile FiloSottile commented Mar 11, 2021

In stead of go test -fuzz make it go fuzz, and in stead of using the testing.F package and type, make it the fuzzing.F
package and type. And fuzz tests must be placed in in files named xxxx_fuzz.go.

The cost of doing this would be losing the extremely natural "go test -fuzzcrash foundgo test FAIL → write fixgo test PASS" cycle, which is one of the highlights of this UX.

The -fuzz flag, which is not and will never be set by default, keeps the expensive, non-deterministic, and "different" part of fuzz tests firmly opt-in already.

Benchmarks are also expensive and not a replacement for unit tests, but we don't have a go bench command or _bench.go files.

@josharian
Copy link
Contributor

@josharian josharian commented Mar 11, 2021

@FiloSottile I think a point I made earlier bears re-iterating:

Fuzz functions will change over time, as all code does. If/when they change, the crashes in testdata may no longer be effectively testing against regressions.

So I think your cycle needs an additional step: crash found → write explicit test → go test FAIL.

@FiloSottile
Copy link
Member

@FiloSottile FiloSottile commented Mar 11, 2021

@FiloSottile I think a point I made earlier bears re-iterating:

Fuzz functions will change over time, as all code does. If/when they change, the crashes in testdata may no longer be effectively testing against regressions.

So I think your cycle needs an additional step: crash found → write explicit test → go test FAIL.

The cycle I am talking about would be within the same debugging session. The fuzzer finds a crasher, you sprinkle some printf, run go test, write a fix, run go test. There is no interval of time for the code to change in there, and being able to reproduce the crasher with just go test instead of having to write a unit test before you understand the issue is critical.

About checked in testdata, I am trying to think through what makes fuzzer inputs more likely to rot silently than, say, table driven tests. The testdata entry will have type information, it won't be just a binary blob that can silently get out of sync with the parser, and the plan is to make the test fail if types don't match. However, fuzz targets are written to ignore and skip nonsensical inputs, while unit tests are not, so maybe we should make Skip() fail the test if it happens with a testdata input?

@josharian
Copy link
Contributor

@josharian josharian commented Mar 11, 2021

I am trying to think through what makes fuzzer inputs more likely to rot silently than, say, table driven tests.

Visibility. If you're editing a table-driven test, the data is right there, easy to see and easy to edit.

@bcmills
Copy link
Member

@bcmills bcmills commented Mar 11, 2021

But doesn't that same argument apply to benchmarks? If you edit the benchmark itself, then it may no longer be valid to compare against a benchmark from a previous commit, and that benchmarking data is not explicit in the code either.

@josharian
Copy link
Contributor

@josharian josharian commented Mar 11, 2021

Sort of? Except people typically run before and after benchmarks side by side for other reasons (same machine, same toolchain, same conditions).

And if you change a benchmark, you're explicitly saying "I disavow all previous benchmark results". If you change a table-driven test, you modify the table as needed. If you change a fuzz function you...what? You don't want to give up on the existing tests (a la benchmarks). It's not clear how to modify the previous inputs (a la table-driven tests). And without being able to see and play with the existing data, I don't even know how you'd form an accurate mental model of what changes to the fuzz function are safe to make without losing test coverage.

@jayconrod
Copy link
Contributor

@jayconrod jayconrod commented Mar 11, 2021

Our existing model is that if you change the type signature of a fuzz function, errors will be reported for seed corpus inputs that don't have matching inputs. go test would report that (without the need for -fuzz). That prevents accidental loss of coverage: if you want to drop those inputs, you can delete the corresponding files in testdata.

We plan to ignore cached inputs with the wrong type though. These are valuable, but it's not clear to me they should cause errors after changing a type. Maybe they should, and you have to run go clean -fuzzcache to proceed (or something more precise)?

About table driven tests, at some point, we were talking about building a tool that could convert a file in the testdata seed corpus into a call to F.Add. I think my preference is to keep those inputs in testdata though; they're not easy to read or edit.

@thepudds
Copy link

@thepudds thepudds commented Mar 11, 2021

The testdata entry will have type information, it won't be just a binary blob that can silently get out of sync with the parser, and the plan is to make the test fail if types don't match.

One approach would be to semi-gracefully handle, for example, adding or removing arguments to a function under test by mapping the on-disk corpus item to whatever types do match (in order, skipping over mismatch, using zero values / zero length arguments if needed), or some other simple matching logic that accepts partial matches. In some common cases cases, this would allow the value of on-disk example to only partially decay in the face of a signature change.

(Obviously, not needed for a first cut experimental MVP).

@timothy-king
Copy link
Contributor

@timothy-king timothy-king commented Mar 11, 2021

@jayconrod In the libFuzzer interface, the assumption is any []byte input is valid (so the precondition is effectively just true). Normally a well behaved fuzzer in that world parses and then stops processing malformed inputs as non-crashing. This is basically the distinction of:

func fuzz(b []byte) {
  x, y, z, err := parse(b)
  if err != nil { return } // stop but don't crash
  doSomething(x, y, z)
}

and

func fuzz(b []byte) {
  x, y, z, err := parse(b)
  if err != nil { panic("crash here") }
  doSomething(x, y, z)
}

My understanding is that you are proposing the latter. OssFuzz and similar fuzzing infrastructure are going to keep both the current coverage corpus and previous crashes around and transfer these between compilations. The infrastructure will want to reuse these. You will not want the coverage corpus to become crashes. You also do not want to discard the corpus in this situation. (It may have a few CPU decades behind it.)

@bcmills
Copy link
Member

@bcmills bcmills commented Mar 11, 2021

@thepudds, @josharian: I feel like that use-case might be better served by “just writing another fuzz test”, and leaving the existing fuzz test well enough alone.

That does bring up an interesting use-case, though: if I've replaced a previous fuzz test with a new fuzz test, then I probably want to continue testing the old one against existing inputs to check for regressions, but I want to run only the new one during go test -fuzz going forward. Is there a straightforward way to do that with this API?

@beoran
Copy link

@beoran beoran commented Mar 13, 2021

@FiloSottile , actually, go bench, with _bench.go files seems like a great idea to me. I can never remember the combination of flags for I need for benchmarking.

And go test -fuzz is not as easy to remember nor explain as simply go fuzz is. As go fuzzall sorts of sub-command or options that are relevant for fuzzing only can be added more easily as well.

@bcmills
Copy link
Member

@bcmills bcmills commented Mar 15, 2021

Thinking about go fuzz vs. go test -fuzz, the analogy to go test -bench does not quite hold.

If I run go test -bench=., then the go command runs all of the tests, plus the named benchmarks, and then emits the output for all of those tests and benchmarks together.

In contrast, AFAICT go test -fuzz=. by default will not return at all: it will not have an exit code, and it's not clear to me whether it will produce any output on success. That makes it different enough from go test that I think I could justify making it a separate command, even though fuzzing is deeply related to testing.

@katiehockman
Copy link
Member Author

@katiehockman katiehockman commented Mar 17, 2021

In contrast, AFAICT go test -fuzz=. by default will not return at all: it will not have an exit code, and it's not clear to me whether it will produce any output on success.

This isn't entirely true.
go test -fuzz works much like go test -bench in that it will run all of the unit tests and the seed corpus for all of the fuzz targets before it starts fuzzing. There will be a non-zero exit code if any of those tests fail. Similarly, once it starts fuzzing, it will return a non-zero exit code if a crasher is found.

You are correct in that go test -fuzz by default won't return anything, but if someone wants a return code, then they could run go test -fuzz -fuzztime=60s which would run fuzzing for a minute, then report a success if no crashes were found.

...we create a separation between testing and fuzzing that IMO is very desirable.

I disagree with the general sentiment that fuzz targets and unit tests should be viewed and treated as entirely separate.

To re-iterate, fuzz targets are run in two main ways:

  1. with go test, where the seed corpus from testdata and from f.Add are run much like a unit test
  2. with go test -fuzz, where the seed corpus is run, but the fuzzing engine runs and mutates inputs to try to find new crashes

I agree that go test -fuzz in mode (2) is very different from a unit test, and fuzzing shouldn't be run by default with go test. However, I counter that in mode (1) a fuzz target behaves very similar to a unit test. No one should be making commits to add new corpus entries into testdata that would fail as a regression without first providing a fix, much in the same way that someone shouldn't check in a new unit test that's broken before fixing the problem. Running the seed corpus with go test makes it clear that the seed corpus for these fuzz targets act as a regression test, and should be maintained just like a unit test should.

If go test runs the fuzz targets, just without fuzzing them, then it follows that go test -fuzz also runs the fuzz targets, just in a different mode (ie. with fuzzing). A big part of this proposal is to intentionally make fuzz targets look much like unit tests, and allow the contents of a unit test to be virtually copy-pastable into an f.Fuzz function to act as a fuzz target (where the biggest change would be to move the table test inputs into f.Add calls).

It may even be the case that developers eventually prefer to migrate or start writing new unit tests as fuzz targets instead, since they run much the same way by default. The only difference is that they have the added capability of running a fuzzing engine to attempt to expand the set of tests being run when running with go test -fuzz.

@beoran
Copy link

@beoran beoran commented Mar 18, 2021

@katiehockman

To re-iterate, fuzz targets are run in two main ways:

  1. with go test, where the seed corpus from testdata and from f.Add are run much like a unit test

At least for me, this a reason for some concern. With this proposal, go test will not just run the unit tests anymore, but also some fuzz tests and a seed corpus. How will this affect performance? How will it affect test binaries compiled with -c, which are useful for debugging if something goes wrong? I think go test should not run the fuzz targets just as it doesn't run benchmarks without the -bench flag.

@FiloSottile
Copy link
Member

@FiloSottile FiloSottile commented Mar 18, 2021

How will this affect performance? How will it affect test binaries compiled with -c, which are useful for debugging if something goes wrong?

It will have the same performance and -c behavior as a regular unit test with inputs in a table or in testdata/.

This is detailed in the design draft, so it's not something new introduced in #44551 (comment), nor the point of that comment.

@rsc
Copy link
Contributor

@rsc rsc commented Mar 24, 2021

Discussion seems to be ongoing, so will leave this likely accept for another week.

For what it's worth, it seems clear from a go command UX point of view that this should be part of go test and not its own command. It is a specific kind of testing, not at the same general level as build / install / test / get.

@rsc rsc moved this from Likely Accept to Accepted in Proposals Apr 1, 2021
@rsc
Copy link
Contributor

@rsc rsc commented Apr 1, 2021

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

@rsc rsc changed the title proposal: testing: add fuzz test support testing: add fuzz test support Apr 1, 2021
@rsc rsc modified the milestones: Proposal, Backlog Apr 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Proposals
Accepted
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
16 participants