New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: cmd/go: make fuzzing a first class citizen, like tests or benchmarks #19109

Open
bradfitz opened this Issue Feb 15, 2017 · 82 comments

Comments

Projects
None yet
@bradfitz
Member

bradfitz commented Feb 15, 2017

Filing a proposal on behalf of @kcc and @dvyukov:

They request that cmd/go support fuzzing natively, just like it does tests and benchmarks and race detection today.

https://github.com/dvyukov/go-fuzz exists but it's not as easy as writing tests and benchmarks and running "go test -race" today.

Should we make this easier?

@bradfitz bradfitz added the Proposal label Feb 15, 2017

@bradfitz bradfitz added this to the Proposal milestone Feb 15, 2017

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Feb 15, 2017

I think it would be easier to evaluate the idea if it were slightly less abstract.

For example:

  • _test.go are permitted to contain functions of the form FuzzXxx(f *testing.F, data []byte)
  • these functions are expected to run some test based on the random bytes in data
  • errors are reported using the testing.F argument in the usual way
  • f.Useful() may be called to indicate useful data, i.e., data that parses correctly
  • f.Discard() may be called to indicate that the data should be discarded
  • go test -fuzz=. runs the fuzz functions using a regexp like -test and -bench
  • naturally go test -fuzz must also rebuild the package in fuzz mode
  • the data is cached somewhere under $GOROOT/pkg, but where?
@bradfitz

This comment has been minimized.

Member

bradfitz commented Feb 15, 2017

@ianlancetaylor, yes, FuzzXxx(f *testing.F, ...) is what this is about. The exact API is probably TBD.

I think the first step before it's designed completely is to determine whether there's interest.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Feb 15, 2017

As a general concept, I'm in favor.

@dsnet

This comment has been minimized.

Member

dsnet commented Feb 15, 2017

I would expect that there would be an additional required flag (when fuzzing) where you specify the corpus directory.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Feb 15, 2017

Can we just cache the corpus somewhere under $GOROOT/pkg? Are there cases where a typical user would be expected to modify the corpus themselves?

@dsnet

This comment has been minimized.

Member

dsnet commented Feb 15, 2017

I think it's wrong to think of the corpus as strictly a cache. The corpus is the save state of the fuzzer and the documentation for go-fuzz even recommends committing them into a version control system. The pkg directory is treated strictly as cache and it is not uncommon for people to recommend clearing out the directory, which will unfortunately delete the fuzzer state.

A specified corpus is not so much for the user modify the corpus themselves, but for them to specify how to persist the corpus data.

@jimmyfrasche

This comment has been minimized.

Member

jimmyfrasche commented Feb 15, 2017

Could there be some default convention say a _fuzz/xxx directory (where xxx corresponds with FuzzXxx) and a method on the *testing.F object to load a different corpus from the _fuzz/ directory if necessary? It seems like it should just know where the corpus is.

@minux

This comment has been minimized.

Member

minux commented Feb 15, 2017

@cznic

This comment has been minimized.

Contributor

cznic commented Feb 16, 2017

Quoting @dvyukov

I would appreciate if you drop a line there if you found fuzzing useful and a brief of your success story.

It was very useful for me - found bugs in several lexers.

@mvdan

This comment has been minimized.

Member

mvdan commented Feb 16, 2017

I use it regularly on a lexer/parser/formatter for Bash (https://github.com/mvdan/sh).

Having it be a first-class citizen would simplify things for me and for contributors.

@dsnet

This comment has been minimized.

Member

dsnet commented Feb 16, 2017

Found a bug in the C decoder for google/brotli by fuzzing a Go implementation of a Brotli decoder.

Also found some divergences in Go bzip2 decoders from the canonical C decoder (this and #18516). All by fuzzing.

@fatih

This comment has been minimized.

Member

fatih commented Feb 16, 2017

My coworker at DigitalOcean was working on a side project to make fuzzing easier. Check his repo out here: https://github.com/tam7t/cautious-pancake Adding it here as I think it would be a valuable piece of information for this discussion.

@dgryski

This comment has been minimized.

Contributor

dgryski commented Feb 16, 2017

The README for go-fuzz lists a number of "Trophies", ( https://github.com/dvyukov/go-fuzz#trophies ) the majority of which are from the standard library, but about 20% of which are external to the Go standard libraries.

A GitHub search for Go source files with the gofuzz build tag gives ~2500 results: https://github.com/search?l=Go&q=gofuzz&type=Code&utf8=%E2%9C%93

My tutorial on fuzzing ( https://medium.com/@dgryski/go-fuzz-github-com-arolek-ase-3c74d5a3150c ) gets 50-60 "reads" per month (according to medium's stats).

@Kubuxu

This comment has been minimized.

Kubuxu commented Feb 16, 2017

Feature that would be also important (at least for me) would be ease of turning some selected Fuzz test cases into permanent tests. Simplest way to do it would be exporting the case data in go byte array and calling FuzzXXX function from TestXXX function but if FuzzXXX accepts *testing.F struct type it won't be possible.

@DavidVorick

This comment has been minimized.

DavidVorick commented Feb 16, 2017

Yes, we've found fuzzing useful in our projects multiple times. Especially sensitive code, the fuzzer will frequently find edge cases that we missed. Encoding, networking, and generally things that depend on user input.

I will say that most of the benefit is usually seen in the first tiny bit of fuzzing. There's a pretty strong diminishing returns as you continue to fuzz, at least that's what we've found.

@dvyukov

This comment has been minimized.

Member

dvyukov commented Feb 16, 2017

As you can understand, I am very supportive for this. Traditional testing is just not enough for modern development speeds. I am ready to dedicate some time to work on parts of this.

Throwing some ideas onto the table:

  1. To flesh out the interface, we don't need to implement coverage nor any actual smartness. The interface should work if we just feed in completely random data, it will be just less useful. But I think it's the right first step. We can transparently add smartness later.

  2. It would be nice to have some default location for corpus, because it will make onboarding easier. The location probably needs to be overridable with go test flag or GOFUZZ env var.

  3. I think it's "must have" that fuzz funciton runs during normal testing. If corpus is present, each input from corpus is executed once. Plus we can run N random inputs.

  4. Thinking how we can integrate continuous fuzzing into Go std lib testing (including corpus management) would be useful to ensure that it will also work for end users in their setups.

  5. go command (or whatever runs fuzz function) might need some additional modes. For example, execute 1 given input, useful for crash debugging. Or, run all programs from corpus and dump code coverage report.

  6. I am ready to give up on f.Useful() and f.Discard() for simplicity (as far as I understand that come from go-fuzz return values). They were never proven to be useful enough. For Discard Fuzz function can just return. And fuzzer can try to figure out Useful automatically.

  7. In some cases Fuzz function needs more than just []byte. For example, regexp test needs the regular expression and a string to match. Other tests may need some additional int's and bool's. It's possible to manually split []byte into several strings and also take some bits as int's and bool's. But it's quite inconvenient and can negatively affect fuzzing efficiency (fuzzer can do better if it understands more about input structure). So we could consider allowing Fuzz function to accept a set of inputs with some restrictions on types, e.g. FuzzFoo(f *testing.F, s1, s2 string, x int, b bool). But this can be added later as backwards compatible extension. Just something to keep in mind.

  8. An alternative interface could be along the following lines:

func FuzzFoo(f *testing.F) {
  var data []byte
  f.GetRandomData(&data)
  // use data
}

GetRandomData must be called once and always with the same type.
Since the function now does not accept the additional argument, we can make it a normal test:

func TestFoo(t *testing.T) {
  var data []byte
  testing.GetRandomData(&data)
  // use data
}

This recalls testing/quick interface considerably, so maybe we could just use testing/quick for this.
go tool will need to figure out that this is a fuzzing function based on the call to testing.GetRandomData.

@dvyukov

This comment has been minimized.

Member

dvyukov commented Feb 16, 2017

I will say that most of the benefit is usually seen in the first tiny bit of fuzzing. There's a pretty strong diminishing returns as you continue to fuzz, at least that's what we've found.

That's true to some degree, but not completely. It depends on (1) complexity of your code, (2) rate of change of your code, (3) smartness of the fuzzer engine. If your code is simple and doesn't change, then fuzzer will find everything it can in minutes. However, if your code change often, you want to run fuzzing continuously as regression testing. If your code is large and complex and fuzzer is smart enough, then it can manage to uncover complex bugs only after significant time.
One example is this bug in OpenSSL bignum asm implementation that we've found after several CPU years of fuzzing: https://github.com/google/fuzzer-test-suite/tree/master/openssl-1.1.0c
Another example is our Linux kernel fuzzing which uncovers bugs at roughly constant rate over more than a year (due to complexity of the code and frequent changes): https://github.com/google/syzkaller/wiki/Found-Bugs

@webRat

This comment has been minimized.

webRat commented Feb 16, 2017

I'm fine with fuzzing, but the problem is that if you vendor in a library that fuzzes, then... you inherit all their corpus. So, I'm not a fan of corpus being checked into the project.

Case in point:
screen shot 2017-02-16 at 9 35 58 am

Overall, I think fuzzing is a must have. Glad to see a proposal to make it easier.

@btracey

This comment has been minimized.

Contributor

btracey commented Feb 16, 2017

To confirm @dvyukov in #19109 (comment) , it would be really nice to have supported types other than []byte. We found bugs in both the gonum/blas implementation and the OpenBLAS library using fuzzing. It's possible to use go-fuzz, but it's kind of a pain to parse the []byte directly, (https://github.com/btracey/blasfuzz/blob/master/onevec/idamax.go).

@kardianos

This comment has been minimized.

Contributor

kardianos commented Feb 16, 2017

Suggest it goes under the subfolder testdata. Then any tools that ignore tests will also ignore this dir.

@dsnet

This comment has been minimized.

Member

dsnet commented Feb 16, 2017

@dvyukov

I think it's "must have" that fuzz funciton runs during normal testing. If corpus is present, each input from corpus is executed once. Plus we can run N random inputs.

I have concerns about how much time this is going to add to testing. My experience with fuzzing is that compiling with the fuzz instrumentation takes a significant amount of time. I'm not sure this is something we want to inflict upon every use of go test.

@Kubuxu

This comment has been minimized.

Kubuxu commented Feb 16, 2017

@dsnet to execute corpus and check if it doesn't fail instrumentation isn't needed. Instrumentation is needed when you want to expend/improve the corpus.

@CAFxX

This comment has been minimized.

Contributor

CAFxX commented Feb 16, 2017

Should there be a story to make it easy to use external fuzzing engines?

@dsnet

This comment has been minimized.

Member

dsnet commented Feb 16, 2017

@Kubuxu, I'm comfortable with running the Fuzz functions as a form of test without special instrumentation, but Dmitry comment suggested running with N random inputs, which implies having the instrumentation involved.

@kcc

This comment has been minimized.

kcc commented Feb 16, 2017

My 2c (I am utterly ignorant about Go, but have some ideas about fuzzing)

There are several major parts in coverage-guided fuzzing as I can see it:

  • instrumentation
  • interface
  • fuzzing engines' logic (how to mutate, choose elements to add to the corpus, etc)
  • integration with the rest of Go testing infra (I won't comment on this one -- no opinion)

Instrumentation is better to be done in the compiler, this way it's the most efficient and easy to use.
In LLVM we have these two kinds of instrumentation used for guided fuzzing:
https://clang.llvm.org/docs/SanitizerCoverage.html#tracing-pcs-with-guards (control flow feedback)
https://clang.llvm.org/docs/SanitizerCoverage.html#tracing-data-flow (data flow feedback)

The interface must be as simple as possible. For C/C++ our interface (which we use with libFuzzer, AFL, hoggfuzz, and a few others) is:

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
  DoSomethingInterestingWithMyAPI(Data, Size);
  return 0;  // Non-zero return values are reserved for future use.
}

and the only thing I regret is that the return type is not void.
IMHO, for the first version of the interface for Go fuzzing, the go-fuzz approach is perfect:

func Fuzz(data []byte) int

(again, not confident about int return value)

Fuzzing engines and the interface should be independent.
It should be possible to plug any fuzzing engine (not necessary written in Go) to fuzz a Go fuzz target.
Such fuzzing engine may need to understand the feedback provided by the instrumentation though.
E.g. I'd love to try libFuzzer/AFL on the Go targets.
And by fuzzing engine we should understand a wider class of tools, including e.g. concolic execution tools.

And, it would be nice to have the new fuzzing engine(s) to behave similar to AFL, libFuzzer, and go-fuzz
so that they are easier to integrate with continuous fuzzing service(s) (e.g. oss-fuzz)

Should there be a story to make it easy to use external fuzzing engines?

Absolutely, see above.

it would be really nice to have supported types other than []byte.

Maybe.
For the really complex data structures our current answer is to use protobufs as the input:
https://github.com/google/libprotobuf-mutator
There is also a middle ground where you need to fuzz e.g. a pair of strings.
But I would rather have a standard adapter from from []byte into a pair of strings than to complicate the interface.

Are there cases where a typical user would be expected to modify the corpus themselves?

Corpus is not a constant. It evolves as long as the code under test changes, fuzzing techniques evolve, and simply more CPU hours are spent fuzzing.
We typically store a seed corpus in RCS, maintain a larger corpus on the fuzzing service,
and periodically merge it back to RCS.

Note: a corpus stored in RCS allows to perform regression testing (w/o fuzzing)

I also want to have cmd/cover built on compiler instrumentation to support branch coverage, but that's off-topic to this issue.

Not too much off-topic.
This approach in LLVM allows us to get various kinds of coverage data with the same compiler instrumentation, by just re-implementing the callbacks.

I'm fine with fuzzing, but the problem is that if you vendor in a library that fuzzes, then... you inherit all their corpus. So, I'm not a fan of corpus being checked into the project.

This is a price worth paying since the corpus often turns out to be a great regression test.

I have concerns about how much time this is going to add to testing. My experience with fuzzing is that compiling with the fuzz instrumentation takes a significant amount of time. I'm not sure this is something we want to inflict upon every use of go test.

If you don't enable fuzzing instrumentation (which won't be on by default, I think) you won't pay for it.

@kcc

This comment has been minimized.

kcc commented Feb 16, 2017

A separate topic worth thinking about is fuzzing for equivalence between two implementations of the same protocol.

Imagine your code has
func ReferenceFoo(data []byte) SomeType and
func ExperimentalOptimizedFoo(data []byte) SomeType.

Then you can fuzz the following target to verify that the two implementations match:

func Fuzz(data []byte) int {
    if ReferenceFoo(data) != ExperimentalOptimizedFoo(data) {
       panic("ouch!")
    }
    return 0
}

This works pretty well when both things are implemented in Go.
But imagine you are porting to Go something previously written in C.
Here is a write up that describes one possible solution:
https://moderncrypto.org/mail-archive/hacs/2017/000001.html
(in short: have two processes running in parallel and exchanging data via shared memory or some such)

@FiloSottile

This comment has been minimized.

Member

FiloSottile commented Feb 17, 2017

I love this.

And I think a good solution to the corpus location, like

  • defaulting to testdata/FuzzXxx/
  • run (only) the corpus cases w/o flags

would

  • remove the need to duplicate code to "freeze" certain testcases
  • avoid sacrificing the API to fit it in a testing.T
  • be a more elegant solution that doesn't require putting binary data in source files

Projects that don't commit the corpus could use -fuzzcorpus (or similar) when fuzzing, and then copy the test cases they want to run every time in the testdata folder and check them in.

Actual fuzzing could be controlled by -fuzztime (like -benchtime).

@dvyukov

This comment has been minimized.

Member

dvyukov commented Dec 4, 2017

My main question: what aspects do we want to shake out with another prototype?

@Kubuxu

This comment has been minimized.

Kubuxu commented Dec 4, 2017

[LATER] change from current Fuzz(data []byte) int signature to a testing.TB signature

This can be done in other way, by introducing custom structure in custom go-fuzz and in future aliasing it to testing.F. This is how the x/net/context.Context was migrated.

@andybons andybons self-assigned this Dec 23, 2017

@dvyukov

This comment has been minimized.

Member

dvyukov commented Dec 24, 2017

One thing that popped up after discussions re OSS-Fuzz integration:
OSS-Fuzz would prefer if a fuzzer binary exits with non-zero status on first bug found. But for manual local runs one would prefer it to continue until explicitly stopped. So potentially we may need a flag for this. Could reuse -count (i.e. "find that many crashes"); or -short ("do a short run")?

@dvyukov

This comment has been minimized.

Member

dvyukov commented Jul 20, 2018

If you want to go with source-to-source transformation outside of go toolchain, we need at least support for such s2s transformations in the go command. It's not possible to create and typecheck C packages outside of go tool. I am not sure what the interface for this should be, because it must not be just a command that accepts a file and produces a modified version of the file (e.g. go build -transform=mytool), the command needs to typecheck the package, including all dependencies and C package and be able to locate all these dependencies (already transformed versions of these dependencies).
On top of that there is constant stream of vendor, internal, modules, etc. It's not possible to do source-to-source transformation of Go code today.
Don't know if such support will be useful for anything else. Maybe. It enables a simple way to do complex and powerful things.

@hugelgupf

This comment has been minimized.

Contributor

hugelgupf commented Jul 20, 2018

As someone on a completely different project that actually does implement Go source-to-source transformations on nocgo-only code, having Go toolchain support for this would be really appreciated, but I understand there would be a lot of challenges.

On top of that there is constant stream of vendor, internal, modules, etc. It's not possible to do source-to-source transformation of Go code today.

I know that pain. Add to that the differences of the bazel/blaze/buck Go toolchain (and the differences in Go rule implementations between bazel, blaze, and buck). See this code, which rewrites Go commands to be Go packages (blaze Skylark rules exist, not yet open source as bazel Go rules are very different).

@dvyukov

This comment has been minimized.

Member

dvyukov commented Jul 20, 2018

@hugelgupf can you describe what you are doing with source-to-source transformation? It would be really useful to have more than one real use case when designing such support.

@hugelgupf

This comment has been minimized.

Contributor

hugelgupf commented Jul 20, 2018

I should write this up in a readme in our project.

We take multiple Go commands and compile them into one binary, busybox-style.

This means we take Go commands' source and do the following source-to-source transformation:

cmds/foo/foo.go:

package main

import (
  "flag"
  "log"
)

var global = flag.String("name", "", "")

func init() {
  log.Printf("init")
}

func main() {
  log.Printf("main")
}

to

package foo // package name based on directory name or go_binary rule name

import (
  "flag"
  "log"

  "github.com/u-root/u-root/pkg/bb"
)

// Type must inferred from type-checking flag.String.
// This means we must resolve dependencies through vendor, modules, bazel.
var global string

func Init0() {
  log.Printf("init")
}

func Init1() {
  global = flag.String("name", "", "")
}

func Init() {
  // Order of statements determined by types.Info.InitOrder [1]
  Init0()
  Init1()
}

func Main() {
  log.Printf("main")
}

func init() {
  bb.Register("foo", Init, Main) // [2]
}

[1] https://golang.org/pkg/go/types/#Info
[2] https://github.com/u-root/u-root/blob/master/pkg/bb/register.go

The rewritten packages are added as _ imports to this main.go file, which can then be compiled into one binary.

You can then access the command busybox-style through

./bb foo other-args...

or

ln -s bb foo
./foo other-args...

Use case is really space-constrained embedded environments (LinuxBoot).

@dvyukov

This comment has been minimized.

Member

dvyukov commented Sep 25, 2018

Crash in x/net/html #27846
We are also seeing some in compress/flate google/syzkaller#731
Both are potentially remotely-triggerable

@dvyukov

This comment has been minimized.

Member

dvyukov commented Nov 23, 2018

Over the past few days 3 new bugs in go-fuzz were reported:

  • another issue with cgo
  • modules not working
  • source-to-source instrumentation produces broken code on a weird corner case
    @Sajmani
@knaxo

This comment has been minimized.

knaxo commented Nov 23, 2018

Is there any specific reason why the golang project is not picking this up? In the world of distributed software go has become a go to language because of its properties. I am guessing that because of the very same properties ( managed code, etc ) fuzzing has been neglected. In the world of distributed software DOS can have devastating implications and the only way to build confidence is through fuzzing.

Other than distributed software, at my company we are working on a range of cloud based services, which do DICOM ( medical data formats ) parsing, authentication and a ton more. Currently there is no acceptable way to do integration based fuzzing and the only software, which tries to solve that problem (go-fuzz) is on the shoulders of one guy.

Either decide to pick fuzzing up and have it properly supported in a reasonable amount of time or at least decide that the feature is not going to be supported, so the community can try to work out some other solution.

Keeping this proposal on hold signals to the community that no action is needed.

PS. I had no luck with using AFL's -Q (qemu) feature for fuzzing go binaries and don't know if this would be possible at all.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Nov 24, 2018

@knaxo There is no specific reason. Someone has to do the work. You suggest that the community could perhaps work out some other solution; I would suggest that the community could perhaps implement this solution.

@knaxo

This comment has been minimized.

knaxo commented Nov 24, 2018

That was an expected answer and I fully understand and apologize if I sound pushy. I have no idea how you guys assign / pick work.

It would be nice if the golang team deliver proper instrumentation and coverage support, as it requires very specific compiler internals knowledge, in order to execute well. I doubt that we are going to be able to find anybody outside of golang devs that is able to do that work properly. My intuition is that the community will be able to get traction from there and deliver on the rest.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Nov 24, 2018

It's not obvious to me why anything has to be done inside the compiler. The cover tool does not involve the compiler at all. The cover tool itself rewrites the Go code. That seems to me to be the way to go for a fuzzer; I think we would need a clear reason why that is insufficient.

@dvyukov

This comment has been minimized.

Member

dvyukov commented Nov 25, 2018

We don't have necessary support for source-to-source (s2s) transformation tools. The current coverage has custom ad-hoc support both in go tool and in bazel/blaze. It's not possible to mimic all of build system logic and cgo support on the side.
If we decide to go s2s route, we need corresponding support for this in go tool and in bazel/blaze. As I see it an s2s tool should receive sets of Go files that correspond to a single package, including all cgo magic, and be able to reasonably easily parse the files with go/types (this includes importing all depending packages somehow) and finally produce modified files.

@hugelgupf

This comment has been minimized.

Contributor

hugelgupf commented Nov 25, 2018

As I see it an s2s tool should receive sets of Go files that correspond to a single package, including all cgo magic, and be able to reasonably easily parse the files with go/types (this includes importing all depending packages somehow) and finally produce modified files.

Someone will have to come up with a standardized way for blaze/bazel/buck ("BBB") to pass type & dependency information to the s2s tool. I think you could conceivably come up with an interface from the s2s tool which can be implemented by both kinds of build systems separately.

The interface would/could probably be as simple as passing in & back out a version of https://godoc.org/golang.org/x/tools/go/packages#Package, which is so far the most comprehensive package information struct I've seen.

I think this could be implemented as a separate project outside of the toolchain, but you'd likely annoyingly always be trailing compiler features. u-root now has s2s transformation support for both the standard Go tool chain and a proposed (bit stale now..) PR for bazel/blaze support at u-root/u-root#927. See e.g. _uroot_rewrite_ast and the corresponding Go tool it passes cmdline information to -- it's a "simple" matter of getting BBB to collect the type and dependency information you need, passing them via cmdline args, and parsing them back out. Then pounding your s2s code into shape such that you can collect the same information from both BBB and the standard Go toolchain go/build stuff. (And yet, we haven't even had time for cgo or Go modules. That's just... a lot of work.)

I guess what I'm saying is... go/types won't be able to implement how blaze works. You have to get BBB to pass the information to you in the first place to get this right.

@thepudds

This comment has been minimized.

thepudds commented Nov 25, 2018

@ianlancetaylor

I do not know the degree to which this is still accurate, but some additional concerns listed in the initial draft proposal document include:

However, go-fuzz suffers from several problems:

  • It breaks multiple times per Go release because it's tied to the way go build works, std lib package structure and dependencies, etc. It broke due to internal packages (multiple times), vendoring (multiple times), changed dependencies in std lib, etc.
  • It tries to do compiler work regarding coverage instrumentation without compiler help. This leads to build breakages on corner case code; poor performance; suboptimal quality of coverage instrumentation (missed edges).
  • Considerable difficulty in integrating it into other build systems and non-standard contexts as it uses source pre-processing.

Goal of this proposal is to make fuzzing as easy to use as unit testing.

@Sajmani

This comment has been minimized.

Contributor

Sajmani commented Nov 25, 2018

@thepudds

This comment has been minimized.

thepudds commented Nov 25, 2018

Backing up, in early 2017, @dvyukov had said above "I am ready to dedicate some time to work on parts of this".

After some discussion, the core Go team asked for a prototype before deciding whether or not to accept the proposal. For example, comments from Russ #19109 (comment), #19109 (comment), and #19109 (comment), including:

it still seems like the right next step is to make 'go-fuzz' the separate command as close to 'go fuzz' the proposed standard command as possible, and to add fuzz tests to at least the x subrepos and maybe the standard library, so that we can understand the implications of having them in the source repos (including how much space we're going to spend on testdata/fuzz corpus directories).

Putting this on hold until go-fuzz is more like the proposed 'go fuzz'.

As a random member of the community, it seems reasonable to ask for a prototype, especially given the care and thought that the Go team has put into things like how 'go test' works.

Another observation is that there is a fair amount of interest in the proposal from the broader Go community. Currently, if you sort by +1 reactions, this issue is ranked number 4 in the open GitHub issue list.

Something that I think could help a prototype move forward faster could be a slightly longer comment from the core Go about what the goals and non-goals of a prototype might be, especially regarding some sense of what might be required in a prototype to reach the point where the Go proposal review team could review a proposal.

That in turn might help different people from the community see how they might be able to help this.

However, I can also imagine it might be difficult for the core Go team to enumerate exactly what should be in a prototype, given I think at least part of the intent of asking for a prototype is to have greater clarity about exactly what is being proposed.

All that said, I will make up some goals. I am sure these will be incomplete or otherwise not reflective of the actual desired goals, but I wanted to throw out a strawman.

Draft Goals for a Prototype

To be done before an evaluation can be made by the Go proposal review committee:

  1. Prototype proposed CLI, including interaction with existing 'go test'.

  2. Add some sample fuzz tests to at least the x subrepos and maybe the standard library.

  3. Start an initial set of corpus directories for the x repos and maybe the standard library (for example, earlier, the proposal suggested "For the standard library it is proposed to check in corpus into golang.org/x/fuzz repo").

  4. Understand how much space is used in corpus directories for x subrepos and/or standard library based on those sample fuzz tests.

  5. Add a new fuzzing signature (or change the existing Fuzz(data []byte) int signature) to work with testing.TB.

Draft Non-Goals for a Prototype

  1. Build 100% of the exact desired compiler-level integration.

  2. Allow the fuzzed function to take a *testing.F for error reporting (and could instead start with using testing.TB instead as suggested by Russ in #19109 (comment) ).

My personal reason for the split between goals/non-goals includes that items 1-5 are more externally visible aspects. Item 6 might be something that could be accepted or rejected at the proposal review stage off a design document and/or perhaps a basic exploratory proof-of-concept. However, that is pure conjecture on my part, and perhaps item 6 is actually considered high risk, and perhaps item 6 is considered an absolute requirement of a prototype prior to review by the proposal committee.

In terms of tapping into the community interest here — items 1-4 are things that are likely within the skillset of a decent-segment of the larger Go community. Item 5 might also be something that could be accomplished without deep knowledge of go-fuzz or Go itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment