x/fuzzdata: new repository for fuzzing corpus data #31215
The follow-up item in this issue here is to add a new
The name for the new corpus repo alternatively could be
Note that this issue is at least currently intended to be solely about creating the repository itself, and this issue does not cover checking in any corresponding fuzzing corpus (which is likely to be a follow-up issue).
As part of the #19109 proposal discussion, there were multiple comments/requests from the core Go team asking to develop a better understanding of how a fuzzing corpus looks and behaves when it resides in a repository. For example, this March 2017 request in #19109 (comment) from Russ:
Personally, I think
Populating the corpus
The initial seeding of the corpus can likely come from https://github.com/dvyukov/go-fuzz-corpus for a given Fuzz function. If that happens, then right now, there would be two corpus directories populated: go-fuzz-corpus/png/corpus and go-fuzz-corpus/tiff/corpus.
In parallel, multiple people are making progress on integrating
However, the exact mechanism of populating and updating
I am of course happy to discuss anything here, and happy to be corrected if any of the above is different than how people would like to proceed.
The text was updated successfully, but these errors were encountered:
If it's only for seed corpus, then seed corpus may belong to the main repo better. We do want to use it in unit tests too! See dvyukov/go-fuzz#218 (comment) for more explanation.
Part of what is tricky is there are really 2 things running in parallel, I think:
For 2., it seems important that golang.org/x/fuzzdata gets created so that the stdlib and x subrepos can be a pilot of a decent size project that stores a decent sized corpus with some churn in a repo... and I think that is important even under the split corpus idea in dvyukov/go-fuzz#218 (comment), which I commented on there yesterday (probably too enthusiastically, sorry).
Hi @katiehockman, I had a question for you about the generated corpus under the current draft design for first class fuzzing, which in turn might have some implications for what to do this with issue.
The draft includes:
It makes sense that the details of what is inside the generated corpus can effectively be opaque to ordinary users, but the generated corpus itself is fairly valuable, and often worthy of keeping around & building up over time.
Given that generated corpus is valuable, it can be fairly useful to allow more direct control of the location of the generated corpus (to allow for easier storage in a separate repo, or cloud storage, or a shared filesystem, or easier integration with the diversity of CI systems in use, or purpose built fuzzing systems like OSS-Fuzz)
The prior first class fuzzing proposal went through a couple iterations on corpus behavior, but the behavior in the fzgo prototype of that proposal ultimately defaulted to reading from testdata (e.g., for a seed corpus), wrote to GOPATH/pkg/fuzz by default as a local cache for the generated corpus, but also allowed more direct control of the generated corpus via an optional
GOCACHE certainly can be made to work in different scenarios, but what are your thoughts around a more convenient flag or fuzz-specific env variable to allow more direct control of the location of the generated corpus?
Thanks, and exciting to see the progress you have been making!
I’d like to second the notion that corpuses are valuable and should be (able to be) retained and shared. It’s not just computational resources to generate them. Some corpuses I’ve developed alongside the code, so they contain good coverage already for alternative implementations. And other corpuses are seeded manually, which can make a dramatic difference to fuzzing effectiveness.
Agreed. Even though it defaults to
But to your point about making this customizable, I've gone ahead and added that to the open issues section of the design draft. It's something that we'll likely be considering in the future, it is just unlikely to make it into the first experimental release: https://go.googlesource.com/proposal/+/master/design/draft-fuzzing.md