Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/fuzzdata: new repository for fuzzing corpus data #31215

Open
thepudds opened this issue Apr 2, 2019 · 10 comments

Comments

@thepudds
Copy link

commented Apr 2, 2019

Summary

In #30719 and #30979, dvyukov/go-fuzz compatible fuzz functions were landed in:

The follow-up item in this issue here is to add a new golang.org/x/corpus repository to hold an example fuzzing corpus for portions of the standard library and x subrepos. This is to help with the exploration requested by the core Go team in discussion of the #19109 proposal to "make fuzzing a first class citizen".

The name for the new corpus repo alternatively could be golang.org/x/fuzz or something else; some additional naming discussion below.

Note that this issue is at least currently intended to be solely about creating the repository itself, and this issue does not cover checking in any corresponding fuzzing corpus (which is likely to be a follow-up issue).

Background

See the "Background" section of #30719 or #19109 (comment).

Additional Details

As part of the #19109 proposal discussion, there were multiple comments/requests from the core Go team asking to develop a better understanding of how a fuzzing corpus looks and behaves when it resides in a repository. For example, this March 2017 request in #19109 (comment) from Russ:

add fuzz tests to at least the x subrepos and maybe the standard library, so that we can understand the implications of having them in the source repos (including how much space we're going to spend on testdata/fuzz corpus directories

In the March 2017 proposal document in #19109 (comment), @dvyukov proposed:

For the standard library it is proposed to check in corpus into golang.org/x/fuzz repo.

golang.org/x/fuzz seems to be a very reasonable name. Two alternative possible names:

  • golang.org/x/corpus
  • golang.org/x/fuzzcorpus

Personally, I think x/corpus or x/fuzzcorpus might be more evocative than x/fuzz (e.g., someone might incorrectly think x/fuzz is where the fuzzing implementation or fuzzing functions live), but I do not have a strong opinion on the name and suspect others will have stronger opinions. Until there is additional feedback on the name, the rest of this comment here will use the term golang.org/x/corpus.

Populating the corpus

The initial seeding of the corpus can likely come from https://github.com/dvyukov/go-fuzz-corpus for a given Fuzz function. If that happens, then right now, there would be two corpus directories populated: go-fuzz-corpus/png/corpus and go-fuzz-corpus/tiff/corpus.

In parallel, multiple people are making progress on integrating dvyukov/go-fuzz into oss-fuzz in google/oss-fuzz#36, google/oss-fuzz#2188, dvyukov/go-fuzz#213 and elsewhere. Most likely, it will make sense to periodically update the golang.org/x/corpus repo with the output from oss-fuzz (which otherwise by default is stored in a Google Cloud Storage bucket).

However, the exact mechanism of populating and updating golang.org/x/corpus repo I think can be discussed outside of this particular issue. (Reason: In general, it seems more tractable to make progress if things are broken down into more manageable discrete chunks of work, especially to break out into separate steps the things that must be done by someone on the core Go team, vs. could be done by someone from the broader community. This issue is focused on the creation of the repo, which presumably must be done by someone on the core Go team, whereas populating the corpus can be done by a greater range of people from the broader community in consultation with the core Go team).

I am of course happy to discuss anything here, and happy to be corrected if any of the above is different than how people would like to proceed.

CC @dvyukov @josharian @FiloSottile @bradfitz @acln0

@FiloSottile FiloSottile changed the title create a 'golang.org/x/corpus' repo for prototype of "fuzzing as a first class citizen" x/corpus: new repository for fuzzing corpus data Apr 2, 2019

@gopherbot gopherbot added this to the Unreleased milestone Apr 2, 2019

@FiloSottile

This comment has been minimized.

Copy link
Member

commented Apr 2, 2019

LGTM. I suspect bringing the full oss-fuzz corpus back into the repo might bloat it too much, but we definitely need a place for seed corpus, so this sounds good.

@dmitshur

This comment has been minimized.

Copy link
Member

commented Apr 2, 2019

Related issue: #14304.

@FiloSottile

This comment has been minimized.

Copy link
Member

commented Apr 2, 2019

Oh, how about x/fuzzdata? x/corpus does not imply fuzzing to everyone I suspect. (Bikeshed!)

@kybin

This comment has been minimized.

Copy link
Contributor

commented Apr 2, 2019

will the repository used only for fuzzing?

@thepudds

This comment has been minimized.

Copy link
Author

commented Apr 2, 2019

For reasons of clarity, I personally like x/fuzzdata or x/fuzzcorpus over x/corpus or x/fuzz. I led off the issue using x/corpus mainly because it seems the core Go team generally places great weight on brevity.

@bradfitz

This comment has been minimized.

Copy link
Member

commented Apr 2, 2019

I was also going to suggest fuzzdata.

@thepudds

This comment has been minimized.

Copy link
Author

commented Apr 2, 2019

@kybin

will the repository used only for fuzzing?

That is the current intent of the suggestion in this issue.

@FiloSottile FiloSottile changed the title x/corpus: new repository for fuzzing corpus data x/fuzzdata: new repository for fuzzing corpus data Apr 2, 2019

@dvyukov

This comment has been minimized.

Copy link
Member

commented Apr 3, 2019

LGTM. I suspect bringing the full oss-fuzz corpus back into the repo might bloat it too much, but we definitely need a place for seed corpus, so this sounds good.

If it's only for seed corpus, then seed corpus may belong to the main repo better. We do want to use it in unit tests too! See dvyukov/go-fuzz#218 (comment) for more explanation.

@dvyukov

This comment has been minimized.

Copy link
Member

commented Apr 3, 2019

To clarify, I mean only seed corpus, i.e. small number of high-quality inputs without significant churn (no more than what happens with code). For a fuzzer-generated corpus we do need a separate repo.

@thepudds

This comment has been minimized.

Copy link
Author

commented Apr 3, 2019

Part of what is tricky is there are really 2 things running in parallel, I think:

  1. Actually fuzzing the stdlib much more (mainly via oss-fuzz).

  2. Use the stdlib and x subrepos as a “pilot” for the “first class fuzzing” proposal (at Russ’ suggestion).

For 2., it seems important that golang.org/x/fuzzdata gets created so that the stdlib and x subrepos can be a pilot of a decent size project that stores a decent sized corpus with some churn in a repo... and I think that is important even under the split corpus idea in dvyukov/go-fuzz#218 (comment), which I commented on there yesterday (probably too enthusiastically, sorry).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.