Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: testing: support naming seed corpus values provided with (*testing.F).Add #50456

Closed
katiehockman opened this issue Jan 5, 2022 · 21 comments
Labels
FeatureRequest fuzz Issues related to native fuzzing support Proposal
Milestone

Comments

@katiehockman
Copy link
Contributor

katiehockman commented Jan 5, 2022

Seed corpus entries can be added to a fuzz test by calling (*testing.F).Add. There is no way to name these seed corpus entries, so they default to "seed#{num}". When run with go test -v, it will look something like this:

--- PASS: FuzzFoo (0.00s)
    --- PASS: FuzzFoo/seed#0 (0.00s)
    --- PASS: FuzzFoo/seed#1 (0.00s)
    --- PASS: FuzzFoo/seed#2 (0.00s)

The proposal is to allow a way to name these entries, much like you would with an execution of t.Run. This would help with error messages and debugging. We could do this a few different ways:

  1. Amend f.Add to accept an optional first/last name string param. Since f.Add takes a []interface{} today, this is functionality that could be added later that wouldn't break backwards compatibility.

e.g. existing fuzz tests that look like this:

func FuzzFoo(f *testing.F) {
	f.Add("a3g1f3", 10)
	f.Add("----", 100)
	f.Add("", 0)
	f.Fuzz(func(*testing.T, string, int) {})
}

would change to

func FuzzFoo(f *testing.F) {
	f.Add("valid", "a3g1f3", 10)
	f.Add("invalid", "----", 100)
	f.Add("empty", "", 0)
	f.Fuzz(func(*testing.T, string, int) {})
}
  1. We could add another method on *testing.F. For example:
    (*testing.F).AddNamed(string, []interface{})

Originally proposed by @dnwe

/cc @golang/fuzzing

@katiehockman katiehockman added Proposal FeatureRequest fuzz Issues related to native fuzzing support labels Jan 5, 2022
@katiehockman katiehockman added this to the Proposal milestone Jan 5, 2022
@thepudds
Copy link

thepudds commented Jan 5, 2022

CC @dnwe

@ianlancetaylor ianlancetaylor added this to Incoming in Proposals (old) Jan 5, 2022
@mvdan
Copy link
Member

mvdan commented Jan 6, 2022

  1. Since f.Add takes a []interface{} today, this is functionality that could be added later that wouldn't break backwards compatibility.

While this will work just fine for the call sites, it would make for a pretty tricky signature: args ...any, but if it begins with an extra string, that's actually the name.

It would be a lot saner if we did name string, args ...any. Is it too late for that? beta1 already shipped, but rc1 hasn't yet, and I don't imagine a lot of people have written native fuzzers yet. We could also leverage go fix to automatically fix fuzz funcs, or supply a gofmt -r expression that people can run.

@mvdan
Copy link
Member

mvdan commented Jan 6, 2022

2. (*testing.F).AddNamed(string, []interface{})

For completeness: I realise this is also an option, but leaving Add around as an extra is equally unfortunate, I'd say :) Having just Add(name string, args ...any) feels like the best result overall, especially as it mimics other methods like Run.

@rsc rsc moved this from Incoming to Active in Proposals (old) Jan 12, 2022
@rsc
Copy link
Contributor

rsc commented Jan 12, 2022

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc
Copy link
Contributor

rsc commented Jan 26, 2022

It's getting very late to change the API, so it's looking like AddNamed is the right approach?
Or maybe we don't need this feature at all?

Another possibility is to define a new string type like testing.FuzzName and then you could

f.Add(testing.FuzzName("myname"), "data")

but that's a bit verbose.

@mvdan
Copy link
Member

mvdan commented Jan 26, 2022

I reckon we're not late to change the API; this user report was given three weeks ago, a couple of weeks after the beta came out. That's very much within the timeframe that you would expect users to give the beta an honest try and report issues. At the time of writing there are 38 open issues with the release-blocker label, so I also reckon the RC is at least a couple of weeks away, and shouldn't be further delayed by a minor API change.

I do think we need this feature; if we say "one shouldn't need to name seed corpus entries", presumably the same could have applied to sub-test and sub-benchmark names :) And I still believe that altering Add while we still can is the best and most consistent approach long-term.

@mvdan
Copy link
Member

mvdan commented Jan 26, 2022

Also, if it comes between AddNamed and FuzzName, my vote would go for FuzzName; that way we avoid two "add" entrypoints, and it's clear which argument is the name versus the variable number of fuzz values.

Finally, just a thought: I seem to recall that the proposal review committee only meets once a week, and the RC isn't very far, so we should probably expedite a resolution here before more weeks go by and we really are too late.

@mvdan
Copy link
Member

mvdan commented Jan 29, 2022

I also reckon the RC is at least a couple of weeks away

It seems like we'll get beta2 next week, which seems to support my reasoning above - the RC is probably four weeks away or so. And, if we can manage to get this small change into master early next week, users can also test it as part of beta2 :)

@rsc
Copy link
Contributor

rsc commented Jan 31, 2022

I am not sure about the need to name specific seed values. This reminds me of the pets vs cattle discussion for managing computers.

Every test function we write is typically an important, unique, carefully tended thing (like a pet).
I am not convinced fuzzing seed corpus values are like that at all: do we really want the overhead of having to name them?
Seeds are usually small: why not just give the subtest for it a sequence number and then print the actual value when it fails?

We are talking about adding complexity, and I am not convinced we have established that the complexity is needed.
And it is very late in the cycle to add complexity that we're not sure we need.
(Package constraints just missed the cut! Is this really more important than that? And are we so sure?)

@mvdan
Copy link
Member

mvdan commented Jan 31, 2022

Perhaps you're right that not every seed corpus entry will need a name, but I think that already applies to sub-tests and sub-benchmarks - one can leave the name empty and they get a number instead. That seems like the best of both worlds to me.

Seeds are usually small: why not just give the subtest for it a sequence number and then print the actual value when it fails?

That assumption worries me a little; most of the fuzzers I've written take bytes or strings as input, and some of those inputs do get reasonably long fairly quickly when I want the seed corpus to also cover some particularly interesting and complex edge cases.

That said, if we think that unnamed seed corpus entries will be more common, then perhaps the FuzzName solution is a good middle ground. I can't say I have enough data to point one way or another.

I'm not sure if the comparison with package constraints applies; we can always add that package in 1.19, whereas we can't change the signature of testing.F.Add once it gets released :)

@katiehockman
Copy link
Contributor Author

katiehockman commented Jan 31, 2022

whereas we can't change the signature of testing.F.Add once it gets released :)

I don't think that adding the testing.FuzzName type would be a non-backwards compatible change to the signature of testing.F.Add if we were to add this in Go 1.19. Either way, the fuzz name should be optional. Given that, we could make both of the following testing.F.Add calls valid, whereas in Go 1.18 only the first would be:

f.Add("data")
f.Add(testing.FuzzName("myname"), "data")

testing.F.Add takes args ...any today, so we can certainly add an optional additional value later that will be processed as the name, once we've had more time to collect evidence, and when we don't have to rush.

@dnwe
Copy link

dnwe commented Jan 31, 2022

I originally raised this merely as an API discrepancy rather than necessarily a strong desire for the capability.

My 2p would be that testing.F.AddNamed jars too much and doesn't fit with the rest of testing, so I'd personally discount that immediately.

I similarly think that the functionality we gain isn't worth the overhead and unintuitiveness of a special testing.FuzzName type).

If it's too late to make testing.F.Add match the arg style of testing.T.Run then personally I'd discount the feature and just add a note in the docs that if you want to name seed values then you should directly write to testdata/fuzz/Fuzz_Name/Seed_Name yourself

@dnwe
Copy link

dnwe commented Jan 31, 2022

(I actually prefer storing my seed values under the testdata hierarchy anyway)

@AlekSi
Copy link
Contributor

AlekSi commented Jan 31, 2022

One option could be to combine ideas from #46780 and #47413 and add support for testing.F.Run:

func FuzzFoo(f *testing.F) {
    for _, tc := range []struct {
        name string
        s string
        expected int
    }{
        {name: "random", s: "a3g1f3", expected: 10},
        {name: "dashes", s: "----", expected: 100},
        {name: "empty", s: "", expected: 0},
    } {
        tc := tc
        f.Run(tc.name, func(f *testing.F) {
            // test tc
            if res := foo(tc.s); res != tc.expected {
                f.Errorf("foo(%s) = %d, want %d", tc.s, res, tc.expected)
            }

            f.Add(tc.s) // named tc.name
        })
    }

    f.Fuzz(...)
}

(I actually prefer storing my seed values under the testdata hierarchy anyway)

Interestingly, I prefer my seed values to be a part of the table-driver test.

@rsc
Copy link
Contributor

rsc commented Jan 31, 2022

@mvdan I would have thought that large seed inputs would be better handled as files under testdata/fuzz than placed in programs? Or are these mechanically constructed seeds?

@dnwe
Copy link

dnwe commented Jan 31, 2022

Interestingly, I prefer my seed values to be a part of the table-driver test.

what do you do after fuzzing found an interesting input and dumps it in testdata for you to re-run against? Do you manually move it to your seed table or do you just discard it after you’ve fixed the crasher?

@AlekSi
Copy link
Contributor

AlekSi commented Jan 31, 2022

Do you manually move it to your seed table

That, with an expected result or error. For example: https://github.com/FerretDB/FerretDB/blob/main/internal/bson/double_test.go

(It would be nice to have some tooling for that or a function to read seed corpus files, but that's offtopic)

@rsc
Copy link
Contributor

rsc commented Feb 2, 2022

It sounds like this is either a likely decline or a 'on hold'. Given that we are not making changes for Go 1.18, perhaps it should be declined for now, and then a new proposal with a different API can be proposed if we need it?

@mvdan
Copy link
Member

mvdan commented Feb 2, 2022

I would have thought that large seed inputs would be better handled as files under testdata/fuzz than placed in programs? Or are these mechanically constructed seeds?

I admit I hadn't realised it was supported to add seed corpus entries directly as files while using custom names. Taking another look at https://go.dev/doc/fuzz/, I do see it's mentioned. If one seed input is too large to be named after its own value, then I think it's a reasonable tradeoff to instead include it as a named file.

I do have one case where I'm mechanically constructing reasonably sized seeds, but I'm not sure that it would be a huge problem either. I could always write a go generate program to construct the seeds and write them into testdata.

It sounds like this is either a likely decline or a 'on hold'.

I've grown less convinced that we need this feature right now as the conversation has progressed; declining for now seems reasonable.

@rsc rsc moved this from Active to Likely Decline in Proposals (old) Feb 9, 2022
@rsc
Copy link
Contributor

rsc commented Feb 9, 2022

Based on the discussion above, this proposal seems like a likely decline.
— rsc for the proposal review group

@rsc rsc moved this from Likely Decline to Declined in Proposals (old) Feb 16, 2022
@rsc
Copy link
Contributor

rsc commented Feb 16, 2022

No change in consensus, so declined.
— rsc for the proposal review group

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FeatureRequest fuzz Issues related to native fuzzing support Proposal
Projects
Status: No status
Status: Declined
Development

No branches or pull requests

6 participants