cmd/go: uses of os/exec can trigger fuzz failures when stopping via ^C #50228
In this case, I hit
This is a minimal reproducer, using /bin/sleep for the sake of simplicity. My real fuzz func calls a reasonable program that exits within milliseconds. But if I stop the fuzzer just at the right moment with
I don't think my code is to blame here. I could possibly expand my
I think the solution should come from cmd/go's fuzzing - if the user hits
The text was updated successfully, but these errors were encountered:
When you hit ^C you're sending SIGINT to the foreground process group, including all the
If instead of ^C you use
I don't think this eliminates the race. The fuzz failure can happen before
To extend the idea, when the fuzzer is exiting due to a signal, it can suppress any failures that occurred during the last, say, 50ms. That should reduce the race window to ~0 but it seems really hacky.
(In general using
I'm not sure I agree. It can be hard to estimate how long one should fuzz for. Sometimes I fuzz for five minutes, and the corpus stops expanding almost entirely. Sometimes I fuzz for twenty minutes, and the corpus keeps expanding. Being able to stop the fuzzing process interactively is helpful.
Good observation, I hadn't thought of that.
From the user's perspective, this solution seems fine. If I'm manually hitting ^C, I really don't care that much about a few hundred milliseconds being lost. That's usually what happens with "cleanup" until the process actually exits, anyway. It may not be a foolproof mechanism, but if it reduces the odds of false positive failures to near-zero, that seems like a big step forward.
This is a bit complicated. I thought we explicitly recommended against using fuzzing in this way. We don't, but we probably should. With the way the fuzzing engine is designed, fuzzing things outside of the runtime is likely to have other strange behaviors other than this anyway.
Throwing away things that happen in some interval before exiting could work, but it would also end up masking some real failures. I also cannot think of an elegant implementation of this that doesn't simply rely on buffering findings, which would end up adding a not insignificant amount of complexity.
Given that this is a pretty narrow issue, and that we don't have all that much time left, I think for 1.18 we should just document that this is not a recommended usage of the fuzzer, and is likely to lead to to such issues, and consider if there is a larger structural change we could make for this for 1.19.
What exactly do "this way" and "fuzzing things outside of the runtime" mean? (IOW, what are you proposing to recommend against for Go 1.18?)
Do you have examples of the "other strange behaviors"?
As long as the suppression window is a lot smaller than the fuzzing duration, this doesn't seem significant. For example, suppose you suppress failures from the last 50ms of the fuzzing run. In this scenario, running the fuzzer for 5 seconds would be equivalent to running the fuzzer for 4.95s without suppression.
Also, you wouldn't suppress results when exiting due to
Given finding crashers is rare, rather than continually buffering, would it be reasonable to instead effectively pause when a crasher is found but before reporting it? In other words, the new behavior could kick in only when it would have been about to report a crasher? Or maybe that’s what you were already saying…
And sorry for the double post, but just one more quick comment to agree with @mvdan that it would be nice to handle this gracefully eventually. It can be helpful to be able to execute an external program for comparative fuzzing, even if you don’t get coverage with the external binary.
For example, from the go-fuzz-corpus:
which found a bunch of real bugs comparing go/types, cmd/compile, and gccgo, where only go/types had coverage instrumentation in that example (I think).
Alll that said, it seems reasonable to do whatever is expedient for 1.18, which might just be a note in the documentation or similar.