-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
testing: apparent memory leak in fuzzer #44517
Comments
Thanks @dsnet. We'll definitely look into this. |
I suspect this will have been fixed by https://golang.org/cl/316030. @dsnet could you re-run your target with the current head of dev.fuzz and see if it still OOMs? |
It seems that there's maybe still a possible memory leak? The horizontal axis is in seconds, while the vertical axis is memory utilization in percentages (on a system with 8GiB). This is the memory usage for the worker, while the parent was closer to 0.1% memory usage at all times. It seems that https://golang.org/cl/316030 had a notable impact. Of note, the worker process seemed to allocate more and more memory until it hit 75% memory utilization around 2 hours in. Afterwards, it held at that level. It never triggered an OOM from the kernel, but did cause the system to be sufficiently unusable that I couldn't SSH into the machine to kill the fuzzer. It's unclear to me whether the process hovering around 75% utilization was intended behavior. I'm using 60f16d7. |
@dsnet I'll take another look, I think we expect that the fuzzer will typically use a bunch of memory, especially if it is finding a lot of interesting inputs, but the chart does look strange. Could you share the target you are using? It's hard to debug without really knowing what is actually happening. You can email me at bracewell@ if you'd like it to remain private. |
For a test set with a rather large base set I see huge memory allocs as well. I've had to reduce the base set to only include input <100K to not have memory usage explode.
|
I am currently unable to reproduce the behavior described in your second comment on the head of dev.fuzz (b2ff3e8) or at 60f16d7 @dsnet. Running your target on my system results in workers each using a stable ~200MB of memory (tested over ~4 hours). One thing that may be happening is that the coordinator process currently stores a copy of the corpus in memory, so if the input corpus is extremely large, or if the workers find a lot of new inputs, the memory footprint will be quite large. We can address this by just always reading the inputs from disk, rather than keeping a copy in memory. Another factor may be that we currently don't influence the mutator towards smaller inputs, so over time inputs will trend larger, although this is capped at ~100MB so as long as there isn't a leak in the worker we shouldn't see unbounded memory usage growth over time. |
Anecdotal evidence: it seems that there might be a memory leak related to the use of sub-tests. In the snippet above, I eventually encounter an OOM (within 30 minutes) if |
Okay, it seems that this is the problem. Trivial reproduction: func FuzzOOM(f *testing.F) {
f.Fuzz(func(t *testing.T, b []byte) {
for i := 0; i < 1000; i++ {
t.Run("", func(t *testing.T) {})
}
})
} This will allocate and hold onto gigabytes of memory within minutes. |
Change https://golang.org/cl/341336 mentions this issue: |
Change https://golang.org/cl/351452 mentions this issue: |
Another case to consider: the test can spawn arbitrarily many subtests with names based on the fuzz inputs, so the set of user-provided subtest names will grow over time even if it only stores O(1) bits per user-provided test name. Consider this related test: func FuzzInputAsSubtest(f *testing.F) {
f.Fuzz(func(t *testing.T, b []byte) {
t.Run(string(b), func(t *testing.T) {
})
})
} As far as I can tell, in order to make this scale we must ensure that the fuzz worker periodically drops the |
FWIW I'm unable to reproduce unbounded memory growth with the example in #44517 (comment) from the current head, and I don't observe a leak with #44517 (comment) either. I'm not sure how to resolve that discrepancy. @dsnet, can you still reproduce the problem? |
As of
Since each individual fuzz worker has its own namespace, running with |
Hmm, I wonder if the |
Change https://golang.org/cl/354430 mentions this issue: |
Change https://golang.org/cl/354749 mentions this issue: |
This is heavily based on CL 341336 by Joe Tsai and CL 351452 by Jay Conrod. T.Run and T.Name use a map[string]int64 to hold the next suffix to use when duplicate names are passed to T.Run. This map necessarily retains one entry per unique name. However, it's a waste of memory to retain one entry per duplicate name: when we encounter the Nth duplicate, we know that names 00 through N-1 have been used just by looking at N. We do still need to store (and check for collisions againsts) explicit names provided by the caller. For example, if the user passes in "a", then "a#01", then "a" again, we cannot deduplicate the second "a" to "a#01" — we need to instead skip ahead to "a#02". We can do so by checking the count of "a", then generating a proposed deduplicated name, then double-checking that proposed name against only the explicit names so far. This somewhat reduces memory usage for tests that spawn large numbers of duplicate subtests, but doesn't solve the problem of memory growth for fuzzing — we still have to track all of the explicit, user-provided subtest names, and in a long-running fuzz test that set alone may be unbounded. This fixes memory growth for the example described in https://golang.org/issue/44517#issuecomment-897104060, but not the one in https://golang.org/issue/44517#issuecomment-933825661. For #44517 Change-Id: Ia159ecfcf44561ba67508d3af6377c27856df31d Reviewed-on: https://go-review.googlesource.com/c/go/+/354749 Trust: Bryan C. Mills <bcmills@google.com> Run-TryBot: Bryan C. Mills <bcmills@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Emmanuel Odeke <emmanuel@orijtech.com> Reviewed-by: Jay Conrod <jayconrod@google.com>
If I leave a fuzzer running for sufficiently long, then it crashes with an OOM. Snippet from
dmesg
:I don't believe this is because the code being tested is OOMing, but rather the fuzzer itself is retaining too much memory.
Here's a graph of the RSS memory usage over time:
The machine has 32GiB of RAM.
There are large jumps in memory usage at various intervals. I don't have much understanding of how the fuzzer works, but maybe this could be the mutator discovering that some input expands coverage and adding that to some internal data structure?
I should further note that when the fuzzer crashes, it produces a
testdata/FuzzXXX/YYY
file as the "reproducer". Running the test with that "reproducer" does not fail the fuzz test. If possible, the fuzzer should be able to distinguish between OOMs due to itself and versus OOMs due to the code being tested. The former should not result in any "repro" corpus files being added, while the latter should.I'm using 5aacd47.
(I can provide the code I'm fuzzing, but contact me privately)
\cc @katiehockman @jayconrod
The text was updated successfully, but these errors were encountered: