Join GitHub today
runtime: mark assist blocks GC microbenchmark for 7ms #27732
Please answer these questions before submitting your issue. Thanks!
What version of Go are you using (
#16528 (now closed) I think might have been cited in discussion of a previous incarnation of this GC benchmark.
Also probably related is this prior blog post:
And also cited in that blog is this mailing list conversation from 2016:
Expanding my comment slightly:
The benchmark code appearing within the text of that 2016 blog:
seems to match the benchmark code that @dr2chase says he used as his starting point above:
Both are from the same original author.
That 2016 blog reported ~7 ms pause time for Go 1.7.3, and at the time #16528 was theorized as the root cause as why the results weren't better (e.g., see the "Why Are the Go Results Not Better?" section there).
In any event, mainly just wanted to mention the older investigation in case that is useful (and apologies if not useful).
changed the title from
Mark assist blocks GC microbenchmark for 7ms.
runtime: mark assist blocks GC microbenchmark for 7ms
Sep 18, 2018
referenced this issue
Sep 19, 2018
A bit more info: it turns out that in the original, the 4800000 byte circular buffer was allocated on the stack, and large stack frames are not handled incrementally in the same way that large objects are.
Modifying the benchmark to allocate the circular buffer and store the pointer in a global, the latency falls to 2ms, which is better, though still far worse than expected. In the snapshot from the trace, you can see that the GC work is now shared among several threads, but the worker thread is 100% running mark assist during that interval.
A different modification, to move the declaration of var c circularBuffer to a global, also shortens the worst case latency in the same way, also with the same 100% mark assist for about 2ms.
Still to investigate:
Once mark assist is dealt with, this microbenchmark is likely to have problems with long sweeps, a different, known bug ( #18155 ) that I've also seen in some of these traces. That looks like:
New summary of apparent subissues:
For reliable measurement of steady-state latencies, benchmark ought to do a warmup run first, because rapid heap growth around startup is more likely to provoke mysterious OS interference with progress.
Lack of credit for mark assist of roots is a contributor to long pauses.