runtime: memory use regressions in various benchmarks on perf.golang.org/dashboard #53738

mknyszek · 2022-07-07T19:45:49Z

A number of the Sweet benchmarks have memory use regressions on the order of about 5% vs. Go 1.18 on tip. This should be resolved before release.

Two commits that appear to be point to as culprits are 13b18ec (for the Tile38 benchmarks), 79db59d (for some of the others, like the markdown benchmark), and 129dcb7 (for one of the fogleman benchmarks and a few others).

13b18ec - Causes a change to fractional worker pacing since the builders run the Tile38 server with GOMAXPROCS=6. Not reproducible with no fractional worker (needs verification).
- Update: That's been debunked. Regression is still there without fractional workers. The regression also seems coupled with an improvement in CPU time, so it looks like it may be a trade-off going on.
- Update: Looks like it's the fact that we're using the real trigger instead of precomputed one now. See runtime: memory use regressions in various benchmarks on perf.golang.org/dashboard #53738 (comment).
79db59d - heapLive is now an underestimate which causes a more optimistic assumption by the GC that's less accurate for smaller heaps. The exact mechanism is unknown, but I've verified that switching back to an overestimate fixes it. Fix incoming (a partial revert).
- Update: Fixed.
129dcb7 - Clearly there's a pacing change, but I'm not sure about this one yet. This is the least investigated.
- Update: This is WAI, as it turns out. The regressions here are a result of faulty math in Go 1.18 pacing code that produced ultimately better performance (especially in microbenchmarks). This faulty math caused a bunch of overshoot and additional assists that had a bigger impact the smaller the heap was (but, both of these effects cause shorter and fewer GC cycles for the same GOGC, so effectively a performance improvement). This faulty math was fixed in 1.19, and I've confirmed that (1) the 1.19 binaries are more stable w.r.t. the pacer and (2) use fewer GC assists.

(Note that 54bd44e may appear to be implicated on some recent graphs but that actually fixed a pretty significant latency issue and brought performance back in line with Go 1.18, so we're looking at deltas from earlier. In effect, the behavior before that CL (back to when the relevant benchmarks start exhibiting that behavior) was just plain wrong.)

The text was updated successfully, but these errors were encountered:

gopherbot · 2022-07-07T20:05:09Z

Change https://go.dev/cl/416417 mentions this issue: runtime: overestimate the amount of allocated memory in heapLive

CL 377516 made it so that memory metrics are truly monotonic, but also updated how heapLive tracked allocated memory to also be monotonic. The result is that cached spans with allocated memory aren't fully accounted for by the GC, causing it to make a worse assumption (the exact mechanism is at this time unknown), resulting in a memory regression, especially for smaller heaps. This change is a partial revert of CL 377516 that makes heapLive a non-monotonic overestimate again, which appears to resolve the regression. For #53738. Change-Id: I5c51067abc0b8e0a6b89dd8dbd4a0be2e8c0c1b2 Reviewed-on: https://go-review.googlesource.com/c/go/+/416417 Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>

CL 377516 made it so that memory metrics are truly monotonic, but also updated how heapLive tracked allocated memory to also be monotonic. The result is that cached spans with allocated memory aren't fully accounted for by the GC, causing it to make a worse assumption (the exact mechanism is at this time unknown), resulting in a memory regression, especially for smaller heaps. This change is a partial revert of CL 377516 that makes heapLive a non-monotonic overestimate again, which appears to resolve the regression. For golang#53738. Change-Id: I5c51067abc0b8e0a6b89dd8dbd4a0be2e8c0c1b2 Reviewed-on: https://go-review.googlesource.com/c/go/+/416417 Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>

mknyszek · 2022-07-11T21:40:48Z

I've updated the top post, but there's only one regression left to get to the bottom of before release. I expect to resolve that this week and I'm on top of it.

mknyszek · 2022-07-14T18:44:23Z

I may have narrowed down the cause of the tile38 memory regression on tip (and the original culprit I suspect seems innocent now; reverting on tip changes nothing).

My guess is that it's related to the fact that the GC pacer math now, in some places, uses the actual trigger point over the precomputed GOGC-based trigger. This is technically "more" correct because that math (like the cons/mark calculation) really does care more about how much was actually allocated than it does when we intended to trigger. It also really shouldn't make much of a difference in most cases.

However, where it does make a difference is in the PI controller used to help smooth the cons/mark measurements a bit. Because we're using the actual trigger point now, there are a bunch of recalculations of cons/mark that are skipped because, as it turns out, during the initialization phase of the tile38 server, the bytes of allocation between the trigger point and when the GC is done is zero. We can't do anything with that because it can cause divide-by-zero issues, and it's also just not very informative.

But these are cases where the 1.18 pacer takes that information (because it's using a lower precomputed trigger over the actual trigger point) and decides to flail wildly for a few GCs. This flailing now ends up in the PI controller's history and by the time we actually get to the main phase of the server (the part we actually measure) the PI controller ends up insisting that the cons/mark ratio is higher than the samples its getting (and would take more GCs than the benchmark currently executes to settle). The actual measured cons/mark in this phase between Go 1.18 and Go 1.19 is roughly the same across the GC cycles. Go 1.19's pacer doesn't flail like this, so it ends up underestimating the cons/mark a bit.

This difference in cons/mark causes Go 1.19's pacer to give the GC a lot less runway, and sometimes the underestimate swings low enough that it's not enough runway, incurring more assists. More GC cycles in the benchmark would smooth this out, but the benchmark already runs for a long time as it is.

Also as result of this fix, Go 1.19's pacer tends to be a little more on-target when it comes to the heap goal, which is why it ends up using slightly more memory.

The end result is: more memory used, roughly equal performance (throughput and latency). we don't get the benefits of fewer GC cycles on average because of (AFAICT) the additional assists that trigger due to the occasional insufficient runway.

One thing that I'm kind of starting to dislike in all this is the PI controller. I'm regretting the decision to use that as a smoothing function now. Yes, it's the only thing that has a chance at really truly finding the steady-state, but it's pretty swingy and I don't think we can afford to damp it more.

I've been experimenting with a much simpler smoothing function: moving average cons/mark over the last 2 cycles. it appears to work OK, is far more stable, and on at least two of the tile38 benchmarks turns down the memory use regression a bit. It also will prevent these kinds of surprises in the future.

This seems to work OK across the board in terms of mitigating the regression.

As an alternative, I am looking into switching back to the precomputed trigger to see if it also resolves the issues.

mknyszek · 2022-07-14T21:01:43Z

Good news, looks like using the precomputed trigger resolves the remaining issues. That's probably safer for Go 1.19, so I'm going to advocate for landing that. For Go 1.20, I think I'd like to go back to using the actual trigger and switching to a simpler smoothing function, like the 2-GC moving average I described above.

gopherbot · 2022-07-14T21:36:22Z

Change https://go.dev/cl/417557 mentions this issue: runtime: revert to using the precomputed trigger for pacer calculations

CL 377516 made it so that memory metrics are truly monotonic, but also updated how heapLive tracked allocated memory to also be monotonic. The result is that cached spans with allocated memory aren't fully accounted for by the GC, causing it to make a worse assumption (the exact mechanism is at this time unknown), resulting in a memory regression, especially for smaller heaps. This change is a partial revert of CL 377516 that makes heapLive a non-monotonic overestimate again, which appears to resolve the regression. For golang#53738. Change-Id: I5c51067abc0b8e0a6b89dd8dbd4a0be2e8c0c1b2 Reviewed-on: https://go-review.googlesource.com/c/go/+/416417 Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>

Issue golang#53738 describes in detail how switching to using the actual trigger point over the precomputed trigger causes a memory regression, that arises from the fact that the PI controller in front of the cons/mark ratio has a long time constant (for overdamping), so it retains a long history of inputs. This change, for the Go 1.19 cycle, just reverts to using the precomputed trigger because it's safer, but in the future we should consider moving away from such a history-sensitive smoothing function. See the big comment in the diff and golang#53738 for more details. Performance difference vs. 1.18 after this change: https://perf.golang.org/search?q=upload:20220714.15 Fixes golang#53738. Change-Id: I636993a730a3eaed25da2a2719860431b296c6f9 Reviewed-on: https://go-review.googlesource.com/c/go/+/417557 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com>

mknyszek added Performance NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker labels Jul 7, 2022

mknyszek added this to the Go1.19 milestone Jul 7, 2022

mknyszek self-assigned this Jul 7, 2022

gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 7, 2022

mknyszek mentioned this issue Jul 14, 2022

runtime: simplify cons/mark smoothing and use the correct trigger point in the cons/mark calculation #53892

Closed

gopherbot closed this as completed in 85a482f Jul 19, 2022

dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Aug 1, 2022

golang locked and limited conversation to collaborators Aug 1, 2023

gopherbot added the FrozenDueToAge label Aug 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime: memory use regressions in various benchmarks on perf.golang.org/dashboard #53738

runtime: memory use regressions in various benchmarks on perf.golang.org/dashboard #53738

mknyszek commented Jul 7, 2022 •

edited

gopherbot commented Jul 7, 2022

mknyszek commented Jul 11, 2022

mknyszek commented Jul 14, 2022

mknyszek commented Jul 14, 2022

gopherbot commented Jul 14, 2022

runtime: memory use regressions in various benchmarks on perf.golang.org/dashboard #53738

runtime: memory use regressions in various benchmarks on perf.golang.org/dashboard #53738

Comments

mknyszek commented Jul 7, 2022 • edited

gopherbot commented Jul 7, 2022

mknyszek commented Jul 11, 2022

mknyszek commented Jul 14, 2022

mknyszek commented Jul 14, 2022

gopherbot commented Jul 14, 2022

mknyszek commented Jul 7, 2022 •

edited