Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: memory use regressions in various benchmarks on perf.golang.org/dashboard #53738

Closed
3 tasks done
mknyszek opened this issue Jul 7, 2022 · 5 comments
Closed
3 tasks done
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. Performance release-blocker
Milestone

Comments

@mknyszek
Copy link
Contributor

mknyszek commented Jul 7, 2022

A number of the Sweet benchmarks have memory use regressions on the order of about 5% vs. Go 1.18 on tip. This should be resolved before release.

Two commits that appear to be point to as culprits are 13b18ec (for the Tile38 benchmarks), 79db59d (for some of the others, like the markdown benchmark), and 129dcb7 (for one of the fogleman benchmarks and a few others).

  • 13b18ec - Causes a change to fractional worker pacing since the builders run the Tile38 server with GOMAXPROCS=6. Not reproducible with no fractional worker (needs verification).

  • 79db59d - heapLive is now an underestimate which causes a more optimistic assumption by the GC that's less accurate for smaller heaps. The exact mechanism is unknown, but I've verified that switching back to an overestimate fixes it. Fix incoming (a partial revert).

    • Update: Fixed.
  • 129dcb7 - Clearly there's a pacing change, but I'm not sure about this one yet. This is the least investigated.

    • Update: This is WAI, as it turns out. The regressions here are a result of faulty math in Go 1.18 pacing code that produced ultimately better performance (especially in microbenchmarks). This faulty math caused a bunch of overshoot and additional assists that had a bigger impact the smaller the heap was (but, both of these effects cause shorter and fewer GC cycles for the same GOGC, so effectively a performance improvement). This faulty math was fixed in 1.19, and I've confirmed that (1) the 1.19 binaries are more stable w.r.t. the pacer and (2) use fewer GC assists.

(Note that 54bd44e may appear to be implicated on some recent graphs but that actually fixed a pretty significant latency issue and brought performance back in line with Go 1.18, so we're looking at deltas from earlier. In effect, the behavior before that CL (back to when the relevant benchmarks start exhibiting that behavior) was just plain wrong.)

@mknyszek mknyszek added Performance NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker labels Jul 7, 2022
@mknyszek mknyszek added this to the Go1.19 milestone Jul 7, 2022
@mknyszek mknyszek self-assigned this Jul 7, 2022
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 7, 2022
@gopherbot
Copy link

Change https://go.dev/cl/416417 mentions this issue: runtime: overestimate the amount of allocated memory in heapLive

gopherbot pushed a commit that referenced this issue Jul 7, 2022
CL 377516 made it so that memory metrics are truly monotonic, but also
updated how heapLive tracked allocated memory to also be monotonic.

The result is that cached spans with allocated memory aren't fully
accounted for by the GC, causing it to make a worse assumption (the
exact mechanism is at this time unknown), resulting in a memory
regression, especially for smaller heaps.

This change is a partial revert of CL 377516 that makes heapLive a
non-monotonic overestimate again, which appears to resolve the
regression.

For #53738.

Change-Id: I5c51067abc0b8e0a6b89dd8dbd4a0be2e8c0c1b2
Reviewed-on: https://go-review.googlesource.com/c/go/+/416417
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
fnxpt pushed a commit to fnxpt/go that referenced this issue Jul 8, 2022
CL 377516 made it so that memory metrics are truly monotonic, but also
updated how heapLive tracked allocated memory to also be monotonic.

The result is that cached spans with allocated memory aren't fully
accounted for by the GC, causing it to make a worse assumption (the
exact mechanism is at this time unknown), resulting in a memory
regression, especially for smaller heaps.

This change is a partial revert of CL 377516 that makes heapLive a
non-monotonic overestimate again, which appears to resolve the
regression.

For golang#53738.

Change-Id: I5c51067abc0b8e0a6b89dd8dbd4a0be2e8c0c1b2
Reviewed-on: https://go-review.googlesource.com/c/go/+/416417
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
@mknyszek
Copy link
Contributor Author

I've updated the top post, but there's only one regression left to get to the bottom of before release. I expect to resolve that this week and I'm on top of it.

@mknyszek
Copy link
Contributor Author

I may have narrowed down the cause of the tile38 memory regression on tip (and the original culprit I suspect seems innocent now; reverting on tip changes nothing).

My guess is that it's related to the fact that the GC pacer math now, in some places, uses the actual trigger point over the precomputed GOGC-based trigger. This is technically "more" correct because that math (like the cons/mark calculation) really does care more about how much was actually allocated than it does when we intended to trigger. It also really shouldn't make much of a difference in most cases.

However, where it does make a difference is in the PI controller used to help smooth the cons/mark measurements a bit. Because we're using the actual trigger point now, there are a bunch of recalculations of cons/mark that are skipped because, as it turns out, during the initialization phase of the tile38 server, the bytes of allocation between the trigger point and when the GC is done is zero. We can't do anything with that because it can cause divide-by-zero issues, and it's also just not very informative.

But these are cases where the 1.18 pacer takes that information (because it's using a lower precomputed trigger over the actual trigger point) and decides to flail wildly for a few GCs. This flailing now ends up in the PI controller's history and by the time we actually get to the main phase of the server (the part we actually measure) the PI controller ends up insisting that the cons/mark ratio is higher than the samples its getting (and would take more GCs than the benchmark currently executes to settle). The actual measured cons/mark in this phase between Go 1.18 and Go 1.19 is roughly the same across the GC cycles. Go 1.19's pacer doesn't flail like this, so it ends up underestimating the cons/mark a bit.

This difference in cons/mark causes Go 1.19's pacer to give the GC a lot less runway, and sometimes the underestimate swings low enough that it's not enough runway, incurring more assists. More GC cycles in the benchmark would smooth this out, but the benchmark already runs for a long time as it is.

Also as result of this fix, Go 1.19's pacer tends to be a little more on-target when it comes to the heap goal, which is why it ends up using slightly more memory.

The end result is: more memory used, roughly equal performance (throughput and latency). we don't get the benefits of fewer GC cycles on average because of (AFAICT) the additional assists that trigger due to the occasional insufficient runway.

One thing that I'm kind of starting to dislike in all this is the PI controller. I'm regretting the decision to use that as a smoothing function now. Yes, it's the only thing that has a chance at really truly finding the steady-state, but it's pretty swingy and I don't think we can afford to damp it more.

I've been experimenting with a much simpler smoothing function: moving average cons/mark over the last 2 cycles. it appears to work OK, is far more stable, and on at least two of the tile38 benchmarks turns down the memory use regression a bit. It also will prevent these kinds of surprises in the future.

This seems to work OK across the board in terms of mitigating the regression.

As an alternative, I am looking into switching back to the precomputed trigger to see if it also resolves the issues.

@mknyszek
Copy link
Contributor Author

Good news, looks like using the precomputed trigger resolves the remaining issues. That's probably safer for Go 1.19, so I'm going to advocate for landing that. For Go 1.20, I think I'd like to go back to using the actual trigger and switching to a simpler smoothing function, like the 2-GC moving average I described above.

@gopherbot
Copy link

Change https://go.dev/cl/417557 mentions this issue: runtime: revert to using the precomputed trigger for pacer calculations

@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Aug 1, 2022
jproberts pushed a commit to jproberts/go that referenced this issue Aug 10, 2022
CL 377516 made it so that memory metrics are truly monotonic, but also
updated how heapLive tracked allocated memory to also be monotonic.

The result is that cached spans with allocated memory aren't fully
accounted for by the GC, causing it to make a worse assumption (the
exact mechanism is at this time unknown), resulting in a memory
regression, especially for smaller heaps.

This change is a partial revert of CL 377516 that makes heapLive a
non-monotonic overestimate again, which appears to resolve the
regression.

For golang#53738.

Change-Id: I5c51067abc0b8e0a6b89dd8dbd4a0be2e8c0c1b2
Reviewed-on: https://go-review.googlesource.com/c/go/+/416417
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
jproberts pushed a commit to jproberts/go that referenced this issue Aug 10, 2022
Issue golang#53738 describes in detail how switching to using the actual
trigger point over the precomputed trigger causes a memory regression,
that arises from the fact that the PI controller in front of the
cons/mark ratio has a long time constant (for overdamping), so it
retains a long history of inputs.

This change, for the Go 1.19 cycle, just reverts to using the
precomputed trigger because it's safer, but in the future we should
consider moving away from such a history-sensitive smoothing function.

See the big comment in the diff and golang#53738 for more details.

Performance difference vs. 1.18 after this change:
https://perf.golang.org/search?q=upload:20220714.15

Fixes golang#53738.

Change-Id: I636993a730a3eaed25da2a2719860431b296c6f9
Reviewed-on: https://go-review.googlesource.com/c/go/+/417557
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
@golang golang locked and limited conversation to collaborators Aug 1, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. Performance release-blocker
Projects
None yet
Development

No branches or pull requests

3 participants