Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: k8s scalability load test failure (latency regression) with unified IR #54593

Closed
mknyszek opened this issue Aug 22, 2022 · 4 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance release-blocker
Milestone

Comments

@mknyszek
Copy link
Contributor

mknyszek commented Aug 22, 2022

Kubernetes runs a nightly load test built with Go tip. Recently there was a regression in latency that caused various thresholds to be exceeded.

(See the "gathering measurements step," though that on its own isn't very descriptive.)

The team has narrowed the culprit down to

833367e98a internal/buildcfg: enable unified IR by default

It's unclear at this time what the issue is. We're currently waiting on more metrics to investigate this more thoroughly.

CC @mdempsky

@mknyszek mknyszek added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker labels Aug 22, 2022
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Aug 22, 2022
@mknyszek mknyszek added Performance and removed compiler/runtime Issues related to the Go compiler and/or runtime. labels Aug 22, 2022
@mknyszek mknyszek added this to the Go1.20 milestone Aug 22, 2022
@mknyszek
Copy link
Contributor Author

mknyszek commented Aug 22, 2022

Assigning to myself for now to gather more information.

@mknyszek mknyszek self-assigned this Aug 22, 2022
@mdempsky
Copy link
Member

mdempsky commented Aug 22, 2022

Just to confirm, this is an issue with run-time code being slower? Like it's not a problem with the compiler having slowed down too much?

One thing that comes to mind is inlining has changed. In particular, the generated IR is somewhat different now, whereas the inlining heuristics are still tuned for the non-unified frontend's IR; and unified IR still has a more primitive heuristic for when to re-export function bodies, so sometimes inlining may fail. And in turn, this might have impacts on escape analysis.

@mknyszek
Copy link
Contributor Author

mknyszek commented Aug 22, 2022

Nope, it's about code at runtime. Now that I'm looking at some of the graphs, it actually looks more like occasional stalls than a uniform slowdown, though I think there is some smaller uniform impact on tail latencies as well in between those stalls (unclear if they're related yet).

Here's what the API call latency looks like over time (note the threshold for the first graph is 1 second):

image

And here's some GC-related statistics (note the breaks in the graphs that line up with the spikes):

image

@mdempsky
Copy link
Member

mdempsky commented Aug 30, 2022

Kubernetes reports this is fixed since 6a801d3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance release-blocker
Projects
Status: Done
Development

No branches or pull requests

3 participants