Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: scavenger pacing fails to account for fragmentation [1.13 backport] #34149

Closed
gopherbot opened this issue Sep 6, 2019 · 7 comments
Closed
Assignees
Milestone

Comments

@gopherbot
Copy link

@gopherbot gopherbot commented Sep 6, 2019

@mknyszek requested issue #34048 to be considered for backport to the next 1.13 minor release.

Since this has the potential to cause a severe performance loss, I think we should backport this to Go 1.13.

@gopherbot Please open a backport issue for 1.13.

@bcmills

This comment has been minimized.

Copy link
Member

@bcmills bcmills commented Sep 25, 2019

this has the potential to cause a severe performance loss

@mknyszek, @aclements: could you describe the conditions under which this performance degradation occurs? (Is it predictable? Does it show up during testing? Is there a workaround available? Do we have a rough estimate for the fraction of users affected?)

@mknyszek

This comment has been minimized.

Copy link
Contributor

@mknyszek mknyszek commented Sep 25, 2019

Is it predictable?

Yes. If the application's span fragmentation (heap_inuse / heap_alloc) >= 10%, the scavenger can spiral into a state in which every free span is scavenged, leading to a large number of page faults and a large number of syscalls.

Does it show up during testing?

I don't have a specific test, but I can recreate the case with both large and small programs. I'd have to write a short program which reproduces the problem, but that shouldn't be difficult to do.

Is there a workaround available?

No. There is no way to work around this because it's embedded in runtime pacing. You'd have to somehow keep your application's span fragmentation under 10% which isn't reasonable to ask of anyone.

Do we have a rough estimate for the fraction of users affected?

No, and I'm not sure exactly how we'd be able to get that. We know for a fact that it severely affects Kubernetes' API latency and is one reason why they're blocked on moving to Go 1.13 (see #32828). There is another (medium-sized, I suppose?) Google application which took consistently took 80% longer to run in Go 1.13 as a direct consequence of this issue. The patch in the non-backport bug fixes this behavior, and brings it back to Go 1.12 levels of performance.

@bcmills

This comment has been minimized.

Copy link
Member

@bcmills bcmills commented Sep 25, 2019

By “Is it predictable?” and “Does it show up during testing?”, I meant more, “will users be able to predict and/or identify that they are affected?”

(That is: this seems more important to backport if it can crop up suddenly after a user has already tested and deployed a new build.)

@mknyszek

This comment has been minimized.

Copy link
Contributor

@mknyszek mknyszek commented Sep 25, 2019

Ah, I think I understand. Then it's not predictable. It can certainly show up suddenly in production, without anyone noticing while running tests, running microbenchmarks, or even a smaller version of the full application. It's as difficult to predict as how much heap fragmentation your application produces, which may be a function of input in some cases.

@aclements

This comment has been minimized.

Copy link
Member

@aclements aclements commented Oct 2, 2019

I agree that this should be backported for the reasons @mknyszek laid out.

Specifically, the CL to backport is https://golang.org/cl/193040.

@gopherbot

This comment has been minimized.

Copy link
Author

@gopherbot gopherbot commented Oct 2, 2019

Change https://golang.org/cl/198487 mentions this issue: [release-branch.go1.13] runtime: redefine scavenge goal in terms of heap_inuse

gopherbot pushed a commit that referenced this issue Oct 4, 2019
…eap_inuse

This change makes it so that the scavenge goal is defined primarily in
terms of heap_inuse at the end of the last GC rather than next_gc. The
reason behind this change is that next_gc doesn't take into account
fragmentation, and we can fall into situation where the scavenger thinks
it should have work to do but there's no free and unscavenged memory
available.

In order to ensure the scavenge goal still tracks next_gc, we multiply
heap_inuse by the ratio between the current heap goal and the last heap
goal, which describes whether the heap is growing or shrinking, and by
how much.

Finally, this change updates the documentation for scavenging and
elaborates on why the scavenge goal is defined the way it is.

Fixes #34149

Change-Id: I8deaf87620b5dc12a40ab8a90bf27932868610da
Reviewed-on: https://go-review.googlesource.com/c/go/+/193040
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
(cherry picked from commit 9b30811)
Reviewed-on: https://go-review.googlesource.com/c/go/+/198487
Run-TryBot: Andrew Bonventre <andybons@golang.org>
@gopherbot

This comment has been minimized.

Copy link
Author

@gopherbot gopherbot commented Oct 4, 2019

Closed by merging cd951ae to release-branch.go1.13.

@gopherbot gopherbot closed this Oct 4, 2019
@katiehockman katiehockman modified the milestones: Go1.13.2, Go1.13.3 Oct 17, 2019
@mm4tt mm4tt mentioned this issue Oct 24, 2019
9 of 9 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.