-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: long latency in mark assist #40225
Comments
@choleraehyq thanks for the detailed report! I'll take a look through changes that were made in the runtime that could be relevant, since I can't think of what could be causing this off the top of my head. Judging by the My other observation is that we're spending a comparatively high amount of CPU time in mark assists vs. background marking (like 410 vs. 180 on (a very very rough guess) average). |
@mknyszek Thanks for your response. I'll produce |
@mknyszek
1.14.4
|
Thanks for the quick turnaround! Ah, OK. I see what's going on re: early GCs. GCs appear to be relatively cheap for your application, so we're maxing out on how late we can start a GC. This happens in both 1.14 and 1.15, so that on its own shouldn't be the problem. Otherwise, at first glance, these two traces look very similar. I'll take a closer look. |
Sorry for the delayed reply. Taking a closer look at both the traces, I don't see any significant differences that would lead to what you see in the execution trace. CPU time spent in assists and background GC are similar in both, though there's a lot more time spent in assists than in background GC. This suggests to me that your application is allocating fairly heavily, but there isn't much work for the GC to do (either there are just a few big allocations, there isn't much depth in the points-to graph, or most of the heap just doesn't survive). Maybe one difference is that @randall77 @aclements any thoughts on this? Just to be totally clear, is that execution trace screencap you included reproducible in Go 1.14 or is this issue totally new? The data in each GC trace suggests that this should be reproducible. |
@mknyszek I take a look at the execution trace of Go1.14, it's indeed quite similar to Go1.15. But my application latency is really higher using Go1.15, with the same application code and workload. I'll dig into it later. |
At a high level, the runtime garbage collector can impact user goroutine latency in two ways. The first is that it pauses all goroutines during its stop-the-world sweep termination and mark termination phases. The second is that it backpressures memory allocations by instructing user goroutines to assist with scanning and marking in response to a high allocation rate. There is plenty of observability into the first of these sources of user-visible latency. There is significantly less observability into the second. As a result, it is often more difficult to diagnose latency problems due to over-assist (e.g. golang#14812, golang#27732, golang#40225). To this point, the ways to determine that GC assist was a problem were to use execution tracing or to use GODEBUG=gctrace=1 tracing, neither of which is easy to access programmatically in a running system. CPU profiles also give some insight, but are rarely as instructive as one might expect because heavy GC assist time is scattered across a profile. Notice even in https://tip.golang.org/doc/gc-guide, the guidance on recognizing and remedying performance problems due to GC assist is sparse. This commit adds a counter to the MemStats and GCStats structs called AssistTotalNs, which tracks the cumulative nanoseconds in GC assist since the program started. This provides a new form of observability into GC assist delays, and one that can be manipulated programmatically. There's more work to be done in this area. This feels like a reasonable first step.
Change https://go.dev/cl/431877 mentions this issue: |
Mark assists should have gone down significantly in a variety of cases thanks to the pacer changes in Go 1.18 (specifically, the target mark assist percentage is now 0). If you have a chance @choleraehyq, please double-check! I'm inclined to mark this as resolved for now. |
@mknyszek I don't find any latency problem now, thanks! |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes, 1.15beta1 is the latest release.
What operating system and processor architecture are you using (
go env
)?Linux amd64
What did you do?
Upgrade one production service instance from Go 1.14.4 to Go 1.15beta1. It's a quite complicated proxy-like service, currently, I can't reproduce by a simple test case.
What did you expect to see?
Run well as before.
What did you see instead?
The latency of this service instance sharply increases. I dump a trace of this process and find out after sweep termination, mark assist use nearly all cpus in about 20ms:
MMU:
part of GC trace:
The text was updated successfully, but these errors were encountered: