Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: long GC STW pauses (≥80ms) #19378

Closed
obeattie opened this issue Mar 3, 2017 · 9 comments

Comments

Projects
None yet
6 participants
@obeattie
Copy link

commented Mar 3, 2017

What version of Go are you using (go version)?

go1.8 linux/amd64

What operating system and processor architecture are you using (go env)?

GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/go"
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0"
CXX="g++"
CGO_ENABLED="1"
PKG_CONFIG="pkg-config"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"

The host where the application is running does not have go (just the compiled binary), so this is actually from the host which built it.

What did you do?

Following from a discussion on golang-nuts, I was advised to create an issue.

In our applications we are regularly seeing GC pauses of ≥80ms, even for extremely simple programs with light workloads. These long pauses seem to be sporadic; most GC's take fractions of a millisecond as I would expect, but in some runs it seems that the sweep termination phase (which I understand is STW) is taking a very long time. Given these applications are network RPC servers, pauses like these are extremely problematic for tail latencies.

I have captured a runtime/trace here showing one such pause, along with the output of GODEBUG=gctrace=1 below for the same period. The long GC is at 451,008.477 ms in the trace and corresponds to gc 12 below.

I'd really appreciate any help investigating what could cause this.

GC forced
gc 9 @1111.485s 0%: 79+1.8+0.037 ms clock, 159+0.60/1.7/0.85+0.074 ms cpu, 3->3->1 MB, 4 MB goal, 2 P
scvg7: 0 MB released
scvg7: inuse: 4, idle: 1, sys: 6, released: 0, consumed: 5 (MB)
GC forced
gc 10 @1232.068s 0%: 0.032+98+0.30 ms clock, 0.065+0/97/1.9+0.60 ms cpu, 2->2->1 MB, 4 MB goal, 2 P
scvg8: 0 MB released
scvg8: inuse: 4, idle: 1, sys: 6, released: 1, consumed: 4 (MB)
GC forced
gc 11 @1352.178s 0%: 0.037+5.7+0.021 ms clock, 0.074+0/5.6/0+0.043 ms cpu, 2->2->1 MB, 4 MB goal, 2 P
GC forced
gc 12 @1472.566s 0%: 98+2.7+0.025 ms clock, 197+0/2.7/1.9+0.051 ms cpu, 3->3->1 MB, 4 MB goal, 2 P
scvg9: inuse: 4, idle: 1, sys: 6, released: 1, consumed: 4 (MB)
GC forced
gc 13 @1593.066s 0%: 0.029+105+0.030 ms clock, 0.058+0/102/6.0+0.061 ms cpu, 3->3->1 MB, 4 MB goal, 2 P
scvg10: 0 MB released
scvg10: inuse: 4, idle: 1, sys: 6, released: 0, consumed: 5 (MB)
GC forced
gc 14 @1713.512s 0%: 0.092+55+0.051 ms clock, 0.18+53/53/1.7+0.10 ms cpu, 3->3->1 MB, 4 MB goal, 2 P
scvg11: inuse: 4, idle: 1, sys: 6, released: 0, consumed: 5 (MB)
GC forced
gc 15 @1834.066s 0%: 0.020+2.1+0.019 ms clock, 0.041+0/2.0/1.4+0.039 ms cpu, 2->2->1 MB, 4 MB goal, 2 P```

@ALTree ALTree changed the title Long GC STW pauses (≥80ms) runtime: long GC STW pauses (≥80ms) Mar 3, 2017

@obeattie

This comment has been minimized.

Copy link
Author

commented Mar 3, 2017

Here is another runtime/trace that shows this a little more clearly – the first GC captured has fast STW phases, but the second apparently spends 83ms in sweep termination. Associated gctrace output:

GC forced
gc 224 @26972.566s 0%: 0.019+104+0.15 ms clock, 0.039+0.66/103/101+0.30 ms cpu, 3->3->1 MB, 4 MB goal, 2 P
scvg179: inuse: 4, idle: 1, sys: 6, released: 0, consumed: 5 (MB)
GC forced
gc 225 @27092.882s 0%: 83+1.5+0.024 ms clock, 167+0.91/82/1.8+0.048 ms cpu, 2->2->1 MB, 4 MB goal, 2 P
scvg180: inuse: 4, idle: 1, sys: 6, released: 0, consumed: 5 (MB)
@obeattie

This comment has been minimized.

Copy link
Author

commented Mar 10, 2017

Is there anything I can do/any more information I can gather to help debug this?

I know this probably isn't very high priority, but this is having a fairly severe effect on our system, so I'd love to do anything I can to get to the bottom of this.

@bradfitz bradfitz added this to the Go1.9 milestone Mar 10, 2017

@bradfitz

This comment has been minimized.

Copy link
Member

commented Mar 10, 2017

@obeattie, sorry, I just assigned it to @aclements. It's likely nobody on the runtime team saw it until now.

@aclements

This comment has been minimized.

Copy link
Member

commented Mar 10, 2017

@obeattie, thanks for the execution traces. Unfortunately, I spent a while looking at them and wasn't able to glean much about the actual problem.

It's interesting that these are always periodic GCs and that your heap is under 4 MB. Looking through sweep termination, I could make guesses about what's going on, but they would all be shots in the dark. Background sweeping has long finished sweeping before we enter sweep termination, so I don't think the sweep termination part of sweep termination is the problem.

Could you collect a pprof profile? Given how idle your program generally is, I suspect you would get some samples under runtime.forcegchelper that may give us a better idea of what's holding it up. You might want to crank up the profile sampling rate with runtime.SetCPUProfileRate. In the pprof tool, use something like "web forcegchelper" to focus on samples including forcegchelper.

@obeattie

This comment has been minimized.

Copy link
Author

commented Mar 10, 2017

Thanks for getting back to me 😄 I'll get this data as soon as I can.

@aclements

This comment has been minimized.

Copy link
Member

commented May 18, 2017

@obeattie, ping.

@obeattie

This comment has been minimized.

Copy link
Author

commented May 19, 2017

So sorry for leaving this open for so long with no follow-up. 🤦‍♂️ Thank you for the ping!

We have been able to mitigate the impact of this issue somewhat but it is still happening. Our investigations were hampered by some infrastructure issues. We've resolved those issues in the last week though, so we should be able to look into this more deeply now.

My current thinking is that some behaviour of the Linux scheduler is involved. The fact that the GC times are so closely clustered around 100ms – which is the default cfs_period_us – is very suspicious. Getting a patch for #19497 would be super helpful to help confirm this. Is that a possibility? I'm happy to take a stab myself if need be, but I'd need some pointers.

@jeffallen

This comment has been minimized.

Copy link
Contributor

commented May 30, 2017

Hello @obeattie,

I got interested in your issue and decided to try to see if I could prove or disprove your theory about CFS before digging into it from a Go point of view.

I wrote a little program that does network transactions (talking with itself, via localhost) and dying as soon as it sees one transaction that takes more than 80ms. It is here: https://play.golang.org/p/7ZXSoCeb18

(Put it in a directory called "server" and do "go build" on it.)

Running "./server" does not trigger the long transactions, even with significant other load on the machine. (See the -spin argument for how I was generating load.)

But it is possible to trigger the 80 ms transactions by putting the server inside of a cgroup which is using cpu.cfs_quota_us. To see it in action:

# mkdir /sys/fs/cgroup/issue19378
# cd /sys/fs/cgroup/issue19378
# echo $$ > tasks
# echo 100000 > cpu.cfs_period_us
# echo -1 > cpu.cfs_quota_us
# ./server

In another window, do the following:

# echo 400000 > cpu.cfs_quota_us
# echo 200000 > cpu.cfs_quota_us
# echo 100000 > cpu.cfs_quota_us

As quota approaches period, the max latency goes up, eventually reaching 80ms and the log.Fatal triggers.

If quota is at 200% of period (i.e. quota = 200000, period = 100000), you can even run "./server -spin" next to "./server" and you'll end up with max latencies of 50ms. (I can't quite explain why that is; I expected the max latency to still be 100 ms in that case, maybe I just got lucky.)

What's happening is that when the quota is exhausted, all processes in the cgroup are starved of time slices until the next period starts. If period is 100ms, you can eventually expect to get unlucky and have one transaction take up to 100ms.

The solution is to use a lower period, so that the worst case wait to finish any given transaction is lower. This config allows the spinner and the server to coexist with max latencies of 50 ms, at least on my machine:

# echo 10000 > cpu.cfs_period_us
# echo 10000 > cpu.cfs_quota_us

Are you running your server inside of Docker, managed by kubernetes?

If so, there's still some work to do to figure out how to get your cgroups
to use lower periods. Here's some references to it:

The second URL seems to indicate that Kubernetes will not allow you to change the period. :(

@obeattie

This comment has been minimized.

Copy link
Author

commented Jun 2, 2017

@jeffallen Thank you so much for contributing your time to investigate this. Your findings match mine, and indeed these problems are seen when running under Kubernetes with CFS quotas enforced. I will need to work out some way to change the period under k8s…

Based on these findings I'm now confident that this is not a Go bug and I'm going to close the ticket. I think that fixing #19497 would be very helpful to anyone facing similar issues; debugging behaviour like this is somewhat like trying to find a needle in a haystack at the moment.

Thank you to everyone who's helped with this ❤️

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.