runtime: 10ms-26ms latency from GC in go1.14rc1, possibly due to 'GC (idle)' work #37116

thepudds · 2020-02-07T17:58:36Z

What version of Go are you using (`go version`)?

$ go1.14rc1 version

go version go1.14rc1 linux/amd64

Does this issue reproduce with the latest release?

Yes, go1.14rc1.

What operating system and processor architecture are you using (`go env`)?

Ubuntu 18.04.2 LTS (stock GCE image). 4 vCPUs, 15 GB.

go env Output

$ go1.14rc1 env
GO111MODULE=""

GOARCH="amd64"

GOBIN=""

GOCACHE="/home/user/.cache/go-build"

GOENV="/home/user/.config/go/env"

GOEXE=""

GOFLAGS=""

GOHOSTARCH="amd64"

GOHOSTOS="linux"

GOINSECURE=""

GONOPROXY=""

GONOSUMDB=""

GOOS="linux"

GOPATH="/home/user/go"

GOPRIVATE=""

GOPROXY="https://proxy.golang.org,direct"

GOROOT="/home/user/sdk/go1.14rc1"

GOSUMDB="sum.golang.org"

GOTMPDIR=""

GOTOOLDIR="/home/user/sdk/go1.14rc1/pkg/tool/linux_amd64"

GCCGO="gccgo"

AR="ar"

CC="gcc"

CXX="g++"

CGO_ENABLED="1"

GOMOD=""

CGO_CFLAGS="-g -O2"

CGO_CPPFLAGS=""

CGO_CXXFLAGS="-g -O2"

CGO_FFLAGS="-g -O2"

CGO_LDFLAGS="-g -O2"

PKG_CONFIG="pkg-config"

GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build656951356=/tmp/go-build -gno-record-gcc-switches"

Background

A recent blog post reported an issue they saw in the >2 year old Go 1.10 when there are "tens of millions" of entries in an LRU, along with CPU spikes and response time spikes every two minutes, with a p95 response time reportedly hitting 50ms-300ms.

The two minute periodicity was explained within the blog by the following, which seems plausible:

we had written the Go code very efficiently and had very few allocations. We were not creating a lot of garbage.
After digging through the Go source code, we learned that Go will force a garbage collection run every 2 minutes at minimum. In other words, if garbage collection has not run for 2 minutes, regardless of heap growth, go will still force a garbage collection.

A separate recent post from one of the engineers theorized the problem was due to a very long linked list, but I am less sure if this comment is a precise description of the actual root cause:

we used a large free-list like structure, a very long linked list. The head of the list is maintained as a variable, which means that Go's mark phase must start scanning from the head and then pointer chase its way through the list. For whatever reason, Go does (did?) this section in a single-threaded manner with a global lock held. As a result, everything must wait until this extremely long pointer chase occurs.

An engineer from the same team might have also posted this roughly 2.5 years ago in #14812 (comment)

I've also got a service that's encountering long request latencies due to CPU starvation when the GC runs. Like a couple others have mentioned, I think the GC may act poorly on long linked lists - I have a 1M-long list/map used as an LRU cache.

What did you do?

Write a simple GC benchmark intended to emulate a LRU with a separate free list.

It's not 100% clear what their exact data structures were, including whether or not it was just an LRU that linked elements within the LRU to help with expiring the older items, vs. maybe there was an additional free list that was separate from the LRU, vs. something else entirely.

Last night, without too much data, I put together a simple gc benchmark of a LRU cache along with a separate free list.

The benchmark is here:

https://play.golang.org/p/-Vo1hqIAoNz

Here is a sample run with 50M items in a LRU cache (a map + the items within the map also are on a linked list), as well as 1M items on separate free list, and a background allocation rate of about 10k/sec:

$ go1.14rc1 build 
$ GODEBUG=gctrace=1 ./lru-gc-benchmark -free-list-items=1000000 -lru-items=50000000 -duration=20m -bg-alloc-sleep=10ms

What did you expect to see?

Ideally, sub-ms worst-case latency impact of GC.

What did you see instead?

This was fairly quick and dirty, and might not be at all related to the problem reported in the blog, but seemed to observe occasionally higher latency values.

It is important to note that the issue here does not seem to be anywhere near the scale reported in the blog:

The amplitude of the latency impact here (10ms-26ms) seems substantially smaller than what was reported on the blog (50ms-300ms).
The issue here is reporting worst-case latency, rather than p95 latency on the blog.

Here is an example where a single 26ms worst-case latency was observed when looking up an element in the LRU during a forced GC.

$ go1.14rc1 build  && GODEBUG=gctrace=1 ./lru-gc-benchmark -free-list-items=1000000 -lru-items=50000000 -duration=20m -bg-alloc-sleep=10ms 

gc 1 @0.011s 5%: 0.066+4.1+0.023 ms clock, 0.26+0.30/3.0/6.9+0.095 ms cpu, 4->4->3 MB, 5 MB goal, 4 P
gc 2 @0.022s 10%: 0.008+9.0+0.021 ms clock, 0.033+0.33/8.7/10+0.087 ms cpu, 7->7->6 MB, 8 MB goal, 4 P
gc 3 @0.044s 10%: 0.012+13+0.020 ms clock, 0.049+0/12/23+0.083 ms cpu, 13->13->12 MB, 14 MB goal, 4 P
gc 4 @0.087s 10%: 0.009+26+0.030 ms clock, 0.038+0/24/46+0.12 ms cpu, 27->27->25 MB, 28 MB goal, 4 P
gc 5 @0.177s 10%: 0.010+49+0.023 ms clock, 0.042+0/47/89+0.095 ms cpu, 53->54->50 MB, 54 MB goal, 4 P
gc 6 @0.369s 10%: 0.015+98+0.022 ms clock, 0.060+0/98/179+0.089 ms cpu, 107->108->100 MB, 108 MB goal, 4 P
gc 7 @0.767s 10%: 0.012+195+0.021 ms clock, 0.049+0/194/380+0.086 ms cpu, 213->216->200 MB, 214 MB goal, 4 P
gc 8 @1.586s 11%: 7.4+377+0.028 ms clock, 29+71/376/742+0.11 ms cpu, 426->430->398 MB, 427 MB goal, 4 P
gc 9 @3.343s 11%: 0.024+787+0.023 ms clock, 0.099+236/785/1501+0.094 ms cpu, 853->859->795 MB, 854 MB goal, 4 P
gc 10 @7.071s 11%: 0.023+1806+0.023 ms clock, 0.092+497/1803/3513+0.095 ms cpu, 1706->1719->1591 MB, 1707 MB goal, 4 P
gc 11 @14.905s 12%: 49+3796+0.023 ms clock, 196+1084/3796/7271+0.093 ms cpu, 3411->3439->3182 MB, 3412 MB goal, 4 P
gc 12 @31.292s 14%: 226+8162+0.043 ms clock, 904+5028/8161/14256+0.17 ms cpu, 6823->6866->6353 MB, 6824 MB goal, 4 P
LRU count:       50.00M
Free list count: 1.00M
BG allocation:   10000 / sec (approx.)
BG retain count: 10000
Printing slowest lookup times:
i=0          bgAllocs=540000   at=64.026s  lookup=0.009ms
i=4096       bgAllocs=540100   at=64.028s  lookup=0.016ms
i=8944       bgAllocs=540100   at=64.030s  lookup=0.118ms
i=9410       bgAllocs=540100   at=64.031s  lookup=0.123ms
i=52120      bgAllocs=540300   at=64.053s  lookup=0.141ms
i=54295      bgAllocs=540300   at=64.054s  lookup=0.148ms
i=81167      bgAllocs=540400   at=64.068s  lookup=0.185ms
i=141525     bgAllocs=540700   at=64.099s  lookup=0.193ms
i=473297     bgAllocs=542400   at=64.271s  lookup=0.205ms
i=820934     bgAllocs=544200   at=64.451s  lookup=0.233ms
i=1045129    bgAllocs=545300   at=64.572s  lookup=0.323ms
i=49675599   bgAllocs=810200   at=91.600s  lookup=0.651ms
i=59934276   bgAllocs=863000   at=97.000s  lookup=0.661ms
i=63432689   bgAllocs=881700   at=98.900s  lookup=0.696ms
GC forced
i=173370432  bgAllocs=1476800   at=159.688s  lookup=3.407ms
i=173381145  bgAllocs=1476800   at=159.727s  lookup=26.235ms
gc 13 @159.687s 5%: 0.096+17356+0.020 ms clock, 0.38+0.65/17348/34668+0.081 ms cpu, 9642->9659->8463 MB, 12706 MB goal, 4 P
GC forced
gc 14 @297.047s 4%: 0.12+17250+0.021 ms clock, 0.49+30/17250/34376+0.086 ms cpu, 8696->8713->8463 MB, 16926 MB goal, 4 P
GC forced
gc 15 @434.306s 4%: 0.10+16952+0.024 ms clock, 0.41+0.20/16949/33820+0.096 ms cpu, 8696->8712->8463 MB, 16927 MB goal, 4 P

I would guess that 26ms lookup latency is likely related to this 26ms 'GC (idle)' period where the main goroutine did not seem to be on a P:

Another sample run:


$ go1.14rc1 build  && GODEBUG=gctrace=1 ./lru-gc-benchmark -free-list-items=1000000 -lru-items=50000000 -duration=20m -bg-alloc-sleep=10ms 

gc 1 @0.011s 6%: 0.014+4.3+0.025 ms clock, 0.056+0.023/3.7/6.6+0.10 ms cpu, 4->4->3 MB, 5 MB goal, 4 P
gc 2 @0.021s 10%: 0.009+8.4+0.026 ms clock, 0.038+0.34/7.8/11+0.10 ms cpu, 7->7->6 MB, 8 MB goal, 4 P
gc 3 @0.043s 10%: 0.009+12+0.023 ms clock, 0.036+0/11/23+0.092 ms cpu, 13->13->12 MB, 14 MB goal, 4 P
gc 4 @0.084s 11%: 0.010+24+0.023 ms clock, 0.041+0/24/42+0.093 ms cpu, 27->27->25 MB, 28 MB goal, 4 P
gc 5 @0.171s 10%: 0.014+48+0.024 ms clock, 0.058+0/47/93+0.097 ms cpu, 53->54->50 MB, 54 MB goal, 4 P
gc 6 @0.364s 10%: 0.017+97+0.022 ms clock, 0.070+0/96/177+0.088 ms cpu, 107->108->100 MB, 108 MB goal, 4 P
gc 7 @0.758s 10%: 5.6+203+0.022 ms clock, 22+0.49/201/386+0.091 ms cpu, 213->216->200 MB, 214 MB goal, 4 P
gc 8 @1.605s 10%: 0.019+411+0.023 ms clock, 0.078+0/409/796+0.095 ms cpu, 426->432->399 MB, 427 MB goal, 4 P
gc 9 @3.312s 11%: 28+790+0.031 ms clock, 114+194/787/1520+0.12 ms cpu, 853->860->795 MB, 854 MB goal, 4 P
gc 10 @6.964s 12%: 46+1586+0.033 ms clock, 185+730/1585/2988+0.13 ms cpu, 1706->1716->1587 MB, 1707 MB goal, 4 P
gc 11 @14.638s 13%: 79+3760+0.022 ms clock, 316+1164/3758/7168+0.088 ms cpu, 3411->3438->3182 MB, 3412 MB goal, 4 P
gc 12 @31.187s 18%: 196+6723+0.007 ms clock, 785+11516/6720/7379+0.028 ms cpu, 6823->6824->6311 MB, 6824 MB goal, 4 P
LRU count:       50.00M
Free list count: 1.00M
BG allocation:   10000 / sec (approx.)
BG retain count: 10000
Printing slowest lookup times:
i=0          bgAllocs=530900   at=63.844s  lookup=0.006ms
i=100        bgAllocs=530900   at=63.844s  lookup=0.006ms
i=671        bgAllocs=530900   at=63.844s  lookup=0.028ms
i=2407       bgAllocs=530900   at=63.845s  lookup=0.030ms
i=4258       bgAllocs=530900   at=63.847s  lookup=0.120ms
i=80012      bgAllocs=531300   at=63.886s  lookup=0.634ms
i=134006913  bgAllocs=1258500   at=138.086s  lookup=0.679ms
i=145128912  bgAllocs=1321200   at=144.486s  lookup=0.703ms
GC forced
i=170136450  bgAllocs=1454900   at=158.133s  lookup=16.008ms
gc 13 @158.108s 6%: 0.11+17483+0.022 ms clock, 0.44+0.23/17480/34884+0.089 ms cpu, 9639->9656->8463 MB, 12622 MB goal, 4 P
GC forced
gc 14 @295.596s 5%: 0.13+16977+0.024 ms clock, 0.52+28/16977/33847+0.099 ms cpu, 8695->8712->8463 MB, 16927 MB goal, 4 P
GC forced
i=632749868  bgAllocs=3978800   at=432.596s  lookup=11.794ms
gc 15 @432.576s 4%: 0.073+17356+0.021 ms clock, 0.29+0.23/17355/34654+0.084 ms cpu, 8697->8714->8463 MB, 16927 MB goal, 4 P
GC forced
gc 16 @569.938s 4%: 0.17+16911+0.021 ms clock, 0.71+4.9/16908/33760+0.086 ms cpu, 8696->8713->8463 MB, 16926 MB goal, 4 P
GC forced
gc 17 @706.859s 3%: 0.19+17107+0.023 ms clock, 0.76+0.41/17106/34143+0.093 ms cpu, 8697->8714->8463 MB, 16927 MB goal, 4 P
GC forced
i=1327474340 bgAllocs=7763400   at=843.987s  lookup=8.791ms
gc 18 @843.969s 3%: 0.28+17297+0.022 ms clock, 1.1+0.82/17295/34532+0.088 ms cpu, 8696->8713->8463 MB, 16927 MB goal, 4 P
i=1344317800 bgAllocs=7849300   at=861.294s  lookup=13.015ms
...
^C

I suspect this trace section corresponds to the at=432.596s lookup=11.794ms observed latency:

In both cases, the traces started roughly 100 seconds after the start of the process, so the timestamps between the log message vs. images are offset roughly by 100 seconds.

Related issues

This issue might very well be a duplicate. For example, there is a recent list of 5 known issues in #14812 (comment). That said, if this issue is explained by one of the 5 items there, it is not immediately obvious to me which one, including some of the issues in that list are resolved now, but this happens in go1.14rc1. Also possibly related to #14179, #18155, or others.

The text was updated successfully, but these errors were encountered:

cagedmantis · 2020-02-07T20:30:23Z

/cc @aclements @rsc @randall77

heschi · 2020-02-09T19:02:53Z

I think this is working as expected. At the heart of github.com/golang/groupcache/lru is a single large map. AFAIK the map has to be marked all in one go, which will take a long time. In the execution trace above you can see a MARK ASSIST segment below G21. If you click on that I think you'll find that it spent all of its time marking a single object, presumably the map. Austin and others will know better, of course.

Since a real server would need to lock the map, sharding it would be a good idea, which would also reduce the latency impact of a mark.

(cc @rsc, since I think @cagedmantis made a typo.)

RLH · 2020-02-09T22:37:57Z

The following may be relevant. From https://golang.org/pkg/runtime/ func GC <https://golang.org/src/runtime/mgc.go?s=37860:37869#L1026> func GC() GC runs a garbage collection and blocks the caller until the garbage collection is complete. It may also block the entire program. In the section about the trace line: If the line ends with "(forced)", this GC was forced by a runtime.GC() call. The trace contains gc 15 @432.576s 4%: 0.073+17356+0.021 ms clock, 0.29+0.23/17355/34654+0.084 ms cpu, 8697->8714->8463 MB, 16927 MB goal, 4 P GC forced at the point indicated as the cause of the issue and the second screenshot looks like a call to runtime.GC is made and the mutator thread is stopped and the GC is run. I would look into whether runtime.GC is called and if the call is in the dynamic scope of code being timed. If so remove the runtime.GC call and see if that makes any difference. And kudos for helping out testing the new release candidate.

…

On Sun, Feb 9, 2020 at 2:03 PM Heschi Kreinick ***@***.***> wrote: I think this is working as expected. At the heart of github.com/golang/groupcache/lru is a single large map <https://github.com/golang/groupcache/blob/8c9f03a8e57eb486e42badaed3fb287da51807ba/lru/lru.go#L33>. AFAIK the map has to be marked all in one go, which will take a long time. In the execution trace above you can see a MARK ASSIST segment below G21. If you click on that I think you'll find that it spent all of its time marking a single object, presumably the map. Austin and others will know better, of course. Since a real server would need to lock the map, sharding it would be a good idea, which would also reduce the latency impact of a mark. (cc @rsc <https://github.com/rsc>, since I think @cagedmantis <https://github.com/cagedmantis> made a typo.) — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#37116?email_source=notifications&email_token=AAHNNHYCOXRMJYP62MDEGO3RCBHPVA5CNFSM4KRSQU32YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELGU7RA#issuecomment-583880644>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAHNNH2BVNJP6PVXBKZJ3ZTRCBHPVANCNFSM4KRSQU3Q> .

randall77 · 2020-02-09T23:06:43Z

On Sun, Feb 9, 2020 at 2:38 PM Richard L. Hudson <notifications@github.com> wrote:

The following may be relevant. From https://golang.org/pkg/runtime/ func GC <https://golang.org/src/runtime/mgc.go?s=37860:37869#L1026> func GC() GC runs a garbage collection and blocks the caller until the garbage collection is complete. It may also block the entire program. In the section about the trace line: If the line ends with "(forced)", this GC was forced by a runtime.GC() call. The trace contains gc 15 @432.576s 4%: 0.073+17356+0.021 ms clock, 0.29+0.23/17355/34654+0.084 ms cpu, 8697->8714->8463 MB, 16927 MB goal, 4 P GC forced at the point indicated as the cause of the issue and the second screenshot looks like a call to runtime.GC is made and the mutator thread is stopped and the GC is run. I would look into whether runtime.GC is called and if the call is in the dynamic scope of code being timed. If so remove the runtime.GC call and see if that makes any difference. And kudos for helping out testing the new release candidate. On Sun, Feb 9, 2020 at 2:03 PM Heschi Kreinick ***@***.***> wrote: > I think this is working as expected. At the heart of > github.com/golang/groupcache/lru is a single large map > < https://github.com/golang/groupcache/blob/8c9f03a8e57eb486e42badaed3fb287da51807ba/lru/lru.go#L33 >. > AFAIK the map has to be marked all in one go, which will take a long time.

This is no longer true. We now break large objects, like maps, into bounded-size chunks. https://go-review.googlesource.com/c/go/+/23540 That was submitted mid 2016, so Go1.8?

…

> In the execution trace above you can see a MARK ASSIST segment below G21. > If you click on that I think you'll find that it spent all of its time > marking a single object, presumably the map. Austin and others will know > better, of course. > > Since a real server would need to lock the map, sharding it would be a > good idea, which would also reduce the latency impact of a mark. > > (cc @rsc <https://github.com/rsc>, since I think @cagedmantis > <https://github.com/cagedmantis> made a typo.) > > — > You are receiving this because you are subscribed to this thread. > Reply to this email directly, view it on GitHub > < #37116?email_source=notifications&email_token=AAHNNHYCOXRMJYP62MDEGO3RCBHPVA5CNFSM4KRSQU32YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELGU7RA#issuecomment-583880644 >, > or unsubscribe > < https://github.com/notifications/unsubscribe-auth/AAHNNH2BVNJP6PVXBKZJ3ZTRCBHPVANCNFSM4KRSQU3Q > > . > — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#37116?email_source=notifications&email_token=ABUSAIHQ2VOL4AOHQ3P236LRCCAUTA5CNFSM4KRSQU32YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELG2KAQ#issuecomment-583902466>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABUSAIAKOYXJ4SA7QAJ23VDRCCAUTANCNFSM4KRSQU3Q> .

cagedmantis · 2020-02-10T15:11:04Z

Thanks @heschik

aclements · 2020-02-10T16:28:26Z

Thanks for trying to put something together to recreate this behavior, especially from the little scraps of information we have. :)

Based on the execution traces, I think you're probably right that the idle GC worker is the root of the problem here. The idle worker currently runs until a scheduler preemption, which happens only every 10 to 20ms. The trace shows a fifth goroutine that appears to enter the run queue a moment after the idle worker kicks in, which is presumably the user goroutine.

I agree with @randall77 that this is not likely to be related to the large map, since we do break up large objects when marking.

In general long linked lists are not very GC-friendly, since it has no choice but to walk the list sequentially. That will inherently serialize that part of marking (assuming it's the only work left), though I don't understand the quoted comment about this happening "with a global lock". I don't think there's been anything like a global lock here since we switched to a concurrent GC in Go 1.5.

thepudds · 2020-02-11T14:31:41Z

The Discord blog post described above had said they saw GC-related latency problems with Go 1.9.2 (and also later said elsewhere they had tested but seemingly not deployed Go 1.10), so part of the intent here was to see if the pauses they reported could be reproduced in Go 1.9.2.

Here are some results from running that benchmark above in a loop overnight, with ten 15-minute runs first on 1.9.2 and then on 1.14rc1.

Comparing 1.9.2 to 1.14rc1

1.9.2 was seeing occasional delays in the range 32->64ms, as well as above 64ms. None of those were present in 1.14rc1.
The frequency of delays in the range 4->16ms reduced by a factor of roughly 40x with 1.14rc1.
The frequency of delays in the range 1->4ms reduced by a factor of roughly 100x with 1.14rc1.
This was a latency benchmark and not really focused on throughput, but 1.14rc1 did about 30% more work over the same time interval.
Delays in the range 16->32ms were rare in both cases (1 or 0 out of a billion lookups or so), but persist in Go 1.14rc1, which is partly why this issue was opened.

Each line here in these two tables shows the counts of delays in different ranges for one 15-minute run:

Go 1.9.2 counts of delays

 0->1ms: 1034773027  1->4ms: 1640  4->16ms: 188  16->32ms: 0  32->64ms: 1  64ms->∞: 2
 0->1ms: 1020927457  1->4ms: 1232  4->16ms: 144  16->32ms: 1  32->64ms: 2  64ms->∞: 0
 0->1ms: 1027859639  1->4ms: 1724  4->16ms: 189  16->32ms: 0  32->64ms: 1  64ms->∞: 2
 0->1ms: 1023974382  1->4ms: 981   4->16ms: 109  16->32ms: 0  32->64ms: 0  64ms->∞: 0
 0->1ms: 1024773755  1->4ms: 1767  4->16ms: 186  16->32ms: 1  32->64ms: 0  64ms->∞: 1
 0->1ms: 1028694483  1->4ms: 1755  4->16ms: 187  16->32ms: 1  32->64ms: 0  64ms->∞: 2
 0->1ms: 1028308359  1->4ms: 2226  4->16ms: 236  16->32ms: 1  32->64ms: 0  64ms->∞: 1
 0->1ms: 1034795065  1->4ms: 1273  4->16ms: 143  16->32ms: 0  32->64ms: 0  64ms->∞: 0
 0->1ms: 1033581601  1->4ms: 1643  4->16ms: 172  16->32ms: 0  32->64ms: 0  64ms->∞: 0
 0->1ms: 1027894298  1->4ms: 1412  4->16ms: 152  16->32ms: 1  32->64ms: 0  64ms->∞: 0

Go 1.14rc1 counts of delays

 0->1ms: 1359388697  1->4ms: 6     4->16ms: 6    16->32ms: 1  32->64ms: 0  64ms->∞: 0
 0->1ms: 1355108389  1->4ms: 9     4->16ms: 5    16->32ms: 0  32->64ms: 0  64ms->∞: 0
 0->1ms: 1344033293  1->4ms: 13    4->16ms: 3    16->32ms: 0  32->64ms: 0  64ms->∞: 0
 0->1ms: 1333374534  1->4ms: 21    4->16ms: 5    16->32ms: 1  32->64ms: 0  64ms->∞: 0
 0->1ms: 1340122378  1->4ms: 19    4->16ms: 4    16->32ms: 0  32->64ms: 0  64ms->∞: 0
 0->1ms: 1353391580  1->4ms: 14    4->16ms: 6    16->32ms: 1  32->64ms: 0  64ms->∞: 0
 0->1ms: 1354059777  1->4ms: 9     4->16ms: 5    16->32ms: 0  32->64ms: 0  64ms->∞: 0
 0->1ms: 1366321263  1->4ms: 14    4->16ms: 3    16->32ms: 0  32->64ms: 0  64ms->∞: 0
 0->1ms: 1350138536  1->4ms: 16    4->16ms: 6    16->32ms: 0  32->64ms: 0  64ms->∞: 0
 0->1ms: 1366766376  1->4ms: 15    4->16ms: 1    16->32ms: 0  32->64ms: 0  64ms->∞: 0

I'm putting the raw results here including so that anyone running this benchmark in the future can see some of the current variability from run to run.

thepudds · 2020-02-11T15:47:23Z

Also, the 5 worst lookups seen in 1.9.2 across those ten runs:

i=509412400  bgAllocs=4601400   at=486.606s  lookup=260.360ms
i=863625700  bgAllocs=7347600   at=775.641s  lookup=171.862ms
i=4738900    bgAllocs=760700    at=89.159s   lookup=137.284ms
i=163862600  bgAllocs=2008200   at=216.857s  lookup=117.707ms
i=429438200  bgAllocs=4006925   at=423.171s  lookup=117.672ms

As I noted above, this is a simplified benchmark, including the lookups are all done sequentially, so that a GC-induced delay of >100ms might be recorded only in one measurement here, whereas in a real system with parallel inbound requests a GC-induced delay of >100ms might impact many requests at once and hence impact multiple response time measurements.

ianlancetaylor · 2020-06-16T02:42:15Z

Rolling forward to 1.16.

lwpyr · 2020-08-21T02:38:11Z

I observe similar phenomenons in our production codes, it may be the same issue. when one processor is doing the marking job, other processors are idle. G89980 in the picture uses select waiting a timer, but the runtime seems fail to preempt the gc idle processors then fails to awake G89980 on time, then the whole GC process is like STW for 20+ ms. the golang version is 1.15

aclements · 2020-12-22T18:57:53Z

Rolling forward to 1.17 since this is certainly too risky to address in 1.16 at this point. Hopefully @mknyszek 's broader investigation into GC scheduling will improve the situation here.

ianlancetaylor · 2021-04-23T19:18:55Z

I'm guessing that this isn't done for 1.17. Should we move this to 1.18 or should we move it to Backlog?

mknyszek · 2021-04-23T19:37:03Z

Looking over this quickly, I think it'll be helped a bit by #44163. It looks like the scheduler saw the chance to schedule an idle GC worker because the only piece of work hadn't been observed by the queue, and then it proceeded to hog the whole quantum.

I think there are other good reasons to remove the idle GC workers, and this is one more reason. I would really like to remove them early in cycle for 1.18.

gopherbot · 2022-03-19T02:51:39Z

Change https://go.dev/cl/393394 mentions this issue: runtime: reduce max idle mark workers during periodic GC cycles

mknyszek · 2022-04-26T22:24:23Z

Re-reading the issue, https://go.dev/cl/393394 should solve the issue. Please feel free to reopen if that's not true.

cagedmantis modified the milestones: Backlog, Go1.14 Feb 7, 2020

cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 7, 2020

aclements modified the milestones: Go1.14, Go1.15 Feb 10, 2020

ianlancetaylor modified the milestones: Go1.15, Go1.16 Jun 16, 2020

mknyszek mentioned this issue Nov 27, 2020

runtime: GC pacer problems meta-issue #42430

Open

aclements modified the milestones: Go1.16, Go1.17 Dec 22, 2020

mknyszek modified the milestones: Go1.17, Go1.18 Apr 23, 2021

aclements modified the milestones: Go1.18, Backlog Oct 5, 2021

thepudds mentioned this issue Dec 16, 2021

runtime: remove idle GC workers for the 2-minute GC cycle #44163

Closed

gopherbot closed this as completed in e1b5f34 Apr 26, 2022

golang locked and limited conversation to collaborators Apr 26, 2023

gopherbot added the FrozenDueToAge label Apr 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime: 10ms-26ms latency from GC in go1.14rc1, possibly due to 'GC (idle)' work #37116

runtime: 10ms-26ms latency from GC in go1.14rc1, possibly due to 'GC (idle)' work #37116

thepudds commented Feb 7, 2020 •

edited

Loading

cagedmantis commented Feb 7, 2020 •

edited

Loading

heschi commented Feb 9, 2020

RLH commented Feb 9, 2020 via email

randall77 commented Feb 9, 2020 via email

cagedmantis commented Feb 10, 2020

aclements commented Feb 10, 2020

thepudds commented Feb 11, 2020 •

edited

Loading

thepudds commented Feb 11, 2020 •

edited

Loading

ianlancetaylor commented Jun 16, 2020

lwpyr commented Aug 21, 2020 •

edited

Loading

aclements commented Dec 22, 2020

ianlancetaylor commented Apr 23, 2021

mknyszek commented Apr 23, 2021

gopherbot commented Mar 19, 2022

mknyszek commented Apr 26, 2022

runtime: 10ms-26ms latency from GC in go1.14rc1, possibly due to 'GC (idle)' work #37116

runtime: 10ms-26ms latency from GC in go1.14rc1, possibly due to 'GC (idle)' work #37116

Comments

thepudds commented Feb 7, 2020 • edited Loading

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

Background

What did you do?

What did you expect to see?

What did you see instead?

Related issues

cagedmantis commented Feb 7, 2020 • edited Loading

heschi commented Feb 9, 2020

RLH commented Feb 9, 2020 via email

randall77 commented Feb 9, 2020 via email

cagedmantis commented Feb 10, 2020

aclements commented Feb 10, 2020

thepudds commented Feb 11, 2020 • edited Loading

Comparing 1.9.2 to 1.14rc1

Go 1.9.2 counts of delays

Go 1.14rc1 counts of delays

thepudds commented Feb 11, 2020 • edited Loading

ianlancetaylor commented Jun 16, 2020

lwpyr commented Aug 21, 2020 • edited Loading

aclements commented Dec 22, 2020

ianlancetaylor commented Apr 23, 2021

mknyszek commented Apr 23, 2021

gopherbot commented Mar 19, 2022

mknyszek commented Apr 26, 2022

thepudds commented Feb 7, 2020 •

edited

Loading

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?

cagedmantis commented Feb 7, 2020 •

edited

Loading

thepudds commented Feb 11, 2020 •

edited

Loading

thepudds commented Feb 11, 2020 •

edited

Loading

lwpyr commented Aug 21, 2020 •

edited

Loading