runtime: clearpools causes excessive STW1 time #22331

bryanpkc · 2017-10-18T17:56:50Z

Our application uses a large number of goroutines, mutexes, and defers. Using go 1.8.3 on linux amd64, we observed that the STW1 time is dominated by the runtime.clearpools function in mgc.go. Here is an excerpt from the GC trace:

gc 12 @27.641s 4%: 16+865+0.82 ms clock, 522+6169/9551/1514+26 ms cpu, 8375->8694->5362 MB, 8943 MB goal, 48 P

By instrumenting the runtime, we determined that, out of the 16ms STW1 time, 9.3ms was spent in the loop that disconnects sched.sudogcache linked list, and 6.7ms was spent in the loop that disconnects the linked lists in sched.deferpool. These O(n) loops end up causing a very long pause.

I understand the reason for originally introducing the loops in #9110. However, looking at tip around the call sites of releaseSudog and freedefer, I couldn't see a case where a released object would still be referenced by a live pointer somewhere else in the system. Are these loops still really necessary?

FWIW, I replaced the loops with a simple zeroing of the heads of the linked lists, and test/fixedbugs/issue9110.go still passed.

I propose making these loops optional to avoid the excessive STW1 time, and enabling them only with a GODEBUG option for debugging purposes.

The text was updated successfully, but these errors were encountered:

bradfitz · 2017-10-18T18:51:34Z

/cc @aclements

gopherbot · 2019-03-11T21:14:53Z

Change https://golang.org/cl/166957 mentions this issue: sync: internal fixed size lock-free queue for sync.Pool

gopherbot · 2019-03-11T21:14:55Z

Change https://golang.org/cl/166958 mentions this issue: sync: internal dynamically sized lock-free queue for sync.Pool

gopherbot · 2019-03-11T21:14:58Z

Change https://golang.org/cl/166959 mentions this issue: sync: add Pool benchmarks to stress STW and reuse

gopherbot · 2019-03-11T21:14:59Z

Change https://golang.org/cl/166960 mentions this issue: sync: use lock-free structure for Pool stealing

This is the first step toward fixing multiple issues with sync.Pool. This adds a fixed size, lock-free, single-producer, multi-consumer queue that will be used in the new Pool stealing implementation. For #22950, #22331. Change-Id: I50e85e3cb83a2ee71f611ada88e7f55996504bb5 Reviewed-on: https://go-review.googlesource.com/c/go/+/166957 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>

This adds a dynamically sized, lock-free, single-producer, multi-consumer queue that will be used in the new Pool stealing implementation. It's built on top of the fixed-size queue added in the previous CL. For #22950, #22331. Change-Id: Ifc0ca3895bec7e7f9289ba9fb7dd0332bf96ba5a Reviewed-on: https://go-review.googlesource.com/c/go/+/166958 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>

This adds two benchmarks that will highlight two problems in Pool that we're about to address. The first benchmark measures the impact of large Pools on GC STW time. Currently, STW time is O(# of items in Pools), and this benchmark demonstrates 70µs STW times. The second benchmark measures the impact of fully clearing all Pools on each GC. Typically this is a problem in heavily-loaded systems because it causes a spike in allocation. This benchmark stresses this by simulating an expensive "New" function, so the cost of creating new objects is reflected in the ns/op of the benchmark. For #22950, #22331. Change-Id: I0c8853190d23144026fa11837b6bf42adc461722 Reviewed-on: https://go-review.googlesource.com/c/go/+/166959 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>

bradfitz changed the title ~~runtime.clearpools causes excessive STW1 time~~ runtime: clearpools causes excessive STW1 time Oct 18, 2017

bradfitz added this to the Go1.10 milestone Oct 18, 2017

bradfitz assigned aclements Oct 18, 2017

ianlancetaylor added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Oct 18, 2017

aclements mentioned this issue Oct 24, 2017

runtime: long sweep termination #22410

Closed

rsc modified the milestones: Go1.10, Go1.11 Nov 22, 2017

ianlancetaylor added the Performance label Jul 9, 2018

ianlancetaylor modified the milestones: Go1.11, Go1.12 Jul 9, 2018

aclements modified the milestones: Go1.12, Go1.13 Jan 8, 2019

gopherbot closed this as completed in d5fd2dd Apr 5, 2019

golang locked and limited conversation to collaborators Apr 4, 2020

gopherbot added the FrozenDueToAge label Apr 4, 2020

rsc unassigned aclements Jun 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime: clearpools causes excessive STW1 time #22331

runtime: clearpools causes excessive STW1 time #22331

bryanpkc commented Oct 18, 2017

bradfitz commented Oct 18, 2017

gopherbot commented Mar 11, 2019

gopherbot commented Mar 11, 2019

gopherbot commented Mar 11, 2019

gopherbot commented Mar 11, 2019

runtime: clearpools causes excessive STW1 time #22331

runtime: clearpools causes excessive STW1 time #22331

Comments

bryanpkc commented Oct 18, 2017

bradfitz commented Oct 18, 2017

gopherbot commented Mar 11, 2019

gopherbot commented Mar 11, 2019

gopherbot commented Mar 11, 2019

gopherbot commented Mar 11, 2019