Skip to content

runtime: latency in sweep assists when there's no garbage #18155

@rhysh

Description

@rhysh

go version devel +ae9712c766 Thu Dec 1 22:20:31 2016 +0000 darwin/amd64, and linux/amd64

I wrote a test case for the GC latency described by @WillSewell in https://blog.pusher.com/golangs-real-time-gc-in-theory-and-practice/ . I recorded the timestamps of when pauses (during allocation) were observed, and matched them against GODEBUG=gctrace=1 output.

The pauses come after the GC cycle, during the concurrent sweep phase. Also, they happen as the heap grows and do not happen once the heap size is stable. And the pauses are often longer than the clock time of any single gctrace line.

A goroutine that needs to allocate memory right after a GC cycle is required to help with the sweep work. This means looking through the heap until suitable memory is found. If the heap is growing rapidly with little or no garbage created, this mutator goroutine will need to search for a long time: it can end up sweeping half of the heap (shared with the background sweep worker).

The GC is pretty good at controlling the mark assist burden on any single goroutine—and starting the GC cycle before the allowable heap growth is exhausted, allowing tradeoffs between assist burden and heap growth during the cycle. These protections don't seem to be in place for sweep assists.

The execution tracer's view of this is confusing since it emits events for each call to runtime.(*mspan).sweep, which is called in a loop from runtime.(*mcentral).cacheSpan. This makes it look like the user goroutine is making a lot of progress on sweep work, since a single call to runtime.mallocgc can lead to thousands of tiny "SWEEP" blocks.

Frustrated assists like this are perhaps more likely to happen in microbenchmarks than in production applications, but the way sweep work is shown in the execution tracer makes it hard to say for sure. This could be a contributing factor to #14812 — one of the goroutines in the recent execution trace spent over 17ms in sweep assists.

The test is below, followed by some example runs.

package gc_test

import (
	"flag"
	"testing"
	"time"
)

var (
	start       = time.Now()
	allocSize   = flag.Int("alloc-size", 1000, "Bytes per allocation")
	retainCount = flag.Int("retain-count", 1000000, "Number of allocations to retain")
	loopCount   = flag.Int("loop-count", 10, "Number of times to execute loop")
	preallocate = flag.Bool("preallocate", false, "Preallocate retention slice")
)

func TestHeapGrowthStalls(t *testing.T) {
	var mem [][]byte
	if *preallocate {
		mem = make([][]byte, *retainCount)
	}

	var slowDelta time.Duration

	for i := 0; i < (*loopCount)*(*retainCount); i++ {
		// Allow the program's heap to start small, instead of pre-allocating
		// the entire slice.
		if len(mem) < *retainCount {
			mem = append(mem, nil)
		}

		j := i % len(mem)
		before := time.Now()
		mem[j] = make([]byte, *allocSize)
		delta := time.Since(before)
		if delta > slowDelta || (delta > time.Millisecond && delta > slowDelta/2) {
			slowDelta = delta
			at := time.Since(start)
			t.Logf("i=%-8d at=%.3fs   delta=%.3fms",
				i,
				at.Seconds(),
				delta.Seconds()*float64(time.Second/time.Millisecond))
		}
	}
}

On a darwin/amd64 laptop:

$ GODEBUG=gctrace=1 /tmp/gc.test -test.v
=== RUN   TestHeapGrowthStalls
gc 1 @0.002s 1%: 0.035+0.14+0.019 ms clock, 0.10+0.047/0.12/0.12+0.059 ms cpu, 4->4->3 MB, 5 MB goal, 8 P
gc 2 @0.003s 2%: 0.003+0.63+0.028 ms clock, 0.024+0.054/0.17/0.14+0.22 ms cpu, 6->7->6 MB, 7 MB goal, 8 P
gc 3 @0.004s 3%: 0.003+0.29+0.034 ms clock, 0.026+0.16/0.16/0.095+0.27 ms cpu, 12->12->12 MB, 13 MB goal, 8 P
gc 4 @0.007s 4%: 0.004+1.2+0.048 ms clock, 0.033+0.94/0.059/0.29+0.39 ms cpu, 23->23->22 MB, 24 MB goal, 8 P
gc 5 @0.012s 3%: 0.005+0.59+0.024 ms clock, 0.044+0.054/0.59/0.58+0.19 ms cpu, 44->44->42 MB, 45 MB goal, 8 P
gc 6 @0.020s 2%: 0.005+5.9+0.033 ms clock, 0.040+0.093/1.1/5.7+0.26 ms cpu, 84->84->80 MB, 85 MB goal, 8 P
gc 7 @0.045s 2%: 0.006+1.7+0.036 ms clock, 0.051+0.25/2.2/2.3+0.29 ms cpu, 156->157->152 MB, 160 MB goal, 8 P
gc 8 @0.077s 1%: 0.006+2.1+0.027 ms clock, 0.055+1.0/2.9/7.3+0.21 ms cpu, 297->300->286 MB, 305 MB goal, 8 P
gc 9 @0.154s 1%: 0.008+4.4+0.044 ms clock, 0.070+2.3/4.6/18+0.35 ms cpu, 558->563->537 MB, 573 MB goal, 8 P
gc 10 @0.268s 1%: 0.008+7.5+0.044 ms clock, 0.067+3.7/12/32+0.35 ms cpu, 1047->1056->1004 MB, 1074 MB goal, 8 P
gc 11 @0.428s 1%: 0.008+7.8+0.043 ms clock, 0.070+1.9/13/36+0.34 ms cpu, 1959->1974->1018 MB, 2009 MB goal, 8 P
gc 12 @0.924s 0%: 0.008+8.6+0.17 ms clock, 0.071+1.8/15/38+1.3 ms cpu, 1985->2004->1022 MB, 2036 MB goal, 8 P
gc 13 @1.426s 0%: 0.006+9.0+0.034 ms clock, 0.053+1.8/17/41+0.27 ms cpu, 1993->2000->1010 MB, 2044 MB goal, 8 P
gc 14 @1.658s 0%: 0.008+9.4+0.026 ms clock, 0.070+1.8/15/45+0.21 ms cpu, 1969->1982->1015 MB, 2020 MB goal, 8 P
gc 15 @1.905s 0%: 0.015+9.5+0.075 ms clock, 0.12+1.9/18/41+0.60 ms cpu, 1980->1994->1017 MB, 2031 MB goal, 8 P
gc 16 @2.143s 0%: 0.016+11+0.045 ms clock, 0.12+1.8/21/54+0.36 ms cpu, 1983->1999->1019 MB, 2034 MB goal, 8 P
gc 17 @2.385s 0%: 0.015+10+0.066 ms clock, 0.12+2.4/20/49+0.53 ms cpu, 1987->2002->1018 MB, 2038 MB goal, 8 P
gc 18 @2.616s 0%: 0.015+11+0.077 ms clock, 0.12+2.4/15/59+0.61 ms cpu, 1985->2004->1021 MB, 2036 MB goal, 8 P
--- PASS: TestHeapGrowthStalls (2.85s)
	gc_test.go:42: i=0        at=0.000s   delta=0.000ms
	gc_test.go:42: i=2        at=0.000s   delta=0.001ms
	gc_test.go:42: i=18       at=0.000s   delta=0.002ms
	gc_test.go:42: i=26       at=0.001s   delta=0.002ms
	gc_test.go:42: i=114      at=0.001s   delta=0.003ms
	gc_test.go:42: i=170      at=0.001s   delta=0.004ms
	gc_test.go:42: i=218      at=0.001s   delta=0.014ms
	gc_test.go:42: i=3226     at=0.002s   delta=0.157ms
	gc_test.go:42: i=12016    at=0.005s   delta=0.385ms
	gc_test.go:42: i=22534    at=0.009s   delta=0.408ms
	gc_test.go:42: i=41997    at=0.014s   delta=0.797ms
	gc_test.go:42: i=79541    at=0.028s   delta=1.451ms
	gc_test.go:42: i=152493   at=0.051s   delta=3.489ms
	gc_test.go:42: i=286213   at=0.084s   delta=5.107ms
	gc_test.go:42: i=286229   at=0.087s   delta=2.922ms
	gc_test.go:42: i=536053   at=0.169s   delta=10.281ms
	gc_test.go:42: i=1001709  at=0.295s   delta=20.094ms
PASS

The at=0.169s delta=10.281ms line matches up with gc 9 @0.154s 1%: 0.008+4.4+0.044 ms clock. The GC cycle began at 0.154s and completed around 0.158s. The record-breaking allocation delay finished at 0.169s and began at 0.159s.

On a linux/amd64 server:

$ GOMAXPROCS=8 GODEBUG=gctrace=1 ./gc.test.linux_amd64 -test.v -retain-count=20000000 -preallocate=true -loop-count=3
=== RUN   TestHeapGrowthStalls
gc 1 @0.045s 11%: 0.11+21+0.072 ms clock, 0.22+20/43/106+0.14 ms cpu, 458->459->458 MB, 459 MB goal, 8 P
gc 2 @0.068s 14%: 0.004+27+0.10 ms clock, 0.036+0.063/46/93+0.87 ms cpu, 459->513->513 MB, 917 MB goal, 8 P
gc 3 @0.172s 10%: 0.019+30+0.084 ms clock, 0.15+0.20/54/135+0.67 ms cpu, 739->785->784 MB, 1027 MB goal, 8 P
gc 4 @0.389s 6%: 0.014+33+0.039 ms clock, 0.11+0.71/64/161+0.31 ms cpu, 1315->1388->1388 MB, 1569 MB goal, 8 P
gc 5 @0.795s 4%: 0.026+33+0.044 ms clock, 0.21+1.6/62/164+0.35 ms cpu, 2486->2557->2557 MB, 2777 MB goal, 8 P
gc 6 @1.589s 3%: 0.025+84+0.046 ms clock, 0.20+2.3/159/406+0.37 ms cpu, 4780->4915->4915 MB, 5115 MB goal, 8 P
gc 7 @3.202s 2%: 0.015+63+0.041 ms clock, 0.12+4.3/126/313+0.32 ms cpu, 9377->9499->9499 MB, 9830 MB goal, 8 P
gc 8 @6.350s 1%: 0.018+204+0.091 ms clock, 0.14+55/395/998+0.72 ms cpu, 18438->18685->18685 MB, 18999 MB goal, 8 P
gc 9 @13.058s 1%: 0.019+300+0.066 ms clock, 0.15+112/565/1496+0.52 ms cpu, 36436->36706->20259 MB, 37370 MB goal, 8 P
gc 10 @34.001s 0%: 0.017+265+0.11 ms clock, 0.13+83/507/1313+0.91 ms cpu, 39506->39732->20215 MB, 40519 MB goal, 8 P
--- PASS: TestHeapGrowthStalls (53.37s)
        gc_test.go:42: i=0        at=0.045s   delta=0.581ms
        gc_test.go:42: i=57055    at=0.097s   delta=2.220ms
        gc_test.go:42: i=331266   at=0.200s   delta=2.323ms
        gc_test.go:42: i=334871   at=0.221s   delta=19.590ms
        gc_test.go:42: i=952975   at=0.454s   delta=32.232ms
        gc_test.go:42: i=2150327  at=0.889s   delta=61.852ms
        gc_test.go:42: i=4564455  at=1.799s   delta=126.545ms
        gc_test.go:42: i=9258703  at=3.533s   delta=268.414ms
        gc_test.go:42: i=18664799 at=7.354s   delta=799.231ms
PASS

/cc @aclements @RLH

Metadata

Metadata

Assignees

No one assigned

    Labels

    NeedsFixThe path to resolution is known, but the work has not been done.compiler/runtimeIssues related to the Go compiler and/or runtime.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions