Skip to content

runtime: long latency of sweep assists #57523

@zyguan

Description

@zyguan

What version of Go are you using (go version)?

$ go version
go version go1.19.1 linux/amd64

Does this issue reproduce with the latest release?

Not sure, but at least with v1.19.1.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/zyguan/.cache/go-build"
GOENV="/home/zyguan/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/zyguan/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/zyguan/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/lib/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/lib/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.19.1"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/usr/local/lib/go/src/go.mod"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build2199602951=/tmp/go-build -gno-record-gcc-switches"

What did you do?

I'm investigating a long tail latency issue of tidb and found that it's sweep assists that lead to those slow queries.

2022-12-30_110129

I also added a debug log and found that the heapLive might be less than the sweepHeapLiveBasis here. As a result, we got a extremely large pagesTarget (because of underflow) and then spent a lot of time on sweeping (need to sweep 2767037498505953 pages for allocating 8192 bytes).

2022-12-30_121552

diff --git a/src/runtime/mgcsweep.go b/src/runtime/mgcsweep.go
index de57f18..d26d03b 100644
--- a/src/runtime/mgcsweep.go
+++ b/src/runtime/mgcsweep.go
@@ -810,6 +810,7 @@ func deductSweepCredit(spanBytes uintptr, callerSweepPages uintptr) {
                return
        }
 
+       startTime := nanotime()
        if trace.enabled {
                traceGCSweepStart()
        }
@@ -818,7 +819,8 @@ retry:
        sweptBasis := mheap_.pagesSweptBasis.Load()
 
        // Fix debt if necessary.
-       newHeapLive := uintptr(atomic.Load64(&gcController.heapLive)-mheap_.sweepHeapLiveBasis) + spanBytes
+       heapLive := atomic.Load64(&gcController.heapLive)
+       newHeapLive := uintptr(heapLive-mheap_.sweepHeapLiveBasis) + spanBytes
        pagesTarget := int64(mheap_.sweepPagesPerByte*float64(newHeapLive)) - int64(callerSweepPages)
        for pagesTarget > int64(mheap_.pagesSwept.Load()-sweptBasis) {
                if sweepone() == ^uintptr(0) {
@@ -834,6 +836,9 @@ retry:
        if trace.enabled {
                traceGCSweepDone()
        }
+       if dur := nanotime()-startTime; dur > 100000000 {
+               print(">>> deductSweepCredit dur:", dur/1000000, "ms span-bytes:", spanBytes, " heap-live:", heapLive, " heap-live-basis:", mheap_.sweepHeapLiveBasis, " target:", pagesTarget, "\n")
+       }
 }
 
 // clobberfree sets the memory content at x to bad content, for debugging

What did you expect to see?

reasonable latency of sweep assists.

What did you see instead?

it takes hundreds of milliseconds for deductSweepCredit.

Metadata

Metadata

Assignees

Labels

FrozenDueToAgeNeedsFixThe path to resolution is known, but the work has not been done.compiler/runtimeIssues related to the Go compiler and/or runtime.

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions