Skip to content

runtime: difficult to find memory leak that is a result lots of defers in a hot loop #42357

Open
@leighhopcroft

Description

@leighhopcroft

What version of Go are you using (go version)?

$ go version
go version go1.15.3 linux/amd64

Cross compiled for GOOS=linux GOARCH=arm GOARM=7.

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE="auto"
GOARCH="arm"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOENV="/root/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/go/pkg/mod"
GONOPROXY="gitlab.corp.wabtec.com"
GONOSUMDB="gitlab.corp.wabtec.com"
GOOS="linux"
GOPATH="/go"
GOPRIVATE="gitlab.corp.wabtec.com"
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
GOARM="7"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="0"
GOMOD="/workspaces/rm-edge-hst/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -marm -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build052064669=/tmp/go-build -gno-record-gcc-switches"

What did you do?

I tried to use pprof to debug a memory leak in a long running program that was showing growth in live objects (MemStats.Mallocs - MemStats.Frees) over time.

What did you expect to see?

pprof to reveal the problematic code that was causing the leak.

What did you see instead?

Setting sample_index=inuse_objects and running top showed lots of objects coming from runtime.systemstack. pprof was not giving useful info regarding what is actually running on the stack and causing the leak of live objects. web and peek showed that runtime.mstart was calling runtime.systemstack suggesting a possible leak of threads.
image (2)

However this was not actually relevant, I think due to this line in asm_arm.s that states "make it look like mstart called systemstack on g0, to stop traceback".

After lots of manual debugging, the culprit was found to be a bug in my code where a defer was being called in a hot and long running loop, rather than being called once before the loop.

e.g.
With bug - http://play.golang.org/p/0AzEwnbplyq
Bug fixed - http://play.golang.org/p/VbTEFgYXmMI

This was causing lots of calls to runtime.newdefer which was appending to the defer pool in a systemstack block:

systemstack(func() {
    lock(&sched.deferlock)
    for len(pp.deferpool[sc]) < cap(pp.deferpool[sc])/2 && sched.deferpool[sc] != nil {
        d := sched.deferpool[sc]
	sched.deferpool[sc] = d.link
	d.link = nil
	pp.deferpool[sc] = append(pp.deferpool[sc], d)
    }
    unlock(&sched.deferlock)
})

I'm not sure what could be done to fix this, but it would have been nice if pprof was able to point me to the offending code more directly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.compiler/runtimeIssues related to the Go compiler and/or runtime.

    Type

    No type

    Projects

    Status

    Triage Backlog

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions