Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: difficult to find memory leak that is a result lots of defers in a hot loop #42357

Open
leighhopcroft opened this issue Nov 3, 2020 · 5 comments
Milestone

Comments

@leighhopcroft
Copy link

@leighhopcroft leighhopcroft commented Nov 3, 2020

What version of Go are you using (go version)?

$ go version
go version go1.15.3 linux/amd64

Cross compiled for GOOS=linux GOARCH=arm GOARM=7.

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE="auto"
GOARCH="arm"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOENV="/root/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/go/pkg/mod"
GONOPROXY="gitlab.corp.wabtec.com"
GONOSUMDB="gitlab.corp.wabtec.com"
GOOS="linux"
GOPATH="/go"
GOPRIVATE="gitlab.corp.wabtec.com"
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
GOARM="7"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="0"
GOMOD="/workspaces/rm-edge-hst/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -marm -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build052064669=/tmp/go-build -gno-record-gcc-switches"

What did you do?

I tried to use pprof to debug a memory leak in a long running program that was showing growth in live objects (MemStats.Mallocs - MemStats.Frees) over time.

What did you expect to see?

pprof to reveal the problematic code that was causing the leak.

What did you see instead?

Setting sample_index=inuse_objects and running top showed lots of objects coming from runtime.systemstack. pprof was not giving useful info regarding what is actually running on the stack and causing the leak of live objects. web and peek showed that runtime.mstart was calling runtime.systemstack suggesting a possible leak of threads.
image (2)

However this was not actually relevant, I think due to this line in asm_arm.s that states "make it look like mstart called systemstack on g0, to stop traceback".

After lots of manual debugging, the culprit was found to be a bug in my code where a defer was being called in a hot and long running loop, rather than being called once before the loop.

e.g.
With bug - http://play.golang.org/p/0AzEwnbplyq
Bug fixed - http://play.golang.org/p/VbTEFgYXmMI

This was causing lots of calls to runtime.newdefer which was appending to the defer pool in a systemstack block:

systemstack(func() {
    lock(&sched.deferlock)
    for len(pp.deferpool[sc]) < cap(pp.deferpool[sc])/2 && sched.deferpool[sc] != nil {
        d := sched.deferpool[sc]
	sched.deferpool[sc] = d.link
	d.link = nil
	pp.deferpool[sc] = append(pp.deferpool[sc], d)
    }
    unlock(&sched.deferlock)
})

I'm not sure what could be done to fix this, but it would have been nice if pprof was able to point me to the offending code more directly.

@davecheney
Copy link
Contributor

@davecheney davecheney commented Nov 3, 2020

Defers are function scoped not block scoped. It appears that you have solved your problem, what is left to address in this issue?

@leighhopcroft
Copy link
Author

@leighhopcroft leighhopcroft commented Nov 3, 2020

@josharian encouraged me to file an issue after helping me debug this over on gophers #performance slack. I have solved my memory leak bug but pprof didn't really help in this case - I thought it might be useful to share my experience and start a discussion on whether pprof could be improved to catch issues like this in the future.

@davecheney
Copy link
Contributor

@davecheney davecheney commented Nov 3, 2020

Could you post a cpu svg for the same trace, I’m hoping it might point to newdefer in some way.

@toothrot toothrot added this to the Backlog milestone Nov 3, 2020
@toothrot
Copy link
Contributor

@toothrot toothrot commented Nov 3, 2020

/cc @hyangah

@josharian
Copy link
Contributor

@josharian josharian commented Nov 3, 2020

-lines would have helped a little. The fundamental problem here is the interaction between: (1) systemstack is a backtracing roadblock and (2) defer allocations occur in a call to systemstack.

@ianlancetaylor ianlancetaylor changed the title pprof: difficult to find memory leak that is a result lots of defers in a hot loop runtime: difficult to find memory leak that is a result lots of defers in a hot loop Nov 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.