Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/pprof: unable to profile CPU intensive code (even with available threads) #32652

Closed
jaffee opened this issue Jun 17, 2019 · 9 comments
Closed

Comments

@jaffee
Copy link

@jaffee jaffee commented Jun 17, 2019

What version of Go are you using (go version)?

$ go version
go version go1.12.6 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/ubuntu/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/ubuntu/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build117733995=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Ran a simple program which launches a configurable number of goroutines all which do CPU intensive work in a tight loop. It also imports net/http/pprof and starts a server to expose the debugging endpoint.

go get github.com/jaffee/pegger/cmd/pegger
pegger -concurrency 3
# in another terminal:
go tool pprof localhost:6060/debug/pprof/profile?seconds=3

Try setting the concurrency argument lower than what runtime.NumCPU() reports on your system. By default it is equal to runtime.NumCPU().

What did you expect to see?

A profile after 3ish seconds.

What did you see instead?

A few different things:

$ go tool pprof localhost:6060/debug/pprof/profile
Fetching profile over HTTP from http://localhost:6060/debug/pprof/profile
localhost:6060/debug/pprof/profile: Get http://localhost:6060/debug/pprof/profile: net/http: timeout awaiting response headers
failed to fetch any source profiles
$ go tool pprof localhost:6060/debug/pprof/profile
Fetching profile over HTTP from http://localhost:6060/debug/pprof/profile
localhost:6060/debug/pprof/profile: Get http://localhost:6060/debug/pprof/profile: dial tcp 127.0.0.1:6060: connect: connection refused
failed to fetch any source profiles

On OSX, I see the connection refused message if -concurrency is equal to the number of hardware threads available, and otherwise it works.

On a 32 core Linux VM in Azure, I've seen both the timeout and connection refused messages at concurrencies ranging from 2 to 32. At concurrency 1, things work as expected. At concurrency 2, I've seen it work, but it usually doesn't. It has failed consistently at all higher concurrencies in my testing.

I can understand why, if I have 8 cores, and I'm running 8 goroutines all in tight loops, that nothing else might get to run, and so I wouldn't be able to connect to the debug endpoint. But why if I'm only running 3 or 4 goroutines on a 32 core machine?

@jaffee
Copy link
Author

@jaffee jaffee commented Jun 18, 2019

Here is a condensed version of the pegger code in the playground: https://play.golang.org/p/BVSZLO6Kc3B

(You know, in case you don't want to go get and blindly run arbitrary code from a stranger)

@katiehockman katiehockman changed the title unable to profile CPU intensive code (even with available threads) cmd/pprof: unable to profile CPU intensive code (even with available threads) Jun 18, 2019
@katiehockman
Copy link
Member

@katiehockman katiehockman commented Jun 18, 2019

/cc @hyangah

@hyangah
Copy link
Contributor

@hyangah hyangah commented Jun 19, 2019

Is it reproducible other than on VM?

@yuce
Copy link

@yuce yuce commented Jun 19, 2019

@hyangah I can reproduce this with -concurrency 6 and up on Ubuntu 18.04 on an 8 thread CPU (i7-6700HQ). /cc @jaffee .

go version:

go version go1.12.5 linux/amd64

uname -a:

Linux ichi 4.15.0-51-generic #55-Ubuntu SMP Wed May 15 14:27:21 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

go env:

GOARCH="amd64"
GOBIN=""
GOCACHE="/home/yuce/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/yuce/go_path"
GOPROXY=""
GORACE=""
GOROOT="/home/yuce/go"
GOTMPDIR=""
GOTOOLDIR="/home/yuce/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build113306053=/tmp/go-build -gno-record-gcc-switches"

EDIT: Added the output of go version.

@hyangah
Copy link
Contributor

@hyangah hyangah commented Jun 19, 2019

I think this is an inherent limitation of profiling/monitoring cpu or memory from runtime, especially, when the runtime doesn't treat profiling and monitoring work differently from any other work the runtime is doing. How can we rely on the process to reliably capture the correct profile or performance signals when the process itself is overloaded or about to oom?

Maybe runtime team may have a better idea to mitigate the issue from runtime (@aclements)

Otherwise, for processes with extremely high cpu usage, I think we may have better luck by collecting profiles outside runtime (e.g. linux perf, ...)

@jaffee
Copy link
Author

@jaffee jaffee commented Jun 19, 2019

Keep in mind, that we're talking about running with 4 CPU-heavy goroutines on a 32 core host. There are 28 other threads available to the runtime for execution, memory usage is minimal.

I attached to the pegger process with strace, and interestingly, when the pprof request occurs, you see SIGPROF events:

--- SIGPROF {si_signo=SIGPROF, si_code=SI_KERNEL} ---
strace: [ Process PID=13912 runs in 64 bit mode. ]
rt_sigreturn({mask=[]})                 = 5542608
--- SIGPROF {si_signo=SIGPROF, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 7136556669451735562
--- SIGPROF {si_signo=SIGPROF, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 824635219968
--- SIGPROF {si_signo=SIGPROF, si_code=SI_KERNEL} ---

then they abruptly stop, and shortly thereafter the pprof request times out.

If you try to hit the pprof endpoint again, you see NO new system calls through strace.

@hyangah
Copy link
Contributor

@hyangah hyangah commented Jun 19, 2019

@jaffee Yeah, that's unarguably strange that pprof couldn't find cpu to run on when there are 28 other cores are idle in theory. But I am not familiar with Azure's VM environment, so I was curious whether such dramatic differences (32 cores >> 4 working cores) were observed in non-VM environment.

@jaffee
Copy link
Author

@jaffee jaffee commented Jun 19, 2019

Thanks @hyangah. I have a couple boxes I can test on when I get home (nothing with 32 cores though!). Can also pretty easily test on AWS/GCP/OCI if that helps.

@jaffee
Copy link
Author

@jaffee jaffee commented Jul 10, 2019

After more investigation (and opening another ticket #33015), this is likely also a dup of #10958 and will likely be fixed by #24543.

I think the missing piece of my understanding was that the runtime is waiting for all running goroutines to reach a pre-emption point so that it can do a STW phase of GC (or something), which is why more things won't run even though there are available threads.

@jaffee jaffee closed this Jul 10, 2019
@golang golang locked and limited conversation to collaborators Jul 9, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.