Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/pprof: macOS 12.6.1 profile overcounts system calls (again) #57722

Open
felixge opened this issue Jan 10, 2023 · 9 comments
Open

cmd/pprof: macOS 12.6.1 profile overcounts system calls (again) #57722

felixge opened this issue Jan 10, 2023 · 9 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Darwin
Milestone

Comments

@felixge
Copy link
Contributor

felixge commented Jan 10, 2023

What version of Go are you using (go version)?

$ go version
go version go1.19.4 darwin/arm64

This is on macOS 12.6.1 with an M1 chip, but the problem seems to affect Intel as well.

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE="auto"
GOARCH="arm64"
GOBIN=""
GOCACHE="/Users/MyUser/Library/Caches/go-build"
GOENV="/Users/MyUser/Library/Application Support/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS="-count=1"
GOHOSTARCH="arm64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/MyUser/go/pkg/mod"
GONOPROXY="github.com/MyOrg"
GONOSUMDB="github.com/MyOrg,go.myorg.io"
GOOS="darwin"
GOPATH="/Users/MyUser/go"
GOPRIVATE="github.com/MyOrg"
GOPROXY="binaries.myorg.io,https://proxy.golang.org,direct"
GOROOT="/opt/homebrew/Cellar/go/1.19.4/libexec"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/opt/homebrew/Cellar/go/1.19.4/libexec/pkg/tool/darwin_arm64"
GOVCS=""
GOVERSION="go1.19.4"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/MyUser/go/src/github.com/felixge/macos-profiler-bias/go.mod"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/0t/nwrdnfnd2mjdgr0z_p_f6xww0000gq/T/go-build3012663781=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

Create a CPU profile of a CPU-bound workload that is interleaved with short system calls.

For example TestInterleavedIO is a reproducer that is compressing a 250MB file containing random data.

What did you expect to see?

A profile that is similar to the one shown below for linux. Most of time should be spent on gzip compression, and a little bit on syscalls.

interleaved linux

What did you see instead?

On macOS the majority of time is attributed to read and write syscalls. The gzip compression is barely showing up at all.

interleaved darwin

Sanity Check

TestSequentialIO implements the same workload, but instead of interleaved I/O, it does one big read first, followed by gzip compression, followed by a big write.

As expected, Linux produces a profile dominated by gzip compression.

sequential linux pprof

And macOS now shows the right profile too.

sequential darwin pprof

Conclusion

macOS setitimer(2) seems to bias towards syscalls. This issue was reported and fixed in the past, see #17406 (2016) and #6047 (2013). So this could be a macOS regression.

I've uploaded all code, pprofs and screenshots to this google drive folder.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jan 10, 2023
@cbedard
Copy link
Contributor

cbedard commented Jan 10, 2023

Testing on an Intel MacOS (MacOS 12.6.1)

go version

go version
go version go1.19.4 darwin/amd64

go env

go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/cameron.bedard/Library/Caches/go-build"
GOENV="/Users/cameron.bedard/Library/Application Support/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/cameron.bedard/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/cameron.bedard/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GOVCS=""
GOVERSION="go1.19.4"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/dev/null"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch x86_64 -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/zs/r6mt7f1n66n9mw2rz8f9w2540000gp/T/go-build1170536968=/tmp/go-build -gno-record-gcc-switches -fno-common"

I'm a bit of a go noob (start during advent of code lol), so I wasn't able to set up the UI you used but my IDE's built in pprof output looks similar to yours. If you want the matching profiler UI and send me a how-to link I can set it up :)

TestInterleavedIO
Screen Shot 2023-01-10 at 2 34 40 PM

TestSequentialIO
Screen Shot 2023-01-10 at 2 35 13 PM

@felixge
Copy link
Contributor Author

felixge commented Jan 10, 2023

@cbedard thanks, this looks like Intel Macs have the same issue. The UI doesn't matter, this is fine :).

@felixge felixge changed the title cmd/pprof: macOS 12.6.1 (M1) profile overcounts system calls (again) cmd/pprof: macOS 12.6.1 profile overcounts system calls (again) Jan 10, 2023
@cagedmantis cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jan 11, 2023
@cagedmantis
Copy link
Contributor

@golang/runtime

@prattmic
Copy link
Member

prattmic commented Jan 11, 2023

It would not surprise me if darwin has unfair SIGPROF delivery. e.g., if the system call return path updates the CPU timers, it may be more likely to trigger overflow and deliver SIGPROF to itself vs threads that do not make system calls.

@prattmic
Copy link
Member

I think this could be tested with a simple C program. Two threads, one running a busy loop, the other with a busy loop that makes a system call. Do they both receive ~50% of SIGPROF signals?

@cherrymui
Copy link
Member

I haven't read the sample code. Just wonder, for the "interleaved" case, are the compression and I/O running concurrently on different goroutines/threads, or they are on the same thread but just interleaved? If the latter, do you know what frequency or time period the interleaving is, and is it faster than the profile sampling frequency (100 Hz)? Thanks.

@felixge
Copy link
Contributor Author

felixge commented Jan 11, 2023

I haven't read the sample code. Just wonder, for the "interleaved" case, are the compression and I/O running concurrently on different goroutines/threads, or they are on the same thread but just interleaved?

It's the latter. But I suspect that there could be concurrent/parallel read-ahead happening on the OS level.

If the latter, do you know what frequency or time period the interleaving is, and is it faster than the profile sampling frequency (100 Hz)? Thanks.

According to the trace (shown below), the workload issues 53k syscalls in 3.7s, so at ~14kHz it's much faster than the profiling frequency.

CleanShot 2023-01-11 at 22 37 34@2x

It would not surprise me if darwin has unfair SIGPROF delivery. e.g., if the system call return path updates the CPU timers, it may be more likely to trigger overflow and deliver SIGPROF to itself vs threads that do not make system calls.
I think this could be tested with a simple C program. Two threads, one running a busy loop, the other with a busy loop that makes a system call. Do they both receive ~50% of SIGPROF signals?

What syscall would you use? It would have to be a non-blocking one to see 50%? But yeah, a small C reproducer could be useful for this.

@prattmic
Copy link
Member

prattmic commented Jan 18, 2023

I'd start with getpid or something similar, though perhaps it needs to go through the scheduler, so a very briefly blocking system call may be needed.

@mknyszek
Copy link
Contributor

In triage now, since it seems like the next step is a C reproducer to confirm unfair SIGPROF delivery? Does that sound right? (The assignee is mostly just for tracking, there's no obligation; feel free to remove it.)

@mknyszek mknyszek added this to the Go1.21 milestone Jan 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Darwin
Projects
Status: In Progress
Development

No branches or pull requests

7 participants