Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/pprof: macOS 10.12 profile overcounts system calls #17406

Closed
lucas-clemente opened this issue Oct 11, 2016 · 15 comments
Closed

cmd/pprof: macOS 10.12 profile overcounts system calls #17406

lucas-clemente opened this issue Oct 11, 2016 · 15 comments

Comments

@lucas-clemente
Copy link
Contributor

@lucas-clemente lucas-clemente commented Oct 11, 2016

I've encountered a strong discrepancy between pprof and dtrace while profiling quic-go. I've managed to reduce it to a somewhat simple test case (link), where (in a loop) data is sent via UDP to localhost and some CPU intensive computation that takes a few ms runs.

Profiling this via dtrace gives the expected result, with the CPU intensive part taking the majority of the time. dtrace (imo correctly) shows doSomething taking almost all of the runtime, whereas pprof says 62% is spent in syscall.Syscall6, called by the netFD.Write method.

The dtrace profile is generated using flamegraph with

sudo dtrace -x ustackframes=1000 -n 'profile-99 /execname == "pprof-test" && arg1/ { @[ustack()] = count(); } tick-20s { exit(0); }' -o out.dtrace && ~/src/FlameGraph/stackcollapse.pl out.dtrace > folded.dtrace && ~/src/FlameGraph/flamegraph.pl folded.dtrace > dtrace.svg

screenshot 2016-10-11 11 24 04

The pprof profile is generated using

go build && ./pprof-test
# in another shell
go tool pprof pprof-test localhost:8080/debug/pprof/profile

screenshot 2016-10-11 11 24 08

### All files - [The dtrace flamegraph SVG](https://dl.dropboxusercontent.com/u/1428776/pprof-report/dtrace.svg) - [The dtrace output](https://dl.dropboxusercontent.com/u/1428776/pprof-report/out.dtrace) - [The pprof SVG](https://dl.dropboxusercontent.com/u/1428776/pprof-report/pprof.svg) - [The pprof profile](https://dl.dropboxusercontent.com/u/1428776/pprof-report/pprof.pprof-test.localhost%3A8080.samples.cpu.002.pb.gz) - [The compiled binary from which the profile was generated](https://dl.dropboxusercontent.com/u/1428776/pprof-report/pprof-test) ### What version of Go are you using (`go version`)?

go version go1.7.1 darwin/amd64

What operating system and processor architecture are you using (go env)?

macOS 10.12 (16A323)

~/D/pprof-test → go env
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/lucas/src/go"
GORACE=""
GOROOT="/usr/local/Cellar/go/1.7.1/libexec"
GOTOOLDIR="/usr/local/Cellar/go/1.7.1/libexec/pkg/tool/darwin_amd64"
CC="clang"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/8c/y8j1444j1fbddn6bl788p1ym0000gn/T/go-build975132377=/tmp/go-build -gno-record-gcc-switches -fno-common"
CXX="clang++"
CGO_ENABLED="1"
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Oct 11, 2016

Would it be possible for you to try a similar C program with the C compiler's -pg option? The Go runtime is simply using setitimer with ITIMER_PROF. While there could be a bug in how Go collects profiling data, it is also possible that this is simply due to differences between how Darwin handles setitimer/ITIMERPROF and dtrace.

@lucas-clemente
Copy link
Contributor Author

@lucas-clemente lucas-clemente commented Oct 11, 2016

I'm not sure whether -pg works properly on macOS, I couldn't get it to produce the profiler output. I did however try a similar thing with a C version (link, also needs this go server) and gperftools, and got the same result as with go pprof, with the syscall wrongly shown as taking the majority of the runtime. How much code do gperftools and go's pprof share? Is it possible there's a common bug?

I'm not sure how to put this, but as a pprof user I would be quite unhappy if the solution was "there's simply a difference in reporting". In that case, we should definitely find out in what cases exactly pprof is unreliable, and document this somewhere or/and alert users on Darwin about these issues. A profiler that is sometimes wrong can cause quite a lot of lost time ;)

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Oct 11, 2016

gperftools and Go's pprof command are based on the same code, but I doubt that is in the issue. If the same thing happens with C and Go, then the issue is almost certainly that the Darwin kernel setitimer call and dtrace measure different things. The Darwin man page for setitimer says that ITIMER_PROF "decrements both in process virtual time and when the system is running on behalf of the process." That is, it measures both user time and system time. I don't really know what dtrace measures.

If there is a bug here, it's likely to be in the kernel implementation of setitimer, not in any part of the Go distribution. But further investigation will require someone familiar with the Darwin kernel.

@rsc
Copy link
Contributor

@rsc rsc commented Oct 25, 2016

Thanks for the program. I ran it on my OS X 10.11.6 laptop and I get a profile that matches dtrace:

(pprof) top10
29.48s of 29.67s total (99.36%)
Dropped 14 nodes (cum <= 0.15s)
Showing top 10 nodes out of 14 (cum >= 0.34s)
      flat  flat%   sum%        cum   cum%
    24.60s 82.91% 82.91%     24.60s 82.91%  main.doSomething
     4.67s 15.74% 98.65%      4.67s 15.74%  syscall.Syscall
     0.20s  0.67% 99.33%      0.34s  1.15%  syscall.Recvfrom
     0.01s 0.034% 99.36%      0.35s  1.18%  net.(*UDPConn).readFrom
         0     0% 99.36%     29.28s 98.69%  main.main
         0     0% 99.36%      0.35s  1.18%  main.udpSink
         0     0% 99.36%      0.35s  1.18%  net.(*UDPConn).ReadFromUDP
         0     0% 99.36%      4.68s 15.77%  net.(*conn).Write
         0     0% 99.36%      4.68s 15.77%  net.(*netFD).Write
         0     0% 99.36%      0.34s  1.15%  net.(*netFD).readFrom

@quentinmit tried it on an OS X 10.12.1 beta laptop and got a similar profile (actually an even higher percentage of doSomething).

So neither of us can reproduce this.

Prior to OS X 10.11, there were problems with profiling signals being sent to the wrong thread, and the only workaround was to apply a binary patch to the kernel (see #6047). It is possible that the bug was reintroduced in OS X 10.12 and fixed in 10.12.1 beta, of course, but that seems unlikely.

@lucas-clemente, two more pieces of data that would be useful from your machine that exhibits the problem:

  1. The full output of go test -v runtime/pprof. I'd like to see all the output but the most important part is whether there are any SKIP lines.
  2. The output of running the C program in the 071d212 commit message.

Thanks.

@rsc rsc changed the title cmd/pprof: Syscalls profiled incorrectly on macOS cmd/pprof: macOS 10.12 profile overcounts system calls Oct 25, 2016
@lucas-clemente
Copy link
Contributor Author

@lucas-clemente lucas-clemente commented Oct 26, 2016

I have since upgraded to 10.12.1 (16B2555), and can still reproduce the problem. I also got @marten-seemann to try it on his machine, and he's seeing the same issue, so we're 2:2 ;)

The output of go test -v runtime/pprof is here: https://gist.github.com/lucas-clemente/676c3ad5d5a92fd79c33f8315579b2ec. I can't see any SKIP lines.

The output of the C file from the commit message (compiled with clang test.c -o test is just 10 times signal on cpu-chewing looper thread, so I guess it's a new issue, not #6047 again.

I'd be really happy to help in any way, but I'm not sure what to do next :/

@rsc
Copy link
Contributor

@rsc rsc commented Oct 26, 2016

@lucas-clemente, is this a work machine that might have some kind of antivirus software on it watching the network I/O? Would the same be true of @marten-seemann's machine?

Failing that, I'm trying to figure out what's different between your machines and ours. The two I tried are laptops (MacBook Pros). What hardware are you using?

It's possible that dtrace is simply not reporting stacks at all for user code in a system call, so if the problem here is that the kernel is spending a lot more time in syscall.Syscall on your machine than it does on mine, that might not show up in dtrace. Even on our machines I see 5-15% of time in the system calls, and that's just not on your flame graph at all. Surely the program does spend some time in the writes, which makes me wonder if they are missing entirely from the dtrace.

@lucas-clemente
Copy link
Contributor Author

@lucas-clemente lucas-clemente commented Oct 26, 2016

Both private retina MacBooks Pros, mine is from late 2013 (2.3 GHz i7). I'm not aware of anything I have running that should interfere with the network.

I also checked that I'm not running any strange kexts, and only found VirtualBox (which I assume @marten-seemann has installed too). I tried uninstalling it and testing again, but that didn't change anything. I'm also using Docker's Mac app (which uses Hypervisor.framework), so I tried shutting that down too. Nothing changed.

~ → kextstat | grep -v com.apple
Index Refs Address            Size       Wired      Name (Version) UUID <Linked Against>
  142    3 0xffffff7f837bf000 0x61000    0x61000    org.virtualbox.kext.VBoxDrv (5.1.6) 2432AE53-EC50-36E5-B33B-799910A80FB2 <7 5 4 3 1>
  147    0 0xffffff7f83820000 0x8000     0x8000     org.virtualbox.kext.VBoxUSB (5.1.6) 4F861754-8C13-3623-BE58-893A2303341D <146 142 40 7 5 4 3 1>
  148    0 0xffffff7f83828000 0x5000     0x5000     org.pqrs.driver.VirtualHIDManager (1.1.0) 07DBA1E3-533B-3F80-B1F9-F2B0F417D96D <32 5 4 3 1>
  149    0 0xffffff7f8382d000 0x5000     0x5000     org.virtualbox.kext.VBoxNetFlt (5.1.6) 89A8D50A-DB9F-34B7-B1D0-20E305507610 <142 7 5 4 3 1>
  150    0 0xffffff7f83832000 0x6000     0x6000     org.virtualbox.kext.VBoxNetAdp (5.1.6) C1DDA422-665A-316D-AD52-FA64268ADA2C <142 5 4 1>
  162    0 0xffffff7f83838000 0x6000     0x6000     com.getdropbox.dropbox.kext (1.8.1) 81F3EECE-2B2A-35D5-A57A-62066A404BFC <7 5 4 2 1>
@rsc
Copy link
Contributor

@rsc rsc commented Nov 3, 2016

Today I am getting 30-50% in syscall, both from master and from Go 1.7, so I guess I've reproduced the problem? Not sure what has changed.

@rsc rsc modified the milestones: Go1.8Maybe, Go1.8 Nov 3, 2016
@rsc
Copy link
Contributor

@rsc rsc commented Nov 11, 2016

Even so I don't think we're going to get to this for Go 1.8.

@rsc rsc modified the milestones: Go1.9, Go1.8Maybe Nov 11, 2016
@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Jun 26, 2017

I've also found that CPU profiling on macOS is pretty much useless lately.

@rsc, do you want to take a look for Go 1.9 or should we move to Go 1.10?

@rsc
Copy link
Contributor

@rsc rsc commented Jun 27, 2017

Not going to happen for Go 1.9, sorry.

@rsc rsc modified the milestones: Go1.10, Go1.9 Jun 27, 2017
@stuartcarnie
Copy link

@stuartcarnie stuartcarnie commented Jul 23, 2017

I can also confirm profiling is broken (again) on 10.12.6 from what appears to be a kernel bug.

I am running High Sierra 10.13 Beta 2 on a second laptop (identical hardware) and everything works as expected.

dtrace is also broken on 10.12.6 per this stack-trace bug yet works fine on 10.13.

@rsc
Copy link
Contributor

@rsc rsc commented Nov 22, 2017

Happy to hear this is fixed in High Sierra. Closing.

@rsc rsc closed this Nov 22, 2017
@Allendar
Copy link

@Allendar Allendar commented Dec 16, 2017

I'm seemingly having this issue on http pprof (running a light benchmark with 10 simultaneous connections).

The trace clearly shows running the following path (overshadowing anything else, even tho some heavier tasks were performed):
1 net/http.(*conn).serve /usr/local/opt/go/libexec/src/net/http/server.go (42.58s)
2 net/http.(*response).finishRequest /usr/local/opt/go/libexec/src/net/http/server.go (11.77s)
3 bufio.(*Writer).Flush /usr/local/opt/go/libexec/src/bufio/bufio.go (9s)
4 time.Now /usr/local/opt/go/libexec/src/time/time.go (14.88s)
5 time.now /usr/local/opt/go/libexec/src/runtime/sys_darwin_amd64.s (14.88s)
and it branches from there and also gets an entry from my project.com/projectname/vendor/github.com/go-sql-driver/mysql.(*mysqlConn).writePacket /Users/erwin/Go/src/espaldd.com/espal/vendor/github.com/go-sql-driver/mysql/packets.go (8.49s):
5 net.(*conn).Write /usr/local/opt/go/libexec/src/net/net.go (10.14s)
6 net.(*netFD).Write /usr/local/opt/go/libexec/src/net/fd_unix.go (10.14s)
7 internal/poll.(*FD).Write /usr/local/opt/go/libexec/src/internal/poll/fd_unix.go (10.12s)
8 syscall.Syscall /usr/local/opt/go/libexec/src/syscall/asm_darwin_amd64.s (20.88s)

Running:
macOS High Sierra 10.13.2 (17C88)
go version go1.9.2 darwin/amd64

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Dec 17, 2017

@Allendar This issue is closed because we think it is fixed. Please open a new issue with full details. Thanks.

@golang golang locked and limited conversation to collaborators Dec 17, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
8 participants
You can’t perform that action at this time.