You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.
Using CPU profiling and go tool pprof attributes CPU to the instruction following the one that consumed the time. This can cause CPU time to be attributed to an incorrect line number. For comparison, the Linux perf command works correctly. The example below shows the time is spent in the for loop, when it is actually spent during expensive memory accesses.
AFAICT this issue has existed for a long time (at least since 1.4). I can't see this issue has been logged yet which is surprising. It occurs with recent toolchains as well:
go version devel +d277a36123 Fri Sep 11 02:58:36 2020 +0000 linux/amd64
go version go1.15.2 linux/amd64
Tested on: Linux localhost.localdomain 5.8.4-200.fc32.x86_64 #1 SMP Wed Aug 26 22:28:08 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Here is a script which demonstrates the fault:
#!/bin/sh
cat > prof_test.go <<EOF// Demonstrate that CPU profiles refer to the wrong instruction.package profimport "testing"var buf [1024 * 1024 * 1024]bytefunc Benchmark(b *testing.B) { for i := 0; i < b.N; i++ { // Expensive memory accesses. buf[(2097169*i)%len(buf)] = 42 }}EOF
go mod init prof
go test -c -o prof.test
perf record -- ./prof.test -test.bench . -test.benchtime 3s -test.cpuprofile cpu.prof
echoecho'=== pprof list shows the "for" loop is expensive, not the memory access. ==='
go tool pprof -list Benchmark prof.test cpu.prof
echoecho'=== pprof disasm shows the loop "INC" instruction is most expensive (not the "MOV"). ==='
go tool pprof -disasm Benchmark prof.test cpu.prof
echoecho'=== Linux perf correctly shows the "MOV" instruction is expensive. ==='
perf annotate --stdio -s prof.Benchmark | cat
This sounds like it is likely due to the proposal in #36821. Go's pprof profiling uses ITIMER_PROF, which can have skid of several instructions from the actual expensive instruction.
Linux's perf, on the other hand, can use Intel PEBS (/AMD equivalent) for precise sampling.
I don't think perf enables PEBS by default (-e cycles:pp is required), but even the standard hardware counter likely has less skid than ITIMER_PROF.
odeke-em
changed the title
cmd/pprof: Pprof provides wrong address/line for CPU profiles
cmd/pprof: pprof provides wrong address/line for CPU profiles
Sep 11, 2020
Using CPU profiling and
go tool pprof
attributes CPU to the instruction following the one that consumed the time. This can cause CPU time to be attributed to an incorrect line number. For comparison, the Linuxperf
command works correctly. The example below shows the time is spent in thefor
loop, when it is actually spent during expensive memory accesses.AFAICT this issue has existed for a long time (at least since 1.4). I can't see this issue has been logged yet which is surprising. It occurs with recent toolchains as well:
Tested on:
Linux localhost.localdomain 5.8.4-200.fc32.x86_64 #1 SMP Wed Aug 26 22:28:08 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Here is a script which demonstrates the fault:
And the output:
Cc @hyangah @randall77
The text was updated successfully, but these errors were encountered: