Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: pprof output is broken for go test -cpuprofile #16891

Closed
smithjacobj opened this issue Aug 26, 2016 · 10 comments

Comments

Projects
None yet
6 participants
@smithjacobj
Copy link

commented Aug 26, 2016

Please answer these questions before submitting your issue. Thanks!

  1. What version of Go are you using (go version)?
    1.7 windows/386
  2. What operating system and processor architecture are you using (go env)?
    Windows, 386
  3. What did you do?
    If possible, provide a recipe for reproducing the error.
    A complete runnable program is good.
    A link on play.golang.org is best.
    run go test on a package with the -cpuprofile flag.
  4. What did you expect to see?
    Using output from 1.6.3, I get the following result from pprof top10:
(pprof) top10
2470ms of 2880ms total (85.76%)
Dropped 61 nodes (cum <= 14.40ms)
Showing top 10 nodes out of 91 (cum >= 40ms)
      flat  flat%   sum%        cum   cum%
    1440ms 50.00% 50.00%     1720ms 59.72%  <redacted>
     540ms 18.75% 68.75%      540ms 18.75%  runtime.cgocall
     210ms  7.29% 76.04%      210ms  7.29%  runtime.osyield
      50ms  1.74% 77.78%      190ms  6.60%  <redacted>
      50ms  1.74% 79.51%      100ms  3.47%  runtime.mallocgc
      40ms  1.39% 80.90%       40ms  1.39%  runtime.heapBitsForObject
      40ms  1.39% 82.29%       40ms  1.39%  runtime.heapBitsSetType
      40ms  1.39% 83.68%       80ms  2.78%  runtime.mapaccess2
      30ms  1.04% 84.72%       30ms  1.04%  runtime.scanobject
      30ms  1.04% 85.76%       40ms  1.39%  runtime.typedmemmove
  1. What did you see instead?
    Using output from 1.7, I get the following result from pprof top10
(pprof) top10
2.87s of 2.87s total (  100%)
      flat  flat%   sum%        cum   cum%
     2.87s   100%   100%      2.87s   100%  [traffic.test.exe]

Additionally, go-torch gives me this output:

$ go-torch traffic.test.exe hint.prof --raw > hint.torch
INFO[12:11:03] Run pprof command: go tool pprof -raw -seconds 30 traffic.test.exe hint.prof
FATA[12:11:03] Failed: could not parse raw pprof output: malformed location line: 1: 0x4f94ad M=1
@josharian

This comment has been minimized.

Copy link
Contributor

commented Aug 26, 2016

If you're up for some investigation, want to do a git bisect to find the commit that broke this?

@josharian josharian changed the title pprof output is broken for go test -cpuprofile runtime: pprof output is broken for go test -cpuprofile Aug 26, 2016

@smithjacobj

This comment has been minimized.

Copy link
Author

commented Aug 27, 2016

I haven't built Go from the source tree yet so it may take me a little bit to get set up. I'll try to do it when I have time (likely Monday).

@alexbrainman

This comment has been minimized.

Copy link
Member

commented Aug 28, 2016

I tried reproducing this:

C:\>cd %TEMP%

C:\DOCUME~1\brainman\LOCALS~1\Temp>go version
go version go1.7 windows/386

C:\DOCUME~1\brainman\LOCALS~1\Temp>go test -o=a.exe -cpuprofile=cpu.out compress/gzip
ok      compress/gzip   0.531s

C:\DOCUME~1\brainman\LOCALS~1\Temp>go tool pprof a.exe cpu.out
Entering interactive mode (type "help" for commands)
(pprof) top10
320ms of 320ms total (  100%)
Showing top 10 nodes out of 51 (cum >= 70ms)
      flat  flat%   sum%        cum   cum%
     260ms 81.25% 81.25%      260ms 81.25%  runtime.cgocall
      20ms  6.25% 87.50%       20ms  6.25%  runtime.gentraceback
      10ms  3.12% 90.62%       10ms  3.12%  path/filepath.Join
      10ms  3.12% 93.75%       10ms  3.12%  runtime.freedefer
      10ms  3.12% 96.88%       10ms  3.12%  runtime.lock
      10ms  3.12%   100%       10ms  3.12%  runtime.stringtoslicerune
         0     0%   100%       70ms 21.88%  bufio.(*Reader).Read
         0     0%   100%       50ms 15.62%  bufio.(*Reader).fill
         0     0%   100%       70ms 21.88%  compress/gzip.(*Reader).Reset
         0     0%   100%       70ms 21.88%  compress/gzip.(*Reader).readHeader
(pprof) quit

C:\DOCUME~1\brainman\LOCALS~1\Temp>

I don't see any problem. @smithjacobj what did I do wrong?

Alex

@smithjacobj

This comment has been minimized.

Copy link
Author

commented Aug 29, 2016

Strangely enough I can't repro it for any other package, but it's reliable for this one (which I unfortunately can't share) when I flip between 1.6.3/1.7.

I built a single-benchmark test package and was able to get usable results from pprof so there's some edge case I seem to be running into.

@quentinmit

This comment has been minimized.

Copy link
Contributor

commented Sep 13, 2016

@smithjacobj Can you share the profile file generated from this code, even if you can't share the code?

@smithjacobj

This comment has been minimized.

Copy link
Author

commented Sep 19, 2016

Apologies for my tardiness, here's an example. list returns Total: 8.21s and top10 returns:

8.21s of 8.21s total (  100%)
      flat  flat%   sum%        cum   cum%
     8.21s   100%   100%      8.21s   100%  [traffic.test.exe]

https://www.dropbox.com/s/4txd41vacim2sca/broken.prof?dl=0

@quentinmit

This comment has been minimized.

Copy link
Contributor

commented Oct 4, 2016

@smithjacobj
I have a suspicion what's happening. The behavior you see is exactly what happens when pprof can't symbolize the profile. Are you perhaps accidentally trying to use pprof with the binary from 1.6.2 but a profile from 1.7, or vice versa? You need to explicitly generate the binary with go test -o foo.exe before you can run pprof; maybe you did that once and it's hanging around in your directory, but you didn't do it again when you switched to 1.7?

Note that there is indeed profile information in the file you sent:

$ go tool pprof broken.prof 
Entering interactive mode (type "help" for commands)
(pprof) addresses
(pprof) top10
3260ms of 8210ms total (39.71%)
Dropped 486 nodes (cum <= 41.05ms)
Showing top 10 nodes out of 152 (cum >= 80ms)
      flat  flat%   sum%        cum   cum%
    1990ms 24.24% 24.24%     1990ms 24.24%  0000000000402710
     360ms  4.38% 28.62%      360ms  4.38%  000000000042951c
     190ms  2.31% 30.94%      190ms  2.31%  000000000048196c
     130ms  1.58% 32.52%      130ms  1.58%  00000000004819ae
     130ms  1.58% 34.10%      130ms  1.58%  00000000004932fb
     120ms  1.46% 35.57%      120ms  1.46%  0000000000481557
      90ms  1.10% 36.66%       90ms  1.10%  000000000041bb08
      90ms  1.10% 37.76%       90ms  1.10%  00000000004e2c50
      80ms  0.97% 38.73%       80ms  0.97%  0000000000481974
      80ms  0.97% 39.71%       80ms  0.97%  0000000000481983

@rsc

This comment has been minimized.

Copy link
Contributor

commented Nov 22, 2016

This is definitely a symbolization failure. Go 1.8 should make this impossible by putting the symbols in the profile.

@rsc

This comment has been minimized.

Copy link
Contributor

commented Nov 22, 2016

I'm going to close this since I believe Go 1.8 will fix it.

@rsc rsc closed this Nov 22, 2016

@smithjacobj

This comment has been minimized.

Copy link
Author

commented Dec 22, 2016

I'm not sure this will get seen but I had an idea and the time to test the idea. The issue seems to be related to use of the github.com/google/gopacket/pcap package, which binds to libpcap via CGo

I tested with a bare CGo sample, and also a control.

Control results:

$ go tool pprof cgo_17_bug_control.test.exe cpu.prof
Entering interactive mode (type "help" for commands)
(pprof) top10
1650ms of 2040ms total (80.88%)
Dropped 4 nodes (cum <= 10.20ms)
Showing top 10 nodes out of 27 (cum >= 50ms)
      flat  flat%   sum%        cum   cum%
     370ms 18.14% 18.14%      370ms 18.14%  sync/atomic.AddUint32
     310ms 15.20% 33.33%      310ms 15.20%  sync/atomic.CompareAndSwapUint32
     210ms 10.29% 43.63%     1360ms 66.67%  math/rand.(*lockedSource).Int63
     210ms 10.29% 53.92%      210ms 10.29%  math/rand.(*rngSource).Int63
     130ms  6.37% 60.29%      440ms 21.57%  sync.(*Mutex).Lock
     120ms  5.88% 66.18%     1480ms 72.55%  math/rand.(*Rand).Int63
     120ms  5.88% 72.06%      500ms 24.51%  sync.(*Mutex).Unlock
      70ms  3.43% 75.49%      370ms 18.14%  runtime.newstack
      60ms  2.94% 78.43%     1610ms 78.92%  scratch/cgo_17_bug_control.grind
      50ms  2.45% 80.88%       50ms  2.45%  runtime.casgstatus
(pprof)

Bug results:

$ go tool pprof cgo_17_bug.test.exe cpu.prof
Entering interactive mode (type "help" for commands)
(pprof) top10
1.33s of 1.33s total (  100%)
      flat  flat%   sum%        cum   cum%
     1.33s   100%   100%      1.33s   100%  [cgo_17_bug.test.exe]
(pprof)

Control:

impl

package cgo_17_bug_control

import (
	"math/rand"
)

func grind(in int) (out int) {
	for i := 0; i < 1000; i++ {
		out += in + rand.Int()
	}
	return
}

test

package cgo_17_bug_control

import (
	"testing"
)

func BenchmarkGrind(b *testing.B) {
	for i := 0; i < b.N; i++ {
		grind(i)
	}
}

Bug

impl

package cgo_17_bug

// #include <stdlib.h>
//
// unsigned int grind(unsigned int in) {
// 	unsigned int out = 0;
// 	for (int i = 0; i < 1000; i++) {
// 		out += in + rand();
// 	}
// 	return out;
// }
import "C"

func grind(in uint) (out uint) {
	return uint(C.grind(C.uint(in)))
}

test

package cgo_17_bug

import (
	"testing"
)

func BenchmarkGrind(b *testing.B) {
	for i := 0; i < b.N; i++ {
		grind(uint(i))
	}
}
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.