Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/pprof: list not accurate in 1.9 #21697

Closed
ardan-bkennedy opened this issue Aug 30, 2017 · 19 comments
Closed

cmd/pprof: list not accurate in 1.9 #21697

ardan-bkennedy opened this issue Aug 30, 2017 · 19 comments
Milestone

Comments

@ardan-bkennedy
Copy link

@ardan-bkennedy ardan-bkennedy commented Aug 30, 2017

What version of Go are you using (go version)?

go version go1.9 darwin/amd64

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/bill/code/go"
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/_v/4r515ktx08g5yr6dgkxhfyfr0000gn/T/go-build016620058=/tmp/go-build -gno-record-gcc-switches -fno-common"
CXX="clang++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"

Code to reproduce the bug?

https://github.com/ardanlabs/gotraining/tree/master/topics/go/profiling/memcpu

What did you do?

$ go test -run none -bench . -benchtime 3s -memprofile mem.out
$ go tool pprof -alloc_space mem.out
Type: alloc_space
Time: Aug 30, 2017 at 7:33am (PDT)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) list algOne
Total: 54.01MB
ROUTINE ======================== github.com/ardanlabs/gotraining/topics/go/profiling/memcpu.algOne in /Users/bill/code/go/src/github.com/ardanlabs/gotraining/topics/go/profiling/memcpu/stream.go
       2MB    54.01MB (flat, cum)   100% of Total
         .          .     78:
         .          .     79:// algOne is one way to solve the problem.
         .          .     80:func algOne(data []byte, find []byte, repl []byte, output *bytes.Buffer) {
         .          .     81:
         .          .     82:	// Use a bytes Buffer to provide a stream to process.
         .    52.01MB     83:	input := bytes.NewBuffer(data)
         .          .     84:
         .          .     85:	// The number of bytes we are looking for.
         .          .     86:	size := len(find)
         .          .     87:
         .          .     88:	// Declare the buffers we need to process the stream.
       2MB        2MB     89:	buf := make([]byte, size)
         .          .     90:	end := size - 1
         .          .     91:
         .          .     92:	// Read in an initial number of bytes we need to get started.
         .          .     93:	if n, err := io.ReadFull(input, buf[:end]); err != nil {
         .          .     94:		output.Write(buf[:n])

The allocation for input is not being reported in pprof under flat but the escape analysis shows that the call to bytes.NewBuffer was inlined

$ go build -gcflags "-m -m"
./stream.go:83:26: inlining call to bytes.NewBuffer func([]byte) *bytes.Buffer { return &bytes.Buffer literal }

What did you expect to see?

/Users/bill/code/go/src/github.com/ardanlabs/gotraining/topics/go/profiling/memcpu/stream.go
       2MB    54.01MB (flat, cum)   100% of Total
         .          .     78:
         .          .     79:// algOne is one way to solve the problem.
         .          .     80:func algOne(data []byte, find []byte, repl []byte, output *bytes.Buffer) {
         .          .     81:
         .          .     82:	// Use a bytes Buffer to provide a stream to process.
   52.01MB    52.01MB     83:	input := bytes.NewBuffer(data)
         .          .     84:
         .          .     85:	// The number of bytes we are looking for.
         .          .     86:	size := len(find)
         .          .     87:
         .          .     88:	// Declare the buffers we need to process the stream.
       2MB        2MB     89:	buf := make([]byte, size)
         .          .     90:	end := size - 1
         .          .     91:
         .          .     92:	// Read in an initial number of bytes we need to get started.
         .          .     93:	if n, err := io.ReadFull(input, buf[:end]); err != nil {
         .          .     94:		output.Write(buf[:n])
@ianlancetaylor ianlancetaylor changed the title pprof list not accurate in 1.9 cmd/pprof: list not accurate in 1.9 Aug 30, 2017
@ianlancetaylor ianlancetaylor added this to the Go1.10 milestone Aug 30, 2017
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Aug 30, 2017

@ardan-bkennedy
Copy link
Author

@ardan-bkennedy ardan-bkennedy commented Oct 17, 2017

I am seeing this on linux as well when using a Docker container running Alpine and/or Debian.

What version of Go are you using (go version)?

go version go1.9.1 linux/amd64

What operating system and processor architecture are you using (go env)?

Using Alpine3.6 in a Docker Container

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/go"
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0"
CXX="g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"

Code to reproduce the bug?

https://github.com/ardanlabs/gotraining/tree/master/topics/go/profiling/memcpu

What did you do?

$ go test -run none -bench . -benchtime 3s -memprofile mem.out
$ go tool pprof -alloc_space mem.out

File: memcpu.test
Type: alloc_space
Time: Oct 17, 2017 at 11:31pm (UTC)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) list algOne
Total: 325.53MB
ROUTINE ======================== github.com/ardanlabs/gotraining/topics/go/profiling/memcpu.algOne in /go/src/github.com/ardanlabs/gotraining/topics/go/profiling/memcpu/stream.go
  325.53MB   325.53MB (flat, cum)   100% of Total
         .          .      8:// The solution cannot meaningfully buffer to the end of the stream and
         .          .      9:// then process the replacement.
         .          .     10:package main
         .          .     11:
         .          .     12:import (
  310.03MB   310.03MB     13:	"bytes"
         .          .     14:	"fmt"
         .          .     15:	"io"
         .          .     16:)
         .          .     17:
         .          .     18:// data represents a table of input and expected output.
         .          .     19:var data = []struct {
         .          .     20:	input  []byte
         .          .     21:	output []byte
         .          .     22:}{
         .          .     23:	{[]byte("abc"), []byte("abc")},
         .          .     24:	{[]byte("elvis"), []byte("Elvis")},
         .          .     25:	{[]byte("aElvis"), []byte("aElvis")},
         .          .     26:	{[]byte("abcelvis"), []byte("abcElvis")},
         .          .     27:	{[]byte("eelvis"), []byte("eElvis")},
         .          .     28:	{[]byte("aelvis"), []byte("aElvis")},
         .          .     29:	{[]byte("aabeeeelvis"), []byte("aabeeeElvis")},
         .          .     30:	{[]byte("e l v i s"), []byte("e l v i s")},
         .          .     31:	{[]byte("aa bb e l v i saa"), []byte("aa bb e l v i saa")},
         .          .     32:	{[]byte(" elvi s"), []byte(" elvi s")},
         .          .     33:	{[]byte("elvielvis"), []byte("elviElvis")},
         .          .     34:	{[]byte("elvielvielviselvi1"), []byte("elvielviElviselvi1")},
         .          .     35:	{[]byte("elvielviselvis"), []byte("elviElvisElvis")},
         .          .     36:}
         .          .     37:
         .          .     38:// assembleInputStream combines all the input into a
         .          .     39:// single stream for processing.
         .          .     40:func assembleInputStream() []byte {
         .          .     41:	var in []byte
         .          .     42:	for _, d := range data {
         .          .     43:		in = append(in, d.input...)
         .          .     44:	}
         .          .     45:	return in
         .          .     46:}
         .          .     47:
         .          .     48:// assembleOutputStream combines all the output into a
         .          .     49:// single stream for comparing.
         .          .     50:func assembleOutputStream() []byte {
         .          .     51:	var out []byte
         .          .     52:	for _, d := range data {
         .          .     53:		out = append(out, d.output...)
         .          .     54:	}
         .          .     55:	return out
         .          .     56:}
         .          .     57:
         .          .     58:func main() {
         .          .     59:	var output bytes.Buffer
         .          .     60:	in := assembleInputStream()
         .          .     61:	out := assembleOutputStream()
         .          .     62:
         .          .     63:	find := []byte("elvis")
         .          .     64:	repl := []byte("Elvis")
         .          .     65:
         .          .     66:	fmt.Println("=======================================\nRunning Algorithm One")
         .          .     67:	output.Reset()
         .          .     68:	algOne(in, find, repl, &output)
         .          .     69:	matched := bytes.Compare(out, output.Bytes())
         .          .     70:	fmt.Printf("Matched: %v\nInp: [%s]\nExp: [%s]\nGot: [%s]\n", matched == 0, in, out, output.Bytes())
         .          .     71:
         .          .     72:	fmt.Println("=======================================\nRunning Algorithm Two")
         .          .     73:	output.Reset()
         .          .     74:	algTwo(in, find, repl, &output)
         .          .     75:	matched = bytes.Compare(out, output.Bytes())
         .          .     76:	fmt.Printf("Matched: %v\nInp: [%s]\nExp: [%s]\nGot: [%s]\n", matched == 0, in, out, output.Bytes())
         .          .     77:}
         .          .     78:
         .          .     79:// algOne is one way to solve the problem.
         .          .     80:func algOne(data []byte, find []byte, repl []byte, output *bytes.Buffer) {
         .          .     81:
         .          .     82:	// Use a bytes Buffer to provide a stream to process.
         .          .     83:	input := bytes.NewBuffer(data)
         .          .     84:
         .          .     85:	// The number of bytes we are looking for.
         .          .     86:	size := len(find)
         .          .     87:
         .          .     88:	// Declare the buffers we need to process the stream.
   15.50MB    15.50MB     89:	buf := make([]byte, size)
         .          .     90:	end := size - 1
         .          .     91:
         .          .     92:	// Read in an initial number of bytes we need to get started.
         .          .     93:	if n, err := io.ReadFull(input, buf[:end]); err != nil {
         .          .     94:		output.Write(buf[:n])
(pprof)

What did you expect to see?

I expected to only see algOne

  325.53MB   325.53MB (flat, cum)   100% of Total

         .          .     79:// algOne is one way to solve the problem.
         .          .     80:func algOne(data []byte, find []byte, repl []byte, output *bytes.Buffer) {
         .          .     81:
         .          .     82:	// Use a bytes Buffer to provide a stream to process.
         .          .     83:	input := bytes.NewBuffer(data)
         .          .     84:
         .          .     85:	// The number of bytes we are looking for.
         .          .     86:	size := len(find)
         .          .     87:
         .          .     88:	// Declare the buffers we need to process the stream.
   15.50MB    15.50MB     89:	buf := make([]byte, size)
         .          .     90:	end := size - 1
         .          .     91:
         .          .     92:	// Read in an initial number of bytes we need to get started.
         .          .     93:	if n, err := io.ReadFull(input, buf[:end]); err != nil {
         .          .     94:		output.Write(buf[:n])

I also expected to see the allocations (both Flat and Cum) on line 83

$ go build -gcflags "-m -m"

./stream.go:83:26: inlining call to bytes.NewBuffer func([]byte) *bytes.Buffer { return &bytes.Buffer literal }

./stream.go:83:26: &bytes.Buffer literal escapes to heap
./stream.go:83:26: 	from ~r0 (assign-pair) at ./stream.go:83:26
./stream.go:83:26: 	from input (assigned) at ./stream.go:83:8
./stream.go:83:26: 	from input (interface-converted) at ./stream.go:93:26
./stream.go:83:26: 	from input (passed to call[argument escapes]) at ./stream.go:93:26
@aclements
Copy link
Member

@aclements aclements commented Oct 27, 2017

This looks like bad line number information, probably from bytes.NewBuffer getting inlined across a package boundary. @mdempsky

@ardan-bkennedy, can you try this with current master? It's possible this has already been fixed.

@ardan-bkennedy
Copy link
Author

@ardan-bkennedy ardan-bkennedy commented Oct 27, 2017

I tried with 1.9.2 this evening and the cum number showed but not the flat. That was on Darwin. I’ll try master this weekend. Thank You!

@ardan-bkennedy
Copy link
Author

@ardan-bkennedy ardan-bkennedy commented Oct 30, 2017

@aclements I just tried with master and the same thing. The allocation does not show up in the flat column for line 83.

$ go version
go version devel +4d0151ede5 Mon Oct 30 13:56:39 2017 +0000 darwin/amd64
Type: alloc_space
Time: Oct 30, 2017 at 12:16pm (EDT)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) list algOne
Total: 46.50MB
ROUTINE ======================== github.com/ardanlabs/gotraining/topics/go/profiling/memcpu.algOne in /Users/bill/code/go/src/github.com/ardanlabs/gotraining/topics/go/profiling/memcpu/stream.go
    3.50MB    46.50MB (flat, cum)   100% of Total
         .          .     78:
         .          .     79:// algOne is one way to solve the problem.
         .          .     80:func algOne(data []byte, find []byte, repl []byte, output *bytes.Buffer) {
         .          .     81:
         .          .     82:	// Use a bytes Buffer to provide a stream to process.
         .       43MB     83:	input := bytes.NewBuffer(data)
         .          .     84:
         .          .     85:	// The number of bytes we are looking for.
         .          .     86:	size := len(find)
         .          .     87:
         .          .     88:	// Declare the buffers we need to process the stream.
    3.50MB     3.50MB     89:	buf := make([]byte, size)
         .          .     90:	end := size - 1
         .          .     91:
         .          .     92:	// Read in an initial number of bytes we need to get started.
         .          .     93:	if n, err := io.ReadFull(input, buf[:end]); err != nil {
         .          .     94:		output.Write(buf[:n])
(pprof) exit
@ardan-bkennedy
Copy link
Author

@ardan-bkennedy ardan-bkennedy commented Oct 30, 2017

It does look like the bug of only showing the code in question has been fixed!

@aclements
Copy link
Member

@aclements aclements commented Oct 31, 2017

Sorry, I think there are multiple issues going on here and I was confused. @ardan-bkennedy, the output you just posted looks 100% correct to me. bytes.NewBuffer is a function call, so it shouldn't be showing anything in the flat column (and it's not). It doesn't matter that the compiler made an internal decision to inline it; the allocation doesn't happen on that line, it happens on a line in NewBuffer.

It does look like we fixed the mis-attribution of the allocation to the import line, which was clearly a bug. @ardan-bkennedy, from another issue it sounded like you were interested in having that fixed in 1.9. If so, I'm not positive it passes the bar for back-porting, but if you can bisect down to which commit fixed it (or, @mdempsky, maybe you know off the top of your head?) we could consider it.

@ardan-bkennedy
Copy link
Author

@ardan-bkennedy ardan-bkennedy commented Oct 31, 2017

That call is inlined and I expected it to be flat because in version 1.8.x this was the case. So this is a change in the way the tooling is reporting allocations then? Are you saying it is more acurate to show it this way instead of the old way? I don't care so much about the answer only so I can be accurate when teaching. I have material and videos on 1.8.x that teach the inlining will cause the value to the flat.

@ardan-bkennedy
Copy link
Author

@ardan-bkennedy ardan-bkennedy commented Oct 31, 2017

I checked the release notes for 1.9, it does not document this change in the way allocations are reported. Since the escape analysis report go build -gcflags "-m -m" shows inlining decisions, I would expect those decisions to be relevant to the output.

@aclements
Copy link
Member

@aclements aclements commented Oct 31, 2017

That call is inlined and I expected it to be flat because in version 1.8.x this was the case. So this is a change in the way the tooling is reporting allocations then? Are you saying it is more acurate to show it this way instead of the old way?

Right. Reporting it as flat was a bug in 1.8 caused by Go's lack of inlining information. That was fixed in 1.9 and now the report is accurate (or, at least, it will be accurate in 1.10; clearly attributing the import line was not accurate and we might back-port the fix if we know what it was).

Inlining is an implementation detail. Previously you would have gotten a different report if you compiled with certain debug flags, or if bytes.NewBuffer changed in some minor way that stopped it from being inlined, or if the compiler's inlining heuristics changed. Now Go has accurate inlining information so tools like pprof have access to accurate call stacks and can accurately report allocation sites. This information reflects what's going on at a (stable) language level rather than what's going on inside the (constantly evolving) implementation details.

I have material and videos on 1.8.x that teach the inlining will cause the value to the flat.

Now you can simplify that. :)

@ardan-bkennedy
Copy link
Author

@ardan-bkennedy ardan-bkennedy commented Oct 31, 2017

Perfect! Thanks but I have a question. If I have a function like NewBuffer which can be inlined and the value never escapes the heap, because the calling function does not cause the allocation. Then I would expect to see no allocation at all? I am going to test this.

@ardan-bkennedy
Copy link
Author

@ardan-bkennedy ardan-bkennedy commented Oct 31, 2017

I think this leads to more confusion. I can make the call to bytes.NewBuffer and sometimes it shows a cum allocation and sometimes it doesn't. All based on what the calling function is doing with the return value. I don't think this simplifies anything. Now I have to explain that the value is owned by the calling function becuase of the inlining, but it does not show as flat because it came from the function that was called (which was never really called).

How would you describe what the flat value means in respect to the allocations it shows?

@aclements
Copy link
Member

@aclements aclements commented Oct 31, 2017

I don't think you should involve inlining at all in the explanation. This is purely a question of whether or not there's a heap allocation. The compiler optimizes heap allocations into stack allocations all the time for lots of reasons, and this is just another case of that. The profiler shows you the truth of heap allocations because you want to optimize what actually gets allocated in places where it actually matters, and you shouldn't be guessing about what the compiler will or won't do in subtle cases (that's why there's a profiler :)

How would you describe what the flat value means in respect to the allocations it shows?

The flat column reports bytes of heap allocations that occurred directly on that line of code.

@mdempsky
Copy link
Member

@mdempsky mdempsky commented Oct 31, 2017

@aclements

If so, I'm not positive it passes the bar for back-porting, but if you can bisect down to which commit fixed it (or, @mdempsky, maybe you know off the top of your head?) we could consider it.

The only fix that comes to mind is CL 64470 (7c8a961), but I'd be surprised if that was relevant to a simple bytes.NewBuffer call.

@ardan-bkennedy
Copy link
Author

@ardan-bkennedy ardan-bkennedy commented Oct 31, 2017

I will let it go but I think this is a mistake. The profiler should show the allocations accurately. If the function was inlined, then the allocation should show as flat. In this case, the heap allocations did occur directly on that line of code.

Thanks for everything you do!

@mdempsky
Copy link
Member

@mdempsky mdempsky commented Nov 7, 2017

If the function was inlined, then the allocation should show as flat.

As a compiler developer, I would expect cmd/pprof should report the allocation the same way, whether or not a function was inlined. Put differently, in an ideal world, I expect function inlining to be a purely toolchain-internal decision that users are isolated from. (In the real world this often doesn't happen, but that's at least what we strive for.)

I'm not really a cmd/pprof expert. Are you saying you expect cmd/pprof output to vary depending on whether the function was inlined or not? (I would disagree with this, but then I'd be interested to know why you think that.)

Or are you saying when the function is not inlined, the allocation does show as flat, so it's an error that inlining is changing this? (I would agree this is a bug.)

@ardan-bkennedy
Copy link
Author

@ardan-bkennedy ardan-bkennedy commented Nov 8, 2017

I wanted the profile to show flat numbers based on the decisions made during escape analysis. I believe it makes it easier to reason about the profile. I appreciate that profiling and escape analysis are two separate concerns but they do come together when trying to understand a profile. A profile can only show you what allocated and the escape analysis can tell you why.

When a function is inlined, the value is technically not own by that function anymore, but the calling function. This is why I wanted to see the allocation as flat inside the calling function. Technically it's the calling function that is causing the allocation because the calling function owns the creation of the value (since the function call was inlined).

I just believe it makes the profiles easier to read. This is how it has been for a long time, so this change is confusing.

It would have been nice to see some release notes on this change and to have some discusion on the decision before it was made. The ship has sailed, so no more need to discuss it.

@rsc
Copy link
Contributor

@rsc rsc commented Nov 22, 2017

The new behavior, with accurate call stacks even in the presence of inlining, is correct. Working as intended.

@rsc rsc closed this Nov 22, 2017
@ardan-bkennedy
Copy link
Author

@ardan-bkennedy ardan-bkennedy commented Nov 22, 2017

After talking to Sameer I agree. I’ve adjusted my teaching accordingly. Thank You.

@golang golang locked and limited conversation to collaborators Nov 22, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants
You can’t perform that action at this time.