Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/pprof: weblist gives wrong flat count #52000

Open
fumin opened this issue Mar 29, 2022 · 0 comments
Open

cmd/pprof: weblist gives wrong flat count #52000

fumin opened this issue Mar 29, 2022 · 0 comments
Labels
NeedsInvestigation
Milestone

Comments

@fumin
Copy link

@fumin fumin commented Mar 29, 2022

What version of Go are you using (go version)?

$ go version
go version go1.17.6 windows/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
set GO111MODULE=
set GOARCH=amd64
set GOBIN=
set GOCACHE=C:\Users\a3367\AppData\Local\go-build
set GOENV=C:\Users\a3367\AppData\Roaming\go\env
set GOEXE=.exe
set GOEXPERIMENT=
set GOFLAGS=
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOINSECURE=
set GOMODCACHE=C:\Users\a3367\go\pkg\mod
set GONOPROXY=
set GONOSUMDB=
set GOOS=windows
set GOPATH=C:\Users\a3367\go
set GOPRIVATE=
set GOPROXY=https://proxy.golang.org,direct
set GOROOT=C:\Program Files\Go
set GOSUMDB=sum.golang.org
set GOTMPDIR=
set GOTOOLDIR=C:\Program Files\Go\pkg\tool\windows_amd64
set GOVCS=
set GOVERSION=go1.17.6
set GCCGO=gccgo
set AR=ar
set CC=gcc
set CXX=g++
set CGO_ENABLED=1
set GOMOD=NUL
set CGO_CFLAGS=-g -O2
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2
set PKG_CONFIG=pkg-config
set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0 -fdebug-prefix-map=C:\Users\a3367\AppData\Local\Temp\go-build2334633044=/tmp/go-build -gno-record-gcc-switches
gdb --version: GNU gdb (GDB) 8.1

What did you do?

I profiled the standard library's png.Encode function and looked at this attached profile:

png_encode.zip

I then ran both weblist and list on the png.filter function.

What did you expect to see?

I expect both weblist and list to show the same result.

What did you see instead?

I saw list showing the correct result, with the sum of the flat time of png.abs8 in png.filter tallying with the total time spent on png.abs8.
In particular, the below log shows that png.abs8 took 1.56 seconds in total.

PS C:\Users\a3367\work\misc\seg\output> go tool pprof .\png_encode.pprof
Type: cpu
Time: Mar 29, 2022 at 1:28pm (CST)
Duration: 31.43s, Total samples = 17.53s (55.77%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 16.92s, 96.52% of 17.53s total
Dropped 91 nodes (cum <= 0.09s)
Showing top 10 nodes out of 29
      flat  flat%   sum%        cum   cum%
     7.96s 45.41% 45.41%     13.07s 74.56%  image/png.filter
        2s 11.41% 56.82%      2.01s 11.47%  image/png.abs
     1.74s  9.93% 66.74%      1.74s  9.93%  runtime.memmove
     1.56s  8.90% 75.64%      1.56s  8.90%  image/png.abs8
     1.44s  8.21% 83.86%      3.46s 19.74%  image/png.paeth
     0.90s  5.13% 88.99%      0.90s  5.13%  hash/adler32.update
     0.80s  4.56% 93.55%      0.81s  4.62%  compress/flate.(*deflateFast).matchLen
     0.29s  1.65% 95.21%      0.31s  1.77%  compress/flate.(*huffmanEncoder).bitCounts
     0.12s  0.68% 95.89%      0.12s  0.68%  runtime.asyncPreempt
     0.11s  0.63% 96.52%      1.04s  5.93%  compress/flate.(*deflateFast).encode
(pprof)

Adding the time spent on png.abs8 in png.filter also shows the same 1.56 seconds.

(pprof) list png.filter
Total: 17.53s
ROUTINE ======================== image/png.filter in C:\Program Files\Go\src\image\png\writer.go
     7.96s     13.07s (flat, cum) 74.56% of Total
         .          .    202:   return len(b), nil
         .          .    203:}
         .          .    204:
         .          .    205:// Chooses the filter to use for encoding the current row, and applies it.
         .          .    206:// The return value is the index of the filter and also of the row in cr that has had it applied.
      10ms       10ms    207:func filter(cr *[nFilter][]byte, pr []byte, bpp int) int {
         .          .    208:   // We try all five filter types, and pick the one that minimizes the sum of absolute differences.
         .          .    209:   // This is the same heuristic that libpng uses, although the filters are attempted in order of
         .          .    210:   // estimated most likely to be minimal (ftUp, ftPaeth, ftNone, ftSub, ftAverage), rather than
         .          .    211:   // in their enumeration order (ftNone, ftSub, ftUp, ftAverage, ftPaeth).
         .          .    212:   cdat0 := cr[0][1:]
     110ms      110ms    213:   cdat1 := cr[1][1:]
         .          .    214:   cdat2 := cr[2][1:]
         .          .    215:   cdat3 := cr[3][1:]
         .          .    216:   cdat4 := cr[4][1:]
         .          .    217:   pdat := pr[1:]
         .          .    218:   n := len(cdat0)
         .          .    219:
         .          .    220:   // The up filter.
         .          .    221:   sum := 0
     310ms      310ms    222:   for i := 0; i < n; i++ {
     1.02s      1.03s    223:           cdat2[i] = cdat0[i] - pdat[i]
     230ms      940ms    224:           sum += abs8(cdat2[i])
         .          .    225:   }
         .          .    226:   best := sum
         .          .    227:   filter := ftUp
         .          .    228:
         .          .    229:   // The Paeth filter.
         .          .    230:   sum = 0
         .          .    231:   for i := 0; i < bpp; i++ {
         .          .    232:           cdat4[i] = cdat0[i] - pdat[i]
         .          .    233:           sum += abs8(cdat4[i])
         .          .    234:   }
     790ms      790ms    235:   for i := bpp; i < n; i++ {
     3.49s      6.99s    236:           cdat4[i] = cdat0[i] - paeth(cdat0[i-bpp], pdat[i], pdat[i-bpp])
     240ms      760ms    237:           sum += abs8(cdat4[i])
     180ms      180ms    238:           if sum >= best {
         .          .    239:                   break
         .          .    240:           }
         .          .    241:   }
      40ms       40ms    242:   if sum < best {
         .          .    243:           best = sum
         .          .    244:           filter = ftPaeth
         .          .    245:   }
         .          .    246:
         .          .    247:   // The none filter.
         .          .    248:   sum = 0
         .          .    249:   for i := 0; i < n; i++ {
         .          .    250:           sum += abs8(cdat0[i])
      10ms       10ms    251:           if sum >= best {
         .          .    252:                   break
         .          .    253:           }
         .          .    254:   }
         .          .    255:   if sum < best {
         .          .    256:           best = sum
         .          .    257:           filter = ftNone
         .          .    258:   }
         .          .    259:
         .          .    260:   // The sub filter.
         .          .    261:   sum = 0
         .          .    262:   for i := 0; i < bpp; i++ {
         .          .    263:           cdat1[i] = cdat0[i]
         .          .    264:           sum += abs8(cdat1[i])
         .          .    265:   }
     190ms      210ms    266:   for i := bpp; i < n; i++ {
     990ms         1s    267:           cdat1[i] = cdat0[i] - cdat0[i-bpp]
      80ms      390ms    268:           sum += abs8(cdat1[i])
     150ms      150ms    269:           if sum >= best {
         .          .    270:                   break
         .          .    271:           }
         .          .    272:   }
         .          .    273:   if sum < best {
         .          .    274:           best = sum
         .          .    275:           filter = ftSub
         .          .    276:   }
         .          .    277:
         .          .    278:   // The average filter.
         .          .    279:   sum = 0
         .          .    280:   for i := 0; i < bpp; i++ {
      70ms       80ms    281:           cdat3[i] = cdat0[i] - pdat[i]/2
         .          .    282:           sum += abs8(cdat3[i])
         .          .    283:   }
         .          .    284:   for i := bpp; i < n; i++ {
      30ms       30ms    285:           cdat3[i] = cdat0[i] - uint8((int(cdat0[i-bpp])+int(pdat[i]))/2)
      10ms       30ms    286:           sum += abs8(cdat3[i])
      10ms       10ms    287:           if sum >= best {
         .          .    288:                   break
         .          .    289:           }
         .          .    290:   }
         .          .    291:   if sum < best {
         .          .    292:           filter = ftAverage
(pprof)

However, weblist shows the wrong result, it seems to exagerate the result.
Pprof listing.zip
For example, on line 268, list shows only 80ms spent, whereas weblist shows 360ms spent.

@cherrymui cherrymui added the NeedsInvestigation label Apr 8, 2022
@cherrymui cherrymui added this to the Backlog milestone Apr 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation
Projects
None yet
Development

No branches or pull requests

2 participants