Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: CPU instruction "NOP" utilizes ~50% CPU and pprof doesn't explain that #30708

Open
xaionaro opened this Issue Mar 9, 2019 · 1 comment

Comments

Projects
None yet
2 participants
@xaionaro
Copy link

xaionaro commented Mar 9, 2019

What version of Go are you using (go version)?

$ go version
go version go1.12 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/xaionaro/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/xaionaro/go"
GOPROXY=""
GORACE=""
GOROOT="/home/xaionaro/.gimme/versions/go1.12.linux.amd64"
GOTMPDIR=""
GOTOOLDIR="/home/xaionaro/.gimme/versions/go1.12.linux.amd64/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build465181040=/tmp/go-build -gno-record-gcc-switches"

What did you do?

go get github.com/xaionaro-go/atomicmap
cd "$(go env GOPATH)"/src/github.com/xaionaro-go/atomicmap
git pull # just in case
git checkout performance_experiments
go get ./...
go test ./ -bench=Benchmark_atomicmap_Get_intKeyType_blockSize16777216_keyAmount1048576_trueThreadSafety -benchmem -benchtime 5s -timeout 60s -cpuprofile /tmp/cpu.prof
go tool pprof /tmp/cpu.prof
(pprof) web
(pprof) web increaseReadersStage0Sub0
(pprof) web increaseReadersStage0Sub0Sub0
(pprof) web increaseReadersStage0Sub0Sub1

What did you expect to see?

I expect to see any information about what is utilizing CPU

What did you see instead?

I see an empty function (that doesn't do anything by itself) which utilizes ~50% CPU. Or to be more specific instruction "NOPL" utilizes the CPU. It doesn't make any sense.

Screenshot from 2019-03-10 00-57-55
Screenshot from 2019-03-10 00-58-22
Screenshot from 2019-03-10 00-58-37

The method:

func (slot *storageItem) increaseReadersStage0Sub0() isSet {
    slot.increaseReadersStage0Sub0Sub0()
    return slot.increaseReadersStage0Sub0Sub1()
}

According to pprof (see screenshots) both of this sub-calls doesn't utilizes anything essential. But the method utilizes about 50% of CPU.

I separated this functions this way intentionally to demonstrate the problem. The problem exists if I remove this extra calling levels (even if I manually inline that code to method getByHashValue), too.

         .          .    107:func (slot *storageItem) increaseReadersStage0Sub0() isSet {
         .       30ms    108:   slot.increaseReadersStage0Sub0Sub0()
     9.05s      9.10s    109:   return slot.increaseReadersStage0Sub0Sub1()
         .          .    110:}

Or another try (I separated the return line on two and removed type isSet) -- the same result:

         .          .    107:func (slot *storageItem) increaseReadersStage0Sub0() uint32 {
         .       10ms    108:   slot.increaseReadersStage0Sub0Sub0()
     8.16s      8.89s    109:   r := slot.increaseReadersStage0Sub0Sub1()
         .          .    110:   return r
         .          .    111:}

disasm

ROUTINE ======================== github.com/xaionaro-go/atomicmap.(*storageItem).increaseReaders
     4.61s     14.29s (flat, cum) 191.04% of Total
         .          .     4bb370: MOVQ FS:0xfffffff8, CX                  ;storage.go:146
      10ms       10ms     4bb379: CMPQ 0x10(CX), SP                       ;github.com/xaionaro-go/atomicmap.(*storageItem).increaseReaders storage.go:146
         .          .     4bb37d: JBE 0x4bb3ef                            ;storage.go:146
         .          .     4bb37f: SUBQ $0x18, SP
         .          .     4bb383: MOVQ BP, 0x10(SP)
         .          .     4bb388: LEAQ 0x10(SP), BP
      10ms      4.59s     4bb38d: NOPL                                    ;github.com/xaionaro-go/atomicmap.(*storageItem).increaseReaders storage.go:147
         .      4.58s     4bb38e: NOPL                                    ;github.com/xaionaro-go/atomicmap.(*storageItem).increaseReadersStage0 storage.go:113
         .       10ms     4bb38f: NOPL                                    ;github.com/xaionaro-go/atomicmap.(*storageItem).increaseReadersStage0Sub0 storage.go:108
         .          .     4bb390: MOVL $0x1, AX                           ;storage.go:147
      10ms       10ms     4bb395: MOVQ 0x20(SP), CX                       ;github.com/xaionaro-go/atomicmap.(*storageItem).increaseReadersStage0Sub0Sub0 storage.go:102
         .          .     4bb39a: LOCK XADDL AX, 0x4(CX)                  ;storage.go:102
     4.07s      4.57s     4bb39f: NOPL                                    ;github.com/xaionaro-go/atomicmap.(*storageItem).increaseReadersStage0Sub0 storage.go:109
     490ms      500ms     4bb3a0: NOPL                                    ;github.com/xaionaro-go/atomicmap.(*storageItem).increaseReadersStage0Sub0Sub1 storage.go:105
      10ms       10ms     4bb3a1: NOPL                                    ;github.com/xaionaro-go/atomicmap.(*storageItem).IsSet storage.go:41
         .          .     4bb3a2: MOVL 0(CX), AX                          ;storage.go:21
         .          .     4bb3a4: TESTL AX, AX                            ;storage.go:117
         .          .     4bb3a6: JNE 0x4bb3d4
         .          .     4bb3a8: MOVL $-0x1, DX                          ;storage.go:118
         .          .     4bb3ad: LOCK XADDL DX, 0x4(CX)
         .          .     4bb3b2: CMPL $0xa, AX                           ;storage.go:148
         .          .     4bb3b5: JE 0x4bb3c5
      10ms       10ms     4bb3b7: MOVL AX, 0x28(SP)                       ;github.com/xaionaro-go/atomicmap.(*storageItem).increaseReaders storage.go:151
         .          .     4bb3bb: MOVQ 0x10(SP), BP                       ;storage.go:151
         .          .     4bb3c0: ADDQ $0x18, SP
         .          .     4bb3c4: RET
         .          .     4bb3c5: MOVQ CX, 0(SP)                          ;storage.go:149
         .          .     4bb3c9: CALL github.com/xaionaro-go/atomicmap.(*storageItem).increaseReadersStage1(SB)
         .          .     4bb3ce: MOVL 0x8(SP), AX
         .          .     4bb3d2: JMP 0x4bb3b7
         .          .     4bb3d4: CMPL $0x1, AX                           ;storage.go:115
         .          .     4bb3d7: JE 0x4bb3b2
         .          .     4bb3d9: CMPL $0x4, AX                           ;storage.go:117
         .          .     4bb3dc: JE 0x4bb3a8
         .          .     4bb3de: MOVL $-0x1, DX                          ;storage.go:121
         .          .     4bb3e3: LOCK XADDL DX, 0x4(CX)
         .          .     4bb3e8: MOVL $0xa, AX
         .          .     4bb3ed: JMP 0x4bb3b2                            ;storage.go:147
         .          .     4bb3ef: CALL runtime.morestack_noctxt(SB)       ;storage.go:146
         .          .     4bb3f4: ?
         .          .     4bb3f5: JA 0x4bb3f6
         .          .     4bb3f7: ?

@xaionaro xaionaro changed the title Something (effectively empty method) utilizes ~50% CPU and pprof doesn't show what it is NOPL utilizes ~50% CPU and pprof explain that Mar 10, 2019

@xaionaro xaionaro changed the title NOPL utilizes ~50% CPU and pprof explain that Instruction "NOPL" utilizes ~50% CPU and pprof explain that Mar 10, 2019

@xaionaro xaionaro changed the title Instruction "NOPL" utilizes ~50% CPU and pprof explain that CPU instruction "NOPL" utilizes ~50% CPU and pprof doesn't explain that Mar 10, 2019

@xaionaro xaionaro changed the title CPU instruction "NOPL" utilizes ~50% CPU and pprof doesn't explain that CPU instruction "NOP" utilizes ~50% CPU and pprof doesn't explain that Mar 10, 2019

@xaionaro

This comment has been minimized.

Copy link
Author

xaionaro commented Mar 10, 2019

Just in case:
The same problem if I remove LOCK XADDL:

ROUTINE ======================== github.com/xaionaro-go/atomicmap.(*storageItem).increaseReaders
     4.48s     13.27s (flat, cum) 175.99% of Total
         .          .     4bb370: MOVQ FS:0xfffffff8, CX                  ;storage.go:147
      10ms       10ms     4bb379: CMPQ 0x10(CX), SP                       ;github.com/xaionaro-go/atomicmap.(*storageItem).increaseReaders storage.go:147
         .          .     4bb37d: JBE 0x4bb3e8                            ;storage.go:147
         .          .     4bb37f: SUBQ $0x18, SP
         .          .     4bb383: MOVQ BP, 0x10(SP)
      10ms       10ms     4bb388: LEAQ 0x10(SP), BP                       ;github.com/xaionaro-go/atomicmap.(*storageItem).increaseReaders storage.go:147
      20ms      4.42s     4bb38d: NOPL                                    ;github.com/xaionaro-go/atomicmap.(*storageItem).increaseReaders storage.go:148
         .      4.39s     4bb38e: NOPL                                    ;github.com/xaionaro-go/atomicmap.(*storageItem).increaseReadersStage0 storage.go:114
         .          .     4bb38f: NOPL                                    ;storage.go:109
         .          .     4bb390: MOVQ 0x20(SP), AX                       ;storage.go:148
         .          .     4bb395: INCL 0x4(AX)                            ;storage.go:102
     4.39s      4.39s     4bb398: NOPL                                    ;github.com/xaionaro-go/atomicmap.(*storageItem).increaseReadersStage0Sub0 storage.go:110
         .          .     4bb399: NOPL                                    ;storage.go:106
         .          .     4bb39a: NOPL                                    ;storage.go:41
         .          .     4bb39b: MOVL 0(AX), CX                          ;storage.go:21
      10ms       10ms     4bb39d: TESTL CX, CX                            ;github.com/xaionaro-go/atomicmap.(*storageItem).increaseReadersStage0 storage.go:118
         .          .     4bb39f: JNE 0x4bb3cd                            ;storage.go:118
         .          .     4bb3a1: MOVL $-0x1, DX                          ;storage.go:119
         .          .     4bb3a6: LOCK XADDL DX, 0x4(AX)
         .          .     4bb3ab: CMPL $0xa, CX                           ;storage.go:149
         .          .     4bb3ae: JE 0x4bb3be
         .          .     4bb3b0: MOVL CX, 0x28(SP)                       ;storage.go:152
         .          .     4bb3b4: MOVQ 0x10(SP), BP
         .          .     4bb3b9: ADDQ $0x18, SP
      40ms       40ms     4bb3bd: RET                                     ;github.com/xaionaro-go/atomicmap.(*storageItem).increaseReaders storage.go:152
         .          .     4bb3be: MOVQ AX, 0(SP)                          ;storage.go:150
         .          .     4bb3c2: CALL github.com/xaionaro-go/atomicmap.(*storageItem).increaseReadersStage1(SB)
         .          .     4bb3c7: MOVL 0x8(SP), CX
         .          .     4bb3cb: JMP 0x4bb3b0
         .          .     4bb3cd: CMPL $0x1, CX                           ;storage.go:116
         .          .     4bb3d0: JE 0x4bb3ab
         .          .     4bb3d2: CMPL $0x4, CX                           ;storage.go:118
         .          .     4bb3d5: JE 0x4bb3a1
         .          .     4bb3d7: MOVL $-0x1, DX                          ;storage.go:122
         .          .     4bb3dc: LOCK XADDL DX, 0x4(AX)
         .          .     4bb3e1: MOVL $0xa, CX
         .          .     4bb3e6: JMP 0x4bb3ab                            ;storage.go:148
         .          .     4bb3e8: CALL runtime.morestack_noctxt(SB)       ;storage.go:147
         .          .     4bb3ed: ?

@ALTree ALTree changed the title CPU instruction "NOP" utilizes ~50% CPU and pprof doesn't explain that runtime: CPU instruction "NOP" utilizes ~50% CPU and pprof doesn't explain that Mar 10, 2019

@ALTree ALTree added this to the Go1.13 milestone Mar 10, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.