Skip to content

runtime: freedefer performance oddity #18923

@petermattis

Description

@petermattis

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go version go1.8rc3 darwin/amd64

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/pmattis/Development/go"
GORACE=""
GOROOT="/Users/pmattis/Development/go-1.8"
GOTOOLDIR="/Users/pmattis/Development/go-1.8/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/qc/fpqpgdqd167c70dtc6840xxh0000gn/T/go-build232974522=/tmp/go-build -gno-record-gcc-switches -fno-common"
CXX="clang++"
CGO_ENABLED="1"
PKG_CONFIG="pkg-config"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"

What did you do?

While profiling cockroachdb I noticed runtime.freedefer consuming a surprising amount of time (this is from a 30s profile and, yes, we make a lot of cgo calls):

(pprof) top10
127.59s of 159.49s total (80.00%)
Dropped 1044 nodes (cum <= 0.80s)
Showing top 10 nodes out of 202 (cum >= 1.74s)
      flat  flat%   sum%        cum   cum%
    54.35s 34.08% 34.08%     54.41s 34.11%  runtime.cgocall
    20.79s 13.04% 47.11%     20.81s 13.05%  syscall.Syscall
    14.99s  9.40% 56.51%     14.99s  9.40%  runtime.kevent
    13.04s  8.18% 64.69%     13.04s  8.18%  runtime.mach_semaphore_signal
     6.83s  4.28% 68.97%      6.83s  4.28%  runtime.mach_semaphore_wait
     6.80s  4.26% 73.23%      6.80s  4.26%  [cockroach]
     5.03s  3.15% 76.39%      5.03s  3.15%  runtime.usleep
     2.17s  1.36% 77.75%      4.79s  3.00%  runtime.scanobject
     1.85s  1.16% 78.91%      1.93s  1.21%  runtime.freedefer
     1.74s  1.09% 80.00%      1.74s  1.09%  runtime.duffcopy

Examining where the time is going within freedefer shows:

(pprof) list runtime.freedefer
Total: 2.66mins
...
         .          .    272:			})
         .          .    273:		}
     1.82s      1.89s    274:		*d = _defer{}
      20ms       30ms    275:		pp.deferpool[sc] = append(pp.deferpool[sc], d)

_defer is a simple structure of 7 fields. How is clearing the structure possibly taking that long? As an experiment, I tweaked this code to "manually" clear each field:

diff --git a/src/runtime/panic.go b/src/runtime/panic.go
index 876bca7..c7cbd3f 100644
--- a/src/runtime/panic.go
+++ b/src/runtime/panic.go
@@ -271,7 +271,14 @@ func freedefer(d *_defer) {
                                unlock(&sched.deferlock)
                        })
                }
-               *d = _defer{}
+               // *d = _defer{}
+               d.siz = 0
+               d.started = false
+               d.sp = 0
+               d.pc = 0
+               d.fn = nil
+               d._panic = nil
+               d.link = nil
                pp.deferpool[sc] = append(pp.deferpool[sc], d)
        }
 }

With this change freedefer consumes 110ms of time for the exact same workload.

Is this a real problem or is there some sort of profile oddity going on that is pointing blame at the *d = _defer{} line incorrectly? Seems like something real as the above change produces a small improvement on BenchmarkDefer:

name     old time/op  new time/op  delta
Defer-8  51.2ns ± 1%  50.1ns ± 1%  -2.13%  (p=0.000 n=19+20)

The above diff is still doing too much work as many of the fields are already clear or will be overwritten by the caller of newdefer:

diff --git a/src/runtime/panic.go b/src/runtime/panic.go
index 876bca7..39db94d 100644
--- a/src/runtime/panic.go
+++ b/src/runtime/panic.go
@@ -271,7 +271,8 @@ func freedefer(d *_defer) {
                                unlock(&sched.deferlock)
                        })
                }
-               *d = _defer{}
+               d.started = false
+               d.link = nil
                pp.deferpool[sc] = append(pp.deferpool[sc], d)
        }
 }

Which results in:

name     old time/op  new time/op  delta
Defer-8  51.2ns ± 1%  49.2ns ± 1%  -4.01%  (p=0.000 n=19+20)

Despite the repeatability of the above I'm still dubious about this change as I don't have any explanation for why it makes a difference.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions