-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Description
Please answer these questions before submitting your issue. Thanks!
What version of Go are you using (go version
)?
go version go1.8rc3 darwin/amd64
What operating system and processor architecture are you using (go env
)?
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/pmattis/Development/go"
GORACE=""
GOROOT="/Users/pmattis/Development/go-1.8"
GOTOOLDIR="/Users/pmattis/Development/go-1.8/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/qc/fpqpgdqd167c70dtc6840xxh0000gn/T/go-build232974522=/tmp/go-build -gno-record-gcc-switches -fno-common"
CXX="clang++"
CGO_ENABLED="1"
PKG_CONFIG="pkg-config"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
What did you do?
While profiling cockroachdb I noticed runtime.freedefer
consuming a surprising amount of time (this is from a 30s profile and, yes, we make a lot of cgo calls):
(pprof) top10
127.59s of 159.49s total (80.00%)
Dropped 1044 nodes (cum <= 0.80s)
Showing top 10 nodes out of 202 (cum >= 1.74s)
flat flat% sum% cum cum%
54.35s 34.08% 34.08% 54.41s 34.11% runtime.cgocall
20.79s 13.04% 47.11% 20.81s 13.05% syscall.Syscall
14.99s 9.40% 56.51% 14.99s 9.40% runtime.kevent
13.04s 8.18% 64.69% 13.04s 8.18% runtime.mach_semaphore_signal
6.83s 4.28% 68.97% 6.83s 4.28% runtime.mach_semaphore_wait
6.80s 4.26% 73.23% 6.80s 4.26% [cockroach]
5.03s 3.15% 76.39% 5.03s 3.15% runtime.usleep
2.17s 1.36% 77.75% 4.79s 3.00% runtime.scanobject
1.85s 1.16% 78.91% 1.93s 1.21% runtime.freedefer
1.74s 1.09% 80.00% 1.74s 1.09% runtime.duffcopy
Examining where the time is going within freedefer
shows:
(pprof) list runtime.freedefer
Total: 2.66mins
...
. . 272: })
. . 273: }
1.82s 1.89s 274: *d = _defer{}
20ms 30ms 275: pp.deferpool[sc] = append(pp.deferpool[sc], d)
_defer
is a simple structure of 7 fields. How is clearing the structure possibly taking that long? As an experiment, I tweaked this code to "manually" clear each field:
diff --git a/src/runtime/panic.go b/src/runtime/panic.go
index 876bca7..c7cbd3f 100644
--- a/src/runtime/panic.go
+++ b/src/runtime/panic.go
@@ -271,7 +271,14 @@ func freedefer(d *_defer) {
unlock(&sched.deferlock)
})
}
- *d = _defer{}
+ // *d = _defer{}
+ d.siz = 0
+ d.started = false
+ d.sp = 0
+ d.pc = 0
+ d.fn = nil
+ d._panic = nil
+ d.link = nil
pp.deferpool[sc] = append(pp.deferpool[sc], d)
}
}
With this change freedefer
consumes 110ms of time for the exact same workload.
Is this a real problem or is there some sort of profile oddity going on that is pointing blame at the *d = _defer{}
line incorrectly? Seems like something real as the above change produces a small improvement on BenchmarkDefer
:
name old time/op new time/op delta
Defer-8 51.2ns ± 1% 50.1ns ± 1% -2.13% (p=0.000 n=19+20)
The above diff is still doing too much work as many of the fields are already clear or will be overwritten by the caller of newdefer
:
diff --git a/src/runtime/panic.go b/src/runtime/panic.go
index 876bca7..39db94d 100644
--- a/src/runtime/panic.go
+++ b/src/runtime/panic.go
@@ -271,7 +271,8 @@ func freedefer(d *_defer) {
unlock(&sched.deferlock)
})
}
- *d = _defer{}
+ d.started = false
+ d.link = nil
pp.deferpool[sc] = append(pp.deferpool[sc], d)
}
}
Which results in:
name old time/op new time/op delta
Defer-8 51.2ns ± 1% 49.2ns ± 1% -4.01% (p=0.000 n=19+20)
Despite the repeatability of the above I'm still dubious about this change as I don't have any explanation for why it makes a difference.