runtime: freedefer performance oddity

Please answer these questions before submitting your issue. Thanks!

### What version of Go are you using (`go version`)?

```
go version go1.8rc3 darwin/amd64
```

### What operating system and processor architecture are you using (`go env`)?

```
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/pmattis/Development/go"
GORACE=""
GOROOT="/Users/pmattis/Development/go-1.8"
GOTOOLDIR="/Users/pmattis/Development/go-1.8/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/qc/fpqpgdqd167c70dtc6840xxh0000gn/T/go-build232974522=/tmp/go-build -gno-record-gcc-switches -fno-common"
CXX="clang++"
CGO_ENABLED="1"
PKG_CONFIG="pkg-config"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
```

### What did you do?

While profiling [cockroachdb](https://github.com/cockroachdb/cockroach) I noticed `runtime.freedefer` consuming a surprising amount of time (this is from a 30s profile and, yes, we make a lot of cgo calls):

```
(pprof) top10
127.59s of 159.49s total (80.00%)
Dropped 1044 nodes (cum <= 0.80s)
Showing top 10 nodes out of 202 (cum >= 1.74s)
      flat  flat%   sum%        cum   cum%
    54.35s 34.08% 34.08%     54.41s 34.11%  runtime.cgocall
    20.79s 13.04% 47.11%     20.81s 13.05%  syscall.Syscall
    14.99s  9.40% 56.51%     14.99s  9.40%  runtime.kevent
    13.04s  8.18% 64.69%     13.04s  8.18%  runtime.mach_semaphore_signal
     6.83s  4.28% 68.97%      6.83s  4.28%  runtime.mach_semaphore_wait
     6.80s  4.26% 73.23%      6.80s  4.26%  [cockroach]
     5.03s  3.15% 76.39%      5.03s  3.15%  runtime.usleep
     2.17s  1.36% 77.75%      4.79s  3.00%  runtime.scanobject
     1.85s  1.16% 78.91%      1.93s  1.21%  runtime.freedefer
     1.74s  1.09% 80.00%      1.74s  1.09%  runtime.duffcopy
```

Examining where the time is going within `freedefer` shows:

```
(pprof) list runtime.freedefer
Total: 2.66mins
...
         .          .    272:			})
         .          .    273:		}
     1.82s      1.89s    274:		*d = _defer{}
      20ms       30ms    275:		pp.deferpool[sc] = append(pp.deferpool[sc], d)
```

`_defer` is a simple structure of 7 fields. How is clearing the structure possibly taking that long? As an experiment, I tweaked this code to "manually" clear each field:

```diff
diff --git a/src/runtime/panic.go b/src/runtime/panic.go
index 876bca7..c7cbd3f 100644
--- a/src/runtime/panic.go
+++ b/src/runtime/panic.go
@@ -271,7 +271,14 @@ func freedefer(d *_defer) {
                                unlock(&sched.deferlock)
                        })
                }
-               *d = _defer{}
+               // *d = _defer{}
+               d.siz = 0
+               d.started = false
+               d.sp = 0
+               d.pc = 0
+               d.fn = nil
+               d._panic = nil
+               d.link = nil
                pp.deferpool[sc] = append(pp.deferpool[sc], d)
        }
 }
```

With this change `freedefer` consumes 110ms of time for the exact same workload. 

Is this a real problem or is there some sort of profile oddity going on that is pointing blame at the `*d = _defer{}` line incorrectly? Seems like something real as the above change produces a small improvement on `BenchmarkDefer`:

```
name     old time/op  new time/op  delta
Defer-8  51.2ns ± 1%  50.1ns ± 1%  -2.13%  (p=0.000 n=19+20)
```

The above diff is still doing too much work as many of the fields are already clear or will be overwritten by the caller of `newdefer`:

```diff
diff --git a/src/runtime/panic.go b/src/runtime/panic.go
index 876bca7..39db94d 100644
--- a/src/runtime/panic.go
+++ b/src/runtime/panic.go
@@ -271,7 +271,8 @@ func freedefer(d *_defer) {
                                unlock(&sched.deferlock)
                        })
                }
-               *d = _defer{}
+               d.started = false
+               d.link = nil
                pp.deferpool[sc] = append(pp.deferpool[sc], d)
        }
 }
```

Which results in:

```
name     old time/op  new time/op  delta
Defer-8  51.2ns ± 1%  49.2ns ± 1%  -4.01%  (p=0.000 n=19+20)
```

Despite the repeatability of the above I'm still dubious about this change as I don't have any explanation for why it makes a difference.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

runtime: freedefer performance oddity #18923

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?

What did you do?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

runtime: freedefer performance oddity #18923

Description

What version of Go are you using (go version)?

What operating system and processor architecture are you using (go env)?

What did you do?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?