Skip to content

runtime: performance regression about 2~3ms since Go 1.15 on macOS, due to linking two more shared libraries #40727

@itchyny

Description

@itchyny

What version of Go are you using (go version)?

$ go version
go version go1.15 darwin/amd64

1.15, especially on macOS

Does this issue reproduce with the latest release?

Yes, this is a performance comparison issue, compared 1.15 against 1.14.7.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/itchyny/Library/Caches/go-build"
GOENV="/Users/itchyny/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/itchyny/.local/share/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/itchyny/.local/share/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/Users/itchyny/Downloads/go1.15.darwin-amd64/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/Users/itchyny/Downloads/go1.15.darwin-amd64/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/1c/r313z3nn77b3qyzyx6cnwcv80000gn/T/go-build806662573=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

Compile a hello world program using Go 1.15 and 1.14.7 and compare the performance using hyperfine.

 % cat main.go
package main

import "fmt"

func main() {
        fmt.Println("Hello, world!")
}

 % ./go1.15.darwin-amd64/go/bin/go build -o hello-go115 main.go

 % ./go1.14.7.darwin-amd64/go/bin/go build -o hello-go114 main.go

 % hyperfine --warmup 10 ./hello-go115 ./hello-go114 # first run
Benchmark #1: ./hello-go115
  Time (mean ± σ):       7.3 ms ±   0.9 ms    [User: 2.6 ms, System: 3.2 ms]
  Range (min … max):     6.1 ms …  12.3 ms    228 runs

Benchmark #2: ./hello-go114
  Time (mean ± σ):       5.0 ms ±   1.5 ms    [User: 1.3 ms, System: 2.2 ms]
  Range (min … max):     3.5 ms …  12.8 ms    261 runs

Summary
  './hello-go114' ran
    1.46 ± 0.49 times faster than './hello-go115'

  Warning: Command took less than 5 ms to complete. Results might be inaccurate.

 % hyperfine --warmup 10 ./hello-go115 ./hello-go114 # second run
Benchmark #1: ./hello-go115
  Time (mean ± σ):       7.3 ms ±   0.7 ms    [User: 2.6 ms, System: 3.0 ms]
  Range (min … max):     6.3 ms …  11.6 ms    211 runs

Benchmark #2: ./hello-go114
  Time (mean ± σ):       4.8 ms ±   0.6 ms    [User: 1.2 ms, System: 2.1 ms]
  Range (min … max):     3.8 ms …   7.9 ms    287 runs

Summary
  './hello-go114' ran
    1.54 ± 0.23 times faster than './hello-go115'

  Warning: Command took less than 5 ms to complete. Results might be inaccurate.

 % # This hello world program is too small to compare the performance so let's do a loop.

 % cat main.go
package main

import "fmt"

func main() {
        for i := 0; i < 50000; i++ {
                fmt.Println("Hello, world!")
        }
}

 % ./go1.15.darwin-amd64/go/bin/go build -o hello-go115 main.go

 % ./go1.14.7.darwin-amd64/go/bin/go build -o hello-go114 main.go

 % hyperfine --warmup 10 ./hello-go115 ./hello-go114 # first run
Benchmark #1: ./hello-go115
  Time (mean ± σ):      44.1 ms ±   1.9 ms    [User: 26.7 ms, System: 15.9 ms]
  Range (min … max):    39.9 ms …  49.3 ms    58 runs

Benchmark #2: ./hello-go114
  Time (mean ± σ):      41.1 ms ±   2.5 ms    [User: 23.9 ms, System: 15.6 ms]
  Range (min … max):    36.5 ms …  48.6 ms    67 runs

Summary
  './hello-go114' ran
    1.07 ± 0.08 times faster than './hello-go115'

 % hyperfine --warmup 10 ./hello-go115 ./hello-go114 # second run
Benchmark #1: ./hello-go115
  Time (mean ± σ):      44.4 ms ±   2.7 ms    [User: 26.5 ms, System: 16.1 ms]
  Range (min … max):    39.1 ms …  50.5 ms    58 runs

Benchmark #2: ./hello-go114
  Time (mean ± σ):      41.2 ms ±   2.2 ms    [User: 23.8 ms, System: 15.8 ms]
  Range (min … max):    37.1 ms …  46.7 ms    62 runs

Summary
  './hello-go114' ran
    1.08 ± 0.09 times faster than './hello-go115'

What did you expect to see?

I upgraded the version of Go so I see no performance regression or even faster.

What did you see instead?

  • Single fmt.Println program took 4.9 ms in Go 1.14.7 but 7.3 ms in Go 1.15, so the regression is around 2.4 ms.
  • Repeating fmt.Println 50000 times took 41.2 ms in Go 1.14.7 but 44.4 ms in Go 1.15 so the regression is around 3.2 ms.

Why is this happening?

This regression is caused by https://go-review.googlesource.com/c/go/+/227037/ (crypto/x509: use Security.framework without cgo for roots on macOS). As you can see with otool command (ldd equivalent on macOS), two more shared library are linked in Go 1.15 compared to 1.14.7. This will lead (almost) constant overhead to any program built for macOS.

 % otool -L hello-go114
hello-go114:
        /usr/lib/libSystem.B.dylib (compatibility version 0.0.0, current version 0.0.0)

 % otool -L hello-go115
hello-go115:
        /usr/lib/libSystem.B.dylib (compatibility version 0.0.0, current version 0.0.0)
        /System/Library/Frameworks/Security.framework/Versions/A/Security (compatibility version 0.0.0, current version 0.0.0)
        /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation (compatibility version 0.0.0, current version 0.0.0)

Actually, when I removed the two lines and build the Go command, it produces the executable without linking to the two libraries and the performance is now comparable.

The Go 1.15 Release Notes says On macOS, binaries are now always linked against Security.framework to extract the system trust roots, regardless of whether cgo is available. The resulting behavior should be more consistent with the OS verifier. Thus I think this is an intended overhead we have to accept since Go 1.15. But I want to mention here in this issue tracker because this is where I searched for some regression issues after seeing the weird profiling results on upgrading Go. 2~3 ms addition for hello world program was actually surprising. Maybe I have to accept the fact that the change is applied to any program, even if it is not related to networking.

Go devs, I always appreciate your tough works and please close this issue if the overhead is expected. Thanks @koron for the big clue for this problem.

Regards

Metadata

Metadata

Assignees

No one assigned

    Labels

    FrozenDueToAgeNeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.OS-Darwin

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions