Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: performance regression about 2~3ms since Go 1.15 on macOS, due to linking two more shared libraries #40727

Closed
itchyny opened this issue Aug 12, 2020 · 10 comments

Comments

@itchyny
Copy link
Contributor

@itchyny itchyny commented Aug 12, 2020

What version of Go are you using (go version)?

$ go version
go version go1.15 darwin/amd64

1.15, especially on macOS

Does this issue reproduce with the latest release?

Yes, this is a performance comparison issue, compared 1.15 against 1.14.7.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/itchyny/Library/Caches/go-build"
GOENV="/Users/itchyny/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/itchyny/.local/share/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/itchyny/.local/share/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/Users/itchyny/Downloads/go1.15.darwin-amd64/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/Users/itchyny/Downloads/go1.15.darwin-amd64/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/1c/r313z3nn77b3qyzyx6cnwcv80000gn/T/go-build806662573=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

Compile a hello world program using Go 1.15 and 1.14.7 and compare the performance using hyperfine.

 % cat main.go
package main

import "fmt"

func main() {
        fmt.Println("Hello, world!")
}

 % ./go1.15.darwin-amd64/go/bin/go build -o hello-go115 main.go

 % ./go1.14.7.darwin-amd64/go/bin/go build -o hello-go114 main.go

 % hyperfine --warmup 10 ./hello-go115 ./hello-go114 # first run
Benchmark #1: ./hello-go115
  Time (mean ± σ):       7.3 ms ±   0.9 ms    [User: 2.6 ms, System: 3.2 ms]
  Range (min … max):     6.1 ms …  12.3 ms    228 runs

Benchmark #2: ./hello-go114
  Time (mean ± σ):       5.0 ms ±   1.5 ms    [User: 1.3 ms, System: 2.2 ms]
  Range (min … max):     3.5 ms …  12.8 ms    261 runs

Summary
  './hello-go114' ran
    1.46 ± 0.49 times faster than './hello-go115'

  Warning: Command took less than 5 ms to complete. Results might be inaccurate.

 % hyperfine --warmup 10 ./hello-go115 ./hello-go114 # second run
Benchmark #1: ./hello-go115
  Time (mean ± σ):       7.3 ms ±   0.7 ms    [User: 2.6 ms, System: 3.0 ms]
  Range (min … max):     6.3 ms …  11.6 ms    211 runs

Benchmark #2: ./hello-go114
  Time (mean ± σ):       4.8 ms ±   0.6 ms    [User: 1.2 ms, System: 2.1 ms]
  Range (min … max):     3.8 ms …   7.9 ms    287 runs

Summary
  './hello-go114' ran
    1.54 ± 0.23 times faster than './hello-go115'

  Warning: Command took less than 5 ms to complete. Results might be inaccurate.

 % # This hello world program is too small to compare the performance so let's do a loop.

 % cat main.go
package main

import "fmt"

func main() {
        for i := 0; i < 50000; i++ {
                fmt.Println("Hello, world!")
        }
}

 % ./go1.15.darwin-amd64/go/bin/go build -o hello-go115 main.go

 % ./go1.14.7.darwin-amd64/go/bin/go build -o hello-go114 main.go

 % hyperfine --warmup 10 ./hello-go115 ./hello-go114 # first run
Benchmark #1: ./hello-go115
  Time (mean ± σ):      44.1 ms ±   1.9 ms    [User: 26.7 ms, System: 15.9 ms]
  Range (min … max):    39.9 ms …  49.3 ms    58 runs

Benchmark #2: ./hello-go114
  Time (mean ± σ):      41.1 ms ±   2.5 ms    [User: 23.9 ms, System: 15.6 ms]
  Range (min … max):    36.5 ms …  48.6 ms    67 runs

Summary
  './hello-go114' ran
    1.07 ± 0.08 times faster than './hello-go115'

 % hyperfine --warmup 10 ./hello-go115 ./hello-go114 # second run
Benchmark #1: ./hello-go115
  Time (mean ± σ):      44.4 ms ±   2.7 ms    [User: 26.5 ms, System: 16.1 ms]
  Range (min … max):    39.1 ms …  50.5 ms    58 runs

Benchmark #2: ./hello-go114
  Time (mean ± σ):      41.2 ms ±   2.2 ms    [User: 23.8 ms, System: 15.8 ms]
  Range (min … max):    37.1 ms …  46.7 ms    62 runs

Summary
  './hello-go114' ran
    1.08 ± 0.09 times faster than './hello-go115'

What did you expect to see?

I upgraded the version of Go so I see no performance regression or even faster.

What did you see instead?

  • Single fmt.Println program took 4.9 ms in Go 1.14.7 but 7.3 ms in Go 1.15, so the regression is around 2.4 ms.
  • Repeating fmt.Println 50000 times took 41.2 ms in Go 1.14.7 but 44.4 ms in Go 1.15 so the regression is around 3.2 ms.

Why is this happening?

This regression is caused by https://go-review.googlesource.com/c/go/+/227037/ (crypto/x509: use Security.framework without cgo for roots on macOS). As you can see with otool command (ldd equivalent on macOS), two more shared library are linked in Go 1.15 compared to 1.14.7. This will lead (almost) constant overhead to any program built for macOS.

 % otool -L hello-go114
hello-go114:
        /usr/lib/libSystem.B.dylib (compatibility version 0.0.0, current version 0.0.0)

 % otool -L hello-go115
hello-go115:
        /usr/lib/libSystem.B.dylib (compatibility version 0.0.0, current version 0.0.0)
        /System/Library/Frameworks/Security.framework/Versions/A/Security (compatibility version 0.0.0, current version 0.0.0)
        /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation (compatibility version 0.0.0, current version 0.0.0)

Actually, when I removed the two lines and build the Go command, it produces the executable without linking to the two libraries and the performance is now comparable.

The Go 1.15 Release Notes says On macOS, binaries are now always linked against Security.framework to extract the system trust roots, regardless of whether cgo is available. The resulting behavior should be more consistent with the OS verifier. Thus I think this is an intended overhead we have to accept since Go 1.15. But I want to mention here in this issue tracker because this is where I searched for some regression issues after seeing the weird profiling results on upgrading Go. 2~3 ms addition for hello world program was actually surprising. Maybe I have to accept the fact that the change is applied to any program, even if it is not related to networking.

Go devs, I always appreciate your tough works and please close this issue if the overhead is expected. Thanks @koron for the big clue for this problem.

Regards

@randall77
Copy link
Contributor

@randall77 randall77 commented Aug 12, 2020

@FiloSottile

Is there any way to avoid the reference to the new frameworks if none of the security code that uses it is linked in?

@andybons andybons added this to the Unplanned milestone Aug 12, 2020
@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Aug 12, 2020

I was looking at that the other day, and have a fix locally. Will send it out soon.

@FiloSottile
Copy link
Member

@FiloSottile FiloSottile commented Aug 12, 2020

It's reasonable to wire it up so it's only linked when crypto/x509 is imported, happy to review the CL @cherrymui.

@gopherbot
Copy link

@gopherbot gopherbot commented Aug 13, 2020

Change https://golang.org/cl/248333 mentions this issue: cmd/link: link dynamic library automatically

@hajimehoshi
Copy link
Member

@hajimehoshi hajimehoshi commented Aug 13, 2020

I was wondering if this will be ported to 1.15.1 or not.

@FiloSottile
Copy link
Member

@FiloSottile FiloSottile commented Aug 13, 2020

@gopherbot please open a backport issue for Go 1.15.

I am not sure this qualifies to be backported, but it might as it's a regression with no workaround.

The team will evaluate it in the backport issue.

@gopherbot
Copy link

@gopherbot gopherbot commented Aug 13, 2020

Backport issue(s) opened: #40764 (for 1.15).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

@networkimprov
Copy link

@networkimprov networkimprov commented Aug 14, 2020

@gopherbot add OS-Darwin

@gopherbot
Copy link

@gopherbot gopherbot commented Aug 17, 2020

Change https://golang.org/cl/248719 mentions this issue: [release-branch.go1.15] cmd/link: link dynamic library automatically

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Aug 17, 2020

I'm not sure it qualifies backporting either. But I made a CL anyway. It backports cleanly and does work.

@dmitshur dmitshur modified the milestones: Unplanned, Go1.16 Aug 21, 2020
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue Oct 13, 2020
CHANGELOG
=========

0.23.1
------
- Added `--preview-window` options for disabling flags
    - `nocycle`
    - `nohidden`
    - `nowrap`
    - `default`
- Built with Go 1.14.9 due to performance regression
    - golang/go#40727

0.23.0
------
- Support preview scroll offset relative to window height
  ```sh
  git grep --line-number '' |
    fzf --delimiter : \
        --preview 'bat --style=numbers --color=always --highlight-line {2} {1}' \
        --preview-window +{2}-/2
  ```
- Added `--preview-window` option for sharp edges (`--preview-window sharp`)
- Added `--preview-window` option for cyclic scrolling (`--preview-window cycle`)
- Reduced vertical padding around the preview window when `--preview-window
  noborder` is used
- Added actions for preview window
    - `preview-half-page-up`
    - `preview-half-page-down`
- Vim
    - Popup width and height can be given in absolute integer values
    - Added `fzf#exec()` function for getting the path of fzf executable
        - It also downloads the latest binary if it's not available by running
          `./install --bin`
- Built with Go 1.15.2
    - We no longer provide 32-bit binaries

0.22.0
------
- Added more options for `--bind`
    - `backward-eof` event
      ```sh
      # Aborts when you delete backward when the query prompt is already empty
      fzf --bind backward-eof:abort
      ```
    - `refresh-preview` action
      ```sh
      # Rerun preview command when you hit '?'
      fzf --preview 'echo $RANDOM' --bind '?:refresh-preview'
      ```
    - `preview` action
      ```sh
      # Default preview command with an extra preview binding
      fzf --preview 'file {}' --bind '?:preview:cat {}'

      # A preview binding with no default preview command
      # (Preview window is initially empty)
      fzf --bind '?:preview:cat {}'

      # Preview window hidden by default, it appears when you first hit '?'
      fzf --bind '?:preview:cat {}' --preview-window hidden
      ```
- Added preview window option for setting the initial scroll offset
  ```sh
  # Initial scroll offset is set to the line number of each line of
  # git grep output *minus* 5 lines
  git grep --line-number '' |
    fzf --delimiter : --preview 'nl {1}' --preview-window +{2}-5
  ```
- Added support for ANSI colors in `--prompt` string
- Smart match of accented characters
    - An unaccented character in the query string will match both accented and
      unaccented characters, while an accented character will only match
      accented characters. This is similar to how "smart-case" match works.
- Vim plugin
    - `tmux` layout option for using fzf-tmux
      ```vim
      let g:fzf_layout = { 'tmux': '-p90%,60%' }
      ```

0.21.1
------
- Shell extension
    - CTRL-R will remove duplicate commands
- fzf-tmux
    - Supports tmux popup window (require tmux 3.2 or above)
        - ```sh
          # 50% width and height
          fzf-tmux -p

          # 80% width and height
          fzf-tmux -p 80%

          # 80% width and 40% height
          fzf-tmux -p 80%,40%
          fzf-tmux -w 80% -h 40%

          # Window position
          fzf-tmux -w 80% -h 40% -x 0 -y 0
          fzf-tmux -w 80% -h 40% -y 1000

          # Write ordinary fzf options after --
          fzf-tmux -p -- --reverse --info=inline --margin 2,4 --border
          ```
        - On macOS, you can build the latest tmux from the source with
          `brew install tmux --HEAD`
- Bug fixes
    - Fixed Windows file traversal not to include directories
    - Fixed ANSI colors with `--keep-right`
    - Fixed _fzf_complete for zsh
- Built with Go 1.14.1

0.21.0
------
- `--height` option is now available on Windows as well (@kelleyma49)
- Added `--pointer` and `--marker` options
- Added `--keep-right` option that keeps the right end of the line visible
  when it's too long
- Style changes
    - `--border` will now print border with rounded corners around the
      finder instead of printing horizontal lines above and below it.
      The previous style is available via `--border=horizontal`
    - Unicode spinner
- More keys and actions for `--bind`
- Added PowerShell script for downloading Windows binary
- Vim plugin: Built-in floating windows support
  ```vim
  let g:fzf_layout = { 'window': { 'width': 0.9, 'height': 0.6 } }
  ```
- bash: Various improvements in key bindings (CTRL-T, CTRL-R, ALT-C)
    - CTRL-R will start with the current command-line as the initial query
    - CTRL-R properly supports multi-line commands
- Fuzzy completion API changed
  ```sh
  # Previous: fzf arguments given as a single string argument
  # - This style is still supported, but it's deprecated
  _fzf_complete "--multi --reverse --prompt=\"doge> \"" "$@" < <(
    echo foo
  )

  # New API: multiple fzf arguments before "--"
  # - Easier to write multiple options
  _fzf_complete --multi --reverse --prompt="doge> " -- "$@" < <(
    echo foo
  )
  ```
- Bug fixes and improvements

0.20.0
------
- Customizable preview window color (`preview-fg` and `preview-bg` for `--color`)
  ```sh
  fzf --preview 'cat {}' \
      --color 'fg:#bbccdd,fg+:#ddeeff,bg:#334455,preview-bg:#223344,border:#778899' \
      --border --height 20 --layout reverse --info inline
  ```
- Removed the immediate flicking of the screen on `reload` action.
  ```sh
  : | fzf --bind 'change:reload:seq {q}' --phony
  ```
- Added `clear-query` and `clear-selection` actions for `--bind`
- It is now possible to split a composite bind action over multiple `--bind`
  expressions by prefixing the later ones with `+`.
  ```sh
  fzf --bind 'ctrl-a:up+up'

  # Can be now written as
  fzf --bind 'ctrl-a:up' --bind 'ctrl-a:+up'

  # This is useful when you need to write special execute/reload form (i.e. `execute:...`)
  # to avoid parse errors and add more actions to the same key
  fzf --multi --bind 'ctrl-l:select-all+execute:less {+f}' --bind 'ctrl-l:+deselect-all'
  ```
- Fixed parse error of `--bind` expression where concatenated execute/reload
  action contains `+` character.
  ```sh
  fzf --multi --bind 'ctrl-l:select-all+execute(less {+f})+deselect-all'
  ```
- Fixed bugs of reload action
    - Not triggered when there's no match even when the command doesn't have
      any placeholder expressions
    - Screen not properly cleared when `--header-lines` not filled on reload

0.19.0
------

- Added `--phony` option which completely disables search functionality.
  Useful when you want to use fzf only as a selector interface. See below.
- Added "reload" action for dynamically updating the input list without
  restarting fzf. See junegunn/fzf#1750 to learn
  more about it.
  ```sh
  # Using fzf as the selector interface for ripgrep
  RG_PREFIX="rg --column --line-number --no-heading --color=always --smart-case "
  INITIAL_QUERY="foo"
  FZF_DEFAULT_COMMAND="$RG_PREFIX '$INITIAL_QUERY' || true" \
    fzf --bind "change:reload:$RG_PREFIX {q} || true" \
        --ansi --phony --query "$INITIAL_QUERY"
  ```
- `--multi` now takes an optional integer argument which indicates the maximum
  number of items that can be selected
  ```sh
  seq 100 | fzf --multi 3 --reverse --height 50%
  ```
- If a placeholder expression for `--preview` and `execute` action (and the
  new `reload` action) contains `f` flag, it is replaced to the
  path of a temporary file that holds the evaluated list. This is useful
  when you multi-select a large number of items and the length of the
  evaluated string may exceed [`ARG_MAX`][argmax].
  ```sh
  # Press CTRL-A to select 100K items and see the sum of all the numbers
  seq 100000 | fzf --multi --bind ctrl-a:select-all \
                   --preview "awk '{sum+=\$1} END {print sum}' {+f}"
  ```
- `deselect-all` no longer deselects unmatched items. It is now consistent
  with `select-all` and `toggle-all` in that it only affects matched items.
- Due to the limitation of bash, fuzzy completion is enabled by default for
  a fixed set of commands. A helper function for easily setting up fuzzy
  completion for any command is now provided.
  ```sh
  # usage: _fzf_setup_completion path|dir COMMANDS...
  _fzf_setup_completion path git kubectl
  ```
- Info line style can be changed by `--info=STYLE`
    - `--info=default`
    - `--info=inline` (same as old `--inline-info`)
    - `--info=hidden`
- Preview window border can be disabled by adding `noborder` to
  `--preview-window`.
- When you transform the input with `--with-nth`, the trailing white spaces
  are removed.
- `ctrl-\`, `ctrl-]`, `ctrl-^`, and `ctrl-/` can now be used with `--bind`
- See https://github.com/junegunn/fzf/milestone/15?closed=1 for more details

[argmax]: https://unix.stackexchange.com/questions/120642/what-defines-the-maximum-size-for-a-command-single-argument
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
9 participants
You can’t perform that action at this time.