Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: Go 1.14.rc1 3-5% performance regression from 1.13 during protobuf marshalling #37086

Closed
howardjohn opened this issue Feb 6, 2020 · 18 comments
Assignees
Milestone

Comments

@howardjohn
Copy link

@howardjohn howardjohn commented Feb 6, 2020

$ go1.14rc1 version
go version go1.14rc1 linux/amd64
$ go version
go version go1.13 linux/amd64

Does this issue reproduce with the latest release?

This is an issue with the latest release

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE="on"
GOARCH="amd64"
GOBIN=""
GOCACHE="/usr/local/google/home/howardjohn/.cache/go-build"
GOENV="/usr/local/google/home/howardjohn/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/usr/local/google/home/howardjohn/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/usr/local/google/home/howardjohn/go/src/istio.io/istio/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build462288476=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Ran a benchmark for our application on go 1.13 and go 1.14rc1. performance took about a ~5% hit.

$ go test ./pilot/pkg/proxy/envoy/v2/ -bench=BenchmarkEDS -run=^$ -benchmem -benchtime=20s
goos: linux
goarch: amd64
pkg: istio.io/istio/pilot/pkg/proxy/envoy/v2
BenchmarkEDS/100/1-6              264200             90238 ns/op            8964 B/op        361 allocs/op
BenchmarkEDS/1000/1-6              30642            775560 ns/op           62452 B/op       3061 allocs/op
BenchmarkEDS/10000/1-6              3147           6909406 ns/op          604528 B/op      30086 allocs/op
BenchmarkEDS/100/100-6              3217           7292090 ns/op          876827 B/op      35645 allocs/op
BenchmarkEDS/1000/100-6              355          67245095 ns/op         6318974 B/op     307880 allocs/op
BenchmarkEDS/10000/100-6              34         657385493 ns/op        70324914 B/op    3240990 allocs/op
PASS
ok      istio.io/istio/pilot/pkg/proxy/envoy/v2 176.394s
$ go1.14rc1 test ./pilot/pkg/proxy/envoy/v2/ -bench=BenchmarkEDS -run=^$ -benchmem -benchtime=20s
goos: linux
goarch: amd64
pkg: istio.io/istio/pilot/pkg/proxy/envoy/v2
BenchmarkEDS/100/1-6              246732             97406 ns/op            8964 B/op        361 allocs/op
BenchmarkEDS/1000/1-6              29126            823575 ns/op           62454 B/op       3061 allocs/op
BenchmarkEDS/10000/1-6              3036           7492450 ns/op          604574 B/op      30087 allocs/op
BenchmarkEDS/100/100-6              3120           7801905 ns/op          876848 B/op      35645 allocs/op
BenchmarkEDS/1000/100-6              339          70948842 ns/op         6323512 B/op     307986 allocs/op
BenchmarkEDS/10000/100-6              33         689077376 ns/op        70629433 B/op    3248162 allocs/op
PASS
ok      istio.io/istio/pilot/pkg/proxy/envoy/v2 180.654

What did you expect to see?

Hopefully performance improves or remains constant

What did you see instead?

Performance decrease

Additional Info

This benchmark is basically just doing a bunch of protobuf marshalling.

Profile shows 1.14 spends 16% of time on gcBgMarkWorker, but 1.13 only 14.5%.

CPU profiles:
profiles.tar.gz

Let me know if there is more info I can provide, I realize these reports typically need a lot of info to track down the root cause. You might/should be able to reproduce by running the same command after checking out the repo at this PR: istio/istio#20899

@howardjohn howardjohn changed the title golang 1.14.rc1 performance regression from golang 1.13 golang 1.14.rc1 performance regression from golang 1.13 during protobuf marshalling Feb 6, 2020
@howardjohn howardjohn changed the title golang 1.14.rc1 performance regression from golang 1.13 during protobuf marshalling golang 1.14.rc1 3-5% performance regression from golang 1.13 during protobuf marshalling Feb 6, 2020
@cagedmantis cagedmantis added this to the Go1.14 milestone Feb 6, 2020
@cagedmantis cagedmantis changed the title golang 1.14.rc1 3-5% performance regression from golang 1.13 during protobuf marshalling runtime: golang 1.14.rc1 3-5% performance regression from golang 1.13 during protobuf marshalling Feb 6, 2020
@cagedmantis
Copy link
Contributor

@cagedmantis cagedmantis commented Feb 6, 2020

@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented Feb 6, 2020

@howardjohn Thanks so much for the issue and the detailed information. Looking into it.

@mknyszek mknyszek self-assigned this Feb 6, 2020
@odeke-em
Copy link
Member

@odeke-em odeke-em commented Feb 6, 2020

@howardjohn thank you for the report! Just a comment that in the future you can also use benchstat https://godoc.org/golang.org/x/perf/cmd/benchstat for an isolated summary of benchmark results.

@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented Feb 6, 2020

@howardjohn I can reproduce, and I think I know the source of the problem.

First let me say that I tried pretty hard to reproduce the profiles you attached but I was unable to using the commands in the original post. Every time I compared 1.14 and 1.13 profiles the diff between them was completely different. Part of the problem is that because the number of iterations isn't fixed, each run could be doing different amounts of work. It's harder to compare along these lines.

Next, I need to point out that there's a memory leak in the benchmark. Every time BenchmarkEDS gets invoked, the heap size increases. This is especially noticeable with GODEBUG=gctrace=1 running and -count=N where N>1, but the leak also affects N=1 if the number of benchmark iterations are not fixed (since BenchmarkEDS gets called multiple times by the testing framework).

So, given this, I collected profiles on the BenchmarkEDS/100/100 benchmark with -benchtime=1000x to fix the iteration count. This is roughly the same amount of work as -benchtime=10s.

In the end, I consistently get two profiles which implicate mapaccess1, runtime.interhash, and callees as the culprit in the diff. Specifically, aeshash64 is gone in the 1.14 profile, replaced with aeshashbody (called through runtime.typehash) which takes more time.

For documentation purposes, the command I ran (after cloning istio at the PR mentioned above; if you don't do that the benchmark won't run) was:

perflock go test ./pilot/pkg/proxy/envoy/v2/ -bench=BenchmarkEDS/100$/100$ -count=1 -run=^$ -benchmem -benchtime=1000x -cpuprofile=cpu.prof

Pinging @randall77 since @cherrymui suggested you were involved in some map changes in the Go 1.14 cycle, and I don't know much about the map implementation.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Feb 6, 2020

Uses of runtime.aeshash64 in Go 1.13 should have been replaced by uses of runtime.memhash64 in Go 1.14.

Oh, I see, this program must be using hash maps with keys of interface types. See https://golang.org/cl/191198. Perhaps we should have a release note for that?

@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented Feb 6, 2020

@ianlancetaylor Yeah that's right. @cherrymui and I dug a little deeper and it's actually map with keys of type reflect.Type that is causing >70% of the mapaccess1 calls. :)

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Feb 6, 2020

Might be worth adding a special case for pointers in runtime.typehash, to call the sized memhash function rather than the general one. For 1.14.1.

@randall77
Copy link
Contributor

@randall77 randall77 commented Feb 7, 2020

Certainly we could swap the order of the regular memory check and the kind switch, and add a kindPtr case.

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Feb 7, 2020

I think if regular memory check is true, we could special case 32-bit and 64-bit sizes?

@dmitshur dmitshur added the Performance label Feb 7, 2020
@SlinSo
Copy link

@SlinSo SlinSo commented Feb 13, 2020

i am not sure if this is related, but i saw some differences in map access performance, too. In my case I am not using interface{} as a key, but instead a map[int]int

https://github.com/SlinSo/random/blob/master/mapaccess_test.go

pprof excerpt:
go1.13.8:
0.73s 9.53% 82.64% 0.73s 9.53% runtime.aeshash64

go1.14rc1:
1230ms 15.81% 66.20% 1410ms 18.12% runtime.memhash64Fallback
140ms 1.80% 97.43% 140ms 1.80% runtime.memhash64

@randall77
Copy link
Contributor

@randall77 randall77 commented Feb 13, 2020

That's a bug. Yikes!
Fix coming.

@randall77
Copy link
Contributor

@randall77 randall77 commented Feb 13, 2020

(Unrelated to this issue, though.)

@gopherbot
Copy link

@gopherbot gopherbot commented Feb 13, 2020

Change https://golang.org/cl/219338 mentions this issue: runtime: special case interface hashing for pointers

@gopherbot gopherbot closed this in afd691c Feb 24, 2020
@thepudds
Copy link

@thepudds thepudds commented Feb 24, 2020

@randall77 is the intent for this to be tracked as a back port candidate? Ian had mentioned in #37086 (comment)

Might be worth adding a special case for pointers in runtime.typehash, to call the sized memhash function rather than the general one. For 1.14.1.

@randall77
Copy link
Contributor

@randall77 randall77 commented Feb 24, 2020

@gopherbot please open a backport issue.

I don't think this performance regression qualifies for a backport, but I'll let the release guys make that call.

@gopherbot
Copy link

@gopherbot gopherbot commented Feb 24, 2020

Backport issue(s) opened: #37420 (for 1.12), #37421 (for 1.13).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

@odeke-em
Copy link
Member

@odeke-em odeke-em commented Feb 25, 2020

@randall77 oddly it didn't backport for Go1.14.1, it only created PRs for 1.12 and 1.13.

@dmitshur
Copy link
Member

@dmitshur dmitshur commented Mar 2, 2020

Made #37613 for 1.14.

@dmitshur dmitshur changed the title runtime: golang 1.14.rc1 3-5% performance regression from golang 1.13 during protobuf marshalling runtime: Go 1.14.rc1 3-5% performance regression from 1.13 during protobuf marshalling Mar 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.