Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: large map GC regression from 1.8 to 1.9 #23283

Closed
rubensayshi opened this issue Dec 29, 2017 · 7 comments

Comments

Projects
None yet
6 participants
@rubensayshi
Copy link

commented Dec 29, 2017

What version of Go are you using (go version)?

go version go1.8 linux/amd64
go version go1.9 linux/amd64
go version devel +acce826 Wed Dec 27 15:03:09 2017 +0000 linux/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOOS="linux"
GCCGO="gccgo"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build229268567=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
CGO_LDFLAGS="-g -O2"

What did you do?

When running the examples from this old issue (#9477) there's pretty significant increase in GC time between 1.8 and 1.9 for all 4 cases; http://play.golang.org/p/AeHSLyLz_c

# go 1.4
With map[int32]*int32, GC took 2.721806965s
With map[int32]int32, GC took 397.239916ms
With map shards ([]map[int32]int32), GC took 67.779482ms
With a plain slice ([]main.t), GC took 245.545µs
# go 1.5
With map[int32]*int32, GC took 5.261619858s
With map[int32]int32, GC took 35.010579ms
With map shards ([]map[int32]int32), GC took 5.158465ms
With a plain slice ([]main.t), GC took 198.938µs
# go 1.8
With map[int32]*int32, GC took 569.737537ms
With map[int32]int32, GC took 8.801144ms
With map shards ([]map[int32]int32), GC took 3.085143ms
With a plain slice ([]main.t), GC took 93.366µs
# go 1.9
With map[int32]*int32, GC took 411.635813ms
With map[int32]int32, GC took 19.754026ms
With map shards ([]map[int32]int32), GC took 4.728023ms
With a plain slice ([]main.t), GC took 122.316µs
$ ./bin/go version
go version devel +acce826 Wed Dec 27 15:03:09 2017 +0000 linux/amd64

$ ./bin/go run map.go 2
gc 14 @7.478s 0%: 0.003+20+0.045 ms clock, 0.024+0/20/0.33+0.31 ms cpu, 1074->1074->1074 MB, 2149 MB goal, 7 P (forced)
With map[int32]int32, GC took 22.396922ms

What did you expect to see?

Better or equal performance

What did you see instead?

Regression of the GC time

@davecheney

This comment has been minimized.

Copy link
Contributor

commented Dec 29, 2017

@rubensayshi

This comment has been minimized.

Copy link
Author

commented Dec 29, 2017

using GODEBUG=gctrace=1 for the map[int32]int32 example;

# go 1.8
gc 16 @7.318s 0%: 0.001+0+8.2 ms clock, 0.008+0/25/0.69+57 ms cpu, 1041->1041->1041 MB, 1041 MB goal, 7 P (forced)
With map[int32]int32, GC took 8.309103ms
# go 1.9 
gc 17 @7.255s 0%: 0.003+18+0.14 ms clock, 0.024+0/18/0.26+1.0 ms cpu, 1074->1074->1074 MB, 2149 MB goal, 7 P (forced)
With map[int32]int32, GC took 20.601241ms

so it seems at least in my example on my env runtime.GC() does block properly and the times do match more or less, but it's useful to know that relying on that behaviour is not advised, thx!

@odeke-em odeke-em changed the title large map GC regression from 1.8 to 1.9 runtime: large map GC regression from 1.8 to 1.9 Dec 29, 2017

@mvdan

This comment has been minimized.

Copy link
Member

commented Dec 29, 2017

Have you tried with 1.10, such as tip or the latest beta?

@rubensayshi

This comment has been minimized.

Copy link
Author

commented Dec 29, 2017

using master gives similar result as 1.9

$ ./bin/go version
go version devel +acce826 Wed Dec 27 15:03:09 2017 +0000 linux/amd64

$ ./bin/go run map.go 2
gc 14 @7.478s 0%: 0.003+20+0.045 ms clock, 0.024+0/20/0.33+0.31 ms cpu, 1074->1074->1074 MB, 2149 MB goal, 7 P (forced)
With map[int32]int32, GC took 22.396922ms
@RLH

This comment has been minimized.

Copy link
Contributor

commented Dec 29, 2017

The cpu numbers don't look bad and show improvements from 1.8 to .19. The overall GC % is rounded down to 0%, also not bad. Using wall clock as an indication a concurrent GC performance is fine for some things but not so good as an indication of overall performance improvements.
Removing runtime.GC() and running the program with GODEBUG=gctrace=1 is the first step to understanding GC performance.
There is nothing actionable here. Can we close this?

@cespare

This comment has been minimized.

Copy link
Contributor

commented Dec 29, 2017

Hi @rubensayshi. I originally filed #9477 and wrote that repro code.

Back in the dark ages of Go 1.4, the GC wasn't concurrent and measuring the time it takes for runtime.GC() to run was a proxy for the GC pause time.

As of 1.5, the GC is concurrent and measuring runtime.GC() (or looking at the overall time it takes for a GC cycle to run in the gctrace output) doesn't seem that meaningful to me. The things I would care about are (1) whether the GC is blocking my code for a long time (e.g., by having a long STW phase) and (b) whether the CPU time spent in GC is inordinately high. Neither of those seem to be the case here. For instance, in this gctrace line:

gc 14 @7.478s 0%: 0.003+20+0.045 ms clock, 0.024+0/20/0.33+0.31 ms cpu, 1074->1074->1074 MB, 2149 MB goal, 7 P (forced)

the CPU time is rounded to 0% and the STW phases are 0.003ms and. 0.045ms, respectively.

@rubensayshi

This comment has been minimized.

Copy link
Author

commented Dec 29, 2017

that sounds clear enough to me, so runtime.GC() blocks but the time it blocks is not actually the time that the program would block when the GC would normally be triggered because it's concurrent now, thx!

@golang golang locked and limited conversation to collaborators Dec 29, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.