Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testing: Inconsistent benchmark data when  GOMAXPROCS=1 #31599

Open
xuanjiazhen opened this issue Apr 21, 2019 · 5 comments

Comments

@xuanjiazhen
Copy link

commented Apr 21, 2019

What version of Go are you using (go version)?

$ go version
go version go1.12.4 darwin/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/xuanjiazhen/Library/Caches/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/xuanjiazhen/work"
GOPROXY=""
GORACE=""
GOROOT="/Users/xuanjiazhen/Desktop/go"
GOTMPDIR=""
GOTOOLDIR="/Users/xuanjiazhen/Desktop/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/w3/64tyfz3n4cs2jw5lrl7mc5c80000gn/T/go-build723171853=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

$GOMAXPROCS=1 go test runtime -v -run none -bench  ^BenchmarkConvT2ILarge$ -count 10
goos: darwin
goarch: amd64
pkg: runtime
BenchmarkConvT2ILarge   50000000                46.8 ns/op
BenchmarkConvT2ILarge   30000000                46.9 ns/op
BenchmarkConvT2ILarge   50000000                39.5 ns/op
BenchmarkConvT2ILarge   30000000                39.1 ns/op
BenchmarkConvT2ILarge   30000000                47.6 ns/op
BenchmarkConvT2ILarge   30000000                46.8 ns/op
BenchmarkConvT2ILarge   30000000                49.6 ns/op
BenchmarkConvT2ILarge   30000000                43.6 ns/op
BenchmarkConvT2ILarge   30000000                52.2 ns/op
BenchmarkConvT2ILarge   50000000                46.9 ns/op
PASS
ok      runtime 16.874s
$GOMAXPROCS=1 go test runtime -v -run none -bench  ^BenchmarkConvT2ILarge -count 10
goos: darwin
goarch: amd64
pkg: runtime
BenchmarkConvT2ILarge   100000000               19.6 ns/op
BenchmarkConvT2ILarge   100000000               19.8 ns/op
BenchmarkConvT2ILarge   100000000               20.3 ns/op
BenchmarkConvT2ILarge   100000000               21.2 ns/op
BenchmarkConvT2ILarge   100000000               21.0 ns/op
BenchmarkConvT2ILarge   100000000               20.6 ns/op
BenchmarkConvT2ILarge   100000000               21.1 ns/op
BenchmarkConvT2ILarge   100000000               20.6 ns/op
BenchmarkConvT2ILarge   100000000               20.6 ns/op
BenchmarkConvT2ILarge   100000000               20.8 ns/op
PASS
ok      runtime 21.803s

What did you expect to see?

The test results of the two types of writing should be consistent.

What did you see instead?

Test results with $ are much worse than test results without it.

$GODEBUG=gctrace=1 GOMAXPROCS=1 go test runtime -v -run none -bench  ^BenchmarkConvT2ILarge -count 10
...
gc 3984 @22.195s 2%: 0.001+0.12+0.001 ms clock, 0.001+0.12/0/0+0.001 ms cpu, 4->4->0 MB, 5 MB goal, 1 P
gc 3985 @22.202s 2%: 0.003+0.13+0.001 ms clock, 0.003+0.12/0/0+0.001 ms cpu, 4->4->0 MB, 5 MB goal, 1 P
gc 3986 @22.207s 2%: 0.001+0.11+0.001 ms clock, 0.001+0.11/0/0+0.001 ms cpu, 4->4->0 MB, 5 MB goal, 1 P
gc 3987 @22.213s 2%: 0.001+0.11+0 ms clock, 0.001+0.11/0/0+0 ms cpu, 4->4->0 MB, 5 MB goal, 1 P
gc 3988 @22.218s 2%: 0.001+0.12+0.001 ms clock, 0.001+0.12/0/0+0.001 ms cpu, 4->4->0 MB, 5 MB goal, 1 P
gc 3989 @22.224s 2%: 0.001+0.12+0 ms clock, 0.001+0.12/0/0+0 ms cpu, 4->4->0 MB, 5 MB goal, 1 P
gc 3990 @22.230s 2%: 0+0.12+0 ms clock, 0+0.12/0/0+0 ms cpu, 4->4->0 MB, 5 MB goal, 1 P
gc 3991 @22.236s 2%: 0.001+0.14+0.001 ms clock, 0.001+0.14/0/0+0.001 ms cpu, 4->4->0 MB, 5 MB goal, 1 P
100000000               22.8 ns/op
PASS
$GODEBUG=gctrace=1 GOMAXPROCS=1 go test runtime -v -run none -bench  ^BenchmarkConvT2ILarge$ -count 10
...
gc 1052 @18.124s 0%: 0.001+20+0.002 ms clock, 0.001+0.13/0.021/0+0.002 ms cpu, 6->11->5 MB, 10 MB goal, 1 P
gc 1053 @18.146s 0%: 0.001+21+0.001 ms clock, 0.001+0.11/0.013/0+0.001 ms cpu, 6->12->6 MB, 11 MB goal, 1 P
gc 1054 @18.169s 0%: 0.001+19+0.001 ms clock, 0.001+0.11/0.014/0+0.001 ms cpu, 7->12->5 MB, 12 MB goal, 1 P
gc 1055 @18.190s 0%: 0.001+23+0.001 ms clock, 0.001+0.11/0.015/0+0.001 ms cpu, 6->12->6 MB, 10 MB goal, 1 P
gc 1056 @18.216s 0%: 0.002+19+0.001 ms clock, 0.002+0.13/0.013/0+0.001 ms cpu, 7->12->5 MB, 12 MB goal, 1 P
gc 1057 @18.237s 0%: 0.002+19+0.001 ms clock, 0.002+0.13/0.017/0+0.001 ms cpu, 6->11->5 MB, 10 MB goal, 1 P
gc 1058 @18.258s 0%: 0.001+20+0.002 ms clock, 0.001+0.11/0.020/0+0.002 ms cpu, 6->12->6 MB, 11 MB goal, 1 P
gc 1059 @18.281s 0%: 0.001+19+0.001 ms clock, 0.001+0.11/0.016/0+0.001 ms cpu, 7->12->5 MB, 12 MB goal, 1 P
gc 1060 @18.302s 0%: 0.001+19+0.002 ms clock, 0.001+0.11/0.017/0+0.002 ms cpu, 6->11->5 MB, 11 MB goal, 1 P
gc 1061 @18.323s 0%: 0.002+11+0.002 ms clock, 0.002+0.13/0.023/0+0.002 ms cpu, 6->8->2 MB, 10 MB goal, 1 P
50000000                51.7 ns/op
PASS
ok      runtime 18.342s
gc 21 @19.662s 0%: 0.015+1.6+0.002 ms clock, 0.015+0.099/0.52/0.90+0.002 ms cpu, 4->4->1 MB, 5 MB goal, 1 P
@ALTree

This comment has been minimized.

Copy link
Member

commented Apr 21, 2019

Thanks for the report.

I can't reproduce this using 1.12.4 on linux/amd64, the two results are statistically the same (with vs without $):

$ benchstat with.txt without.txt 

name          old time/op  new time/op  delta
ConvT2ILarge  28.0ns ± 1%  28.0ns ± 1%   ~     (p=0.914 n=9+9)

Can you reproduce this consistently? Are you running the benchmarks on a very quiet machine with no other processes using the cpu?

@ALTree ALTree added this to the Go1.13 milestone Apr 21, 2019

@xuanjiazhen

This comment has been minimized.

Copy link
Author

commented Apr 21, 2019

Thanks for the report.

I can't reproduce this using 1.12.4 on linux/amd64, the two results are statistically the same (with vs without $):

$ benchstat with.txt without.txt 

name          old time/op  new time/op  delta
ConvT2ILarge  28.0ns ± 1%  28.0ns ± 1%   ~     (p=0.914 n=9+9)

Can you reproduce this consistently? Are you running the benchmarks on a very quiet machine with no other processes using the cpu?

Yes,I can reproduce that consistently. Not only on my mac (go1.12.4), but also on x86 and arm64 linux (go1.11.5).They are all quiet enough.

@ALTree ALTree removed the WaitingForInfo label Apr 22, 2019

@ALTree

This comment has been minimized.

Copy link
Member

commented Apr 22, 2019

Ah, I was able to reproduce this by doing more runs.

It doesn't happen every time, but when it happens the benchmark is indeed either 1) consistently slower 2) faster in the first iteration, slower from the 2nd.

Here 10 runs with ^BenchmarkConvT2ILarge$ and count=5
BenchmarkConvT2ILarge 	50000000	        27.9 ns/op
BenchmarkConvT2ILarge 	30000000	        52.5 ns/op
BenchmarkConvT2ILarge 	30000000	        59.9 ns/op
BenchmarkConvT2ILarge 	50000000	        40.5 ns/op
BenchmarkConvT2ILarge 	30000000	        59.6 ns/op

BenchmarkConvT2ILarge 50000000 41.6 ns/op BenchmarkConvT2ILarge 30000000 59.8 ns/op BenchmarkConvT2ILarge 50000000 53.0 ns/op BenchmarkConvT2ILarge 30000000 61.6 ns/op BenchmarkConvT2ILarge 30000000 61.5 ns/op
BenchmarkConvT2ILarge 50000000 40.9 ns/op BenchmarkConvT2ILarge 30000000 61.6 ns/op BenchmarkConvT2ILarge 30000000 61.7 ns/op BenchmarkConvT2ILarge 30000000 61.6 ns/op BenchmarkConvT2ILarge 30000000 61.6 ns/op
BenchmarkConvT2ILarge 50000000 27.7 ns/op BenchmarkConvT2ILarge 50000000 28.2 ns/op BenchmarkConvT2ILarge 50000000 28.0 ns/op BenchmarkConvT2ILarge 50000000 27.9 ns/op BenchmarkConvT2ILarge 50000000 27.9 ns/op
BenchmarkConvT2ILarge 50000000 55.8 ns/op BenchmarkConvT2ILarge 30000000 61.8 ns/op BenchmarkConvT2ILarge 30000000 61.9 ns/op BenchmarkConvT2ILarge 30000000 61.7 ns/op BenchmarkConvT2ILarge 30000000 59.9 ns/op
BenchmarkConvT2ILarge 50000000 28.0 ns/op BenchmarkConvT2ILarge 30000000 57.9 ns/op BenchmarkConvT2ILarge 30000000 61.6 ns/op BenchmarkConvT2ILarge 30000000 61.9 ns/op BenchmarkConvT2ILarge 30000000 61.6 ns/op
BenchmarkConvT2ILarge 50000000 27.8 ns/op BenchmarkConvT2ILarge 50000000 57.2 ns/op BenchmarkConvT2ILarge 50000000 51.5 ns/op BenchmarkConvT2ILarge 30000000 61.9 ns/op BenchmarkConvT2ILarge 30000000 59.7 ns/op
BenchmarkConvT2ILarge 50000000 27.9 ns/op BenchmarkConvT2ILarge 30000000 52.2 ns/op BenchmarkConvT2ILarge 30000000 59.5 ns/op BenchmarkConvT2ILarge 30000000 61.4 ns/op BenchmarkConvT2ILarge 30000000 61.4 ns/op
BenchmarkConvT2ILarge 50000000 47.5 ns/op BenchmarkConvT2ILarge 30000000 59.6 ns/op BenchmarkConvT2ILarge 30000000 59.7 ns/op BenchmarkConvT2ILarge 50000000 51.5 ns/op BenchmarkConvT2ILarge 30000000 61.8 ns/op
BenchmarkConvT2ILarge 50000000 27.9 ns/op BenchmarkConvT2ILarge 50000000 27.7 ns/op BenchmarkConvT2ILarge 50000000 27.9 ns/op BenchmarkConvT2ILarge 50000000 27.9 ns/op BenchmarkConvT2ILarge 50000000 27.7 ns/op

While with ^BenchmarkConvT2ILarge is always around 28 ns. This is on 1.12.4.

OTOH, tip is consistent, even with the $, but always at 40ns, which is slower than the 28ns in 1.12.

@xuanjiazhen Any chance you could test with tip (the current master) and confirm that the issue with the benchmark results being bimodal is indeed no longer present?

@xuanjiazhen

This comment has been minimized.

Copy link
Author

commented Apr 22, 2019

I tried it on the latest master branch.The result looks like the two become consistent.But they reached consistent in the poor result!!! I think it is still not normal. @ALTree
(The number of times is a bit strange, is it a new improvement?)

$go version
go version devel +d17d41e58d Mon Apr 22 14:45:16 2019 +0000 darwin/amd64
$GOMAXPROCS=1 go test runtime -v -run none -bench  ^BenchmarkConvT2ILarg
e -count 10
goos: darwin
goarch: amd64
pkg: runtime
BenchmarkConvT2ILarge 	36258176	        32.3 ns/op
BenchmarkConvT2ILarge 	38381629	        32.3 ns/op
BenchmarkConvT2ILarge 	41117751	        32.2 ns/op
BenchmarkConvT2ILarge 	38234461	        32.7 ns/op
BenchmarkConvT2ILarge 	42858790	        32.2 ns/op
BenchmarkConvT2ILarge 	39587468	        32.6 ns/op
BenchmarkConvT2ILarge 	42929428	        32.7 ns/op
BenchmarkConvT2ILarge 	37120336	        32.4 ns/op
BenchmarkConvT2ILarge 	38552698	        32.9 ns/op
BenchmarkConvT2ILarge 	42029185	        32.2 ns/op
PASS
ok  	runtime	13.210s
$GOMAXPROCS=1 go test runtime -v -run none -bench  ^BenchmarkConvT2ILarg
e$ -count 10
goos: darwin
goarch: amd64
pkg: runtime
BenchmarkConvT2ILarge 	35498226	        32.5 ns/op
BenchmarkConvT2ILarge 	41142729	        32.2 ns/op
BenchmarkConvT2ILarge 	43043185	        32.1 ns/op
BenchmarkConvT2ILarge 	39090960	        32.3 ns/op
BenchmarkConvT2ILarge 	43341727	        32.4 ns/op
BenchmarkConvT2ILarge 	42690354	        32.1 ns/op
BenchmarkConvT2ILarge 	40104470	        32.1 ns/op
BenchmarkConvT2ILarge 	41782971	        32.2 ns/op
BenchmarkConvT2ILarge 	42257203	        32.1 ns/op
BenchmarkConvT2ILarge 	42138686	        32.4 ns/op
PASS
ok  	runtime	13.561s
$GOMAXPROCS=2 go test runtime -v -run none -bench  ^BenchmarkConvT2ILarg
e -count 10
goos: darwin
goarch: amd64
pkg: runtime
BenchmarkConvT2ILarge-2   	53497600	        20.2 ns/op
BenchmarkConvT2ILarge-2   	60137784	        20.0 ns/op
BenchmarkConvT2ILarge-2   	60011299	        20.1 ns/op
BenchmarkConvT2ILarge-2   	60673358	        20.2 ns/op
BenchmarkConvT2ILarge-2   	59793987	        20.1 ns/op
BenchmarkConvT2ILarge-2   	60232131	        20.0 ns/op
BenchmarkConvT2ILarge-2   	60023706	        20.1 ns/op
BenchmarkConvT2ILarge-2   	60491013	        20.1 ns/op
BenchmarkConvT2ILarge-2   	60667680	        19.9 ns/op
BenchmarkConvT2ILarge-2   	60528424	        20.0 ns/op
PASS
ok  	runtime	12.187s
@josharian

This comment has been minimized.

Copy link
Contributor

commented Apr 22, 2019

But they reached consistent in the poor result!!!

The bimodal distribution depending on the number of iterations is because the benchmark is not fully linear in the number of iterations, often due to GC interactions. This is unfortunate but hard to fix in the general case. Whether you hit the "poor result" or not depends on how many iterations you run. If you run with -benchtime=100ms or -benchtime=10ms or the like, you may get different results.

(The number of times is a bit strange, is it a new improvement?)

Yes. And in fact, one of the goals of that change was to avoid bimodal distributions in non-linear benchmarks. :)

@andybons andybons modified the milestones: Go1.13, Go1.14 Jul 8, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.