New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: possible missed optimization in append benchmark #25916

Open
perillo opened this Issue Jun 15, 2018 · 8 comments

Comments

Projects
None yet
3 participants
@perillo

perillo commented Jun 15, 2018

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go version go1.10.3 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN="/home/manlio/.local/bin"
GOCACHE="/home/manlio/.cache/go-build"
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/manlio/.local/lib/go:/home/manlio/code/src/go"
GORACE=""
GOROOT="/usr/lib/go"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build931873105=/tmp/go-build -gno-record-gcc-switches"

What did you do?

I wrote a simple benchmark to check the performance of append versus copy.
The benchmark is here:
https://play.golang.org/p/jA7Fb0oON6Z

The benchmark result is:

Benchmark_Append-4   	30000000	        43.1 ns/op	       0 B/op	       0 allocs/op
Benchmark_Copy-4     	30000000	        43.0 ns/op	       0 B/op	       0 allocs/op

The unexpected result is when bug is set to true. In this case the benchmarks results are:

Benchmark_Append-4   	30000000	        37.2 ns/op	       0 B/op	       0 allocs/op
Benchmark_Copy-4     	30000000	        43.2 ns/op	       0 B/op	       0 allocs/op

The same result is produced when I change bug from a const to a var, even if it is set to false.

When I set GOARCH=386 with bug=true, the benchmark result is:

Benchmark_Append-4   	20000000	        63.5 ns/op	       0 B/op	       0 allocs/op
Benchmark_Copy-4     	20000000	        59.6 ns/op	       0 B/op	       0 allocs/op

This seems to be an issue with the amd64 compiler.

This is the assembly listing when bug is false:
https://pastebin.com/kDVTypHF

and this is the assembly listing when bug is true
https://pastebin.com/VErtqZw6

This is the discussion of golang-nuts:
https://groups.google.com/forum/#!topic/golang-nuts/lJvBonZg62g

What did you expect to see?

The benchmark result should be the same, with bug set to false or bug set to true.

What did you see instead?

When bug is set to true, the Append benchmark is faster than the Copy benchmark.

@martisch

This comment has been minimized.

Member

martisch commented Jun 15, 2018

Please run the benchmarks with -count=20 -cpu=1 on a quiet system (no network or browser) and pipe the output to a file then use https://godoc.org/golang.org/x/perf/cmd/benchstat to compare the results to remove some problems with outliers.

I see some variance (between var bug = false vs true) from run to rune and some sub ns differences which is not unusual even on a quiet system:
name old time/op new time/op delta

_Append  44.2ns ± 5%  44.6ns ± 6%   ~     (p=0.765 n=10+9)
_Copy    57.8ns ± 1%  57.4ns ± 2%   ~     (p=0.139 n=8+10)
@perillo

This comment has been minimized.

perillo commented Jun 15, 2018

Sorry but currently I can't shutdown the network and the browser.
The result of the benchmark is:

name     time/op
_Append  36.8ns ± 0%
_Copy    43.0ns ± 0%

name     alloc/op
_Append   0.00B     
_Copy     0.00B     

name     allocs/op
_Append    0.00     
_Copy      0.00

During the weekend I will try to run again the benchmark after a system reboot.

@martisch

This comment has been minimized.

Member

martisch commented Jun 15, 2018

The output does not show a comparison between false and true and no multiple runs.

I used with go tip (to be 1.11 in the future):

  • go test -bench=Benchmark -cpu=1 -count=10 > ~/append.false
  • change bug variable to false
  • go test -bench=Benchmark -cpu=1 -count=10 > ~/append.true
  • benchstat ~/append.true ~/append.false

Doesnt show any significant or unusual difference between variable values.

@perillo

This comment has been minimized.

perillo commented Jun 15, 2018

Sorry, I have never used benchstat before; the data I posted was from benchstat append.true.

Here is the correct data:

name     old time/op    new time/op    delta
_Append    36.9ns ± 0%    43.0ns ± 0%  +16.69%  (p=0.000 n=20+17)
_Copy      43.0ns ± 0%    43.0ns ± 0%   -0.07%  (p=0.013 n=20+18)

name     old alloc/op   new alloc/op   delta
_Append     0.00B          0.00B          ~     (all equal)
_Copy       0.00B          0.00B          ~     (all equal)

name     old allocs/op  new allocs/op  delta
_Append      0.00           0.00          ~     (all equal)
_Copy        0.00           0.00          ~     (all equal)

I will try to use go tip during the weekend.

@martisch

This comment has been minimized.

Member

martisch commented Jun 15, 2018

Thanks is that with bug as const or var?

@perillo

This comment has been minimized.

perillo commented Jun 15, 2018

With const.

@martisch

This comment has been minimized.

Member

martisch commented Jun 15, 2018

Ok. I see a speed up with const bug set to true too but not for changing bug between true and false with bug as a var.

That said the loop alignment/code layout is not unexpectedly different for bug const true vs false (as the compiler can optimize some code away) which can lead to performance differences. If so i would not count this as a bug (unless the outcome is wrong) but an opportunity to investigate and change the compiler to generate better code for the const bug is true case.

Removing the "if" block below (and thereby changing the code layout) makes the benchmark a lot slower on my system:

buf := make([]byte, 0, n)
buf = append(buf, data...)
if buf == nil {
  print("true")
}

@josharian josharian changed the title from Possible code generation bug with amd64 to cmd/compile: possible missed optimization in append benchmark Jun 16, 2018

@josharian

This comment has been minimized.

Contributor

josharian commented Jun 16, 2018

Our layout pass could use some improvements in general. (I worked on this some in 1.11 and got some improvements but never quite mailed anything.) See also #20356 and #18977 for some other discussion around layout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment