New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: append is slower than make+assignments for small slices with known length #26734

Open
opennota opened this Issue Aug 1, 2018 · 7 comments

Comments

Projects
None yet
5 participants
@opennota

opennota commented Aug 1, 2018

What version of Go are you using (go version)?

go1.10.3 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOCACHE="/home/opennota/.cache/go-build"
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/opennota/gocode"
GORACE=""
GOROOT="/home/opennota/go"
GOTMPDIR=""
GOTOOLDIR="/home/opennota/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build289966161=/tmp/go-build -gno-record-gcc-switches"

What did you do?

https://play.golang.org/p/e34FVX-sqQx

go test -bench=.

What did you expect to see?

The ns/op numbers in the two tests are close.

What did you see instead?

append is 24% slower:

BenchmarkAppend-4               20000000                89.9 ns/op
BenchmarkMakeAssign-4           20000000                73.1 ns/op

@meirf meirf added the Performance label Aug 1, 2018

@ctriple

This comment has been minimized.

Contributor

ctriple commented Aug 1, 2018

please run with go test -bench=. -benchmem

@martisch

This comment has been minimized.

Member

martisch commented Aug 1, 2018

Note that it could be that append allocates and zeroes more memory than make+assign as append rounds up to the next best allocation bucket size. In addition s = append(s, ...) will copy the contents of the previous iteration in addition to the new work if s needs to be resized. Please also run benchmarks on a quiet system and with -count=20 as noise in the ns range when allocation are involved can be quiet high.

@opennota

This comment has been minimized.

opennota commented Aug 1, 2018

With -benchmem:

BenchmarkAppend-4               20000000                89.8 ns/op            80 B/op          1 allocs/op
BenchmarkMakeAssign-4           20000000                75.5 ns/op            80 B/op          1 allocs/op

With -count=20:

BenchmarkAppend-4               20000000                88.3 ns/op
BenchmarkAppend-4               20000000                84.9 ns/op
BenchmarkAppend-4               20000000                86.8 ns/op
BenchmarkAppend-4               20000000                84.8 ns/op
BenchmarkAppend-4               20000000                84.7 ns/op
BenchmarkAppend-4               20000000                86.2 ns/op
BenchmarkAppend-4               20000000                84.7 ns/op
BenchmarkAppend-4               20000000                94.7 ns/op
BenchmarkAppend-4               20000000                88.1 ns/op
BenchmarkAppend-4               20000000                85.4 ns/op
BenchmarkAppend-4               20000000                85.4 ns/op
BenchmarkAppend-4               20000000                86.0 ns/op
BenchmarkAppend-4               20000000                84.8 ns/op
BenchmarkAppend-4               20000000                87.1 ns/op
BenchmarkAppend-4               20000000                85.1 ns/op
BenchmarkAppend-4               20000000                85.4 ns/op
BenchmarkAppend-4               20000000                86.5 ns/op
BenchmarkAppend-4               20000000                85.1 ns/op                                                                    
BenchmarkAppend-4               20000000                84.8 ns/op
BenchmarkAppend-4               20000000                86.7 ns/op
BenchmarkMakeAssign-4           20000000                71.5 ns/op
BenchmarkMakeAssign-4           20000000                72.4 ns/op
BenchmarkMakeAssign-4           20000000                73.8 ns/op
BenchmarkMakeAssign-4           20000000                71.9 ns/op
BenchmarkMakeAssign-4           20000000                71.3 ns/op
BenchmarkMakeAssign-4           20000000                71.5 ns/op
BenchmarkMakeAssign-4           20000000                71.7 ns/op
BenchmarkMakeAssign-4           20000000                74.4 ns/op
BenchmarkMakeAssign-4           20000000                71.6 ns/op
BenchmarkMakeAssign-4           20000000                73.2 ns/op
BenchmarkMakeAssign-4           20000000                71.3 ns/op
BenchmarkMakeAssign-4           20000000                71.9 ns/op
BenchmarkMakeAssign-4           20000000                72.8 ns/op
BenchmarkMakeAssign-4           20000000                71.3 ns/op
BenchmarkMakeAssign-4           20000000                71.2 ns/op
BenchmarkMakeAssign-4           20000000                73.6 ns/op
BenchmarkMakeAssign-4           20000000                71.4 ns/op
BenchmarkMakeAssign-4           20000000                71.7 ns/op
BenchmarkMakeAssign-4           20000000                71.5 ns/op
BenchmarkMakeAssign-4           20000000                74.5 ns/op

benchstat:

name      old time/op  new time/op  delta
Test-4*  72.2ns ± 3%  85.8ns ± 3%  +18.84%  (p=0.000 n=20+19)

* I had to make the test names the same.

@martisch

This comment has been minimized.

Member

martisch commented Aug 1, 2018

I modified the benchmarks a bit to make more similar:
https://play.golang.org/p/2gWjX7-5D28
and also get that makeassign is faster for this case:
name time/op
MakeAssign 83.8ns ± 4%
Append 118ns ± 5%

@meirf

This comment has been minimized.

Member

meirf commented Aug 1, 2018

@martisch should the next step be comparing the assembly of the two implementations? (I don't think most people expect append to be that close to make+assign since make+assign seems optimal.)

@martisch

This comment has been minimized.

Member

martisch commented Aug 1, 2018

Yes looking at the assemblies could reveal if make+assign does an optimization (e.g. combining writes, avoiding some write barriers) that append does not. But it could also come down to runtime.growslice just having more overhead due to being a more general function and not being specialized for compile time known length of the appended slice (if that is the case maybe it could be detected and optimized away by the compiler).

When increasing the size of the string array append/created and number of values copied by 8x append seems to become faster:
name time/op
MakeAssign 263ns ± 3%
Append 246ns ± 3%

Some thoughts:

  • It can be looked into if the compiler could be better optimizing appends when the length of the append slice/arguments list/string is known
  • possibly (special in this case) it suffices to check only the bounds of the largest index of string slice element that is copied in. not sure how often this comes up to warrant such an optimization

@martisch martisch changed the title from runtime: append is 24% slower than make+assign to runtime: append is slower than make+assignments for small slices with known length Aug 1, 2018

@martisch martisch added this to the Unplanned milestone Aug 1, 2018

@go101

This comment has been minimized.

go101 commented Aug 1, 2018

As @martisch suggests, this might be related to write barriers.

https://github.com/go101/go-benchmarks/tree/master/append-vs-make

related issue: #23306

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment