Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: “dumb” slice zeroing is faster than “normal” ones #49997

Open
ainar-g opened this issue Dec 6, 2021 · 2 comments
Open
Labels
NeedsFix The path to resolution is known, but the work has not been done. Performance
Milestone

Comments

@ainar-g
Copy link
Contributor

ainar-g commented Dec 6, 2021

What version of Go are you using (go version)?

$ go version
go version go1.17.4 linux/amd64
go version devel go1.18-ecf6b52b7f Sun Dec 5 12:50:44 2021 +0000 linux/amd64

Does this issue reproduce with the latest release?

Yes, see the second line of go version output.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE="on"
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/ainar/.cache/go-build"
GOENV="/home/ainar/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/ainar/go/pkg/mod"
GONOPROXY="REMOVED"
GONOSUMDB="REMOVED"
GOOS="linux"
GOPATH="/home/ainar/go"
GOPRIVATE="REMOVED"
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/ainar/go/go1.17"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/ainar/go/go1.17/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.17.4"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/dev/null"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build2360179353=/tmp/go-build -gno-record-gcc-switches"

What did you do?

https://go.dev/play/p/QAzxEwqCboZ

The “dumb” zeroing version:

func anonDumb(ip net.IP) (res net.IP) {
	_ = ip[15]

	ip[net.IPv6len-10],
		ip[net.IPv6len-9],
		ip[net.IPv6len-8],
		ip[net.IPv6len-7],
		ip[net.IPv6len-6],
		ip[net.IPv6len-5],
		ip[net.IPv6len-4],
		ip[net.IPv6len-3],
		ip[net.IPv6len-2],
		ip[net.IPv6len-1] =
		0, 0, 0, 0, 0, 0, 0, 0, 0, 0

	// Return a value to make sure that the function isn't
	// eliminated.
	return ip
}

The non-dumb versions:

func anonLoop(ip net.IP) (res net.IP) {
	_ = ip[15]

	for i := 6; i < net.IPv6len; i++ {
		ip[i] = 0
	}

	return ip
}

func anonCopy(ip net.IP) (res net.IP) {
	_ = ip[15]

	var a [10]byte
	copy(ip[6:], a[:])

	return ip
}

What did you expect to see?

One of the non-dumb versions working as fast as the dumb one.

What did you see instead?

The dumb one is much faster. Judging by the output of --gcflags '-S', the dumb version compiles to two register moves (MOVQ and MOVW), while the other two call runtime.memmove.

goos: linux
goarch: amd64
cpu: AMD Ryzen 7 PRO 4750U with Radeon Graphics
BenchmarkIP/dumb-16             901660257                1.230 ns/op           0 B/op          0 allocs/op
BenchmarkIP/loop-16             155647414                7.581 ns/op           0 B/op          0 allocs/op
BenchmarkIP/copy-16             160617816                7.264 ns/op           0 B/op          0 allocs/op
PASS
ok      command-line-arguments  4.618s
@randall77
Copy link
Contributor

randall77 commented Dec 6, 2021

Go does not currently unroll loops, which is why anonLoop is slower.

We do however detect zeroing loops like this:

	x := ip[6:]
	for i := range x {
		x[i] = 0
	}

With that, you'll get one call to memclrNoHeapPointers instead of a loop.

anonCopy is just the prove pass not being able to determine that the size is constant. (Or not having a way to push that info downstream.) We do optimize small, constant-sized memmoves to direct instructions.

If you do copy(ip[6:16], a[:]), it works. (Although it is still a move, not a zero. That might be fixable as well.)

@renthraysk
Copy link

renthraysk commented Dec 6, 2021

Using append() will get the compiler to generate the same instruction sequence as anonDumb().

const zero = "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"

func anonAppend(ip net.IP) (res net.IP) {
	_ = ip[15]
	return append(ip[:6], zero[:10]...)
}

@toothrot toothrot added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Dec 6, 2021
@toothrot toothrot added this to the Backlog milestone Dec 6, 2021
@randall77 randall77 added Performance NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Dec 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsFix The path to resolution is known, but the work has not been done. Performance
Projects
None yet
Development

No branches or pull requests

4 participants