Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: “dumb” slice zeroing is faster than “normal” ones #49997

Open
ainar-g opened this issue Dec 6, 2021 · 2 comments
Open

cmd/compile: “dumb” slice zeroing is faster than “normal” ones #49997

ainar-g opened this issue Dec 6, 2021 · 2 comments

Comments

@ainar-g
Copy link
Contributor

@ainar-g ainar-g commented Dec 6, 2021

What version of Go are you using (go version)?

$ go version
go version go1.17.4 linux/amd64
go version devel go1.18-ecf6b52b7f Sun Dec 5 12:50:44 2021 +0000 linux/amd64

Does this issue reproduce with the latest release?

Yes, see the second line of go version output.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE="on"
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/ainar/.cache/go-build"
GOENV="/home/ainar/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/ainar/go/pkg/mod"
GONOPROXY="REMOVED"
GONOSUMDB="REMOVED"
GOOS="linux"
GOPATH="/home/ainar/go"
GOPRIVATE="REMOVED"
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/ainar/go/go1.17"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/ainar/go/go1.17/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.17.4"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/dev/null"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build2360179353=/tmp/go-build -gno-record-gcc-switches"

What did you do?

https://go.dev/play/p/QAzxEwqCboZ

The “dumb” zeroing version:

func anonDumb(ip net.IP) (res net.IP) {
	_ = ip[15]

	ip[net.IPv6len-10],
		ip[net.IPv6len-9],
		ip[net.IPv6len-8],
		ip[net.IPv6len-7],
		ip[net.IPv6len-6],
		ip[net.IPv6len-5],
		ip[net.IPv6len-4],
		ip[net.IPv6len-3],
		ip[net.IPv6len-2],
		ip[net.IPv6len-1] =
		0, 0, 0, 0, 0, 0, 0, 0, 0, 0

	// Return a value to make sure that the function isn't
	// eliminated.
	return ip
}

The non-dumb versions:

func anonLoop(ip net.IP) (res net.IP) {
	_ = ip[15]

	for i := 6; i < net.IPv6len; i++ {
		ip[i] = 0
	}

	return ip
}

func anonCopy(ip net.IP) (res net.IP) {
	_ = ip[15]

	var a [10]byte
	copy(ip[6:], a[:])

	return ip
}

What did you expect to see?

One of the non-dumb versions working as fast as the dumb one.

What did you see instead?

The dumb one is much faster. Judging by the output of --gcflags '-S', the dumb version compiles to two register moves (MOVQ and MOVW), while the other two call runtime.memmove.

goos: linux
goarch: amd64
cpu: AMD Ryzen 7 PRO 4750U with Radeon Graphics
BenchmarkIP/dumb-16             901660257                1.230 ns/op           0 B/op          0 allocs/op
BenchmarkIP/loop-16             155647414                7.581 ns/op           0 B/op          0 allocs/op
BenchmarkIP/copy-16             160617816                7.264 ns/op           0 B/op          0 allocs/op
PASS
ok      command-line-arguments  4.618s
@randall77
Copy link
Contributor

@randall77 randall77 commented Dec 6, 2021

Go does not currently unroll loops, which is why anonLoop is slower.

We do however detect zeroing loops like this:

	x := ip[6:]
	for i := range x {
		x[i] = 0
	}

With that, you'll get one call to memclrNoHeapPointers instead of a loop.

anonCopy is just the prove pass not being able to determine that the size is constant. (Or not having a way to push that info downstream.) We do optimize small, constant-sized memmoves to direct instructions.

If you do copy(ip[6:16], a[:]), it works. (Although it is still a move, not a zero. That might be fixable as well.)

@renthraysk
Copy link

@renthraysk renthraysk commented Dec 6, 2021

Using append() will get the compiler to generate the same instruction sequence as anonDumb().

const zero = "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"

func anonAppend(ip net.IP) (res net.IP) {
	_ = ip[15]
	return append(ip[:6], zero[:10]...)
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants