Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile,unicode/utf8: utf8.EncodeRune has different performance from the equivalent string conversion #48684

Open
bcmills opened this issue Sep 29, 2021 · 8 comments

Comments

@bcmills
Copy link
Member

@bcmills bcmills commented Sep 29, 2021

What version of Go are you using (go version)?

$ go version
go version devel go1.18-435718edd Tue Sep 28 23:59:17 2021 +0000 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/usr/local/google/home/bcmills/.cache/go-build"
GOENV="/usr/local/google/home/bcmills/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/tmp/tmp.ysR1YLE79p/.gopath/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/tmp/tmp.ysR1YLE79p/.gopath"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/google/home/bcmills/sdk/gotip"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/google/home/bcmills/sdk/gotip/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="devel go1.18-435718edd Tue Sep 28 23:59:17 2021 +0000"
GCCGO="/usr/local/google/home/bcmills/bin/gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="c++"
CGO_ENABLED="1"
GOMOD="/tmp/tmp.ysR1YLE79p/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build4264475613=/tmp/go-build -gno-record-gcc-switches"

What did you do?

package main

import (
	"runtime"
	"testing"
	"unicode/utf8"
)

func BenchmarkEncodeRune(b *testing.B) {
	b.ReportAllocs()

	const r = '💩'
	for n := b.N; n > 0; n-- {
		b := [utf8.UTFMax]byte{}
		_ = utf8.EncodeRune(b[:], r)
		runtime.KeepAlive(b)
	}
}

func BenchmarkByteConvert(b *testing.B) {
	b.ReportAllocs()

	const r = '💩'
	for n := b.N; n > 0; n-- {
		b := [utf8.UTFMax]byte{}
		_ = copy(b[:], string(r))
		runtime.KeepAlive(b)
	}
}

What did you expect to see?

utf8.EncodeRune performance exactly equal to the code using copy and a string conversion, since they are (as far as I can tell) semantically equivalent (compare #3939).

I wrote a fuzz test to demonstrate equivalence, and found no counterexamples after 5 minutes of fuzzing:

func FuzzEncodeRune(f *testing.F) {
	f.Fuzz(func(t *testing.T, r rune) {
		b1 := [utf8.UTFMax]byte{}
		n1 := utf8.EncodeRune(b1[:], r)
		t.Logf("EncodeRune(_, %c) = %d; wrote %q", r, n1, b1[:n1])

		b2 := [utf8.UTFMax]byte{}
		n2 := copy(b2[:], string(r))
		t.Logf("copy(_, string(%c)) = %d; wrote %q", r, n2, b2[:n2])

		if n2 != n1 || !bytes.Equal(b2[:], b1[:]) {
			t.FailNow()
		}
	})
}

What did you see instead?

utf8.EncodeRune takes something like 9x longer than the string-and-copy equivalent:

$ go test -bench=. .
goos: linux
goarch: amd64
pkg: example
cpu: Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz
BenchmarkEncodeRune-12          157204260                7.206 ns/op           0 B/op          0 allocs/op
BenchmarkByteConvert-12         1000000000               0.7992 ns/op          0 B/op          0 allocs/op
PASS
ok      example 2.704s
@bcmills bcmills added this to the Backlog milestone Sep 29, 2021
@bcmills bcmills changed the title cmd/compile,utf8: utf8.EncodeRune is ~10x slower than the equivalent string conversion cmd/compile,utf8: utf8.EncodeRune is ~9x slower than the equivalent string conversion Sep 29, 2021
@bcmills
Copy link
Member Author

@bcmills bcmills commented Sep 29, 2021

At the very least, this may be a useful data point for #27400.

@bcmills
Copy link
Member Author

@bcmills bcmills commented Sep 29, 2021

Yep, this is in part #27400. When I sprinkle the benchmarks with go:noinline fairy dust, the dramatic difference in one direction becomes a smaller difference in the other direction:

package main

import (
	"bytes"
	"runtime"
	"testing"
	"unicode/utf8"
)

//go:noinline
func utf8EncodeRune(b []byte, r rune) int {
	return utf8.EncodeRune(b[:], r)
}

//go:noinline
func stringEncodeRune(b []byte, r rune) int {
	return copy(b[:], string(r))
}

func BenchmarkEncodeRune(b *testing.B) {
	b.ReportAllocs()

	const r = '💩'
	for n := b.N; n > 0; n-- {
		b := [utf8.UTFMax]byte{}
		n := utf8EncodeRune(b[:], r)
		runtime.KeepAlive(b)
		runtime.KeepAlive(n)
	}
}

func BenchmarkCopyConvertedRune(b *testing.B) {
	b.ReportAllocs()

	const r = '💩'
	for n := b.N; n > 0; n-- {
		b := [utf8.UTFMax]byte{}
		n := stringEncodeRune(b[:], r)
		runtime.KeepAlive(b)
		runtime.KeepAlive(n)
	}
}
$ go test -bench=. .
goos: linux
goarch: amd64
pkg: example
cpu: Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz
BenchmarkEncodeRune-12          154603531                7.458 ns/op           0 B/op          0 allocs/op
BenchmarkByteConvert-12         112319052               10.42 ns/op            0 B/op          0 allocs/op
PASS
ok      example 3.815s

@bcmills bcmills changed the title cmd/compile,utf8: utf8.EncodeRune is ~9x slower than the equivalent string conversion cmd/compile,utf8: utf8.EncodeRune has different performance from the equivalent string conversion Sep 29, 2021
@bcmills bcmills changed the title cmd/compile,utf8: utf8.EncodeRune has different performance from the equivalent string conversion cmd/compile,unicode/utf8: utf8.EncodeRune has different performance from the equivalent string conversion Sep 29, 2021
@randall77
Copy link
Contributor

@randall77 randall77 commented Sep 29, 2021

I don't think this is a fair benchmark. Most of BenchmarkByteConvert compiles away.
In particular, BenchmarkByteConvert never needs to compute the number of bytes in the encoded rune.

KeepAlive isn't really doing what you think it is doing. It doesn't force its argument to be calculated. It only forces any pointers it contains to be treated as live. When the argument is a stack-allocated pointer-free thing (like b) it is a no-op.

@bcmills
Copy link
Member Author

@bcmills bcmills commented Sep 29, 2021

KeepAlive isn't really doing what you think it is doing. It doesn't force its argument to be calculated. It only forces any pointers it contains to be treated as live. When the argument is a stack-allocated pointer-free thing (like b) it is a no-op.

That's fair — but is also a pretty strong argument that testing.B is not measuring what it appears to be measuring, which is certainly a usability issue with the benchmarking API. Writing a correct benchmark should not require users to sprinkle their code with magic comments.

@randall77
Copy link
Contributor

@randall77 randall77 commented Sep 29, 2021

I'm sympathetic, but not sure what the fix might be. Sometimes you want to benchmark the fully-optimized code, including any dead code elimination.

Generally any benchmark with "_ = " in it is ripe for dead code elimination that maybe wasn't intended. Certainly fixing #27400 would help.

@cespare
Copy link
Contributor

@cespare cespare commented Sep 29, 2021

@randall77 For a few years I have been under the impression that runtime.KeepAlive could be used exactly for this purpose based on your comment here.

I think we need a testing.Use(v interface{}) or something (with compiler coordination).

@randall77
Copy link
Contributor

@randall77 randall77 commented Sep 29, 2021

@cespare So I think I misspoke a bit here:

It doesn't force its argument to be calculated.

It does force its argument to be generated, in the sense that the value must be available at the KeepAlive point. But when the thing it is calculating is a compile-time constant (like len(b) or the contents of b in OP's code), it doesn't really mean the value must be calculated. For instance, everything in BenchmarkByteConvert could be lifted outside the loop.

@bcmills
Copy link
Member Author

@bcmills bcmills commented Sep 29, 2021

In this case I suspect that the problem may be on the input side rather than the output side.

The input r is a constant, so if the compiler applies enough constant-propagation it can just stick the results directly into the destination variables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants