Skip to content

cmd/compile: adding panic to a never taken branch of an if inside a loop significantly speeds it up by removing knots in the CFG and marking that branch cold #70030

@bboreham

Description

@bboreham

Go version

go version go1.23.2 linux/amd64

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/root/.cache/go-build'
GOENV='/root/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='local'
GOTOOLDIR='/usr/local/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.23.2'
GODEBUG=''
GOTELEMETRY='local'
GOTELEMETRYDIR='/root/.config/go/telemetry'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/tmp/bryan/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build2605208413=/tmp/go-build -gno-record-gcc-switches'

What did you do?

Ran this benchmark:
( I think I can't post benchmarks on go.dev/play)

package bryan

import (
	"testing"
)

func toLower(s string, a []byte) []byte {
	var buf []byte
	for i := 0; i < len(s); i++ {
		c := s[i]
		if 'A' <= c && c <= 'Z' {
			if buf == nil {
				if cap(a) > len(s) {
					buf = a[:len(s)]
					copy(buf, s)
					//panic("copy")  // Goes faster with this line uncommented.
				} else {
					buf = []byte(s)
				}
			}
			buf[i] = c + 'a' - 'A'
		}
	}
	return buf
}

func BenchmarkToLower(b *testing.B) {

	inputs := make([]string, 10)
	for i := range inputs {
		chars := "abcdefghijklmnopqrstuvwxyz"
		// Swap the alphabet to make alternatives.
		inputs[i] = chars[i%len(chars):] + chars[:i%len(chars)]
	}
	b.ResetTimer()
	for n := 0; n < b.N; n++ {
		var a [256]byte
		toLower(inputs[n%len(inputs)], a[:])
	}
}

What did you see happen?

With the panic it takes 25ns per op, and without it takes 45ns. (Panic is never executed)
According to the profiler, it's calling copy, which it shouldn't do. Hence I added a panic to see how it was called, and then it went faster.

What did you expect to see?

It shouldn't call copy. It shouldn't go faster if I add a redundant panic.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions