What version of Go are you using (go version)?
$ gotip version
go version devel go1.21-26a90e4e Mon Jun 5 19:18:13 2023 +0000 linux/amd64
Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (go env)?
go env Output
go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/eliben/.cache/go-build"
GOENV="/home/eliben/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/eliben/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/eliben/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.20.1"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
GOWORK=""
CGO_CFLAGS="-O2 -g"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-O2 -g"
CGO_FFLAGS="-O2 -g"
CGO_LDFLAGS="-O2 -g"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build1665273903=/tmp/go-build -gno-record-gcc-switches"
What did you do?
These two functions are equivalent:
func countFor1(s string) int {
result := 0
for i := 0; i < len(s); {
if s[i]&1 == 1 {
result += 1
}
i++
}
return result
}
func countFor2(s string) int {
result := 0
for i := 0; i < len(s); {
if s[i]&1 == 1 {
result += 1
i++
} else {
i++
}
}
return result
}
However, the second is ~2x slower on AMD64 (benchmark: https://go.dev/play/p/IlYydhYWUGl)
Looking at the disassembly, it seems like for countFor2 a useless bounds check is generated on the hot loop path - it's missing in the disasm of countFor1. Moreover, countFor1 generates a conditonal move instruction to update result, while counteFor2 uses another branch.
What version of Go are you using (
go version)?Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env)?go envOutputWhat did you do?
These two functions are equivalent:
However, the second is ~2x slower on AMD64 (benchmark: https://go.dev/play/p/IlYydhYWUGl)
Looking at the disassembly, it seems like for
countFor2a useless bounds check is generated on the hot loop path - it's missing in the disasm ofcountFor1. Moreover,countFor1generates a conditonal move instruction to updateresult, whilecounteFor2uses another branch.