-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Open
Labels
NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.Performancecompiler/runtimeIssues related to the Go compiler and/or runtime.Issues related to the Go compiler and/or runtime.
Milestone
Description
Go version
go1.21.6
Output of go env in your module/workspace:
GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/Users/egon/Library/Caches/go-build'
GOENV='/Users/egon/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/egon/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='darwin'
GOPATH='/Users/egon/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/local/go/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.21.6'
GCCGO='gccgo'
AR='ar'
CC='clang'
CXX='clang++'
CGO_ENABLED='1'
GOMOD='/Users/egon/tmp/opt/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/yr/rzc9gn3d1mddybrx9v7220x80000gn/T/go-build390599184=/tmp/go-build -gno-record-gcc-switches -fno-common'What did you do?
Compiling the following code leads to assembly that's suboptimal assembly:
func Axpy(alpha float32, xs *float32, incx uintptr, ys *float32, incy uintptr, n uintptr) {
xp := unsafe.Pointer(xs)
yp := unsafe.Pointer(ys)
xn := unsafe.Add(xp, 4*n*incx)
for uintptr(xp) < uintptr(xn) {
*(*float32)(yp) += alpha * *(*float32)(xp)
xp, yp = unsafe.Add(xp, 4*incx), unsafe.Add(yp, 4*incy)
}
}
The optimization seems to be missing both on arm64 and amd64. I'm currently only showing the output from arm64, because the amd64 is similar.
What did you see happen?
The code gets compiled into this:
TEXT main.Axpy(SB) /Users/egon/tmp/opt/main.go
xn := unsafe.Add(xp, 4*n*incx)
0x100056880 d37ef484 LSL $2, R4, R4
0x100056884 9b040024 MADD R4, R0, R1, R4
for uintptr(xp) < uintptr(xn) {
0x100056888 14000008 JMP 8(PC)
*(*float32)(yp) += alpha * *(*float32)(xp)
0x10005688c bd400041 FMOVS (R2), F1
0x100056890 bd4000a2 FMOVS (R5), F2
0x100056894 1f000441 FMADDS F0, F1, F2, F1
0x100056898 bd000041 FMOVS F1, (R2)
xp, yp = unsafe.Add(xp, 4*incx), unsafe.Add(yp, 4*incy)
0x10005689c 8b0108a0 ADD R1<<2, R5, R0 // <-- related to R0, R5 juggling
0x1000568a0 8b030842 ADD R3<<2, R2, R2
for uintptr(xp) < uintptr(xn) {
0x1000568a4 aa0603e4 MOVD R6, R4 // <------------------
0x1000568a8 aa0003e5 MOVD R0, R5 // <------------------
0x1000568ac aa0403e6 MOVD R4, R6 // <------------------
0x1000568b0 eb00009f CMP R0, R4
0x1000568b4 54fffec8 BHI -10(PC)
}
0x1000568b8 d65f03c0 RET
0x1000568bc 00000000 ?
What did you expect to see?
I would've expected code more in the lines of:
LSL $2, R4, R4
MADD R4, R0, R1, R4
JMP check_boundary
loop:
FMOVS (R2), F1
FMOVS (R5), F2
FMADDS F0, F1, F2, F1
FMOVS F1, (R2)
ADD R1<<2, R5, R5
ADD R3<<2, R2, R2
check_boundary:
CMP R5, R4
BHI loop
RET
PS: I just realized that maybe that's happening because it's trying to preserve the register state for returning from the func... if that's the case, the whole logic could happen on registers that don't need to be preserved.
Metadata
Metadata
Assignees
Labels
NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.Performancecompiler/runtimeIssues related to the Go compiler and/or runtime.Issues related to the Go compiler and/or runtime.