Skip to content

cmd/compile: code generates unnecessary mov-s #65192

@egonelbre

Description

@egonelbre

Go version

go1.21.6

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/Users/egon/Library/Caches/go-build'
GOENV='/Users/egon/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/egon/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='darwin'
GOPATH='/Users/egon/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/local/go/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.21.6'
GCCGO='gccgo'
AR='ar'
CC='clang'
CXX='clang++'
CGO_ENABLED='1'
GOMOD='/Users/egon/tmp/opt/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/yr/rzc9gn3d1mddybrx9v7220x80000gn/T/go-build390599184=/tmp/go-build -gno-record-gcc-switches -fno-common'

What did you do?

Compiling the following code leads to assembly that's suboptimal assembly:

func Axpy(alpha float32, xs *float32, incx uintptr, ys *float32, incy uintptr, n uintptr) {
	xp := unsafe.Pointer(xs)
	yp := unsafe.Pointer(ys)
	xn := unsafe.Add(xp, 4*n*incx)
	for uintptr(xp) < uintptr(xn) {
		*(*float32)(yp) += alpha * *(*float32)(xp)
		xp, yp = unsafe.Add(xp, 4*incx), unsafe.Add(yp, 4*incy)
	}
}

The optimization seems to be missing both on arm64 and amd64. I'm currently only showing the output from arm64, because the amd64 is similar.

What did you see happen?

The code gets compiled into this:

TEXT main.Axpy(SB) /Users/egon/tmp/opt/main.go
	xn := unsafe.Add(xp, 4*n*incx)
  0x100056880		d37ef484		LSL $2, R4, R4		
  0x100056884		9b040024		MADD R4, R0, R1, R4	
	for uintptr(xp) < uintptr(xn) {
  0x100056888		14000008		JMP 8(PC)		
		*(*float32)(yp) += alpha * *(*float32)(xp)
  0x10005688c		bd400041		FMOVS (R2), F1		
  0x100056890		bd4000a2		FMOVS (R5), F2		
  0x100056894		1f000441		FMADDS F0, F1, F2, F1	
  0x100056898		bd000041		FMOVS F1, (R2)		
		xp, yp = unsafe.Add(xp, 4*incx), unsafe.Add(yp, 4*incy)
  0x10005689c		8b0108a0		ADD R1<<2, R5, R0	// <-- related to R0, R5 juggling
  0x1000568a0		8b030842		ADD R3<<2, R2, R2	
	for uintptr(xp) < uintptr(xn) {
  0x1000568a4		aa0603e4		MOVD R6, R4		// <------------------
  0x1000568a8		aa0003e5		MOVD R0, R5		// <------------------
  0x1000568ac		aa0403e6		MOVD R4, R6		// <------------------
  0x1000568b0		eb00009f		CMP R0, R4		
  0x1000568b4		54fffec8		BHI -10(PC)		
}
  0x1000568b8		d65f03c0		RET			
  0x1000568bc		00000000		?			

What did you expect to see?

I would've expected code more in the lines of:

        LSL     $2, R4, R4
        MADD    R4, R0, R1, R4
        JMP     check_boundary
loop:
        FMOVS   (R2), F1
        FMOVS   (R5), F2
        FMADDS  F0, F1, F2, F1
        FMOVS   F1, (R2)
        ADD     R1<<2, R5, R5
        ADD     R3<<2, R2, R2
check_boundary:
        CMP     R5, R4
        BHI     loop
        RET

PS: I just realized that maybe that's happening because it's trying to preserve the register state for returning from the func... if that's the case, the whole logic could happen on registers that don't need to be preserved.

Metadata

Metadata

Assignees

No one assigned

    Labels

    NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.Performancecompiler/runtimeIssues related to the Go compiler and/or runtime.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions