Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/asm: ARM64 NEON floating point instructions (VFABD VFMAX, VFMAXNM, VFMINNM VFMIN VFADD, VFSUB VFMUL, VFDIV VFMLA, VFMLS VCVT*) #41092

Open
paultag opened this issue Aug 28, 2020 · 4 comments
Labels
NeedsInvestigation
Milestone

Comments

@paultag
Copy link

@paultag paultag commented Aug 28, 2020

What version of Go are you using (go version)?

$ go version
go version go1.15 linux/arm64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="arm64"
GOBIN=""
GOCACHE="/home/ubuntu/.cache/go-build"
GOENV="/home/ubuntu/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="arm64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/ubuntu/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/ubuntu/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/ubuntu/xx/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/ubuntu/xx/go/pkg/tool/linux_arm64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build141213394=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Whilst writing some NEON code, I found myself in need of floating point operations in NEON. I was able to load my data to the V* registers (and write it out!), but when I attempted to use VF* instructions, such as VFADD or VFMUL, those opcodes have not been implemented by any intrepid engineer on arm64.

What did you expect to see?

Vectorized floating point addition or multiplication.

What did you see instead?

unrecognized instruction "VFADD"

Test code

neon.go

package fptest
func AddFloat([]float32, []float32, []float32)

neon_test.go

package fptest_test

import (
        "testing"
        fptest "."

        "github.com/stretchr/testify/assert"
)

func TestAddFloat(t *testing.T) {
        dst := make([]float32, 4)
        fptest.AddFloat([]float32{1, 2, 3, 4}, []float32{10, 20, 30, 40}, dst)
        assert.Equal(t, []float32{11, 22, 33, 44}, dst)
}

neon_arm64.s

// func AddFloat(a []int32, b []int32, dst []int32)
TEXT ·AddFloat(SB), $0-72
    // For the sake of simplicity, this only does the first 4.

    // Load a, b and dst's addresses to R8, 9, 10.
    MOVD a+0(FP),    R8
    MOVD b+24(FP),   R9
    MOVD dst+48(FP), R10

    // Load [4]int32 from a, b to v1, v2.
    VLD1 (R8), [V1.S4]
    VLD1 (R9), [V2.S4]

    VFADD V1.S4, V2.S4, V1.S4
    // WORD $0x4e21d441;

    // Write [4]int32 to dst.
    VST1 [V1.S4], (R10)

    RET
@paultag paultag changed the title ARM64 NEON floating point instructions (VFADD, VFMUL, etc) cmd/asm: ARM64 NEON floating point instructions (VFADD, VFMUL, etc) Aug 28, 2020
@davecheney
Copy link
Contributor

@davecheney davecheney commented Aug 28, 2020

Please see #40725

@paultag
Copy link
Author

@paultag paultag commented Aug 28, 2020

(I think the above comment is a reference to the following comment from that thread:)

Please don't take this as a critisism, but as an observer of a number of this class of request, the best results are obtained when the OP, you in this case, can enumerate exactly which instructions to add. I have no explanation why requests for all XXX instructions are unsuccessful, but encourage you to list precisely the instructions you would like to see added as there is anecdotal evidence that requests formed in this way are resolved faster.

I'll go shopping for opcodes, thanks @davecheney. I was going to try to put together a changeset to go with this issue, but figured I'd file it ahead of that in case it wound up being an insurmountable pile of internals changes. I'll list opcodes I'm in need of here, and see if I can produce a changeset (famous last bugreport words)

@paultag
Copy link
Author

@paultag paultag commented Aug 28, 2020

After looking at a few similar changes adding arm64 opcodes, I suspect I'm in over my head. I'm still going to try my hand at a changeset for sheer sport, but I wouldn't block on me if anyone competent comes across this issue.

These are the most burning opcodes to help remove some bottlenecks I've hit:

  • VFABD
  • VFMAX, VFMAXNM, VFMINNM VFMIN
  • VFADD, VFSUB
  • VFMUL, VFDIV
  • VFMLA, VFMLS
  • VCVT (Not sure what the correct instruction opcode(s) in idiomatic Go ASM would be here are -- the ones that i'd want would be "float to unsigned integer", "float to signed integer", "signed integer to float" and "unsigned integer to float" variants)

@paultag paultag changed the title cmd/asm: ARM64 NEON floating point instructions (VFADD, VFMUL, etc) cmd/asm: ARM64 NEON floating point instructions (VFABD VFMAX, VFMAXNM, VFMINNM VFMIN VFADD, VFSUB VFMUL, VFDIV VFMLA, VFMLS VCVT*)) Aug 28, 2020
@paultag paultag changed the title cmd/asm: ARM64 NEON floating point instructions (VFABD VFMAX, VFMAXNM, VFMINNM VFMIN VFADD, VFSUB VFMUL, VFDIV VFMLA, VFMLS VCVT*)) cmd/asm: ARM64 NEON floating point instructions (VFABD VFMAX, VFMAXNM, VFMINNM VFMIN VFADD, VFSUB VFMUL, VFDIV VFMLA, VFMLS VCVT*) Aug 28, 2020
@cagedmantis cagedmantis added the NeedsInvestigation label Aug 31, 2020
@cagedmantis cagedmantis added this to the Backlog milestone Aug 31, 2020
@cagedmantis
Copy link
Contributor

@cagedmantis cagedmantis commented Aug 31, 2020

/cc @cherrymui

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation
Projects
None yet
Development

No branches or pull requests

3 participants