-
Notifications
You must be signed in to change notification settings - Fork 248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make ARM SVE code vector length agnostic #285
Conversation
These assume 64 bytes/loop when processing: reedsolomon/galois_gen_switch_arm64.go Line 48 in 7202dfc
You probably also want to make adjustments to reedsolomon/galois_gen_switch_arm64.go Line 25 in 7202dfc
|
See latest commits which adds support for Graviton 4. There is no need to change the remaining bytes left since all *_64/_64Xor functions still always process 64 bytes (and same for the other routines that always process 32 bytes at a time). Note however that with a smaller vector length (only case now being 16 for Graviton 4), it will always loop twice as much in the routines. So even when just processing 64 bytes in eg. |
Here are the testing results on Graviton 4 (as identified by the support for sve2 which Graviton 3 misses) $ lscpu | grep sve2
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
$
$ go test --short
Using ARM+SVE
PASS
ok github.com/klauspost/reedsolomon 3.071s |
AFAICT this should also have vector-length instead of 32 hardcoded: Line 582 in 7202dfc
.. otherwise the temporary slice will be too small.. |
Good point, but (for now) this is not the case since the vector length is capped at 32 bytes (see code below) plus the now vl-agnostic SVE assembly routines always either process a multiple of 64 or 32 bytes (exactly like the AVX2 code from which it is derived). if vl, _ := getVectorLength(); vl <= 256 {
// set vector length in bytes
defaultOptions.vectorLength = int(vl) >> 3
} else {
// disable SVE for hardware implementatons over 256 bits (only know to be Fujitsu A64FX atm)
defaultOptions.useSVE = false
} So the current implementation has minimal impact on the current code. However, once we get our hands on SVE hardware that supports vectors that are larger than 256 bits, we will indeed have to do some more "surgery" to enlarge buffers etc. The only known implementation ATM is the Fujitsu chip listed above, so it might be a while before we can get access and work on this .... |
This PR updates the ARM SVE code to be vector length agnostic (so no more "hardcoded" constants but using
addvl
etc. everywhere).Code has been tested on Graviton 3 with 256-bits vector length (so same vector length as is now supported).
Support for Graviton 4 (128-bits vector length) will be added in a separate PR but should not require any changes to the assembly code itself.