Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NEON intrinsics are broken on big-endian #1484

Open
Amanieu opened this issue Oct 19, 2023 · 4 comments
Open

NEON intrinsics are broken on big-endian #1484

Amanieu opened this issue Oct 19, 2023 · 4 comments

Comments

@Amanieu
Copy link
Member

Amanieu commented Oct 19, 2023

These are currently broken because the order of elements inside vectors is reversed on big-endian systems: the ARM ABI requires that element 0 is located at the highest address of the vector type. However LLVM intrinsics expect element 0 to be located at the lowest address.

See https://llvm.org/docs/BigEndianNEON.html and arm_neon.h in Clang for more details.

@RalfJung
Copy link
Member

the ARM ABI requires that element 0 is located at the highest address of the vector type. However LLVM intrinsics expect element 0 to be located at the lowest address.

What exactly does this mean? Is there a bug in LLVM? If so, where is it tracked?

Or is the problem that Rust stdarch wants to expose the intrinsics the way they work on hardware, but LLVM doesn't provide those semantics? If so, could that be fixed by doing appropriate translation of indices before calling the intrinsics?

@Amanieu
Copy link
Member Author

Amanieu commented Feb 14, 2024

The short answer is that, on big-endian, LLVM portable vectors have a different element ordering than the one in the vector types used by the NEON intrinsics.

The C intrinsics work around this by reversing the element ordering in vectors before & after each intrinsic. We need to do the same in stdarch.

@RalfJung
Copy link
Member

Oh I see, so this is a mismatch about the simd_x intrinsics vs vendor-specific intrinsics? Okay makes sense.

OTOH this is good news for portable-simd, seems like there we'll be getting consistent behavior across platforms without extra work then.

calebzulawski pushed a commit to rust-lang/portable-simd that referenced this issue Feb 17, 2024
calebzulawski pushed a commit to rust-lang/portable-simd that referenced this issue Feb 17, 2024
calebzulawski pushed a commit to rust-lang/portable-simd that referenced this issue Feb 17, 2024
calebzulawski pushed a commit to rust-lang/portable-simd that referenced this issue Feb 17, 2024
calebzulawski pushed a commit to rust-lang/portable-simd that referenced this issue Apr 9, 2024
calebzulawski pushed a commit to rust-lang/portable-simd that referenced this issue Apr 9, 2024
he32 added a commit to he32/memchr that referenced this issue Sep 29, 2024
As noted in rust-lang/stdarch#1484,
the NEON intrinsics are broken on big-endian aarch64.

This is part of fixing rust to build for & on big-endian aarch64,
following up rust-lang/rust#129819.
he32 added a commit to he32/zerocopy that referenced this issue Oct 2, 2024
Neon / SIMD is known to be problematical in rust, ref.
rust-lang/stdarch#1484, even
though the CPU itself supports it.
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue Oct 2, 2024
This is done by avoiding attempts at using neon / SIMD in
big-endian mode by patching some of the vendored crates.
Neon / SIMD is known to be problematical in rust, ref.
rust-lang/stdarch#1484, even
though the CPU itself supports it.

I've also tried reporting the memchr fixes upstream, ref.
BurntSushi/memchr#162
So far not yet adopted.

Zerocopy has also received a pull request:
google/zerocopy#1795
he32 added a commit to he32/bytecount that referenced this issue Oct 2, 2024
Do this by avoiding trying to use neon / SIMD on big-endian aarch64.
Neon intrinsics are problematical on big-endian targets, ref.
rust-lang/stdarch#1484
@workingjubilee
Copy link
Member

@he32

I currently appear unable to find the actual connecting tissue between library/stdarch/crates/core_arch/src/aarch64/neon/mod.rs and LLVM

The actual place that intrinsics themselves are handled is in two places: if it's an architecture-specific intrinsic, it uses link_llvm_intrinsics, which effectively specifies a lowering directly to LLVM textual IR. Otherwise, if it's one of rustc's "portable" intrinsics (simd_add and the like), the primary definition is in rustc_codegen_llvm: https://github.com/rust-lang/rust/blob/master/compiler/rustc_codegen_llvm/src/intrinsic.rs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants