Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure in run-pass/simd/simd-intrinsic-generic-select on Big-Endian Targets #59356

Closed
smaeul opened this issue Mar 22, 2019 · 2 comments · Fixed by #62774
Closed

Failure in run-pass/simd/simd-intrinsic-generic-select on Big-Endian Targets #59356

smaeul opened this issue Mar 22, 2019 · 2 comments · Fixed by #62774
Labels
A-simd Area: SIMD (Single Instruction Multiple Data) C-bug Category: This is a bug. O-PowerPC Target: PowerPC processors

Comments

@smaeul
Copy link
Contributor

smaeul commented Mar 22, 2019

This was found while building rustc 1.33.0 on powerpc64-unknown-linux-musl. Test run-pass/simd/simd-intrinsic-generic-select fails at the third simd_select_bitmask test:

thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `u32x8(8, 1, 10, 3, 12, 5, 14, 7)`,
 right: `u32x8(0, 9, 2, 11, 4, 13, 6, 15)`', src/test/run-pass/simd/simd-intrinsic-generic-select.rs:159:9

The LLVM IR generated for this test case on x86_64-unknown-linux-musl and powerpc64-unknown-linux-musl (respectively) is identical:

  %1036 = load <8 x i32>, <8 x i32>* %a83, align 32
  %1037 = load <8 x i32>, <8 x i32>* %b84, align 32
  %1038 = select <8 x i1> bitcast (<1 x i8> <i8 -16> to <8 x i1>), <8 x i32> %1036, <8 x i32> %1037
  store <8 x i32> %1038, <8 x i32>* %r101, align 32
  %1036 = load <8 x i32>, <8 x i32>* %a83, align 32
  %1037 = load <8 x i32>, <8 x i32>* %b84, align 32
  %1038 = select <8 x i1> bitcast (<1 x i8> <i8 -16> to <8 x i1>), <8 x i32> %1036, <8 x i32> %1037
  store <8 x i32> %1038, <8 x i32>* %r101, align 32

The test appears to expect that the bitmask is interpreted as bitwise little-endian (in other words, the LSB selects the first element from the vectors). However, the implementation uses a bitcast to a vector of i1. On big-endian architectures such as powerpc64, the LSB becomes the last element of this i1 vector, not the first.

Unfortunately, all of the upstream test cases are symmetrical, so "choosing the wrong vector" and "reading the bitmask backwards" are indistinguishable. I added an additional test case:

        let r: u32x8 = simd_select_bitmask(0b11110101u8, a, b);
        let e = u32x8(0, 9, 2, 11, 4, 5, 6, 7);
        assert_eq!(r, e);

This passes on x86_64-unknown-linux-musl, but fails on powerpc64-unknown-linux-musl with:

thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `u32x8(0, 1, 2, 3, 12, 5, 14, 7)`,
 right: `u32x8(0, 9, 2, 11, 4, 5, 6, 7)`', src/test/run-pass/simd/simd-intrinsic-generic-select.rs:50:9

The two "unlike" elements were chosen from the wrong place in the vector.

Since the vectors are 256 bits, and the POWER VSX registers are only 128 bits wide, LLVM must split each vector across two registers. The following powerpc64 assembly is from the last test case (0b11110000u8), and clearly shows that LLVM (not the hardware) picks the first half of a and the second half of b:

   0x000000000000416c <+36>:    li      r3,0                                                                                                                                                   

36          unsafe {
37              let a = u32x8(0, 1, 2, 3, 4, 5, 6, 7);
   0x0000000000004170 <+40>:    stw     r3,288(r1)                                                                                                                                             
   0x0000000000004174 <+44>:    li      r3,1                                                                                                                                                   
   0x0000000000004178 <+48>:    stw     r3,292(r1)
   0x000000000000417c <+52>:    li      r3,2
   0x0000000000004180 <+56>:    stw     r3,296(r1)
   0x0000000000004184 <+60>:    li      r3,3
   0x0000000000004188 <+64>:    stw     r3,300(r1)
   0x000000000000418c <+68>:    li      r3,4
   0x0000000000004190 <+72>:    stw     r3,304(r1)                                                                                                                                             
   0x0000000000004194 <+76>:    li      r3,5
   0x0000000000004198 <+80>:    stw     r3,308(r1)
   0x000000000000419c <+84>:    li      r3,6
   0x00000000000041a0 <+88>:    stw     r3,312(r1)
   0x00000000000041a4 <+92>:    li      r3,7                                                                                                                                                   
   0x00000000000041a8 <+96>:    stw     r3,316(r1)                                                                                                                                             
   0x00000000000041ac <+100>:   li      r3,8

38              let b = u32x8(8, 9, 10, 11, 12, 13, 14, 15);
   0x00000000000041b0 <+104>:   stw     r3,320(r1)
   0x00000000000041b4 <+108>:   li      r3,9
   0x00000000000041b8 <+112>:   stw     r3,324(r1)                                                                                                                                             
   0x00000000000041bc <+116>:   li      r3,10
   0x00000000000041c0 <+120>:   stw     r3,328(r1)                                                                                                                                             
   0x00000000000041c4 <+124>:   li      r3,11
   0x00000000000041c8 <+128>:   stw     r3,332(r1)
   0x00000000000041cc <+132>:   addi    r3,r1,336
   0x00000000000041d0 <+136>:   li      r4,12
   0x00000000000041d4 <+140>:   stw     r4,336(r1)
   0x00000000000041d8 <+144>:   li      r4,13                                                                                                                                                  
   0x00000000000041dc <+148>:   stw     r4,340(r1)
   0x00000000000041e0 <+152>:   li      r4,14
   0x00000000000041e4 <+156>:   stw     r4,344(r1)
   0x00000000000041e8 <+160>:   li      r4,15
   0x00000000000041ec <+164>:   stw     r4,348(r1)
...
60              let r: u32x8 = simd_select_bitmask(0b11110000u8, a, b);
   0x00000000000048cc <+1924>:  addi    r3,r1,288
   0x00000000000048d0 <+1928>:  lvx     v2,0,r3
   0x00000000000048d4 <+1932>:  addi    r3,r1,336
   0x00000000000048d8 <+1936>:  lvx     v3,0,r3
   0x00000000000048dc <+1940>:  addi    r3,r1,1488
   0x00000000000048e0 <+1944>:  stvx    v3,0,r3
   0x00000000000048e4 <+1948>:  addi    r3,r1,1472
   0x00000000000048e8 <+1952>:  stvx    v2,0,r3

So is this a bug in the test, because it should be ensuring that the bitmask is in native vector/endian order? Or in the implementation of simd_select_bitmask, because it should always take a little-endian bitmask and reverse the bits as necessary?

@smaeul
Copy link
Contributor Author

smaeul commented Mar 22, 2019

From rust-lang/stdarch#310 (comment) it looks like that the bitcast is the intended semantics, so that would mean the test is invalid on big-endian targets.

Thoughts, @gnzlbg @alexcrichton ?

@jonas-schievink jonas-schievink added O-PowerPC Target: PowerPC processors A-simd Area: SIMD (Single Instruction Multiple Data) C-bug Category: This is a bug. labels Mar 22, 2019
@gnzlbg
Copy link
Contributor

gnzlbg commented Mar 25, 2019

So is this a bug in the test, because it should be ensuring that the bitmask is in native vector/endian order?

This, we should only enable the test for little endian platforms, maybe adding a new test for big endian platforms.

smaeul added a commit to smaeul/rust that referenced this issue May 27, 2019
c-link-to-rust-va-list-fn: unstable feature, broken on aarch64, rust-lang#56475
env-funky-keys: can't handle LD_PRELOAD (e.g. sandbox)
long-linker-command-lines: takes >10 minutes to run (but still passes)
simd-intrinsic-generic-bitmask.rs: broken on BE, rust-lang#59356
simd-intrinsic-generic-select.rs: broken on BE, rust-lang#59356
sparc-struct-abi: no sparc target
sysroot-crates-are-unstable: can't run rustc without RPATH
smaeul added a commit to smaeul/rust that referenced this issue Jul 18, 2019
Per rust-lang#59356 it is expected that the interpretation of the bitmask depends
on target endianness.

Closes rust-lang#59356
Mark-Simulacrum added a commit to Mark-Simulacrum/rust that referenced this issue Jul 18, 2019
Disable simd_select_bitmask test on big endian

Per rust-lang#59356 it is expected that the interpretation of the bitmask depends
on target endianness.

Closes rust-lang#59356
smaeul added a commit to smaeul/rust that referenced this issue Nov 5, 2019
c-link-to-rust-va-list-fn: unstable feature, broken on aarch64, rust-lang#56475
env-funky-keys: can't handle LD_PRELOAD (e.g. sandbox)
long-linker-command-lines: takes >10 minutes to run (but still passes)
simd-intrinsic-generic-bitmask.rs: broken on BE, rust-lang#59356
sparc-struct-abi: no sparc target
sysroot-crates-are-unstable: can't run rustc without RPATH
smaeul added a commit to smaeul/rust that referenced this issue Mar 14, 2020
c-link-to-rust-va-list-fn: unstable feature, broken on aarch64, rust-lang#56475
env-funky-keys: can't handle LD_PRELOAD (e.g. sandbox)
long-linker-command-lines: takes >10 minutes to run (but still passes)
simd-intrinsic-generic-bitmask.rs: broken on BE, rust-lang#59356
sparc-struct-abi: no sparc target
sysroot-crates-are-unstable: can't run rustc without RPATH
smaeul added a commit to smaeul/rust that referenced this issue Mar 14, 2020
c-link-to-rust-va-list-fn: unstable feature, broken on aarch64, rust-lang#56475
env-funky-keys: can't handle LD_PRELOAD (e.g. sandbox)
long-linker-command-lines: takes >10 minutes to run (but still passes)
simd-intrinsic-generic-bitmask.rs: broken on BE, rust-lang#59356
sparc-struct-abi: no sparc target
sysroot-crates-are-unstable: can't run rustc without RPATH
smaeul added a commit to smaeul/rust that referenced this issue Nov 2, 2020
c-link-to-rust-va-list-fn: unstable feature, broken on aarch64, rust-lang#56475
env-funky-keys: can't handle LD_PRELOAD (e.g. sandbox)
long-linker-command-lines: takes >10 minutes to run (but still passes)
simd-intrinsic-generic-bitmask.rs: broken on BE, rust-lang#59356
sparc-struct-abi: no sparc target
sysroot-crates-are-unstable: can't run rustc without RPATH
smaeul added a commit to smaeul/rust that referenced this issue Nov 2, 2020
c-link-to-rust-va-list-fn: unstable feature, broken on aarch64, rust-lang#56475
env-funky-keys: can't handle LD_PRELOAD (e.g. sandbox)
long-linker-command-lines: takes >10 minutes to run (but still passes)
simd-intrinsic-generic-bitmask.rs: broken on BE, rust-lang#59356
sparc-struct-abi: no sparc target
sysroot-crates-are-unstable: can't run rustc without RPATH
smaeul added a commit to smaeul/rust that referenced this issue Jun 24, 2024
c-link-to-rust-va-list-fn: unstable feature, broken on aarch64, rust-lang#56475
env-funky-keys: can't handle LD_PRELOAD (e.g. sandbox)
long-linker-command-lines: takes >10 minutes to run (but still passes)
simd-intrinsic-generic-bitmask.rs: broken on BE, rust-lang#59356
sparc-struct-abi: no sparc target
sysroot-crates-are-unstable: can't run rustc without RPATH
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-simd Area: SIMD (Single Instruction Multiple Data) C-bug Category: This is a bug. O-PowerPC Target: PowerPC processors
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants