srem-related vector miscompile on AArch64 #77169

regehr · 2024-01-06T02:55:48Z

here's a function @f that I believe is being miscompiled and also a glue function @g that'll let us test it from C:

define <4 x i32> @f(<4 x i32> %0) {
  %2 = srem <4 x i32> %0, <i32 1, i32 1, i32 2, i32 3>
  %3 = icmp eq <4 x i32> %2, zeroinitializer
  %4 = zext <4 x i1> %3 to <4 x i32>
  ret <4 x i32> %4
}

define void @g(ptr %p) {
  %a = load <4 x i32>, ptr %p
  %b = call <4 x i32> @f(<4 x i32> %a);
  store <4 x i32> %b, ptr %p
  ret void
}

here's a test driver:

#include <stdio.h>

int x[] = {0, 0, -2147483648, 0};

void g(int *);

int main(void) {
  printf("%d %d %d %d\n", x[0], x[1], x[2], x[3]);
  g(x);
  printf("%d %d %d %d\n", x[0], x[1], x[2], x[3]);
}

on my M1 Mac it gives:

Johns-MacBook-Pro:build regehr$ ~/llvm-project/for-alive/bin/llc foo2.ll
Johns-MacBook-Pro:build regehr$ clang test.c foo2.s
Johns-MacBook-Pro:build regehr$ ./a.out 
0 0 -2147483648 0
1 1 0 1
Johns-MacBook-Pro:build regehr$

but, by inspection, we can see that the second line should have been 1 1 1 1

the codegen for this one is kind of lengthy, so I hope this explanation suffices, thanks

The text was updated successfully, but these errors were encountered:

llvmbot · 2024-01-06T02:56:02Z

@llvm/issue-subscribers-backend-aarch64

Author: John Regehr (regehr)

here's a function `@f` that I believe is being miscompiled and also a glue function `@g` that'll let us test it from C: ```llvm define <4 x i32> @f(<4 x i32> %0) { %2 = srem <4 x i32> %0, <i32 1, i32 1, i32 2, i32 3> %3 = icmp eq <4 x i32> %2, zeroinitializer %4 = zext <4 x i1> %3 to <4 x i32> ret <4 x i32> %4 }

define void @g(ptr %p) {
%a = load <4 x i32>, ptr %p
%b = call <4 x i32> @f(<4 x i32> %a);
store <4 x i32> %b, ptr %p
ret void
}


here's a test driver:
```c
#include &lt;stdio.h&gt;

int x[] = {0, 0, -2147483648, 0};

void g(int *);

int main(void) {
  printf("%d %d %d %d\n", x[0], x[1], x[2], x[3]);
  g(x);
  printf("%d %d %d %d\n", x[0], x[1], x[2], x[3]);
}

on my M1 Mac it gives:

Johns-MacBook-Pro:build regehr$ ~/llvm-project/for-alive/bin/llc foo2.ll
Johns-MacBook-Pro:build regehr$ clang test.c foo2.s
Johns-MacBook-Pro:build regehr$ ./a.out 
0 0 -2147483648 0
1 1 0 1
Johns-MacBook-Pro:build regehr$

but, by inspection, we can see that the second line should have been 1 1 1 1

the codegen for this one is kind of lengthy, so I hope this explanation suffices, thanks

topperc · 2024-01-06T22:04:57Z

I think this reproduces on X86 too?

regehr · 2024-01-06T22:12:23Z

it does repro on x86-64. sorry, I should have checked that last night, was being lazy

topperc · 2024-01-06T22:28:44Z

Looks like maybe an issue with the algorithm in TargetLowering::prepareSREMEqFold for N=INT_MIN and D is a power of 2. I think that comes from Hacker's Delight.

topperc · 2024-01-06T23:20:03Z

I think the issue is that q=mod(n*multiplicative_inverse(d0), 2^W) is not in the range [-2147483647, 2147483647] when n is -2147483648. So the thoerem that gives the rotate right trick doesn't apply.

CC: @RKSimon @LebedevRI

a power of two. The expansion previously used, derived from Hacker's Delight, does not work correctly when the dividend is INT_MIN and the divisor is a power of two. We now use an alternate derivation of the A and Q constants specifically for the power-of-two divisor case to avoid this problem. Credit to Fabian Giesen for the new derivation. Fixes llvm#77169

a power of two. The expansion previously used, derived from Hacker's Delight, does not work correctly when the dividend is INT_MIN and the divisor is a power of two. We now use an alternate derivation of the A and Q constants specifically for the power-of-two divisor case to avoid this problem. Credit to Fabian Giesen for the new derivation. I have exhaustively tested the new derivation at W=32 for all values of N and all D=2^K. Fixes llvm#77169

…er of two. (#82706) The expansion previously used, derived from Hacker's Delight, does not work correctly when the dividend is INT_MIN and the divisor is a power of two. We now use an alternate derivation of the A and Q constants specifically for the power-of-two divisor case to avoid this problem. Credit to Fabian Giesen for the new derivation. Fixes #77169

regehr added backend:AArch64 llvm:codegen miscompilation NEON ARM NEON labels Jan 6, 2024

resistor mentioned this issue Feb 22, 2024

Fix non-splat vector SREM expansion when one of the divisors is a power of two. #82706

Merged

resistor closed this as completed in #82706 Feb 25, 2024

EugeneZelenko added llvm:SelectionDAG SelectionDAGISel as well and removed backend:AArch64 llvm:codegen NEON ARM NEON labels Feb 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

srem-related vector miscompile on AArch64 #77169

srem-related vector miscompile on AArch64 #77169

regehr commented Jan 6, 2024

llvmbot commented Jan 6, 2024

topperc commented Jan 6, 2024

regehr commented Jan 6, 2024

topperc commented Jan 6, 2024 •

edited

topperc commented Jan 6, 2024

srem-related vector miscompile on AArch64 #77169

srem-related vector miscompile on AArch64 #77169

Comments

regehr commented Jan 6, 2024

llvmbot commented Jan 6, 2024

topperc commented Jan 6, 2024

regehr commented Jan 6, 2024

topperc commented Jan 6, 2024 • edited

topperc commented Jan 6, 2024

topperc commented Jan 6, 2024 •

edited