Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

srem-related vector miscompile on AArch64 #77169

Closed
regehr opened this issue Jan 6, 2024 · 5 comments · Fixed by #82706
Closed

srem-related vector miscompile on AArch64 #77169

regehr opened this issue Jan 6, 2024 · 5 comments · Fixed by #82706
Labels
llvm:SelectionDAG SelectionDAGISel as well miscompilation

Comments

@regehr
Copy link
Contributor

regehr commented Jan 6, 2024

here's a function @f that I believe is being miscompiled and also a glue function @g that'll let us test it from C:

define <4 x i32> @f(<4 x i32> %0) {
  %2 = srem <4 x i32> %0, <i32 1, i32 1, i32 2, i32 3>
  %3 = icmp eq <4 x i32> %2, zeroinitializer
  %4 = zext <4 x i1> %3 to <4 x i32>
  ret <4 x i32> %4
}

define void @g(ptr %p) {
  %a = load <4 x i32>, ptr %p
  %b = call <4 x i32> @f(<4 x i32> %a);
  store <4 x i32> %b, ptr %p
  ret void
}

here's a test driver:

#include <stdio.h>

int x[] = {0, 0, -2147483648, 0};

void g(int *);

int main(void) {
  printf("%d %d %d %d\n", x[0], x[1], x[2], x[3]);
  g(x);
  printf("%d %d %d %d\n", x[0], x[1], x[2], x[3]);
}

on my M1 Mac it gives:

Johns-MacBook-Pro:build regehr$ ~/llvm-project/for-alive/bin/llc foo2.ll
Johns-MacBook-Pro:build regehr$ clang test.c foo2.s
Johns-MacBook-Pro:build regehr$ ./a.out 
0 0 -2147483648 0
1 1 0 1
Johns-MacBook-Pro:build regehr$ 

but, by inspection, we can see that the second line should have been 1 1 1 1

the codegen for this one is kind of lengthy, so I hope this explanation suffices, thanks

@llvmbot
Copy link
Collaborator

llvmbot commented Jan 6, 2024

@llvm/issue-subscribers-backend-aarch64

Author: John Regehr (regehr)

here's a function `@f` that I believe is being miscompiled and also a glue function `@g` that'll let us test it from C: ```llvm define <4 x i32> @f(<4 x i32> %0) { %2 = srem <4 x i32> %0, <i32 1, i32 1, i32 2, i32 3> %3 = icmp eq <4 x i32> %2, zeroinitializer %4 = zext <4 x i1> %3 to <4 x i32> ret <4 x i32> %4 }

define void @g(ptr %p) {
%a = load <4 x i32>, ptr %p
%b = call <4 x i32> @f(<4 x i32> %a);
store <4 x i32> %b, ptr %p
ret void
}


here's a test driver:
```c
#include &lt;stdio.h&gt;

int x[] = {0, 0, -2147483648, 0};

void g(int *);

int main(void) {
  printf("%d %d %d %d\n", x[0], x[1], x[2], x[3]);
  g(x);
  printf("%d %d %d %d\n", x[0], x[1], x[2], x[3]);
}

on my M1 Mac it gives:

Johns-MacBook-Pro:build regehr$ ~/llvm-project/for-alive/bin/llc foo2.ll
Johns-MacBook-Pro:build regehr$ clang test.c foo2.s
Johns-MacBook-Pro:build regehr$ ./a.out 
0 0 -2147483648 0
1 1 0 1
Johns-MacBook-Pro:build regehr$ 

but, by inspection, we can see that the second line should have been 1 1 1 1

the codegen for this one is kind of lengthy, so I hope this explanation suffices, thanks

@topperc
Copy link
Collaborator

topperc commented Jan 6, 2024

I think this reproduces on X86 too?

@regehr
Copy link
Contributor Author

regehr commented Jan 6, 2024

it does repro on x86-64. sorry, I should have checked that last night, was being lazy

@topperc
Copy link
Collaborator

topperc commented Jan 6, 2024

Looks like maybe an issue with the algorithm in TargetLowering::prepareSREMEqFold for N=INT_MIN and D is a power of 2. I think that comes from Hacker's Delight.

@topperc
Copy link
Collaborator

topperc commented Jan 6, 2024

I think the issue is that q=mod(n*multiplicative_inverse(d0), 2^W) is not in the range [-2147483647, 2147483647] when n is -2147483648. So the thoerem that gives the rotate right trick doesn't apply.

CC: @RKSimon @LebedevRI

resistor added a commit to resistor/llvm-project that referenced this issue Feb 22, 2024
a power of two.

The expansion previously used, derived from Hacker's Delight,
does not work correctly when the dividend is INT_MIN and the
divisor is a power of two. We now use an alternate derivation
of the A and Q constants specifically for the power-of-two divisor
case to avoid this problem. Credit to Fabian Giesen for the
new derivation.

Fixes llvm#77169
resistor added a commit to resistor/llvm-project that referenced this issue Feb 25, 2024
a power of two.

The expansion previously used, derived from Hacker's Delight,
does not work correctly when the dividend is INT_MIN and the
divisor is a power of two. We now use an alternate derivation
of the A and Q constants specifically for the power-of-two divisor
case to avoid this problem. Credit to Fabian Giesen for the
new derivation. I have exhaustively tested the new derivation
at W=32 for all values of N and all D=2^K.

Fixes llvm#77169
resistor added a commit that referenced this issue Feb 25, 2024
…er of two. (#82706)

The expansion previously used, derived from Hacker's Delight,
does not work correctly when the dividend is INT_MIN and the
divisor is a power of two. We now use an alternate derivation
of the A and Q constants specifically for the power-of-two divisor
case to avoid this problem. Credit to Fabian Giesen for the
new derivation.

Fixes #77169
@EugeneZelenko EugeneZelenko added llvm:SelectionDAG SelectionDAGISel as well and removed backend:AArch64 llvm:codegen NEON ARM NEON labels Feb 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:SelectionDAG SelectionDAGISel as well miscompilation
Projects
None yet
4 participants