Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RISCV64 miscompile at -O2/-O1 #80052

Closed
patrick-rivos opened this issue Jan 30, 2024 · 7 comments · Fixed by #80242
Closed

RISCV64 miscompile at -O2/-O1 #80052

patrick-rivos opened this issue Jan 30, 2024 · 7 comments · Fixed by #80242

Comments

@patrick-rivos
Copy link
Contributor

Testcase:

int printf(const char *, ...);
int a[10000];
int b, c, d, e, f;
char g, h;
int j[100][100];
int m(long n) {
  b = ((b >> 4) ^ (b ^ n));
  b = ((b >> 4)) ^ a[(b ^ 0x0f) & 0x0F];
  b = ((b >> 4)) ^ a[(b ^ 0x0f) & 0x0F];
  b = ((b >> 4)) ^ a[(b ^ 0x0f) & 0x0F];
  printf("%s %lX\n", &g, b ^ 0xFFFFFFFFUL);
  return 0;
}
int main() {
  int i = 0;
  for (; c < 256; c++)
    a[c] = c;
  volatile int k = 3;
  h = 0;
  for (; (h <= 4); h += 1) {
    int *l = &d;
    *l = k;
    k = 0;
  }
  for (; i < 6; i++)
    for (; e < 1; e++) {
      m(j[i][e]);
      if (f)
        printf("%d%d\n", i, e);
    }
  return m(d);
}

Commands:

> /scratch/tc-testing/llvm-jan-24/build/bin/clang -O2 red.c -o user-config-O2.out
> QEMU_CPU=rv64 /scratch/tc-testing/llvm-jan-24/build/bin/qemu-riscv64 user-config-O2.out
 FFFFFFF0
 FFFFFFFC
> /scratch/tc-testing/llvm-jan-24/build/bin/clang -O0 red.c -o user-config-O0.out
> QEMU_CPU=rv64 /scratch/tc-testing/llvm-jan-24/build/bin/qemu-riscv64 user-config-O0.out
 FFFFFFF0
 FFFFFFFF

This raw testcase differs at -O2, but after feeding it through bugpoint, I got a .bc file that differs at -01.

> /scratch/tc-testing/llvm-jan-24/build/bin/clang red.c -O2 -o user-config.bc -emit-llvm -c
> /scratch/tc-testing/llvm-jan-24/build/build-llvm-linux/bin/bugpoint user-config.bc --run-llc
> /scratch/tc-testing/llvm-jan-24/build/build-llvm-linux/bin/clang bugpoint-reduced-named-md.bc -O0 -o bugpoint-O0.out
> QEMU_CPU=rv64 /scratch/tc-testing/llvm-jan-24/build/bin/qemu-riscv64 bugpoint-O0.out
 FFFFFFF0
 FFFFFFFF
> /scratch/tc-testing/llvm-jan-24/build/build-llvm-linux/bin/clang bugpoint-reduced-named-md.bc -O1 -o bugpoint-O1.out
> QEMU_CPU=rv64 /scratch/tc-testing/llvm-jan-24/build/bin/qemu-riscv64 bugpoint-O1.out
 FFFFFFF0
 FFFFFFFC

bugpoint.zip

@patrick-rivos
Copy link
Contributor Author

Found using 3a92b20. I just kicked off a fresh build of LLVM to confirm it's still on trunk.

@llvmbot
Copy link
Collaborator

llvmbot commented Jan 30, 2024

@llvm/issue-subscribers-backend-risc-v

Author: Patrick O'Neill (patrick-rivos)

Testcase: ```c int printf(const char *, ...); int a[10000]; int b, c, d, e, f; char g, h; int j[100][100]; int m(long n) { b = ((b >> 4) ^ (b ^ n)); b = ((b >> 4)) ^ a[(b ^ 0x0f) & 0x0F]; b = ((b >> 4)) ^ a[(b ^ 0x0f) & 0x0F]; b = ((b >> 4)) ^ a[(b ^ 0x0f) & 0x0F]; printf("%s %lX\n", &g, b ^ 0xFFFFFFFFUL); return 0; } int main() { int i = 0; for (; c < 256; c++) a[c] = c; volatile int k = 3; h = 0; for (; (h <= 4); h += 1) { int *l = &d; *l = k; k = 0; } for (; i < 6; i++) for (; e < 1; e++) { m(j[i][e]); if (f) printf("%d%d\n", i, e); } return m(d); } ``` Commands: ```bash > /scratch/tc-testing/llvm-jan-24/build/bin/clang -O2 red.c -o user-config-O2.out > QEMU_CPU=rv64 /scratch/tc-testing/llvm-jan-24/build/bin/qemu-riscv64 user-config-O2.out FFFFFFF0 FFFFFFFC > /scratch/tc-testing/llvm-jan-24/build/bin/clang -O0 red.c -o user-config-O0.out > QEMU_CPU=rv64 /scratch/tc-testing/llvm-jan-24/build/bin/qemu-riscv64 user-config-O0.out FFFFFFF0 FFFFFFFF ``` This raw testcase differs at -O2, but after feeding it through bugpoint, I got a .bc file that differs at -01. ```bash > /scratch/tc-testing/llvm-jan-24/build/bin/clang red.c -O2 -o user-config.bc -emit-llvm -c > /scratch/tc-testing/llvm-jan-24/build/build-llvm-linux/bin/bugpoint user-config.bc --run-llc > /scratch/tc-testing/llvm-jan-24/build/build-llvm-linux/bin/clang bugpoint-reduced-named-md.bc -O0 -o bugpoint-O0.out > QEMU_CPU=rv64 /scratch/tc-testing/llvm-jan-24/build/bin/qemu-riscv64 bugpoint-O0.out FFFFFFF0 FFFFFFFF > /scratch/tc-testing/llvm-jan-24/build/build-llvm-linux/bin/clang bugpoint-reduced-named-md.bc -O1 -o bugpoint-O1.out > QEMU_CPU=rv64 /scratch/tc-testing/llvm-jan-24/build/bin/qemu-riscv64 bugpoint-O1.out FFFFFFF0 FFFFFFFC ``` [bugpoint.zip](https://github.com/llvm/llvm-project/files/14103541/bugpoint.zip)

@topperc
Copy link
Collaborator

topperc commented Jan 30, 2024

opt-bisect-limit points at Stack Slot Coloring on function (main) on my relatively up to date downstream repo.

@topperc
Copy link
Collaborator

topperc commented Jan 30, 2024

I think this is related to using X0 as a sink for a volatile load that we don't need the result from. We have X0 = load followed by store X0 to the same stack slot. This makes StackSlotColoring::RemoveDeadStores think the the store is dead because the register matches.

@topperc
Copy link
Collaborator

topperc commented Jan 30, 2024

AArch64 won't replace a destination with the zero register if the instruction uses a FrameIndex. I think that was to fix a different issue, but maybe it hides this issue for them.

I commented out that code for AArch64, but still couldn't hit an issue because this same test case doesn't generate any spills for AArch64 so the stack coloring pass ended early without trying to remove dead stores.

@patrick-rivos
Copy link
Contributor Author

Here's the preprocessed unreduced testcase. It has a lot more going on so it might generate a spill for this case on AArch64.
unreduced-red.zip

@topperc
Copy link
Collaborator

topperc commented Jan 30, 2024

The two best fixes seem to be not deleting volatile stores or teaching the pass about constant physical registers.

topperc added a commit to topperc/llvm-project that referenced this issue Feb 1, 2024
The stack slot coloring pass is concerned with optimizing spill
slots. If any change is a pass is made over the function to remove
stack stores that use the same register and stack slot as an
immediately preceding load.

The register check is too simple for constant registers like AArch64
and RISC-V's zero register. This register can be used as the result
of a load if we want to discard the result, but still have the memory
access performed. Like for a volatile or atomic load.

If the code sees a load from the zero register followed by a store
of the zero register at the same stack slot, the pass mistakenly
believes the store isn't needed.

Since the main stack coloring optimization is only concerned with
spill slots, it seems reasonable that RemoveDeadStores should
only be concerned with spills. Since we never generate a reload of
x0, this avoids the issue seen by RISC-V.

Test case concept is adapted from pr30821.mir from X86. That test
had to be updated to mark the stack slot as a spill slot.

Fixes llvm#80052.
topperc added a commit that referenced this issue Feb 1, 2024
…0242)

The stack slot coloring pass is concerned with optimizing spill
slots. If any change is a pass is made over the function to remove
stack stores that use the same register and stack slot as an
immediately preceding load.
    
The register check is too simple for constant registers like AArch64
and RISC-V's zero register. This register can be used as the result
of a load if we want to discard the result, but still have the memory
access performed. Like for a volatile or atomic load.
    
If the code sees a load from the zero register followed by a store
of the zero register at the same stack slot, the pass mistakenly
believes the store isn't needed.
    
Since the main stack coloring optimization is only concerned with
spill slots, it seems reasonable that RemoveDeadStores should
only be concerned with spills. Since we never generate a reload of
x0, this avoids the issue seen by RISC-V.
    
Test case concept is adapted from pr30821.mir from X86. That test
had to be updated to mark the stack slot as a spill slot.
    
Fixes #80052.
agozillon pushed a commit to agozillon/llvm-project that referenced this issue Feb 5, 2024
…vm#80242)

The stack slot coloring pass is concerned with optimizing spill
slots. If any change is a pass is made over the function to remove
stack stores that use the same register and stack slot as an
immediately preceding load.
    
The register check is too simple for constant registers like AArch64
and RISC-V's zero register. This register can be used as the result
of a load if we want to discard the result, but still have the memory
access performed. Like for a volatile or atomic load.
    
If the code sees a load from the zero register followed by a store
of the zero register at the same stack slot, the pass mistakenly
believes the store isn't needed.
    
Since the main stack coloring optimization is only concerned with
spill slots, it seems reasonable that RemoveDeadStores should
only be concerned with spills. Since we never generate a reload of
x0, this avoids the issue seen by RISC-V.
    
Test case concept is adapted from pr30821.mir from X86. That test
had to be updated to mark the stack slot as a spill slot.
    
Fixes llvm#80052.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants