Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RISC-V] Fix crash with late stack realignment requirement #83496

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

nemanjai
Copy link
Member

In order to performa a stack re-alignment, the RISC-V back end requires the frame pointer. Of course, the frame pointer rquires a register to be reserved. The decision on which registers to reserve is made prior to any spilling introduced by RA. If the spill has to spill a register that of a class that requires greater alignment than the stack alignment, this will trigger a requirement to re-align the stack but the FP register hasn't been reserved.
This patch adds a check for any use of a register that could cause stack re-alignment if it were to be spilled.

Currently, the only situation that triggers this issue is when we have -Zdinx with the E extension (for which we align the stack to 4 bytes).

In order to performa a stack re-alignment, the RISC-V back end
requires the frame pointer. Of course, the frame pointer
rquires a register to be reserved. The decision on which
registers to reserve is made prior to any spilling introduced
by RA. If the spill has to spill a register that of a class
that requires greater alignment than the stack alignment,
this will trigger a requirement to re-align the stack but
the FP register hasn't been reserved.
This patch adds a check for any use of a register that
could cause stack re-alignment if it were to be spilled.

Currently, the only situation that triggers this issue
is when we have -Zdinx with the E extension (for which
we align the stack to 4 bytes).
@llvmbot
Copy link
Collaborator

llvmbot commented Feb 29, 2024

@llvm/pr-subscribers-backend-risc-v

Author: Nemanja Ivanovic (nemanjai)

Changes

In order to performa a stack re-alignment, the RISC-V back end requires the frame pointer. Of course, the frame pointer rquires a register to be reserved. The decision on which registers to reserve is made prior to any spilling introduced by RA. If the spill has to spill a register that of a class that requires greater alignment than the stack alignment, this will trigger a requirement to re-align the stack but the FP register hasn't been reserved.
This patch adds a check for any use of a register that could cause stack re-alignment if it were to be spilled.

Currently, the only situation that triggers this issue is when we have -Zdinx with the E extension (for which we align the stack to 4 bytes).


Full diff: https://github.com/llvm/llvm-project/pull/83496.diff

3 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVFrameLowering.cpp (+33-1)
  • (modified) llvm/lib/Target/RISCV/RISCVFrameLowering.h (+3)
  • (added) llvm/test/CodeGen/RISCV/zdinx-realignment-with-e.ll (+64)
diff --git a/llvm/lib/Target/RISCV/RISCVFrameLowering.cpp b/llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
index 8bac41372b5a83..26cf6a24d15cca 100644
--- a/llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
@@ -306,9 +306,12 @@ static Register getMaxPushPopReg(const MachineFunction &MF,
 // variable sized allocas, or if the frame address is taken.
 bool RISCVFrameLowering::hasFP(const MachineFunction &MF) const {
   const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
-
+  bool HasExtE =
+      MF.getSubtarget<RISCVSubtarget>().getTargetABI() == RISCVABI::ABI_ILP32E;
   const MachineFrameInfo &MFI = MF.getFrameInfo();
+
   return MF.getTarget().Options.DisableFramePointerElim(MF) ||
+         (HasExtE && maxPossibleSpillAlign(MF) > Align(4)) ||
          RegInfo->hasStackRealignment(MF) || MFI.hasVarSizedObjects() ||
          MFI.isFrameAddressTaken();
 }
@@ -1175,6 +1178,35 @@ static unsigned estimateFunctionSizeInBytes(const MachineFunction &MF,
   return FnSize;
 }
 
+Align RISCVFrameLowering::maxPossibleSpillAlign(
+    const MachineFunction &MF) const {
+  if (MaxSpillAlign.contains(&MF))
+    return MaxSpillAlign.at(&MF);
+
+  const MachineRegisterInfo &MRI = MF.getRegInfo();
+  const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();
+  Align CurrMaxAlign = Align(1);
+  // The maximum spill alignment has not yet been computed. Compute it.
+  for (const MachineBasicBlock &MBB : MF) {
+    for (const MachineInstr &MI : MBB) {
+      for (const MachineOperand &MO : MI.operands()) {
+        if (!MO.isReg())
+          continue;
+        Register Reg = MO.getReg();
+        const TargetRegisterClass *RC;
+        if (Reg.isPhysical())
+          RC = TRI->getMinimalPhysRegClass(Reg);
+        else
+          RC = MRI.getRegClassOrNull(MO.getReg());
+        if (RC && TRI->getSpillAlign(*RC) > CurrMaxAlign)
+          CurrMaxAlign = TRI->getSpillAlign(*RC);
+      }
+    }
+  }
+  MaxSpillAlign[&MF] = CurrMaxAlign;
+  return CurrMaxAlign;
+}
+
 void RISCVFrameLowering::processFunctionBeforeFrameFinalized(
     MachineFunction &MF, RegScavenger *RS) const {
   const RISCVRegisterInfo *RegInfo =
diff --git a/llvm/lib/Target/RISCV/RISCVFrameLowering.h b/llvm/lib/Target/RISCV/RISCVFrameLowering.h
index 210f8c1064724a..cb5dbf82dc1b11 100644
--- a/llvm/lib/Target/RISCV/RISCVFrameLowering.h
+++ b/llvm/lib/Target/RISCV/RISCVFrameLowering.h
@@ -13,6 +13,7 @@
 #ifndef LLVM_LIB_TARGET_RISCV_RISCVFRAMELOWERING_H
 #define LLVM_LIB_TARGET_RISCV_RISCVFRAMELOWERING_H
 
+#include "llvm/ADT/DenseMap.h"
 #include "llvm/CodeGen/TargetFrameLowering.h"
 #include "llvm/Support/TypeSize.h"
 
@@ -37,6 +38,7 @@ class RISCVFrameLowering : public TargetFrameLowering {
   void processFunctionBeforeFrameFinalized(MachineFunction &MF,
                                            RegScavenger *RS) const override;
 
+  Align maxPossibleSpillAlign(const MachineFunction &MF) const;
   bool hasFP(const MachineFunction &MF) const override;
 
   bool hasBP(const MachineFunction &MF) const;
@@ -82,6 +84,7 @@ class RISCVFrameLowering : public TargetFrameLowering {
 
 protected:
   const RISCVSubtarget &STI;
+  mutable llvm::DenseMap<const MachineFunction *, Align> MaxSpillAlign;
 
 private:
   void determineFrameLayout(MachineFunction &MF) const;
diff --git a/llvm/test/CodeGen/RISCV/zdinx-realignment-with-e.ll b/llvm/test/CodeGen/RISCV/zdinx-realignment-with-e.ll
new file mode 100644
index 00000000000000..1b0e6d1967ff92
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/zdinx-realignment-with-e.ll
@@ -0,0 +1,64 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 2
+; RUN: llc --mtriple=riscv32 --mattr=+e,+zdinx --target-abi ilp32e \
+; RUN:   --verify-machineinstrs < %s | FileCheck %s
+define void @func(ptr %__first, ptr %__start, ptr %add.ptr23) #0 {
+; CHECK-LABEL: func:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    addi sp, sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw ra, 20(sp) # 4-byte Folded Spill
+; CHECK-NEXT:    sw s0, 16(sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset ra, -4
+; CHECK-NEXT:    .cfi_offset s0, -8
+; CHECK-NEXT:    addi s0, sp, 24
+; CHECK-NEXT:    .cfi_def_cfa s0, 0
+; CHECK-NEXT:    andi sp, sp, -8
+; CHECK-NEXT:    lui a1, %hi(.LCPI0_0)
+; CHECK-NEXT:    lw a4, %lo(.LCPI0_0)(a1)
+; CHECK-NEXT:    lw a5, %lo(.LCPI0_0+4)(a1)
+; CHECK-NEXT:    sw a4, 8(sp)
+; CHECK-NEXT:    sw a5, 12(sp)
+; CHECK-NEXT:    lui a1, %hi(.LCPI0_1)
+; CHECK-NEXT:    lw a4, %lo(.LCPI0_1)(a1)
+; CHECK-NEXT:    lw a5, %lo(.LCPI0_1+4)(a1)
+; CHECK-NEXT:    sw a4, 0(sp)
+; CHECK-NEXT:    sw a5, 4(sp)
+; CHECK-NEXT:  .LBB0_1: # %do.body
+; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
+; CHECK-NEXT:    lw a4, 0(zero)
+; CHECK-NEXT:    lw a5, 4(zero)
+; CHECK-NEXT:    fcvt.d.w t1, zero
+; CHECK-NEXT:    sw t1, 0(a0)
+; CHECK-NEXT:    sw t2, 4(a0)
+; CHECK-NEXT:    lw t1, 8(sp)
+; CHECK-NEXT:    lw t2, 12(sp)
+; CHECK-NEXT:    flt.d a1, t1, a4
+; CHECK-NEXT:    neg a1, a1
+; CHECK-NEXT:    and a1, a1, a2
+; CHECK-NEXT:    lw a4, 0(a1)
+; CHECK-NEXT:    lw a5, 4(a1)
+; CHECK-NEXT:    lw t1, 0(sp)
+; CHECK-NEXT:    lw t2, 4(sp)
+; CHECK-NEXT:    flt.d a1, a4, t1
+; CHECK-NEXT:    beqz a1, .LBB0_1
+; CHECK-NEXT:  # %bb.2: # %do.end
+; CHECK-NEXT:    addi sp, s0, -24
+; CHECK-NEXT:    lw ra, 20(sp) # 4-byte Folded Reload
+; CHECK-NEXT:    lw s0, 16(sp) # 4-byte Folded Reload
+; CHECK-NEXT:    addi sp, sp, 24
+; CHECK-NEXT:    ret
+entry:
+  br label %do.body
+
+do.body:                                          ; preds = %do.body, %entry
+  store double 0.000000e+00, ptr %__first, align 8
+  %0 = load double, ptr null, align 8
+  %cmp.i51 = fcmp olt double 0x7FF8000000000000, %0
+  %__child_i.2 = select i1 %cmp.i51, ptr %add.ptr23, ptr null
+  %1 = load double, ptr %__child_i.2, align 8
+  %cmp.i52 = fcmp olt double %1, 1.000000e+00
+  br i1 %cmp.i52, label %do.end, label %do.body
+
+do.end:                                           ; preds = %do.body
+  ret void
+}

Copy link
Member Author

@nemanjai nemanjai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another possible solution would be to change the alignment requirement for GPRPair to 32. Perhaps this would need to only be done when the E extension is enabled, so maybe it would require a separate register class.
In any case, thought I'd propose these suggestions and we can converge on some solution to this problem.

continue;
Register Reg = MO.getReg();
const TargetRegisterClass *RC;
if (Reg.isPhysical())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking, this is not needed to resolve the observed problem as we won't insert a spill of a physical register. However, I implemented it this way to catch any use that might require a realignment - I guess a paranoid position.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decided to remove it after all. It fails on implicit defs that don't have a register class anyway.

@topperc
Copy link
Collaborator

topperc commented Mar 1, 2024

Another possible solution would be to change the alignment requirement for GPRPair to 32. Perhaps this would need to only be done when the E extension is enabled, so maybe it would require a separate register class. In any case, thought I'd propose these suggestions and we can converge on some solution to this problem.

It should have an alignment of 32, but then we have potential issue spilling a GPRPair. We need to be able to calculate the address of the second half of the spill slot without needing a temporary register. By having the entire spill slot aligned to 8, we guarantee that we can fold +4 into the load/store address in RISCVExpandPseudo::expandRV32ZdinxStore and RISCVExpandPseudo::expandRV32ZdinxLoad. Maybe there's a better way to solve that?

Copy link
Contributor

@wangpc-pp wangpc-pp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this problem! I know there will be a lot of problems when ILP32E meets with latest ratified extensions…

@@ -306,9 +306,12 @@ static Register getMaxPushPopReg(const MachineFunction &MF,
// variable sized allocas, or if the frame address is taken.
bool RISCVFrameLowering::hasFP(const MachineFunction &MF) const {
const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();

bool HasExtE =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HasExtE -> HasEABI?

return MaxSpillAlign.at(&MF);

const MachineRegisterInfo &MRI = MF.getRegInfo();
const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();
const TargetRegisterInfo *TRI = STI.getRegisterInfo();

@nemanjai
Copy link
Member Author

nemanjai commented Mar 1, 2024

Another possible solution would be to change the alignment requirement for GPRPair to 32. Perhaps this would need to only be done when the E extension is enabled, so maybe it would require a separate register class. In any case, thought I'd propose these suggestions and we can converge on some solution to this problem.

It should have an alignment of 32, but then we have potential issue spilling a GPRPair. We need to be able to calculate the address of the second half of the spill slot without needing a temporary register. By having the entire spill slot aligned to 8, we guarantee that we can fold +4 into the load/store address in RISCVExpandPseudo::expandRV32ZdinxStore and RISCVExpandPseudo::expandRV32ZdinxLoad. Maybe there's a better way to solve that?

I suppose we could solve it very pessimistically for the Global/CPI case by adding a lui/addi pair into the register that gets freed with the first load/store (and addi for the immediate case), but that seems worse than the potential stack re-alignment for the E ABI. Especially since we would presumably have to restore the address if the register isn't dead.

@topperc
Copy link
Collaborator

topperc commented Mar 1, 2024

Another possible solution would be to change the alignment requirement for GPRPair to 32. Perhaps this would need to only be done when the E extension is enabled, so maybe it would require a separate register class. In any case, thought I'd propose these suggestions and we can converge on some solution to this problem.

It should have an alignment of 32, but then we have potential issue spilling a GPRPair. We need to be able to calculate the address of the second half of the spill slot without needing a temporary register. By having the entire spill slot aligned to 8, we guarantee that we can fold +4 into the load/store address in RISCVExpandPseudo::expandRV32ZdinxStore and RISCVExpandPseudo::expandRV32ZdinxLoad. Maybe there's a better way to solve that?

I suppose we could solve it very pessimistically for the Global/CPI case by adding a lui/addi pair into the register that gets freed with the first load/store (and addi for the immediate case), but that seems worse than the potential stack re-alignment for the E ABI. Especially since we would presumably have to restore the address if the register isn't dead.

The Global/CPI case should be independent of the stack align. Those should only come from user loads/stores. I wonder if we should be splitting 64-bit loads/stores during isel instead of generating pseudos.

@nemanjai
Copy link
Member Author

nemanjai commented Mar 5, 2024

...

I suppose we could solve it very pessimistically for the Global/CPI case by adding a lui/addi pair into the register that gets freed with the first load/store (and addi for the immediate case), but that seems worse than the potential stack re-alignment for the E ABI. Especially since we would presumably have to restore the address if the register isn't dead.

The Global/CPI case should be independent of the stack align. Those should only come from user loads/stores. I wonder if we should be splitting 64-bit loads/stores during isel instead of generating pseudos.

I'm not sure I fully follow how this will help resolve this problem. Do you mean that we would set the alignment of the paired registers to 4, expand the loads/stores of f64 to the two loads/stores plus the necessary build_pair/split nodes? Then we can ensure that the loads/stores are able to access each of the pieces just as we do with any other load/store and if we end up spilling any pairs, we won't have the re-alignment issue?

@topperc
Copy link
Collaborator

topperc commented Mar 13, 2024

...

I suppose we could solve it very pessimistically for the Global/CPI case by adding a lui/addi pair into the register that gets freed with the first load/store (and addi for the immediate case), but that seems worse than the potential stack re-alignment for the E ABI. Especially since we would presumably have to restore the address if the register isn't dead.

The Global/CPI case should be independent of the stack align. Those should only come from user loads/stores. I wonder if we should be splitting 64-bit loads/stores during isel instead of generating pseudos.

I'm not sure I fully follow how this will help resolve this problem. Do you mean that we would set the alignment of the paired registers to 4, expand the loads/stores of f64 to the two loads/stores plus the necessary build_pair/split nodes? Then we can ensure that the loads/stores are able to access each of the pieces just as we do with any other load/store and if we end up spilling any pairs, we won't have the re-alignment issue?

Globals/CPI are unrelated to stack realignment issue let's forget about them.

When I mentioned RISCVExpandPseudo::expandRV32ZdinxStore and RISCVExpandPseudo::expandRV32ZdinxLoad I was referring to the else case that assert(isInt<12>(MBBI->getOperand(2).getImm() + 4)); For pair load/stores created by spill/fill we have the alignment set to 2*XLen so that this assert can never fail even if the offset from SP is near the limit of simm12. The +4 will always be an OR for the stack spill since the spill address should be 8 byte aligned. This keeps it from generating a carry.

I wonder if we could change the alignment of the pair to 4, and scavenge a register in the rare case that MBBI->getOperand(2).getImm() is a simm12, but MBBI->getOperand(2).getImm() + 4 isn't.

Or change the spill alignment to 4 and teach eliminateFrameIndex that it can't fold offset >=2044 into the address of a pair load/store and needs to use a separate ADDI instead. It already knows to use an ADDI for >2047 or <-2048 for regular load/store.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants