[RISC-V] Fix crash with late stack realignment requirement #83496

nemanjai · 2024-02-29T22:35:56Z

In order to performa a stack re-alignment, the RISC-V back end requires the frame pointer. Of course, the frame pointer rquires a register to be reserved. The decision on which registers to reserve is made prior to any spilling introduced by RA. If the spill has to spill a register that of a class that requires greater alignment than the stack alignment, this will trigger a requirement to re-align the stack but the FP register hasn't been reserved.
This patch adds a check for any use of a register that could cause stack re-alignment if it were to be spilled.

Currently, the only situation that triggers this issue is when we have -Zdinx with the E extension (for which we align the stack to 4 bytes).

In order to performa a stack re-alignment, the RISC-V back end requires the frame pointer. Of course, the frame pointer rquires a register to be reserved. The decision on which registers to reserve is made prior to any spilling introduced by RA. If the spill has to spill a register that of a class that requires greater alignment than the stack alignment, this will trigger a requirement to re-align the stack but the FP register hasn't been reserved. This patch adds a check for any use of a register that could cause stack re-alignment if it were to be spilled. Currently, the only situation that triggers this issue is when we have -Zdinx with the E extension (for which we align the stack to 4 bytes).

llvmbot · 2024-02-29T22:36:30Z

@llvm/pr-subscribers-backend-risc-v

Author: Nemanja Ivanovic (nemanjai)

Changes

In order to performa a stack re-alignment, the RISC-V back end requires the frame pointer. Of course, the frame pointer rquires a register to be reserved. The decision on which registers to reserve is made prior to any spilling introduced by RA. If the spill has to spill a register that of a class that requires greater alignment than the stack alignment, this will trigger a requirement to re-align the stack but the FP register hasn't been reserved.
This patch adds a check for any use of a register that could cause stack re-alignment if it were to be spilled.

Currently, the only situation that triggers this issue is when we have -Zdinx with the E extension (for which we align the stack to 4 bytes).

Full diff: https://github.com/llvm/llvm-project/pull/83496.diff

3 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVFrameLowering.cpp (+33-1)
(modified) llvm/lib/Target/RISCV/RISCVFrameLowering.h (+3)
(added) llvm/test/CodeGen/RISCV/zdinx-realignment-with-e.ll (+64)

diff --git a/llvm/lib/Target/RISCV/RISCVFrameLowering.cpp b/llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
index 8bac41372b5a83..26cf6a24d15cca 100644
--- a/llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
@@ -306,9 +306,12 @@ static Register getMaxPushPopReg(const MachineFunction &MF,
 // variable sized allocas, or if the frame address is taken.
 bool RISCVFrameLowering::hasFP(const MachineFunction &MF) const {
   const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
-
+  bool HasExtE =
+      MF.getSubtarget<RISCVSubtarget>().getTargetABI() == RISCVABI::ABI_ILP32E;
   const MachineFrameInfo &MFI = MF.getFrameInfo();
+
   return MF.getTarget().Options.DisableFramePointerElim(MF) ||
+         (HasExtE && maxPossibleSpillAlign(MF) > Align(4)) ||
          RegInfo->hasStackRealignment(MF) || MFI.hasVarSizedObjects() ||
          MFI.isFrameAddressTaken();
 }
@@ -1175,6 +1178,35 @@ static unsigned estimateFunctionSizeInBytes(const MachineFunction &MF,
   return FnSize;
 }
 
+Align RISCVFrameLowering::maxPossibleSpillAlign(
+    const MachineFunction &MF) const {
+  if (MaxSpillAlign.contains(&MF))
+    return MaxSpillAlign.at(&MF);
+
+  const MachineRegisterInfo &MRI = MF.getRegInfo();
+  const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();
+  Align CurrMaxAlign = Align(1);
+  // The maximum spill alignment has not yet been computed. Compute it.
+  for (const MachineBasicBlock &MBB : MF) {
+    for (const MachineInstr &MI : MBB) {
+      for (const MachineOperand &MO : MI.operands()) {
+        if (!MO.isReg())
+          continue;
+        Register Reg = MO.getReg();
+        const TargetRegisterClass *RC;
+        if (Reg.isPhysical())
+          RC = TRI->getMinimalPhysRegClass(Reg);
+        else
+          RC = MRI.getRegClassOrNull(MO.getReg());
+        if (RC && TRI->getSpillAlign(*RC) > CurrMaxAlign)
+          CurrMaxAlign = TRI->getSpillAlign(*RC);
+      }
+    }
+  }
+  MaxSpillAlign[&MF] = CurrMaxAlign;
+  return CurrMaxAlign;
+}
+
 void RISCVFrameLowering::processFunctionBeforeFrameFinalized(
     MachineFunction &MF, RegScavenger *RS) const {
   const RISCVRegisterInfo *RegInfo =
diff --git a/llvm/lib/Target/RISCV/RISCVFrameLowering.h b/llvm/lib/Target/RISCV/RISCVFrameLowering.h
index 210f8c1064724a..cb5dbf82dc1b11 100644
--- a/llvm/lib/Target/RISCV/RISCVFrameLowering.h
+++ b/llvm/lib/Target/RISCV/RISCVFrameLowering.h
@@ -13,6 +13,7 @@
 #ifndef LLVM_LIB_TARGET_RISCV_RISCVFRAMELOWERING_H
 #define LLVM_LIB_TARGET_RISCV_RISCVFRAMELOWERING_H
 
+#include "llvm/ADT/DenseMap.h"
 #include "llvm/CodeGen/TargetFrameLowering.h"
 #include "llvm/Support/TypeSize.h"
 
@@ -37,6 +38,7 @@ class RISCVFrameLowering : public TargetFrameLowering {
   void processFunctionBeforeFrameFinalized(MachineFunction &MF,
                                            RegScavenger *RS) const override;
 
+  Align maxPossibleSpillAlign(const MachineFunction &MF) const;
   bool hasFP(const MachineFunction &MF) const override;
 
   bool hasBP(const MachineFunction &MF) const;
@@ -82,6 +84,7 @@ class RISCVFrameLowering : public TargetFrameLowering {
 
 protected:
   const RISCVSubtarget &STI;
+  mutable llvm::DenseMap<const MachineFunction *, Align> MaxSpillAlign;
 
 private:
   void determineFrameLayout(MachineFunction &MF) const;
diff --git a/llvm/test/CodeGen/RISCV/zdinx-realignment-with-e.ll b/llvm/test/CodeGen/RISCV/zdinx-realignment-with-e.ll
new file mode 100644
index 00000000000000..1b0e6d1967ff92
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/zdinx-realignment-with-e.ll
@@ -0,0 +1,64 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 2
+; RUN: llc --mtriple=riscv32 --mattr=+e,+zdinx --target-abi ilp32e \
+; RUN:   --verify-machineinstrs < %s | FileCheck %s
+define void @func(ptr %__first, ptr %__start, ptr %add.ptr23) #0 {
+; CHECK-LABEL: func:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    addi sp, sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw ra, 20(sp) # 4-byte Folded Spill
+; CHECK-NEXT:    sw s0, 16(sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset ra, -4
+; CHECK-NEXT:    .cfi_offset s0, -8
+; CHECK-NEXT:    addi s0, sp, 24
+; CHECK-NEXT:    .cfi_def_cfa s0, 0
+; CHECK-NEXT:    andi sp, sp, -8
+; CHECK-NEXT:    lui a1, %hi(.LCPI0_0)
+; CHECK-NEXT:    lw a4, %lo(.LCPI0_0)(a1)
+; CHECK-NEXT:    lw a5, %lo(.LCPI0_0+4)(a1)
+; CHECK-NEXT:    sw a4, 8(sp)
+; CHECK-NEXT:    sw a5, 12(sp)
+; CHECK-NEXT:    lui a1, %hi(.LCPI0_1)
+; CHECK-NEXT:    lw a4, %lo(.LCPI0_1)(a1)
+; CHECK-NEXT:    lw a5, %lo(.LCPI0_1+4)(a1)
+; CHECK-NEXT:    sw a4, 0(sp)
+; CHECK-NEXT:    sw a5, 4(sp)
+; CHECK-NEXT:  .LBB0_1: # %do.body
+; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
+; CHECK-NEXT:    lw a4, 0(zero)
+; CHECK-NEXT:    lw a5, 4(zero)
+; CHECK-NEXT:    fcvt.d.w t1, zero
+; CHECK-NEXT:    sw t1, 0(a0)
+; CHECK-NEXT:    sw t2, 4(a0)
+; CHECK-NEXT:    lw t1, 8(sp)
+; CHECK-NEXT:    lw t2, 12(sp)
+; CHECK-NEXT:    flt.d a1, t1, a4
+; CHECK-NEXT:    neg a1, a1
+; CHECK-NEXT:    and a1, a1, a2
+; CHECK-NEXT:    lw a4, 0(a1)
+; CHECK-NEXT:    lw a5, 4(a1)
+; CHECK-NEXT:    lw t1, 0(sp)
+; CHECK-NEXT:    lw t2, 4(sp)
+; CHECK-NEXT:    flt.d a1, a4, t1
+; CHECK-NEXT:    beqz a1, .LBB0_1
+; CHECK-NEXT:  # %bb.2: # %do.end
+; CHECK-NEXT:    addi sp, s0, -24
+; CHECK-NEXT:    lw ra, 20(sp) # 4-byte Folded Reload
+; CHECK-NEXT:    lw s0, 16(sp) # 4-byte Folded Reload
+; CHECK-NEXT:    addi sp, sp, 24
+; CHECK-NEXT:    ret
+entry:
+  br label %do.body
+
+do.body:                                          ; preds = %do.body, %entry
+  store double 0.000000e+00, ptr %__first, align 8
+  %0 = load double, ptr null, align 8
+  %cmp.i51 = fcmp olt double 0x7FF8000000000000, %0
+  %__child_i.2 = select i1 %cmp.i51, ptr %add.ptr23, ptr null
+  %1 = load double, ptr %__child_i.2, align 8
+  %cmp.i52 = fcmp olt double %1, 1.000000e+00
+  br i1 %cmp.i52, label %do.end, label %do.body
+
+do.end:                                           ; preds = %do.body
+  ret void
+}

nemanjai

Another possible solution would be to change the alignment requirement for GPRPair to 32. Perhaps this would need to only be done when the E extension is enabled, so maybe it would require a separate register class.
In any case, thought I'd propose these suggestions and we can converge on some solution to this problem.

nemanjai · 2024-02-29T22:38:10Z

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp

+          continue;
+        Register Reg = MO.getReg();
+        const TargetRegisterClass *RC;
+        if (Reg.isPhysical())


Strictly speaking, this is not needed to resolve the observed problem as we won't insert a spill of a physical register. However, I implemented it this way to catch any use that might require a realignment - I guess a paranoid position.

Decided to remove it after all. It fails on implicit defs that don't have a register class anyway.

topperc · 2024-03-01T01:16:31Z

Another possible solution would be to change the alignment requirement for GPRPair to 32. Perhaps this would need to only be done when the E extension is enabled, so maybe it would require a separate register class. In any case, thought I'd propose these suggestions and we can converge on some solution to this problem.

It should have an alignment of 32, but then we have potential issue spilling a GPRPair. We need to be able to calculate the address of the second half of the spill slot without needing a temporary register. By having the entire spill slot aligned to 8, we guarantee that we can fold +4 into the load/store address in RISCVExpandPseudo::expandRV32ZdinxStore and RISCVExpandPseudo::expandRV32ZdinxLoad. Maybe there's a better way to solve that?

wangpc-pp

Thanks for fixing this problem! I know there will be a lot of problems when ILP32E meets with latest ratified extensions…

wangpc-pp · 2024-03-01T03:31:53Z

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp

@@ -306,9 +306,12 @@ static Register getMaxPushPopReg(const MachineFunction &MF,
 // variable sized allocas, or if the frame address is taken.
 bool RISCVFrameLowering::hasFP(const MachineFunction &MF) const {
  const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
-
+  bool HasExtE =


HasExtE -> HasEABI?

wangpc-pp · 2024-03-01T03:33:38Z

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp

+    return MaxSpillAlign.at(&MF);
+
+  const MachineRegisterInfo &MRI = MF.getRegInfo();
+  const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();


Suggested change

const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();

const TargetRegisterInfo *TRI = STI.getRegisterInfo();

nemanjai · 2024-03-01T17:16:34Z

Another possible solution would be to change the alignment requirement for GPRPair to 32. Perhaps this would need to only be done when the E extension is enabled, so maybe it would require a separate register class. In any case, thought I'd propose these suggestions and we can converge on some solution to this problem.

It should have an alignment of 32, but then we have potential issue spilling a GPRPair. We need to be able to calculate the address of the second half of the spill slot without needing a temporary register. By having the entire spill slot aligned to 8, we guarantee that we can fold +4 into the load/store address in RISCVExpandPseudo::expandRV32ZdinxStore and RISCVExpandPseudo::expandRV32ZdinxLoad. Maybe there's a better way to solve that?

I suppose we could solve it very pessimistically for the Global/CPI case by adding a lui/addi pair into the register that gets freed with the first load/store (and addi for the immediate case), but that seems worse than the potential stack re-alignment for the E ABI. Especially since we would presumably have to restore the address if the register isn't dead.

topperc · 2024-03-01T17:35:52Z

Another possible solution would be to change the alignment requirement for GPRPair to 32. Perhaps this would need to only be done when the E extension is enabled, so maybe it would require a separate register class. In any case, thought I'd propose these suggestions and we can converge on some solution to this problem.

It should have an alignment of 32, but then we have potential issue spilling a GPRPair. We need to be able to calculate the address of the second half of the spill slot without needing a temporary register. By having the entire spill slot aligned to 8, we guarantee that we can fold +4 into the load/store address in RISCVExpandPseudo::expandRV32ZdinxStore and RISCVExpandPseudo::expandRV32ZdinxLoad. Maybe there's a better way to solve that?

I suppose we could solve it very pessimistically for the Global/CPI case by adding a lui/addi pair into the register that gets freed with the first load/store (and addi for the immediate case), but that seems worse than the potential stack re-alignment for the E ABI. Especially since we would presumably have to restore the address if the register isn't dead.

The Global/CPI case should be independent of the stack align. Those should only come from user loads/stores. I wonder if we should be splitting 64-bit loads/stores during isel instead of generating pseudos.

nemanjai · 2024-03-05T14:19:16Z

...

I suppose we could solve it very pessimistically for the Global/CPI case by adding a lui/addi pair into the register that gets freed with the first load/store (and addi for the immediate case), but that seems worse than the potential stack re-alignment for the E ABI. Especially since we would presumably have to restore the address if the register isn't dead.

The Global/CPI case should be independent of the stack align. Those should only come from user loads/stores. I wonder if we should be splitting 64-bit loads/stores during isel instead of generating pseudos.

I'm not sure I fully follow how this will help resolve this problem. Do you mean that we would set the alignment of the paired registers to 4, expand the loads/stores of f64 to the two loads/stores plus the necessary build_pair/split nodes? Then we can ensure that the loads/stores are able to access each of the pieces just as we do with any other load/store and if we end up spilling any pairs, we won't have the re-alignment issue?

topperc · 2024-03-13T19:53:50Z

...

I suppose we could solve it very pessimistically for the Global/CPI case by adding a lui/addi pair into the register that gets freed with the first load/store (and addi for the immediate case), but that seems worse than the potential stack re-alignment for the E ABI. Especially since we would presumably have to restore the address if the register isn't dead.

The Global/CPI case should be independent of the stack align. Those should only come from user loads/stores. I wonder if we should be splitting 64-bit loads/stores during isel instead of generating pseudos.

I'm not sure I fully follow how this will help resolve this problem. Do you mean that we would set the alignment of the paired registers to 4, expand the loads/stores of f64 to the two loads/stores plus the necessary build_pair/split nodes? Then we can ensure that the loads/stores are able to access each of the pieces just as we do with any other load/store and if we end up spilling any pairs, we won't have the re-alignment issue?

Globals/CPI are unrelated to stack realignment issue let's forget about them.

When I mentioned RISCVExpandPseudo::expandRV32ZdinxStore and RISCVExpandPseudo::expandRV32ZdinxLoad I was referring to the else case that assert(isInt<12>(MBBI->getOperand(2).getImm() + 4)); For pair load/stores created by spill/fill we have the alignment set to 2*XLen so that this assert can never fail even if the offset from SP is near the limit of simm12. The +4 will always be an OR for the stack spill since the spill address should be 8 byte aligned. This keeps it from generating a carry.

I wonder if we could change the alignment of the pair to 4, and scavenge a register in the rare case that MBBI->getOperand(2).getImm() is a simm12, but MBBI->getOperand(2).getImm() + 4 isn't.

Or change the spill alignment to 4 and teach eliminateFrameIndex that it can't fold offset >=2044 into the address of a pair load/store and needs to use a separate ADDI instead. It already knows to use an ADDI for >2047 or <-2048 for regular load/store.

nemanjai requested a review from wangpc-pp February 29, 2024 22:36

llvmbot added the backend:RISC-V label Feb 29, 2024

nemanjai requested review from asb and topperc February 29, 2024 22:36

nemanjai commented Feb 29, 2024

View reviewed changes

Nemanja Ivanovic added 2 commits March 1, 2024 01:44

Remove the checks for physical registers.

d2ab68d

Traverse list of virtual regs instead of instructions.

6d7c57b

wangpc-pp reviewed Mar 1, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RISC-V] Fix crash with late stack realignment requirement #83496

[RISC-V] Fix crash with late stack realignment requirement #83496

nemanjai commented Feb 29, 2024

llvmbot commented Feb 29, 2024

nemanjai left a comment

nemanjai Feb 29, 2024

nemanjai Mar 1, 2024

topperc commented Mar 1, 2024

wangpc-pp left a comment

wangpc-pp Mar 1, 2024

wangpc-pp Mar 1, 2024

nemanjai commented Mar 1, 2024

topperc commented Mar 1, 2024

nemanjai commented Mar 5, 2024

topperc commented Mar 13, 2024

	const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();
	const TargetRegisterInfo *TRI = STI.getRegisterInfo();

[RISC-V] Fix crash with late stack realignment requirement #83496

Are you sure you want to change the base?

[RISC-V] Fix crash with late stack realignment requirement #83496

Conversation

nemanjai commented Feb 29, 2024

llvmbot commented Feb 29, 2024

nemanjai left a comment

Choose a reason for hiding this comment

nemanjai Feb 29, 2024

Choose a reason for hiding this comment

nemanjai Mar 1, 2024

Choose a reason for hiding this comment

topperc commented Mar 1, 2024

wangpc-pp left a comment

Choose a reason for hiding this comment

wangpc-pp Mar 1, 2024

Choose a reason for hiding this comment

wangpc-pp Mar 1, 2024

Choose a reason for hiding this comment

nemanjai commented Mar 1, 2024

topperc commented Mar 1, 2024

nemanjai commented Mar 5, 2024

topperc commented Mar 13, 2024