[RISCV][WIP] Let RA do the CSR saves. #90819

mgudim · 2024-05-02T05:08:50Z

We turn the problem of saving and restoring callee-saved registers efficiently into a register allocation problem. This has the advantage that the register allocator can essentialy do shrink-wrapping on per register basis. Currently, shrink-wrapping pass saves all CSR in the same place which may be suboptimal. Also, improvements to register allocation / coalescing will translate to improvements in shrink-wrapping.

In finalizeLowering() we copy all callee-saved registers from a physical register to a virtual one. In all return blocks we copy do the reverse.

llvmbot · 2024-05-02T05:09:27Z

@llvm/pr-subscribers-backend-risc-v

Author: Mikhail Gudim (mgudim)

Changes

We turn the problem of saving and restoring callee-saved registers efficiently into a register allocation problem. This has the advantage that the register allocator can essentialy do shrink-wrapping on per register basis. Currently, shrink-wrapping pass saves all CSR in the same place which may be suboptimal. Also, improvements to register allocation / coalescing will translate to improvements in shrink-wrapping.

In finalizeLowering() we copy all callee-saved registers from a physical register to a virtual one. In all return blocks we copy do the reverse.

Full diff: https://github.com/llvm/llvm-project/pull/90819.diff

6 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVFrameLowering.cpp (+57-6)
(modified) llvm/lib/Target/RISCV/RISCVFrameLowering.h (+1)
(modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+77)
(modified) llvm/lib/Target/RISCV/RISCVISelLowering.h (+2)
(modified) llvm/lib/Target/RISCV/RISCVSubtarget.cpp (+9)
(modified) llvm/lib/Target/RISCV/RISCVSubtarget.h (+2)

diff --git a/llvm/lib/Target/RISCV/RISCVFrameLowering.cpp b/llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
index cb41577c5d9435..b725bfb56389bc 100644
--- a/llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
@@ -1026,12 +1026,51 @@ RISCVFrameLowering::getFrameIndexReference(const MachineFunction &MF, int FI,
   return Offset;
 }
 
-void RISCVFrameLowering::determineCalleeSaves(MachineFunction &MF,
-                                              BitVector &SavedRegs,
-                                              RegScavenger *RS) const {
-  TargetFrameLowering::determineCalleeSaves(MF, SavedRegs, RS);
-  // Unconditionally spill RA and FP only if the function uses a frame
-  // pointer.
+void RISCVFrameLowering::determineMustCalleeSaves(MachineFunction &MF,
+                                              BitVector &SavedRegs) const {
+  const TargetRegisterInfo &TRI = *MF.getSubtarget().getRegisterInfo();
+
+  // Resize before the early returns. Some backends expect that
+  // SavedRegs.size() == TRI.getNumRegs() after this call even if there are no
+  // saved registers.
+  SavedRegs.resize(TRI.getNumRegs());
+
+  // When interprocedural register allocation is enabled caller saved registers
+  // are preferred over callee saved registers.
+  if (MF.getTarget().Options.EnableIPRA &&
+      isSafeForNoCSROpt(MF.getFunction()) &&
+      isProfitableForNoCSROpt(MF.getFunction()))
+    return;
+
+  // Get the callee saved register list...
+  const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
+
+  // Early exit if there are no callee saved registers.
+  if (!CSRegs || CSRegs[0] == 0)
+    return;
+
+  // In Naked functions we aren't going to save any registers.
+  if (MF.getFunction().hasFnAttribute(Attribute::Naked))
+    return;
+
+  // Noreturn+nounwind functions never restore CSR, so no saves are needed.
+  // Purely noreturn functions may still return through throws, so those must
+  // save CSR for caller exception handlers.
+  //
+  // If the function uses longjmp to break out of its current path of
+  // execution we do not need the CSR spills either: setjmp stores all CSRs
+  // it was called with into the jmp_buf, which longjmp then restores.
+  if (MF.getFunction().hasFnAttribute(Attribute::NoReturn) &&
+        MF.getFunction().hasFnAttribute(Attribute::NoUnwind) &&
+        !MF.getFunction().hasFnAttribute(Attribute::UWTable) &&
+        enableCalleeSaveSkip(MF))
+    return;
+
+  // Functions which call __builtin_unwind_init get all their registers saved.
+  if (MF.callsUnwindInit()) {
+    SavedRegs.set();
+    return;
+  }
   if (hasFP(MF)) {
     SavedRegs.set(RISCV::X1);
     SavedRegs.set(RISCV::X8);
@@ -1041,6 +1080,18 @@ void RISCVFrameLowering::determineCalleeSaves(MachineFunction &MF,
     SavedRegs.set(RISCVABI::getBPReg());
 }
 
+void RISCVFrameLowering::determineCalleeSaves(MachineFunction &MF,
+                                              BitVector &SavedRegs,
+                                              RegScavenger *RS) const {
+  const auto &ST = MF.getSubtarget<RISCVSubtarget>();
+  const Function &F = MF.getFunction();
+  determineMustCalleeSaves(MF, SavedRegs);
+  if (ST.doCSRSavesInRA() && F.doesNotThrow())
+    return;
+
+  TargetFrameLowering::determineCalleeSaves(MF, SavedRegs, RS);
+}
+
 std::pair<int64_t, Align>
 RISCVFrameLowering::assignRVVStackObjectOffsets(MachineFunction &MF) const {
   MachineFrameInfo &MFI = MF.getFrameInfo();
diff --git a/llvm/lib/Target/RISCV/RISCVFrameLowering.h b/llvm/lib/Target/RISCV/RISCVFrameLowering.h
index 28ab4aff3b9d51..d7b9df8bd68515 100644
--- a/llvm/lib/Target/RISCV/RISCVFrameLowering.h
+++ b/llvm/lib/Target/RISCV/RISCVFrameLowering.h
@@ -31,6 +31,7 @@ class RISCVFrameLowering : public TargetFrameLowering {
   StackOffset getFrameIndexReference(const MachineFunction &MF, int FI,
                                      Register &FrameReg) const override;
 
+  void determineMustCalleeSaves(MachineFunction &MF, BitVector &SavedRegs) const;
   void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs,
                             RegScavenger *RS) const override;
 
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 19ef1f2f18ec1a..7978dac4aa7944 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -21314,6 +21314,83 @@ unsigned RISCVTargetLowering::getCustomCtpopCost(EVT VT,
   return isCtpopFast(VT) ? 0 : 1;
 }
 
+void RISCVTargetLowering::finalizeLowering(MachineFunction &MF) const {
+  const Function &F = MF.getFunction();
+  if (!Subtarget.doCSRSavesInRA() || !F.doesNotThrow()) { 
+    TargetLoweringBase::finalizeLowering(MF);
+    return;
+  }
+
+  MachineRegisterInfo &MRI = MF.getRegInfo();
+  const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();
+  const RISCVRegisterInfo &TRI = *Subtarget.getRegisterInfo();
+  const RISCVFrameLowering &TFI = *Subtarget.getFrameLowering();
+
+  SmallVector<MachineBasicBlock *, 4> RestoreMBBs;
+  SmallVector<MachineBasicBlock *, 4> SaveMBBs;
+  SaveMBBs.push_back(&MF.front());
+  for (MachineBasicBlock &MBB : MF) {
+    if (MBB.isReturnBlock())
+      RestoreMBBs.push_back(&MBB);
+  }
+
+  BitVector MustCalleeSavedRegs;
+  TFI.determineMustCalleeSaves(MF, MustCalleeSavedRegs);
+  const MCPhysReg * CSRegs = MF.getRegInfo().getCalleeSavedRegs();
+  SmallVector<MCPhysReg, 4> EligibleRegs;
+  for (int i = 0; CSRegs[i]; ++i) {
+    if (!MustCalleeSavedRegs[i])
+      EligibleRegs.push_back(CSRegs[i]);
+  }
+
+  dbgs() << "EligibleRegs: " << EligibleRegs.size() << "\n";
+  SmallVector<Register, 4> VRegs;
+  for (MachineBasicBlock *SaveMBB : SaveMBBs) {
+    for (MCPhysReg Reg : EligibleRegs) {
+      SaveMBB->addLiveIn(Reg);
+      // TODO: should we use Maximal register class instead?
+      Register VReg = MRI.createVirtualRegister(TRI.getMinimalPhysRegClass(Reg));
+      VRegs.push_back(VReg);
+      BuildMI(
+        *SaveMBB,
+        SaveMBB->begin(),
+        SaveMBB->findDebugLoc(SaveMBB->begin()),
+        TII.get(TargetOpcode::COPY),
+        VReg
+      )
+      .addReg(Reg);
+    }
+  }
+
+  for (MachineBasicBlock *RestoreMBB : RestoreMBBs) {
+    MachineInstr &ReturnMI = RestoreMBB->back();
+    assert(ReturnMI.isReturn() && "Expected return instruction!");
+    auto VRegI = VRegs.begin();
+    for (MCPhysReg Reg : EligibleRegs) {
+      Register VReg = *VRegI;
+      BuildMI(
+        *RestoreMBB,
+        ReturnMI.getIterator(),
+        ReturnMI.getDebugLoc(),
+        TII.get(TargetOpcode::COPY),
+        Reg
+      )
+      .addReg(VReg);
+      ReturnMI.addOperand(
+        MF,
+        MachineOperand::CreateReg(
+          Reg,
+          /*isDef=*/false,
+          /*isImplicit=*/true
+        )
+      );
+      VRegI++;
+    }
+  }
+
+  TargetLoweringBase::finalizeLowering(MF);
+}
+
 bool RISCVTargetLowering::fallBackToDAGISel(const Instruction &Inst) const {
 
   // GISel support is in progress or complete for these opcodes.
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.h b/llvm/lib/Target/RISCV/RISCVISelLowering.h
index 78f99e70c083a7..ea1079af2ead05 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.h
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.h
@@ -853,6 +853,8 @@ class RISCVTargetLowering : public TargetLowering {
 
   bool fallBackToDAGISel(const Instruction &Inst) const override;
 
+  void finalizeLowering(MachineFunction &MF) const override;
+
   bool lowerInterleavedLoad(LoadInst *LI,
                             ArrayRef<ShuffleVectorInst *> Shuffles,
                             ArrayRef<unsigned> Indices,
diff --git a/llvm/lib/Target/RISCV/RISCVSubtarget.cpp b/llvm/lib/Target/RISCV/RISCVSubtarget.cpp
index d3236bb07d56d5..15476fc2d3c583 100644
--- a/llvm/lib/Target/RISCV/RISCVSubtarget.cpp
+++ b/llvm/lib/Target/RISCV/RISCVSubtarget.cpp
@@ -65,6 +65,11 @@ static cl::opt<unsigned> RISCVMinimumJumpTableEntries(
     "riscv-min-jump-table-entries", cl::Hidden,
     cl::desc("Set minimum number of entries to use a jump table on RISCV"));
 
+static cl::opt<bool> RISCVEnableSaveCSRByRA(
+    "riscv-enable-save-csr-in-ra",
+    cl::desc("Let register alloctor do csr saves/restores"),
+    cl::init(false), cl::Hidden);
+
 void RISCVSubtarget::anchor() {}
 
 RISCVSubtarget &
@@ -130,6 +135,10 @@ bool RISCVSubtarget::useConstantPoolForLargeInts() const {
   return !RISCVDisableUsingConstantPoolForLargeInts;
 }
 
+bool RISCVSubtarget::doCSRSavesInRA() const {
+  return RISCVEnableSaveCSRByRA;
+}
+
 unsigned RISCVSubtarget::getMaxBuildIntsCost() const {
   // Loading integer from constant pool needs two instructions (the reason why
   // the minimum cost is 2): an address calculation instruction and a load
diff --git a/llvm/lib/Target/RISCV/RISCVSubtarget.h b/llvm/lib/Target/RISCV/RISCVSubtarget.h
index c880c9e921e0ea..f3d8a70c0df14e 100644
--- a/llvm/lib/Target/RISCV/RISCVSubtarget.h
+++ b/llvm/lib/Target/RISCV/RISCVSubtarget.h
@@ -270,6 +270,8 @@ class RISCVSubtarget : public RISCVGenSubtargetInfo {
 
   bool useConstantPoolForLargeInts() const;
 
+  bool doCSRSavesInRA() const;
+
   // Maximum cost used for building integers, integers will be put into constant
   // pool if exceeded.
   unsigned getMaxBuildIntsCost() const;

github-actions · 2024-05-02T05:11:46Z

✅ With the latest revision this PR passed the C/C++ code formatter.

mgudim · 2024-05-02T05:24:43Z

Currently the shrink wrapping saves / restores all registers at the same point. We noticed that this is not optimal in several places in SPEC. GCC can do this on per-register basis and there is an option to turn this off. By using this flag in gcc we determined that this can make up to 2% difference in dynamic instruction count on gcc benchmark and some smaller impacts in other benchmarks.

Instead of trying to improve shrink-wrapping in llvm, I decided to try this (crazy) idea since it was pretty easy to implement this prototype. I disabled this for functions which can unwind, because this is just an experiment to see if this is viable. Thus all c++ benchmarks that have exceptions were not affected. I got 1.7% improvement on gcc and 1.7% degradation on xz in dynamic instruciton count.

Do you think it's worth pursuing this idea further? To fix the degradations, we need to improve register allocator / coalescing which may have more impact than just improving shrink wrapping.

We turn the problem of saving and restoring callee-saved registers efficiently into a register allocation problem. This has the advantage that the register allocator can essentialy do shrink-wrapping on per register basis. Currently, shrink-wrapping pass saves all CSR in the same place which may be suboptimal. Also, improvements to register allocation / coalescing will translate to improvements in shrink-wrapping. In `finalizeLowering()` we copy all callee-saved registers from a physical register to a virtual one. In all return blocks we copy do the reverse.

topperc · 2024-05-07T22:12:23Z

Will this still work with -msave-restore and cm.push/pop?

topperc · 2024-05-07T22:20:07Z

I fought some old post about improving shrink wrapping. https://lists.llvm.org/pipermail/llvm-dev/2017-August/116137.html not sure how much of it was implemented.

CC @francisvm

mshockwave · 2024-05-07T23:01:34Z

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

+      EligibleRegs.push_back(CSRegs[i]);
+  }
+
+  dbgs() << "EligibleRegs: " << EligibleRegs.size() << "\n";


please wrap this with LLVM_DEBUG.

mshockwave · 2024-05-07T23:06:14Z

Does this update any existing tests?

topperc · 2024-05-07T23:07:37Z

Does this update any existing tests?

It's gated by a command line flag that isn't enabled by default.

francisvm · 2024-05-08T00:00:31Z

I fought some old post about improving shrink wrapping. https://lists.llvm.org/pipermail/llvm-dev/2017-August/116137.html not sure how much of it was implemented.

CC @francisvm

Not much of it was implemented in the end. It turned out that for us, the compile-time impact, the unwinding impact and the various regressions were not worth the speedup in the few benchmarks. I wish I had the old data to justify this, but that was a while ago. The unwinder impact was more significant than I initially thought at that time, since Apple platforms rely heavily on compact unwinding, therefore missing some frames and falling back to eh/debug_frame CFI is a huge penalty, and in some memory/permission-constrained environments, very very hard, or impossible for those that only look at prologues/epilogues to quickly unwind (some crash reporters, for example). That made enabling it by default a hard task, and we never pursued an attribute/flag-based approach.

mgudim · 2024-05-09T14:34:46Z

Will this still work with -msave-restore and cm.push/pop?

I think so. All it does is it makes a callee-saved register a spill, so this should work. I'll run spec with these options to check. If we need to optimize for size, than we can just turn this off with a flag.

mgudim · 2024-05-09T14:38:37Z

At this point I just want to understand if I should continue with this approach or not. If "yes", I'll investigate the degradation on xz, figure out what goes wrong if function unwinds and add tests.

topperc · 2024-05-21T19:59:55Z

At this point I just want to understand if I should continue with this approach or not. If "yes", I'll investigate the degradation on xz, figure out what goes wrong if function unwinds and add tests.

I think you should continue.

I'm also curious how this interacts with -fno-omit-frame-pointer which makes the X8 register reserved, but its still in the callee save list.

wangpc-pp · 2024-05-22T05:33:59Z

I support to continue to investigate where we can reach via this approach, I think this can be a good workaroud to enhance "shrink wrapping". Thanks for your great idea and I will do some evaluations too.

mgudim requested review from asb, preames, mshockwave, topperc, pcwang-thead and wangpc-pp May 2, 2024 05:08

llvmbot added the backend:RISC-V label May 2, 2024

mgudim force-pushed the save_csr_in_ra branch from d83e517 to de5e1cd Compare May 2, 2024 05:26

mshockwave reviewed May 7, 2024

View reviewed changes

wangpc-pp removed the request for review from pcwang-thead May 20, 2024 04:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RISCV][WIP] Let RA do the CSR saves. #90819

[RISCV][WIP] Let RA do the CSR saves. #90819

mgudim commented May 2, 2024

llvmbot commented May 2, 2024

github-actions bot commented May 2, 2024 •

edited

mgudim commented May 2, 2024

topperc commented May 7, 2024

topperc commented May 7, 2024 •

edited

mshockwave May 7, 2024

mshockwave commented May 7, 2024

topperc commented May 7, 2024

francisvm commented May 8, 2024

mgudim commented May 9, 2024

mgudim commented May 9, 2024

topperc commented May 21, 2024

wangpc-pp commented May 22, 2024

[RISCV][WIP] Let RA do the CSR saves. #90819

Are you sure you want to change the base?

[RISCV][WIP] Let RA do the CSR saves. #90819

Conversation

mgudim commented May 2, 2024

llvmbot commented May 2, 2024

github-actions bot commented May 2, 2024 • edited

mgudim commented May 2, 2024

topperc commented May 7, 2024

topperc commented May 7, 2024 • edited

mshockwave May 7, 2024

Choose a reason for hiding this comment

mshockwave commented May 7, 2024

topperc commented May 7, 2024

francisvm commented May 8, 2024

mgudim commented May 9, 2024

mgudim commented May 9, 2024

topperc commented May 21, 2024

wangpc-pp commented May 22, 2024

github-actions bot commented May 2, 2024 •

edited

topperc commented May 7, 2024 •

edited