[AMDGPU] Implement -amdgpu-spill-cfi-saved-regs #164725

slinder1 · 2025-10-22T22:28:11Z

These spills need special CFI anyway, so implementing them directly
where CFI is emitted avoids the need to invent a mechanism to track them
from ISel.

Co-authored-by: Scott Linder scott.linder@amd.com
Co-authored-by: Venkata Ramanaiah Nalamothu VenkataRamanaiah.Nalamothu@amd.com

These spills need special CFI anyway, so implementing them directly where CFI is emitted avoids the need to invent a mechanism to track them from ISel. Co-authored-by: Scott Linder <scott.linder@amd.com> Co-authored-by: Venkata Ramanaiah Nalamothu <VenkataRamanaiah.Nalamothu@amd.com>

slinder1 · 2025-10-22T22:28:27Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

[AMDGPU][MC] Replace shifted registers in CFI instructions #164726
[AMDGPU] Implement -amdgpu-spill-cfi-saved-regs #164725 👈 (View in Graphite)
[AMDGPU] Implement CFI for CSR spills #164724
[AMDGPU] Implement CFI for non-kernel functions #164723
[AMDGPU] Emit entry function Dwarf CFI #164722
[MC][Dwarf] Add custom CFI pseudo-ops for use in AMDGPU #164721 : 1 other dependent PR (#166464 )
[MC] Use a variant to hold MCCFIInstruction state (NFC) #164720
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

llvmbot · 2025-10-22T22:39:50Z

@llvm/pr-subscribers-backend-amdgpu

Author: Scott Linder (slinder1)

Changes

These spills need special CFI anyway, so implementing them directly
where CFI is emitted avoids the need to invent a mechanism to track them
from ISel.

Co-authored-by: Scott Linder <scott.linder@amd.com>
Co-authored-by: Venkata Ramanaiah Nalamothu <VenkataRamanaiah.Nalamothu@amd.com>

Patch is 153.06 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/164725.diff

8 Files Affected:

(modified) llvm/lib/Target/AMDGPU/SIFrameLowering.cpp (+35-10)
(modified) llvm/lib/Target/AMDGPU/SIFrameLowering.h (+7)
(modified) llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp (+1-1)
(modified) llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp (+2-1)
(modified) llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h (+11-2)
(modified) llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp (+9)
(modified) llvm/lib/Target/AMDGPU/SIRegisterInfo.h (+2)
(added) llvm/test/CodeGen/AMDGPU/amdgpu-spill-cfi-saved-regs.ll (+2556)

diff --git a/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp b/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
index bbde3c49f64c6..0c5d1445d36e8 100644
--- a/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
@@ -671,12 +671,21 @@ void SIFrameLowering::emitEntryFunctionFlatScratchInit(
 }
 
 // Note SGPRSpill stack IDs should only be used for SGPR spilling to VGPRs, not
-// memory. They should have been removed by now.
-static bool allStackObjectsAreDead(const MachineFrameInfo &MFI) {
+// memory. They should have been removed by now, except CFI Saved Reg spills.
+static bool allStackObjectsAreDead(const MachineFunction &MF) {
+  const MachineFrameInfo &MFI = MF.getFrameInfo();
+  const SIMachineFunctionInfo *FuncInfo = MF.getInfo<SIMachineFunctionInfo>();
   for (int I = MFI.getObjectIndexBegin(), E = MFI.getObjectIndexEnd();
        I != E; ++I) {
-    if (!MFI.isDeadObjectIndex(I))
+    if (!MFI.isDeadObjectIndex(I)) {
+      // determineCalleeSaves() might have added the SGPRSpill stack IDs for
+      // CFI saves into scratch VGPR, ignore them
+      if (MFI.getStackID(I) == TargetStackID::SGPRSpill &&
+          FuncInfo->checkIndexInPrologEpilogSGPRSpills(I)) {
+        continue;
+      }
       return false;
+    }
   }
 
   return true;
@@ -696,8 +705,8 @@ Register SIFrameLowering::getEntryFunctionReservedScratchRsrcReg(
 
   Register ScratchRsrcReg = MFI->getScratchRSrcReg();
 
-  if (!ScratchRsrcReg || (!MRI.isPhysRegUsed(ScratchRsrcReg) &&
-                          allStackObjectsAreDead(MF.getFrameInfo())))
+  if (!ScratchRsrcReg ||
+      (!MRI.isPhysRegUsed(ScratchRsrcReg) && allStackObjectsAreDead(MF)))
     return Register();
 
   if (ST.hasSGPRInitBug() ||
@@ -925,7 +934,7 @@ void SIFrameLowering::emitEntryFunctionPrologue(MachineFunction &MF,
   bool NeedsFlatScratchInit =
       MFI->getUserSGPRInfo().hasFlatScratchInit() &&
       (MRI.isPhysRegUsed(AMDGPU::FLAT_SCR) || FrameInfo.hasCalls() ||
-       (!allStackObjectsAreDead(FrameInfo) && ST.enableFlatScratch()));
+       (!allStackObjectsAreDead(MF) && ST.enableFlatScratch()));
 
   if ((NeedsFlatScratchInit || ScratchRsrcReg) &&
       PreloadedScratchWaveOffsetReg && !ST.flatScratchIsArchitected()) {
@@ -1306,6 +1315,11 @@ void SIFrameLowering::emitCSRSpillStores(MachineFunction &MF,
         LiveUnits.addReg(Reg);
     }
   }
+
+  // Remove the spill entry created for EXEC. It is needed only for CFISaves in
+  // the prologue.
+  if (TRI.isCFISavedRegsSpillEnabled())
+    FuncInfo->removePrologEpilogSGPRSpillEntry(TRI.getExec());
 }
 
 void SIFrameLowering::emitCSRSpillRestores(
@@ -1789,14 +1803,14 @@ void SIFrameLowering::processFunctionBeforeFrameFinalized(
   // can. Any remaining SGPR spills will go to memory, so move them back to the
   // default stack.
   bool HaveSGPRToVMemSpill =
-      FuncInfo->removeDeadFrameIndices(MFI, /*ResetSGPRSpillStackIDs*/ true);
+      FuncInfo->removeDeadFrameIndices(MF, /*ResetSGPRSpillStackIDs*/ true);
   assert(allSGPRSpillsAreDead(MF) &&
          "SGPR spill should have been removed in SILowerSGPRSpills");
 
   // FIXME: The other checks should be redundant with allStackObjectsAreDead,
   // but currently hasNonSpillStackObjects is set only from source
   // allocas. Stack temps produced from legalization are not counted currently.
-  if (!allStackObjectsAreDead(MFI)) {
+  if (!allStackObjectsAreDead(MF)) {
     assert(RS && "RegScavenger required if spilling");
 
     // Add an emergency spill slot
@@ -1896,6 +1910,18 @@ void SIFrameLowering::determinePrologEpilogSGPRSaves(
     MFI->setSGPRForEXECCopy(AMDGPU::NoRegister);
   }
 
+  if (TRI->isCFISavedRegsSpillEnabled()) {
+    Register Exec = TRI->getExec();
+    assert(!MFI->hasPrologEpilogSGPRSpillEntry(Exec) &&
+           "Re-reserving spill slot for EXEC");
+    // FIXME: Machine Copy Propagation currently optimizes away the EXEC copy to
+    // the scratch as we emit it only in the prolog. This optimization should
+    // not happen for frame related instructions. Until this is fixed ignore
+    // copy to scratch SGPR.
+    getVGPRSpillLaneOrTempRegister(MF, LiveUnits, Exec, RC,
+                                   /*IncludeScratchCopy=*/false);
+  }
+
   // hasFP only knows about stack objects that already exist. We're now
   // determining the stack slots that will be created, so we have to predict
   // them. Stack objects force FP usage with calls.
@@ -1905,8 +1931,7 @@ void SIFrameLowering::determinePrologEpilogSGPRSaves(
   //
   // FIXME: Is this really hasReservedCallFrame?
   const bool WillHaveFP =
-      FrameInfo.hasCalls() &&
-      (SavedVGPRs.any() || !allStackObjectsAreDead(FrameInfo));
+      FrameInfo.hasCalls() && (SavedVGPRs.any() || !allStackObjectsAreDead(MF));
 
   if (WillHaveFP || hasFP(MF)) {
     Register FramePtrReg = MFI->getFrameOffsetReg();
diff --git a/llvm/lib/Target/AMDGPU/SIFrameLowering.h b/llvm/lib/Target/AMDGPU/SIFrameLowering.h
index 2b716db0b7a22..526404eb83b4f 100644
--- a/llvm/lib/Target/AMDGPU/SIFrameLowering.h
+++ b/llvm/lib/Target/AMDGPU/SIFrameLowering.h
@@ -114,6 +114,13 @@ class SIFrameLowering final : public AMDGPUFrameLowering {
 public:
   bool requiresStackPointerReference(const MachineFunction &MF) const;
 
+  /// If '-amdgpu-spill-cfi-saved-regs' is enabled, emit RA/EXEC spills to
+  /// a free VGPR (lanes) or memory and corresponding CFI rules.
+  void emitCFISavedRegSpills(MachineFunction &MF, MachineBasicBlock &MBB,
+                             MachineBasicBlock::iterator MBBI,
+                             LiveRegUnits &LiveRegs,
+                             bool emitSpillsToMem) const;
+
   /// Create a CFI index for CFIInst and build a MachineInstr around it.
   MachineInstr *
   buildCFI(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI,
diff --git a/llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp b/llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
index 62386da94d854..57ff52334a470 100644
--- a/llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
+++ b/llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
@@ -531,7 +531,7 @@ bool SILowerSGPRSpills::run(MachineFunction &MF) {
     // free frame index ids by the later pass(es) like "stack slot coloring"
     // which in turn could mess-up with the book keeping of "frame index to VGPR
     // lane".
-    FuncInfo->removeDeadFrameIndices(MFI, /*ResetSGPRSpillStackIDs*/ false);
+    FuncInfo->removeDeadFrameIndices(MF, /*ResetSGPRSpillStackIDs*/ false);
 
     MadeChange = true;
   }
diff --git a/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp b/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
index b398db4f7caff..2c275a85440d9 100644
--- a/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
@@ -566,7 +566,8 @@ bool SIMachineFunctionInfo::allocateVGPRSpillToAGPR(MachineFunction &MF,
 }
 
 bool SIMachineFunctionInfo::removeDeadFrameIndices(
-    MachineFrameInfo &MFI, bool ResetSGPRSpillStackIDs) {
+    MachineFunction &MF, bool ResetSGPRSpillStackIDs) {
+  MachineFrameInfo &MFI = MF.getFrameInfo();
   // Remove dead frame indices from function frame, however keep FP & BP since
   // spills for them haven't been inserted yet. And also make sure to remove the
   // frame indices from `SGPRSpillsToVirtualVGPRLanes` data structure,
diff --git a/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h b/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
index 2c1a13c345aac..8ca34d10de0ef 100644
--- a/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
+++ b/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
@@ -752,6 +752,16 @@ class SIMachineFunctionInfo final : public AMDGPUMachineFunction,
                    }) != PrologEpilogSGPRSpills.end();
   }
 
+  // Remove if an entry created for \p Reg.
+  void removePrologEpilogSGPRSpillEntry(Register Reg) {
+    auto I = find_if(PrologEpilogSGPRSpills,
+                     [&Reg](const auto &Spill) { return Spill.first == Reg; });
+    if (I == PrologEpilogSGPRSpills.end())
+      return;
+
+    PrologEpilogSGPRSpills.erase(I);
+  }
+
   const PrologEpilogSGPRSaveRestoreInfo &
   getPrologEpilogSGPRSaveRestoreInfo(Register Reg) const {
     const auto *I = find_if(PrologEpilogSGPRSpills, [&Reg](const auto &Spill) {
@@ -830,8 +840,7 @@ class SIMachineFunctionInfo final : public AMDGPUMachineFunction,
 
   /// If \p ResetSGPRSpillStackIDs is true, reset the stack ID from sgpr-spill
   /// to the default stack.
-  bool removeDeadFrameIndices(MachineFrameInfo &MFI,
-                              bool ResetSGPRSpillStackIDs);
+  bool removeDeadFrameIndices(MachineFunction &MF, bool ResetSGPRSpillStackIDs);
 
   int getScavengeFI(MachineFrameInfo &MFI, const SIRegisterInfo &TRI);
   std::optional<int> getOptionalScavengeFI() const { return ScavengeFI; }
diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
index 77608a4cfc751..9677c6cd7806c 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
@@ -35,6 +35,11 @@ static cl::opt<bool> EnableSpillSGPRToVGPR(
   cl::ReallyHidden,
   cl::init(true));
 
+static cl::opt<bool> EnableSpillCFISavedRegs(
+    "amdgpu-spill-cfi-saved-regs",
+    cl::desc("Enable spilling the registers required for CFI emission"),
+    cl::ReallyHidden, cl::init(false), cl::ZeroOrMore);
+
 std::array<std::vector<int16_t>, 32> SIRegisterInfo::RegSplitParts;
 std::array<std::array<uint16_t, 32>, 9> SIRegisterInfo::SubRegFromChannelTable;
 
@@ -559,6 +564,10 @@ unsigned SIRegisterInfo::getSubRegFromChannel(unsigned Channel,
   return SubRegFromChannelTable[NumRegIndex - 1][Channel];
 }
 
+bool SIRegisterInfo::isCFISavedRegsSpillEnabled() const {
+  return EnableSpillCFISavedRegs;
+}
+
 MCRegister
 SIRegisterInfo::getAlignedHighSGPRForRC(const MachineFunction &MF,
                                         const unsigned Align,
diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.h b/llvm/lib/Target/AMDGPU/SIRegisterInfo.h
index 2dae5f0eb1c69..749583722e3b0 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.h
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.h
@@ -80,6 +80,8 @@ class SIRegisterInfo final : public AMDGPUGenRegisterInfo {
     return SpillSGPRToVGPR;
   }
 
+  bool isCFISavedRegsSpillEnabled() const;
+
   /// Return the largest available SGPR aligned to \p Align for the register
   /// class \p RC.
   MCRegister getAlignedHighSGPRForRC(const MachineFunction &MF,
diff --git a/llvm/test/CodeGen/AMDGPU/amdgpu-spill-cfi-saved-regs.ll b/llvm/test/CodeGen/AMDGPU/amdgpu-spill-cfi-saved-regs.ll
new file mode 100644
index 0000000000000..c804c75ae7d2c
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/amdgpu-spill-cfi-saved-regs.ll
@@ -0,0 +1,2556 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-spill-cfi-saved-regs -verify-machineinstrs -o - %s | FileCheck --check-prefixes=CHECK,WAVE64 %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -mattr=+wavefrontsize32,-wavefrontsize64 -amdgpu-spill-cfi-saved-regs -verify-machineinstrs -o - %s | FileCheck --check-prefixes=CHECK,WAVE32 %s
+
+define protected amdgpu_kernel void @kern() #0 {
+; CHECK-LABEL: kern:
+; CHECK:       .Lfunc_begin0:
+; CHECK-NEXT:    .cfi_sections .debug_frame
+; CHECK-NEXT:    .cfi_startproc
+; CHECK-NEXT:  ; %bb.0: ; %entry
+; CHECK-NEXT:    .cfi_escape 0x0f, 0x04, 0x30, 0x36, 0xe9, 0x02 ;
+; CHECK-NEXT:    .cfi_undefined 16
+; CHECK-NEXT:    s_endpgm
+entry:
+  ret void
+}
+
+define hidden void @func_saved_in_clobbered_vgpr() #0 {
+; WAVE64-LABEL: func_saved_in_clobbered_vgpr:
+; WAVE64:       .Lfunc_begin1:
+; WAVE64-NEXT:    .cfi_startproc
+; WAVE64-NEXT:  ; %bb.0: ; %entry
+; WAVE64-NEXT:    .cfi_llvm_def_aspace_cfa 64, 0, 6
+; WAVE64-NEXT:    .cfi_llvm_register_pair 16, 62, 32, 63, 32
+; WAVE64-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; WAVE64-NEXT:    s_xor_saveexec_b64 s[4:5], -1
+; WAVE64-NEXT:    buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
+; WAVE64-NEXT:    .cfi_offset 2560, 0
+; WAVE64-NEXT:    s_mov_b64 exec, s[4:5]
+; WAVE64-NEXT:    v_writelane_b32 v0, exec_lo, 0
+; WAVE64-NEXT:    v_writelane_b32 v0, exec_hi, 1
+; WAVE64-NEXT:    .cfi_llvm_vector_registers 17, 2560, 0, 32, 2560, 1, 32
+; WAVE64-NEXT:    s_xor_saveexec_b64 s[4:5], -1
+; WAVE64-NEXT:    buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
+; WAVE64-NEXT:    s_mov_b64 exec, s[4:5]
+; WAVE64-NEXT:    s_waitcnt vmcnt(0)
+; WAVE64-NEXT:    s_setpc_b64 s[30:31]
+;
+; WAVE32-LABEL: func_saved_in_clobbered_vgpr:
+; WAVE32:       .Lfunc_begin1:
+; WAVE32-NEXT:    .cfi_startproc
+; WAVE32-NEXT:  ; %bb.0: ; %entry
+; WAVE32-NEXT:    .cfi_llvm_def_aspace_cfa 64, 0, 6
+; WAVE32-NEXT:    .cfi_llvm_register_pair 16, 62, 32, 63, 32
+; WAVE32-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; WAVE32-NEXT:    s_xor_saveexec_b32 s4, -1
+; WAVE32-NEXT:    buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
+; WAVE32-NEXT:    .cfi_offset 1536, 0
+; WAVE32-NEXT:    s_waitcnt_depctr 0xffe3
+; WAVE32-NEXT:    s_mov_b32 exec_lo, s4
+; WAVE32-NEXT:    v_writelane_b32 v0, exec_lo, 0
+; WAVE32-NEXT:    .cfi_llvm_vector_registers 1, 1536, 0, 32
+; WAVE32-NEXT:    s_xor_saveexec_b32 s4, -1
+; WAVE32-NEXT:    buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
+; WAVE32-NEXT:    s_waitcnt_depctr 0xffe3
+; WAVE32-NEXT:    s_mov_b32 exec_lo, s4
+; WAVE32-NEXT:    s_waitcnt vmcnt(0)
+; WAVE32-NEXT:    s_setpc_b64 s[30:31]
+entry:
+  ret void
+}
+
+; Check that the option causes a CSR VGPR to spill when needed.
+define hidden void @func_saved_in_preserved_vgpr() #0 {
+; WAVE64-LABEL: func_saved_in_preserved_vgpr:
+; WAVE64:       .Lfunc_begin2:
+; WAVE64-NEXT:    .cfi_startproc
+; WAVE64-NEXT:  ; %bb.0: ; %entry
+; WAVE64-NEXT:    .cfi_llvm_def_aspace_cfa 64, 0, 6
+; WAVE64-NEXT:    .cfi_llvm_register_pair 16, 62, 32, 63, 32
+; WAVE64-NEXT:    .cfi_undefined 2560
+; WAVE64-NEXT:    .cfi_undefined 2561
+; WAVE64-NEXT:    .cfi_undefined 2562
+; WAVE64-NEXT:    .cfi_undefined 2563
+; WAVE64-NEXT:    .cfi_undefined 2564
+; WAVE64-NEXT:    .cfi_undefined 2565
+; WAVE64-NEXT:    .cfi_undefined 2566
+; WAVE64-NEXT:    .cfi_undefined 2567
+; WAVE64-NEXT:    .cfi_undefined 2568
+; WAVE64-NEXT:    .cfi_undefined 2569
+; WAVE64-NEXT:    .cfi_undefined 2570
+; WAVE64-NEXT:    .cfi_undefined 2571
+; WAVE64-NEXT:    .cfi_undefined 2572
+; WAVE64-NEXT:    .cfi_undefined 2573
+; WAVE64-NEXT:    .cfi_undefined 2574
+; WAVE64-NEXT:    .cfi_undefined 2575
+; WAVE64-NEXT:    .cfi_undefined 2576
+; WAVE64-NEXT:    .cfi_undefined 2577
+; WAVE64-NEXT:    .cfi_undefined 2578
+; WAVE64-NEXT:    .cfi_undefined 2579
+; WAVE64-NEXT:    .cfi_undefined 2580
+; WAVE64-NEXT:    .cfi_undefined 2581
+; WAVE64-NEXT:    .cfi_undefined 2582
+; WAVE64-NEXT:    .cfi_undefined 2583
+; WAVE64-NEXT:    .cfi_undefined 2584
+; WAVE64-NEXT:    .cfi_undefined 2585
+; WAVE64-NEXT:    .cfi_undefined 2586
+; WAVE64-NEXT:    .cfi_undefined 2587
+; WAVE64-NEXT:    .cfi_undefined 2588
+; WAVE64-NEXT:    .cfi_undefined 2589
+; WAVE64-NEXT:    .cfi_undefined 2590
+; WAVE64-NEXT:    .cfi_undefined 2591
+; WAVE64-NEXT:    .cfi_undefined 2592
+; WAVE64-NEXT:    .cfi_undefined 2593
+; WAVE64-NEXT:    .cfi_undefined 2594
+; WAVE64-NEXT:    .cfi_undefined 2595
+; WAVE64-NEXT:    .cfi_undefined 2596
+; WAVE64-NEXT:    .cfi_undefined 2597
+; WAVE64-NEXT:    .cfi_undefined 2598
+; WAVE64-NEXT:    .cfi_undefined 2599
+; WAVE64-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; WAVE64-NEXT:    s_or_saveexec_b64 s[4:5], -1
+; WAVE64-NEXT:    buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
+; WAVE64-NEXT:    .cfi_offset 2600, 0
+; WAVE64-NEXT:    s_mov_b64 exec, s[4:5]
+; WAVE64-NEXT:    v_writelane_b32 v40, exec_lo, 0
+; WAVE64-NEXT:    v_writelane_b32 v40, exec_hi, 1
+; WAVE64-NEXT:    .cfi_llvm_vector_registers 17, 2600, 0, 32, 2600, 1, 32
+; WAVE64-NEXT:    ;;#ASMSTART
+; WAVE64-NEXT:    ; clobber nonpreserved VGPRs
+; WAVE64-NEXT:    ;;#ASMEND
+; WAVE64-NEXT:    s_or_saveexec_b64 s[4:5], -1
+; WAVE64-NEXT:    buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
+; WAVE64-NEXT:    s_mov_b64 exec, s[4:5]
+; WAVE64-NEXT:    s_waitcnt vmcnt(0)
+; WAVE64-NEXT:    s_setpc_b64 s[30:31]
+;
+; WAVE32-LABEL: func_saved_in_preserved_vgpr:
+; WAVE32:       .Lfunc_begin2:
+; WAVE32-NEXT:    .cfi_startproc
+; WAVE32-NEXT:  ; %bb.0: ; %entry
+; WAVE32-NEXT:    .cfi_llvm_def_aspace_cfa 64, 0, 6
+; WAVE32-NEXT:    .cfi_llvm_register_pair 16, 62, 32, 63, 32
+; WAVE32-NEXT:    .cfi_undefined 1536
+; WAVE32-NEXT:    .cfi_undefined 1537
+; WAVE32-NEXT:    .cfi_undefined 1538
+; WAVE32-NEXT:    .cfi_undefined 1539
+; WAVE32-NEXT:    .cfi_undefined 1540
+; WAVE32-NEXT:    .cfi_undefined 1541
+; WAVE32-NEXT:    .cfi_undefined 1542
+; WAVE32-NEXT:    .cfi_undefined 1543
+; WAVE32-NEXT:    .cfi_undefined 1544
+; WAVE32-NEXT:    .cfi_undefined 1545
+; WAVE32-NEXT:    .cfi_undefined 1546
+; WAVE32-NEXT:    .cfi_undefined 1547
+; WAVE32-NEXT:    .cfi_undefined 1548
+; WAVE32-NEXT:    .cfi_undefined 1549
+; WAVE32-NEXT:    .cfi_undefined 1550
+; WAVE32-NEXT:    .cfi_undefined 1551
+; WAVE32-NEXT:    .cfi_undefined 1552
+; WAVE32-NEXT:    .cfi_undefined 1553
+; WAVE32-NEXT:    .cfi_undefined 1554
+; WAVE32-NEXT:    .cfi_undefined 1555
+; WAVE32-NEXT:    .cfi_undefined 1556
+; WAVE32-NEXT:    .cfi_undefined 1557
+; WAVE32-NEXT:    .cfi_undefined 1558
+; WAVE32-NEXT:    .cfi_undefined 1559
+; WAVE32-NEXT:    .cfi_undefined 1560
+; WAVE32-NEXT:    .cfi_undefined 1561
+; WAVE32-NEXT:    .cfi_undefined 1562
+; WAVE32-NEXT:    .cfi_undefined 1563
+; WAVE32-NEXT:    .cfi_undefined 1564
+; WAVE32-NEXT:    .cfi_undefined 1565
+; WAVE32-NEXT:    .cfi_undefined 1566
+; WAVE32-NEXT:    .cfi_undefined 1567
+; WAVE32-NEXT:    .cfi_undefined 1568
+; WAVE32-NEXT:    .cfi_undefined 1569
+; WAVE32-NEXT:    .cfi_undefined 1570
+; WAVE32-NEXT:    .cfi_undefined 1571
+; WAVE32-NEXT:    .cfi_undefined 1572
+; WAVE32-NEXT:    .cfi_undefined 1573
+; WAVE32-NEXT:    .cfi_undefined 1574
+; WAVE32-NEXT:    .cfi_undefined 1575
+; WAVE32-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; WAVE32-NEXT:    s_or_saveexec_b32 s4, -1
+; WAVE32-NEXT:    buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
+; WAVE32-NEXT:    .cfi_offset 1576, 0
+; WAVE32-NEXT:    s_waitcnt_depctr 0xffe3
+; WAVE32-NEXT:    s_mov_b32 exec_lo, s4
+; WAVE32-NEXT:    v_writelane_b32 v40, exec_lo, 0
+; WAVE32-NEXT:    .cfi_llvm_vector_registers 1, 1576, 0, 32
+; WAVE32-NEXT:    ;;#ASMSTART
+; WAVE32-NEXT:    ; clobber nonpreserved VGPRs
+; WAVE32-NEXT:    ;;#ASMEND
+; WAVE32-NEXT:    s_or_saveexec_b32 s4, -1
+; WAVE32-NEXT:    buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
+; WAVE32-NEXT:    s_waitcnt_depctr 0xffe3
+; WAVE32-NEXT:    s_mov_b32 exec_lo, s4
+; WAVE32-NEXT:    s_waitcnt vmcnt(0)
+; WAVE32-NEXT:    s_setpc_b64 s[30:31]
+entry:
+  call void asm sideeffect "; clobber nonpreserved VGPRs",
+    "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7},~{v8},~{v9}
+    ,~{v10},~{v11},~{v12},~{v13},~{v14},~{v15},~{v16},~{v17},~{v18},~{v19}
+    ,~{v20},~{v21},~{v22},~{v23},~{v24},~{v25},~{v26},~{v27},~{v28},~{v29}
+    ,~{v30},~{v31},~{v32},~{v33},~{v34},~{v35},~{v36},~{v37},~{v38},~{v39}"()
+  ret void
+}
+
+; There's no return here, so the return address live in was deleted.
+define void @empty_func() {
+; WAVE64-LABEL: empty_func:
+; WAVE64:       .Lfunc_begin3:
+; WAVE64-NEXT:    .cfi_startproc
+; WAVE64-NEXT:  ; %bb.0:
+; WAVE64-NEXT:    .cfi_llvm_def_aspace_cfa 64, 0, 6
+; WAVE64-NEXT:    .cfi_llvm_register_pair 16, 62, 32, 63, 32
+; WAVE64-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; WAVE64-NEXT:    s_xor_saveexec_b64 s[4:5], -1
+; WAVE64-NEXT:    buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
+; WAVE64-NEXT:    .cfi_offset 2560, 0
+; WAVE64-NEXT:    s_mov_b64 exec, s[4:5]
+; WAVE64-NEXT:    v_writelane_b32 v0, exec_lo, 0
+; WAVE64-NEXT:    v_writelane_b32 v0, exec_hi, 1
+;
+; WAVE32-LABEL: empty_func:
+; WAVE32:       .Lfunc_begin3:
+; WAVE32-NEXT:    .cfi_startproc
+; WAVE32-NEXT:  ; %bb.0:
+; WAVE32-NEXT:    .cfi_llvm_def_aspace_cfa 64, 0, 6
+; WAVE32-NEXT:    .cfi_llvm_register_pair 16, 62, 32, 63, 32
+; WAVE32-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; WAVE32-NEXT:    s_xor_saveexec_b32 s4, -1
+; WAVE32-NEXT:    buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
+; WAVE32-NEXT:  ...
[truncated]

arsenm · 2025-10-23T03:53:10Z

llvm/test/CodeGen/AMDGPU/amdgpu-spill-cfi-saved-regs.ll

+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-spill-cfi-saved-regs -verify-machineinstrs -o - %s | FileCheck --check-prefixes=CHECK,WAVE64 %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -mattr=+wavefrontsize32,-wavefrontsize64 -amdgpu-spill-cfi-saved-regs -verify-machineinstrs -o - %s | FileCheck --check-prefixes=CHECK,WAVE32 %s


Suggested change

; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-spill-cfi-saved-regs -verify-machineinstrs -o - %s | FileCheck --check-prefixes=CHECK,WAVE64 %s

; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -mattr=+wavefrontsize32,-wavefrontsize64 -amdgpu-spill-cfi-saved-regs -verify-machineinstrs -o - %s | FileCheck --check-prefixes=CHECK,WAVE32 %s

; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-spill-cfi-saved-regs -o - %s | FileCheck --check-prefixes=CHECK,WAVE64 %s

; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -amdgpu-spill-cfi-saved-regs -o - %s | FileCheck --check-prefixes=CHECK,WAVE32 %s

arsenm · 2025-10-23T03:53:46Z

llvm/test/CodeGen/AMDGPU/amdgpu-spill-cfi-saved-regs.ll

+
+attributes #0 = { nounwind }
+attributes #1 = { nounwind "amdgpu-waves-per-eu"="10,10" }
+attributes #2 = { nounwind "frame-pointer"="all" "amdgpu-waves-per-eu"="12,12" }


12? I thought the limit was 10, at least for gfx9

This was referenced Oct 22, 2025

[MC] Use a variant to hold MCCFIInstruction state (NFC) #164720

Open

[MC][Dwarf] Add custom CFI pseudo-ops for use in AMDGPU #164721

Open

[AMDGPU] Emit entry function Dwarf CFI #164722

Open

[AMDGPU] Implement CFI for non-kernel functions #164723

Open

This was referenced Oct 22, 2025

[AMDGPU] Implement CFI for CSR spills #164724

Open

[AMDGPU][MC] Replace shifted registers in CFI instructions #164726

Open

slinder1 assigned kzhuravl, epilk, SLTozer, adrian-prantl, OCHyams and jmorse Oct 22, 2025

slinder1 marked this pull request as ready for review October 22, 2025 22:39

llvmbot added the backend:AMDGPU label Oct 22, 2025

arsenm reviewed Oct 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU] Implement -amdgpu-spill-cfi-saved-regs #164725

[AMDGPU] Implement -amdgpu-spill-cfi-saved-regs #164725

slinder1 commented Oct 22, 2025

Uh oh!

slinder1 commented Oct 22, 2025 •

edited

Loading

Uh oh!

llvmbot commented Oct 22, 2025

Uh oh!

arsenm Oct 23, 2025

Uh oh!

arsenm Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

		; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-spill-cfi-saved-regs -verify-machineinstrs -o - %s \| FileCheck --check-prefixes=CHECK,WAVE64 %s
		; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -mattr=+wavefrontsize32,-wavefrontsize64 -amdgpu-spill-cfi-saved-regs -verify-machineinstrs -o - %s \| FileCheck --check-prefixes=CHECK,WAVE32 %s

[AMDGPU] Implement -amdgpu-spill-cfi-saved-regs #164725

Are you sure you want to change the base?

[AMDGPU] Implement -amdgpu-spill-cfi-saved-regs #164725

Conversation

slinder1 commented Oct 22, 2025

Uh oh!

slinder1 commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Oct 22, 2025

Uh oh!

arsenm Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

arsenm Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

slinder1 commented Oct 22, 2025 •

edited

Loading