Skip to content

Conversation

@slinder1
Copy link
Contributor

These spills need special CFI anyway, so implementing them directly
where CFI is emitted avoids the need to invent a mechanism to track them
from ISel.

Co-authored-by: Scott Linder scott.linder@amd.com
Co-authored-by: Venkata Ramanaiah Nalamothu VenkataRamanaiah.Nalamothu@amd.com

These spills need special CFI anyway, so implementing them directly
where CFI is emitted avoids the need to invent a mechanism to track them
from ISel.

Co-authored-by: Scott Linder <scott.linder@amd.com>
Co-authored-by: Venkata Ramanaiah Nalamothu <VenkataRamanaiah.Nalamothu@amd.com>
Copy link
Contributor Author

slinder1 commented Oct 22, 2025

@llvmbot
Copy link
Member

llvmbot commented Oct 22, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Scott Linder (slinder1)

Changes

These spills need special CFI anyway, so implementing them directly
where CFI is emitted avoids the need to invent a mechanism to track them
from ISel.

Co-authored-by: Scott Linder <scott.linder@amd.com>
Co-authored-by: Venkata Ramanaiah Nalamothu <VenkataRamanaiah.Nalamothu@amd.com>


Patch is 153.06 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/164725.diff

8 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIFrameLowering.cpp (+35-10)
  • (modified) llvm/lib/Target/AMDGPU/SIFrameLowering.h (+7)
  • (modified) llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp (+1-1)
  • (modified) llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp (+2-1)
  • (modified) llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h (+11-2)
  • (modified) llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp (+9)
  • (modified) llvm/lib/Target/AMDGPU/SIRegisterInfo.h (+2)
  • (added) llvm/test/CodeGen/AMDGPU/amdgpu-spill-cfi-saved-regs.ll (+2556)
diff --git a/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp b/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
index bbde3c49f64c6..0c5d1445d36e8 100644
--- a/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
@@ -671,12 +671,21 @@ void SIFrameLowering::emitEntryFunctionFlatScratchInit(
 }
 
 // Note SGPRSpill stack IDs should only be used for SGPR spilling to VGPRs, not
-// memory. They should have been removed by now.
-static bool allStackObjectsAreDead(const MachineFrameInfo &MFI) {
+// memory. They should have been removed by now, except CFI Saved Reg spills.
+static bool allStackObjectsAreDead(const MachineFunction &MF) {
+  const MachineFrameInfo &MFI = MF.getFrameInfo();
+  const SIMachineFunctionInfo *FuncInfo = MF.getInfo<SIMachineFunctionInfo>();
   for (int I = MFI.getObjectIndexBegin(), E = MFI.getObjectIndexEnd();
        I != E; ++I) {
-    if (!MFI.isDeadObjectIndex(I))
+    if (!MFI.isDeadObjectIndex(I)) {
+      // determineCalleeSaves() might have added the SGPRSpill stack IDs for
+      // CFI saves into scratch VGPR, ignore them
+      if (MFI.getStackID(I) == TargetStackID::SGPRSpill &&
+          FuncInfo->checkIndexInPrologEpilogSGPRSpills(I)) {
+        continue;
+      }
       return false;
+    }
   }
 
   return true;
@@ -696,8 +705,8 @@ Register SIFrameLowering::getEntryFunctionReservedScratchRsrcReg(
 
   Register ScratchRsrcReg = MFI->getScratchRSrcReg();
 
-  if (!ScratchRsrcReg || (!MRI.isPhysRegUsed(ScratchRsrcReg) &&
-                          allStackObjectsAreDead(MF.getFrameInfo())))
+  if (!ScratchRsrcReg ||
+      (!MRI.isPhysRegUsed(ScratchRsrcReg) && allStackObjectsAreDead(MF)))
     return Register();
 
   if (ST.hasSGPRInitBug() ||
@@ -925,7 +934,7 @@ void SIFrameLowering::emitEntryFunctionPrologue(MachineFunction &MF,
   bool NeedsFlatScratchInit =
       MFI->getUserSGPRInfo().hasFlatScratchInit() &&
       (MRI.isPhysRegUsed(AMDGPU::FLAT_SCR) || FrameInfo.hasCalls() ||
-       (!allStackObjectsAreDead(FrameInfo) && ST.enableFlatScratch()));
+       (!allStackObjectsAreDead(MF) && ST.enableFlatScratch()));
 
   if ((NeedsFlatScratchInit || ScratchRsrcReg) &&
       PreloadedScratchWaveOffsetReg && !ST.flatScratchIsArchitected()) {
@@ -1306,6 +1315,11 @@ void SIFrameLowering::emitCSRSpillStores(MachineFunction &MF,
         LiveUnits.addReg(Reg);
     }
   }
+
+  // Remove the spill entry created for EXEC. It is needed only for CFISaves in
+  // the prologue.
+  if (TRI.isCFISavedRegsSpillEnabled())
+    FuncInfo->removePrologEpilogSGPRSpillEntry(TRI.getExec());
 }
 
 void SIFrameLowering::emitCSRSpillRestores(
@@ -1789,14 +1803,14 @@ void SIFrameLowering::processFunctionBeforeFrameFinalized(
   // can. Any remaining SGPR spills will go to memory, so move them back to the
   // default stack.
   bool HaveSGPRToVMemSpill =
-      FuncInfo->removeDeadFrameIndices(MFI, /*ResetSGPRSpillStackIDs*/ true);
+      FuncInfo->removeDeadFrameIndices(MF, /*ResetSGPRSpillStackIDs*/ true);
   assert(allSGPRSpillsAreDead(MF) &&
          "SGPR spill should have been removed in SILowerSGPRSpills");
 
   // FIXME: The other checks should be redundant with allStackObjectsAreDead,
   // but currently hasNonSpillStackObjects is set only from source
   // allocas. Stack temps produced from legalization are not counted currently.
-  if (!allStackObjectsAreDead(MFI)) {
+  if (!allStackObjectsAreDead(MF)) {
     assert(RS && "RegScavenger required if spilling");
 
     // Add an emergency spill slot
@@ -1896,6 +1910,18 @@ void SIFrameLowering::determinePrologEpilogSGPRSaves(
     MFI->setSGPRForEXECCopy(AMDGPU::NoRegister);
   }
 
+  if (TRI->isCFISavedRegsSpillEnabled()) {
+    Register Exec = TRI->getExec();
+    assert(!MFI->hasPrologEpilogSGPRSpillEntry(Exec) &&
+           "Re-reserving spill slot for EXEC");
+    // FIXME: Machine Copy Propagation currently optimizes away the EXEC copy to
+    // the scratch as we emit it only in the prolog. This optimization should
+    // not happen for frame related instructions. Until this is fixed ignore
+    // copy to scratch SGPR.
+    getVGPRSpillLaneOrTempRegister(MF, LiveUnits, Exec, RC,
+                                   /*IncludeScratchCopy=*/false);
+  }
+
   // hasFP only knows about stack objects that already exist. We're now
   // determining the stack slots that will be created, so we have to predict
   // them. Stack objects force FP usage with calls.
@@ -1905,8 +1931,7 @@ void SIFrameLowering::determinePrologEpilogSGPRSaves(
   //
   // FIXME: Is this really hasReservedCallFrame?
   const bool WillHaveFP =
-      FrameInfo.hasCalls() &&
-      (SavedVGPRs.any() || !allStackObjectsAreDead(FrameInfo));
+      FrameInfo.hasCalls() && (SavedVGPRs.any() || !allStackObjectsAreDead(MF));
 
   if (WillHaveFP || hasFP(MF)) {
     Register FramePtrReg = MFI->getFrameOffsetReg();
diff --git a/llvm/lib/Target/AMDGPU/SIFrameLowering.h b/llvm/lib/Target/AMDGPU/SIFrameLowering.h
index 2b716db0b7a22..526404eb83b4f 100644
--- a/llvm/lib/Target/AMDGPU/SIFrameLowering.h
+++ b/llvm/lib/Target/AMDGPU/SIFrameLowering.h
@@ -114,6 +114,13 @@ class SIFrameLowering final : public AMDGPUFrameLowering {
 public:
   bool requiresStackPointerReference(const MachineFunction &MF) const;
 
+  /// If '-amdgpu-spill-cfi-saved-regs' is enabled, emit RA/EXEC spills to
+  /// a free VGPR (lanes) or memory and corresponding CFI rules.
+  void emitCFISavedRegSpills(MachineFunction &MF, MachineBasicBlock &MBB,
+                             MachineBasicBlock::iterator MBBI,
+                             LiveRegUnits &LiveRegs,
+                             bool emitSpillsToMem) const;
+
   /// Create a CFI index for CFIInst and build a MachineInstr around it.
   MachineInstr *
   buildCFI(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI,
diff --git a/llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp b/llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
index 62386da94d854..57ff52334a470 100644
--- a/llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
+++ b/llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
@@ -531,7 +531,7 @@ bool SILowerSGPRSpills::run(MachineFunction &MF) {
     // free frame index ids by the later pass(es) like "stack slot coloring"
     // which in turn could mess-up with the book keeping of "frame index to VGPR
     // lane".
-    FuncInfo->removeDeadFrameIndices(MFI, /*ResetSGPRSpillStackIDs*/ false);
+    FuncInfo->removeDeadFrameIndices(MF, /*ResetSGPRSpillStackIDs*/ false);
 
     MadeChange = true;
   }
diff --git a/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp b/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
index b398db4f7caff..2c275a85440d9 100644
--- a/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
@@ -566,7 +566,8 @@ bool SIMachineFunctionInfo::allocateVGPRSpillToAGPR(MachineFunction &MF,
 }
 
 bool SIMachineFunctionInfo::removeDeadFrameIndices(
-    MachineFrameInfo &MFI, bool ResetSGPRSpillStackIDs) {
+    MachineFunction &MF, bool ResetSGPRSpillStackIDs) {
+  MachineFrameInfo &MFI = MF.getFrameInfo();
   // Remove dead frame indices from function frame, however keep FP & BP since
   // spills for them haven't been inserted yet. And also make sure to remove the
   // frame indices from `SGPRSpillsToVirtualVGPRLanes` data structure,
diff --git a/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h b/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
index 2c1a13c345aac..8ca34d10de0ef 100644
--- a/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
+++ b/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
@@ -752,6 +752,16 @@ class SIMachineFunctionInfo final : public AMDGPUMachineFunction,
                    }) != PrologEpilogSGPRSpills.end();
   }
 
+  // Remove if an entry created for \p Reg.
+  void removePrologEpilogSGPRSpillEntry(Register Reg) {
+    auto I = find_if(PrologEpilogSGPRSpills,
+                     [&Reg](const auto &Spill) { return Spill.first == Reg; });
+    if (I == PrologEpilogSGPRSpills.end())
+      return;
+
+    PrologEpilogSGPRSpills.erase(I);
+  }
+
   const PrologEpilogSGPRSaveRestoreInfo &
   getPrologEpilogSGPRSaveRestoreInfo(Register Reg) const {
     const auto *I = find_if(PrologEpilogSGPRSpills, [&Reg](const auto &Spill) {
@@ -830,8 +840,7 @@ class SIMachineFunctionInfo final : public AMDGPUMachineFunction,
 
   /// If \p ResetSGPRSpillStackIDs is true, reset the stack ID from sgpr-spill
   /// to the default stack.
-  bool removeDeadFrameIndices(MachineFrameInfo &MFI,
-                              bool ResetSGPRSpillStackIDs);
+  bool removeDeadFrameIndices(MachineFunction &MF, bool ResetSGPRSpillStackIDs);
 
   int getScavengeFI(MachineFrameInfo &MFI, const SIRegisterInfo &TRI);
   std::optional<int> getOptionalScavengeFI() const { return ScavengeFI; }
diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
index 77608a4cfc751..9677c6cd7806c 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
@@ -35,6 +35,11 @@ static cl::opt<bool> EnableSpillSGPRToVGPR(
   cl::ReallyHidden,
   cl::init(true));
 
+static cl::opt<bool> EnableSpillCFISavedRegs(
+    "amdgpu-spill-cfi-saved-regs",
+    cl::desc("Enable spilling the registers required for CFI emission"),
+    cl::ReallyHidden, cl::init(false), cl::ZeroOrMore);
+
 std::array<std::vector<int16_t>, 32> SIRegisterInfo::RegSplitParts;
 std::array<std::array<uint16_t, 32>, 9> SIRegisterInfo::SubRegFromChannelTable;
 
@@ -559,6 +564,10 @@ unsigned SIRegisterInfo::getSubRegFromChannel(unsigned Channel,
   return SubRegFromChannelTable[NumRegIndex - 1][Channel];
 }
 
+bool SIRegisterInfo::isCFISavedRegsSpillEnabled() const {
+  return EnableSpillCFISavedRegs;
+}
+
 MCRegister
 SIRegisterInfo::getAlignedHighSGPRForRC(const MachineFunction &MF,
                                         const unsigned Align,
diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.h b/llvm/lib/Target/AMDGPU/SIRegisterInfo.h
index 2dae5f0eb1c69..749583722e3b0 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.h
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.h
@@ -80,6 +80,8 @@ class SIRegisterInfo final : public AMDGPUGenRegisterInfo {
     return SpillSGPRToVGPR;
   }
 
+  bool isCFISavedRegsSpillEnabled() const;
+
   /// Return the largest available SGPR aligned to \p Align for the register
   /// class \p RC.
   MCRegister getAlignedHighSGPRForRC(const MachineFunction &MF,
diff --git a/llvm/test/CodeGen/AMDGPU/amdgpu-spill-cfi-saved-regs.ll b/llvm/test/CodeGen/AMDGPU/amdgpu-spill-cfi-saved-regs.ll
new file mode 100644
index 0000000000000..c804c75ae7d2c
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/amdgpu-spill-cfi-saved-regs.ll
@@ -0,0 +1,2556 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-spill-cfi-saved-regs -verify-machineinstrs -o - %s | FileCheck --check-prefixes=CHECK,WAVE64 %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -mattr=+wavefrontsize32,-wavefrontsize64 -amdgpu-spill-cfi-saved-regs -verify-machineinstrs -o - %s | FileCheck --check-prefixes=CHECK,WAVE32 %s
+
+define protected amdgpu_kernel void @kern() #0 {
+; CHECK-LABEL: kern:
+; CHECK:       .Lfunc_begin0:
+; CHECK-NEXT:    .cfi_sections .debug_frame
+; CHECK-NEXT:    .cfi_startproc
+; CHECK-NEXT:  ; %bb.0: ; %entry
+; CHECK-NEXT:    .cfi_escape 0x0f, 0x04, 0x30, 0x36, 0xe9, 0x02 ;
+; CHECK-NEXT:    .cfi_undefined 16
+; CHECK-NEXT:    s_endpgm
+entry:
+  ret void
+}
+
+define hidden void @func_saved_in_clobbered_vgpr() #0 {
+; WAVE64-LABEL: func_saved_in_clobbered_vgpr:
+; WAVE64:       .Lfunc_begin1:
+; WAVE64-NEXT:    .cfi_startproc
+; WAVE64-NEXT:  ; %bb.0: ; %entry
+; WAVE64-NEXT:    .cfi_llvm_def_aspace_cfa 64, 0, 6
+; WAVE64-NEXT:    .cfi_llvm_register_pair 16, 62, 32, 63, 32
+; WAVE64-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; WAVE64-NEXT:    s_xor_saveexec_b64 s[4:5], -1
+; WAVE64-NEXT:    buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
+; WAVE64-NEXT:    .cfi_offset 2560, 0
+; WAVE64-NEXT:    s_mov_b64 exec, s[4:5]
+; WAVE64-NEXT:    v_writelane_b32 v0, exec_lo, 0
+; WAVE64-NEXT:    v_writelane_b32 v0, exec_hi, 1
+; WAVE64-NEXT:    .cfi_llvm_vector_registers 17, 2560, 0, 32, 2560, 1, 32
+; WAVE64-NEXT:    s_xor_saveexec_b64 s[4:5], -1
+; WAVE64-NEXT:    buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
+; WAVE64-NEXT:    s_mov_b64 exec, s[4:5]
+; WAVE64-NEXT:    s_waitcnt vmcnt(0)
+; WAVE64-NEXT:    s_setpc_b64 s[30:31]
+;
+; WAVE32-LABEL: func_saved_in_clobbered_vgpr:
+; WAVE32:       .Lfunc_begin1:
+; WAVE32-NEXT:    .cfi_startproc
+; WAVE32-NEXT:  ; %bb.0: ; %entry
+; WAVE32-NEXT:    .cfi_llvm_def_aspace_cfa 64, 0, 6
+; WAVE32-NEXT:    .cfi_llvm_register_pair 16, 62, 32, 63, 32
+; WAVE32-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; WAVE32-NEXT:    s_xor_saveexec_b32 s4, -1
+; WAVE32-NEXT:    buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
+; WAVE32-NEXT:    .cfi_offset 1536, 0
+; WAVE32-NEXT:    s_waitcnt_depctr 0xffe3
+; WAVE32-NEXT:    s_mov_b32 exec_lo, s4
+; WAVE32-NEXT:    v_writelane_b32 v0, exec_lo, 0
+; WAVE32-NEXT:    .cfi_llvm_vector_registers 1, 1536, 0, 32
+; WAVE32-NEXT:    s_xor_saveexec_b32 s4, -1
+; WAVE32-NEXT:    buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
+; WAVE32-NEXT:    s_waitcnt_depctr 0xffe3
+; WAVE32-NEXT:    s_mov_b32 exec_lo, s4
+; WAVE32-NEXT:    s_waitcnt vmcnt(0)
+; WAVE32-NEXT:    s_setpc_b64 s[30:31]
+entry:
+  ret void
+}
+
+; Check that the option causes a CSR VGPR to spill when needed.
+define hidden void @func_saved_in_preserved_vgpr() #0 {
+; WAVE64-LABEL: func_saved_in_preserved_vgpr:
+; WAVE64:       .Lfunc_begin2:
+; WAVE64-NEXT:    .cfi_startproc
+; WAVE64-NEXT:  ; %bb.0: ; %entry
+; WAVE64-NEXT:    .cfi_llvm_def_aspace_cfa 64, 0, 6
+; WAVE64-NEXT:    .cfi_llvm_register_pair 16, 62, 32, 63, 32
+; WAVE64-NEXT:    .cfi_undefined 2560
+; WAVE64-NEXT:    .cfi_undefined 2561
+; WAVE64-NEXT:    .cfi_undefined 2562
+; WAVE64-NEXT:    .cfi_undefined 2563
+; WAVE64-NEXT:    .cfi_undefined 2564
+; WAVE64-NEXT:    .cfi_undefined 2565
+; WAVE64-NEXT:    .cfi_undefined 2566
+; WAVE64-NEXT:    .cfi_undefined 2567
+; WAVE64-NEXT:    .cfi_undefined 2568
+; WAVE64-NEXT:    .cfi_undefined 2569
+; WAVE64-NEXT:    .cfi_undefined 2570
+; WAVE64-NEXT:    .cfi_undefined 2571
+; WAVE64-NEXT:    .cfi_undefined 2572
+; WAVE64-NEXT:    .cfi_undefined 2573
+; WAVE64-NEXT:    .cfi_undefined 2574
+; WAVE64-NEXT:    .cfi_undefined 2575
+; WAVE64-NEXT:    .cfi_undefined 2576
+; WAVE64-NEXT:    .cfi_undefined 2577
+; WAVE64-NEXT:    .cfi_undefined 2578
+; WAVE64-NEXT:    .cfi_undefined 2579
+; WAVE64-NEXT:    .cfi_undefined 2580
+; WAVE64-NEXT:    .cfi_undefined 2581
+; WAVE64-NEXT:    .cfi_undefined 2582
+; WAVE64-NEXT:    .cfi_undefined 2583
+; WAVE64-NEXT:    .cfi_undefined 2584
+; WAVE64-NEXT:    .cfi_undefined 2585
+; WAVE64-NEXT:    .cfi_undefined 2586
+; WAVE64-NEXT:    .cfi_undefined 2587
+; WAVE64-NEXT:    .cfi_undefined 2588
+; WAVE64-NEXT:    .cfi_undefined 2589
+; WAVE64-NEXT:    .cfi_undefined 2590
+; WAVE64-NEXT:    .cfi_undefined 2591
+; WAVE64-NEXT:    .cfi_undefined 2592
+; WAVE64-NEXT:    .cfi_undefined 2593
+; WAVE64-NEXT:    .cfi_undefined 2594
+; WAVE64-NEXT:    .cfi_undefined 2595
+; WAVE64-NEXT:    .cfi_undefined 2596
+; WAVE64-NEXT:    .cfi_undefined 2597
+; WAVE64-NEXT:    .cfi_undefined 2598
+; WAVE64-NEXT:    .cfi_undefined 2599
+; WAVE64-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; WAVE64-NEXT:    s_or_saveexec_b64 s[4:5], -1
+; WAVE64-NEXT:    buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
+; WAVE64-NEXT:    .cfi_offset 2600, 0
+; WAVE64-NEXT:    s_mov_b64 exec, s[4:5]
+; WAVE64-NEXT:    v_writelane_b32 v40, exec_lo, 0
+; WAVE64-NEXT:    v_writelane_b32 v40, exec_hi, 1
+; WAVE64-NEXT:    .cfi_llvm_vector_registers 17, 2600, 0, 32, 2600, 1, 32
+; WAVE64-NEXT:    ;;#ASMSTART
+; WAVE64-NEXT:    ; clobber nonpreserved VGPRs
+; WAVE64-NEXT:    ;;#ASMEND
+; WAVE64-NEXT:    s_or_saveexec_b64 s[4:5], -1
+; WAVE64-NEXT:    buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
+; WAVE64-NEXT:    s_mov_b64 exec, s[4:5]
+; WAVE64-NEXT:    s_waitcnt vmcnt(0)
+; WAVE64-NEXT:    s_setpc_b64 s[30:31]
+;
+; WAVE32-LABEL: func_saved_in_preserved_vgpr:
+; WAVE32:       .Lfunc_begin2:
+; WAVE32-NEXT:    .cfi_startproc
+; WAVE32-NEXT:  ; %bb.0: ; %entry
+; WAVE32-NEXT:    .cfi_llvm_def_aspace_cfa 64, 0, 6
+; WAVE32-NEXT:    .cfi_llvm_register_pair 16, 62, 32, 63, 32
+; WAVE32-NEXT:    .cfi_undefined 1536
+; WAVE32-NEXT:    .cfi_undefined 1537
+; WAVE32-NEXT:    .cfi_undefined 1538
+; WAVE32-NEXT:    .cfi_undefined 1539
+; WAVE32-NEXT:    .cfi_undefined 1540
+; WAVE32-NEXT:    .cfi_undefined 1541
+; WAVE32-NEXT:    .cfi_undefined 1542
+; WAVE32-NEXT:    .cfi_undefined 1543
+; WAVE32-NEXT:    .cfi_undefined 1544
+; WAVE32-NEXT:    .cfi_undefined 1545
+; WAVE32-NEXT:    .cfi_undefined 1546
+; WAVE32-NEXT:    .cfi_undefined 1547
+; WAVE32-NEXT:    .cfi_undefined 1548
+; WAVE32-NEXT:    .cfi_undefined 1549
+; WAVE32-NEXT:    .cfi_undefined 1550
+; WAVE32-NEXT:    .cfi_undefined 1551
+; WAVE32-NEXT:    .cfi_undefined 1552
+; WAVE32-NEXT:    .cfi_undefined 1553
+; WAVE32-NEXT:    .cfi_undefined 1554
+; WAVE32-NEXT:    .cfi_undefined 1555
+; WAVE32-NEXT:    .cfi_undefined 1556
+; WAVE32-NEXT:    .cfi_undefined 1557
+; WAVE32-NEXT:    .cfi_undefined 1558
+; WAVE32-NEXT:    .cfi_undefined 1559
+; WAVE32-NEXT:    .cfi_undefined 1560
+; WAVE32-NEXT:    .cfi_undefined 1561
+; WAVE32-NEXT:    .cfi_undefined 1562
+; WAVE32-NEXT:    .cfi_undefined 1563
+; WAVE32-NEXT:    .cfi_undefined 1564
+; WAVE32-NEXT:    .cfi_undefined 1565
+; WAVE32-NEXT:    .cfi_undefined 1566
+; WAVE32-NEXT:    .cfi_undefined 1567
+; WAVE32-NEXT:    .cfi_undefined 1568
+; WAVE32-NEXT:    .cfi_undefined 1569
+; WAVE32-NEXT:    .cfi_undefined 1570
+; WAVE32-NEXT:    .cfi_undefined 1571
+; WAVE32-NEXT:    .cfi_undefined 1572
+; WAVE32-NEXT:    .cfi_undefined 1573
+; WAVE32-NEXT:    .cfi_undefined 1574
+; WAVE32-NEXT:    .cfi_undefined 1575
+; WAVE32-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; WAVE32-NEXT:    s_or_saveexec_b32 s4, -1
+; WAVE32-NEXT:    buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
+; WAVE32-NEXT:    .cfi_offset 1576, 0
+; WAVE32-NEXT:    s_waitcnt_depctr 0xffe3
+; WAVE32-NEXT:    s_mov_b32 exec_lo, s4
+; WAVE32-NEXT:    v_writelane_b32 v40, exec_lo, 0
+; WAVE32-NEXT:    .cfi_llvm_vector_registers 1, 1576, 0, 32
+; WAVE32-NEXT:    ;;#ASMSTART
+; WAVE32-NEXT:    ; clobber nonpreserved VGPRs
+; WAVE32-NEXT:    ;;#ASMEND
+; WAVE32-NEXT:    s_or_saveexec_b32 s4, -1
+; WAVE32-NEXT:    buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
+; WAVE32-NEXT:    s_waitcnt_depctr 0xffe3
+; WAVE32-NEXT:    s_mov_b32 exec_lo, s4
+; WAVE32-NEXT:    s_waitcnt vmcnt(0)
+; WAVE32-NEXT:    s_setpc_b64 s[30:31]
+entry:
+  call void asm sideeffect "; clobber nonpreserved VGPRs",
+    "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7},~{v8},~{v9}
+    ,~{v10},~{v11},~{v12},~{v13},~{v14},~{v15},~{v16},~{v17},~{v18},~{v19}
+    ,~{v20},~{v21},~{v22},~{v23},~{v24},~{v25},~{v26},~{v27},~{v28},~{v29}
+    ,~{v30},~{v31},~{v32},~{v33},~{v34},~{v35},~{v36},~{v37},~{v38},~{v39}"()
+  ret void
+}
+
+; There's no return here, so the return address live in was deleted.
+define void @empty_func() {
+; WAVE64-LABEL: empty_func:
+; WAVE64:       .Lfunc_begin3:
+; WAVE64-NEXT:    .cfi_startproc
+; WAVE64-NEXT:  ; %bb.0:
+; WAVE64-NEXT:    .cfi_llvm_def_aspace_cfa 64, 0, 6
+; WAVE64-NEXT:    .cfi_llvm_register_pair 16, 62, 32, 63, 32
+; WAVE64-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; WAVE64-NEXT:    s_xor_saveexec_b64 s[4:5], -1
+; WAVE64-NEXT:    buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
+; WAVE64-NEXT:    .cfi_offset 2560, 0
+; WAVE64-NEXT:    s_mov_b64 exec, s[4:5]
+; WAVE64-NEXT:    v_writelane_b32 v0, exec_lo, 0
+; WAVE64-NEXT:    v_writelane_b32 v0, exec_hi, 1
+;
+; WAVE32-LABEL: empty_func:
+; WAVE32:       .Lfunc_begin3:
+; WAVE32-NEXT:    .cfi_startproc
+; WAVE32-NEXT:  ; %bb.0:
+; WAVE32-NEXT:    .cfi_llvm_def_aspace_cfa 64, 0, 6
+; WAVE32-NEXT:    .cfi_llvm_register_pair 16, 62, 32, 63, 32
+; WAVE32-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; WAVE32-NEXT:    s_xor_saveexec_b32 s4, -1
+; WAVE32-NEXT:    buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
+; WAVE32-NEXT:  ...
[truncated]

Comment on lines +2 to +3
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-spill-cfi-saved-regs -verify-machineinstrs -o - %s | FileCheck --check-prefixes=CHECK,WAVE64 %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -mattr=+wavefrontsize32,-wavefrontsize64 -amdgpu-spill-cfi-saved-regs -verify-machineinstrs -o - %s | FileCheck --check-prefixes=CHECK,WAVE32 %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-spill-cfi-saved-regs -verify-machineinstrs -o - %s | FileCheck --check-prefixes=CHECK,WAVE64 %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -mattr=+wavefrontsize32,-wavefrontsize64 -amdgpu-spill-cfi-saved-regs -verify-machineinstrs -o - %s | FileCheck --check-prefixes=CHECK,WAVE32 %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-spill-cfi-saved-regs -o - %s | FileCheck --check-prefixes=CHECK,WAVE64 %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -amdgpu-spill-cfi-saved-regs -o - %s | FileCheck --check-prefixes=CHECK,WAVE32 %s


attributes #0 = { nounwind }
attributes #1 = { nounwind "amdgpu-waves-per-eu"="10,10" }
attributes #2 = { nounwind "frame-pointer"="all" "amdgpu-waves-per-eu"="12,12" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

12? I thought the limit was 10, at least for gfx9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants