[AArch64][SME] Rework VG CFI information for streaming-mode changes #152283

MacDue · 2025-08-06T09:46:41Z

This patch reworks how VG is handled around streaming mode changes.

Previously, for functions with streaming mode changes, we would:

Save the incoming VG in the prologue
Emit .cfi_offset vg, <offset> and .cfi_restore vg around streaming mode changes

Additionally, for locally streaming functions, we would:

Also save the streaming VG in the prologue
Emit .cfi_offset vg, <incoming VG offset> in the prologue
Emit .cfi_offset vg, <streaming VG offset> and .cfi_restore vg around streaming mode changes

In both cases, this ends up doing more than necessary and would be hard for an unwinder to parse, as using .cfi_offset in this way does not follow the semantics of the underlying DWARF CFI opcodes.

So the new scheme in this patch is to:

In functions with streaming mode changes (inc locally streaming)

Save the incoming VG in the prologue
Emit .cfi_offset vg, <offset> in the prologue (not at streaming mode changes)
Emit .cfi_restore vg after the saved VG has been deallocated
- This will be in the function epilogue, where VG is always the same as the entry VG
Explicitly reference the incoming VG expressions for SVE callee-saves in functions with streaming mode changes
Ensure the CFA is not described in terms of VG in functions with streaming mode changes

A more in-depth discussion of this scheme is available in: https://gist.github.com/MacDue/b7a5c45d131d2440858165bfc903e97b

But the TLDR is that following this scheme, SME unwinding can be implemented with minimal changes to existing unwinders. All unwinders need to do is initialize VG to CNTD at the start of unwinding, then everything else is handled by standard opcodes (which don't need changes to handle VG).

MacDue · 2025-08-06T09:48:23Z

See sme-streaming-mode-changes-unwindinfo.ll for a new test that shows the unwind information in a variety of cases.

llvmbot · 2025-08-06T09:52:18Z

@llvm/pr-subscribers-backend-aarch64

Author: Benjamin Maxwell (MacDue)

Changes

This patch reworks how VG is handled around streaming mode changes.

Previously, for functions with streaming mode changes, we would:

Save the incoming VG in the prologue
Emit .cfi_offset vg, <offset> and .cfi_restore vg around streaming mode changes

Additionally, for locally streaming functions, we would:

Also save the streaming VG in the prologue
Emit .cfi_offset vg, <incoming VG offset> in the prologue
Emit .cfi_offset vg, <streaming VG offset> and .cfi_restore vg around streaming mode changes

In both cases, this ends up doing more than necessary and would be hard for an unwinder to parse, as using .cfi_offset in this way does not follow the semantics of the underlying DWARF CFI opcodes.

So the new scheme in this patch is to:

In functions with streaming mode changes (inc locally streaming)

Save the incoming VG in the prologue
Emit .cfi_offset vg, <offset> in the prologue (not at streaming mode changes)
Never emit .cfi_restore vg (this is not meaningful for unwinding)
Explicitly reference the incoming VG expressions for SVE callee-saves in functions with streaming mode changes
Ensure the CFA is not described in terms of VG in functions with streaming mode changes

A more in-depth discussion of this scheme is available in: https://gist.github.com/MacDue/b7a5c45d131d2440858165bfc903e97b

But the TLDR is that following this scheme, SME unwinding can be implemented with minimal changes to existing unwinders. All unwinders need to do is initialize VG to CNTD at the start of unwinding, then everything else is handled by standard opcodes (which don't need changes to handle VG).

Patch is 382.23 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/152283.diff

28 Files Affected:

(modified) llvm/lib/Target/AArch64/AArch64FrameLowering.cpp (+55-117)
(modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (-13)
(modified) llvm/lib/Target/AArch64/AArch64InstrInfo.cpp (+24-5)
(modified) llvm/lib/Target/AArch64/AArch64InstrInfo.h (+4-2)
(modified) llvm/lib/Target/AArch64/AArch64MachineFunctionInfo.h (-10)
(modified) llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td (-16)
(modified) llvm/lib/Target/AArch64/SMEPeepholeOpt.cpp (+2-19)
(modified) llvm/test/CodeGen/AArch64/outlining-with-streaming-mode-changes.ll (+3-9)
(modified) llvm/test/CodeGen/AArch64/sme-agnostic-za.ll (+10-18)
(modified) llvm/test/CodeGen/AArch64/sme-call-streaming-compatible-to-normal-fn-wihout-sme-attr.ll (+8-14)
(modified) llvm/test/CodeGen/AArch64/sme-callee-save-restore-pairs.ll (+16-32)
(modified) llvm/test/CodeGen/AArch64/sme-darwin-sve-vg.ll (+18-20)
(modified) llvm/test/CodeGen/AArch64/sme-disable-gisel-fisel.ll (+14-37)
(modified) llvm/test/CodeGen/AArch64/sme-lazy-save-call.ll (+4-6)
(modified) llvm/test/CodeGen/AArch64/sme-must-save-lr-for-vg.ll (+5-6)
(modified) llvm/test/CodeGen/AArch64/sme-peephole-opts.ll (+126-59)
(modified) llvm/test/CodeGen/AArch64/sme-pstate-sm-changing-call-disable-coalescing.ll (+198-284)
(modified) llvm/test/CodeGen/AArch64/sme-streaming-body-streaming-compatible-interface.ll (+13-28)
(modified) llvm/test/CodeGen/AArch64/sme-streaming-body.ll (+46-84)
(modified) llvm/test/CodeGen/AArch64/sme-streaming-compatible-interface.ll (+40-60)
(modified) llvm/test/CodeGen/AArch64/sme-streaming-interface.ll (+17-30)
(added) llvm/test/CodeGen/AArch64/sme-streaming-mode-changes-unwindinfo.ll (+308)
(modified) llvm/test/CodeGen/AArch64/sme-streaming-mode-changing-call-disable-stackslot-scavenging.ll (+13-15)
(modified) llvm/test/CodeGen/AArch64/sme-vg-to-stack.ll (+162-169)
(modified) llvm/test/CodeGen/AArch64/ssve-stack-hazard-remarks.ll (+8-8)
(modified) llvm/test/CodeGen/AArch64/stack-hazard.ll (+349-344)
(modified) llvm/test/CodeGen/AArch64/streaming-compatible-memory-ops.ll (+25-45)
(modified) llvm/test/CodeGen/AArch64/sve-stack-frame-layout.ll (+60-57)

diff --git a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
index 885f2a94f85f5..de9d865465901 100644
--- a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
@@ -338,9 +338,11 @@ static bool requiresSaveVG(const MachineFunction &MF);
 // Conservatively, returns true if the function is likely to have an SVE vectors
 // on the stack. This function is safe to be called before callee-saves or
 // object offsets have been determined.
-static bool isLikelyToHaveSVEStack(MachineFunction &MF) {
+static bool isLikelyToHaveSVEStack(const MachineFunction &MF) {
   auto *AFI = MF.getInfo<AArch64FunctionInfo>();
-  if (AFI->isSVECC())
+  if (MF.getFunction().getCallingConv() ==
+          CallingConv::AArch64_SVE_VectorCall ||
+      AFI->isSVECC())
     return true;
 
   if (AFI->hasCalculatedStackSizeSVE())
@@ -532,6 +534,7 @@ bool AArch64FrameLowering::canUseRedZone(const MachineFunction &MF) const {
 bool AArch64FrameLowering::hasFPImpl(const MachineFunction &MF) const {
   const MachineFrameInfo &MFI = MF.getFrameInfo();
   const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
+  const AArch64FunctionInfo &AFI = *MF.getInfo<AArch64FunctionInfo>();
 
   // Win64 EH requires a frame pointer if funclets are present, as the locals
   // are accessed off the frame pointer in both the parent function and the
@@ -545,6 +548,16 @@ bool AArch64FrameLowering::hasFPImpl(const MachineFunction &MF) const {
       MFI.hasStackMap() || MFI.hasPatchPoint() ||
       RegInfo->hasStackRealignment(MF))
     return true;
+  // If we have streaming mode changes and SVE registers on the stack we need a
+  // FP. This is as the stack size may depend on the VG at entry to the
+  // function, which is saved before the SVE area (so unrecoverable without a
+  // FP). Similar for locally streaming functions, but it is because we use
+  // ADDSVL to setup the SVE stack (which might not match VG, even without
+  // streaming-mode changes).
+  if (AFI.needsDwarfUnwindInfo(MF) &&
+      ((requiresSaveVG(MF) || AFI.getSMEFnAttrs().hasStreamingBody()) &&
+       (!AFI.hasCalculatedStackSizeSVE() || AFI.getStackSizeSVE() > 0)))
+    return true;
   // With large callframes around we may need to use FP to access the scavenging
   // emergency spillslot.
   //
@@ -663,10 +676,6 @@ void AArch64FrameLowering::emitCalleeSavedGPRLocations(
     MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI) const {
   MachineFunction &MF = *MBB.getParent();
   MachineFrameInfo &MFI = MF.getFrameInfo();
-  AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();
-  SMEAttrs Attrs = AFI->getSMEFnAttrs();
-  bool LocallyStreaming =
-      Attrs.hasStreamingBody() && !Attrs.hasStreamingInterface();
 
   const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
   if (CSI.empty())
@@ -680,14 +689,6 @@ void AArch64FrameLowering::emitCalleeSavedGPRLocations(
 
     assert(!Info.isSpilledToReg() && "Spilling to registers not implemented");
     int64_t Offset = MFI.getObjectOffset(FrameIdx) - getOffsetOfLocalArea();
-
-    // The location of VG will be emitted before each streaming-mode change in
-    // the function. Only locally-streaming functions require emitting the
-    // non-streaming VG location here.
-    if ((LocallyStreaming && FrameIdx == AFI->getStreamingVGIdx()) ||
-        (!LocallyStreaming && Info.getReg() == AArch64::VG))
-      continue;
-
     CFIBuilder.buildOffset(Info.getReg(), Offset);
   }
 }
@@ -707,8 +708,16 @@ void AArch64FrameLowering::emitCalleeSavedSVELocations(
   AArch64FunctionInfo &AFI = *MF.getInfo<AArch64FunctionInfo>();
   CFIInstBuilder CFIBuilder(MBB, MBBI, MachineInstr::FrameSetup);
 
+  std::optional<int64_t> IncomingVGOffsetFromDefCFA;
+  if (requiresSaveVG(MF)) {
+    auto IncomingVG = *find_if(
+        reverse(CSI), [](auto &Info) { return Info.getReg() == AArch64::VG; });
+    IncomingVGOffsetFromDefCFA =
+        MFI.getObjectOffset(IncomingVG.getFrameIdx()) - getOffsetOfLocalArea();
+  }
+
   for (const auto &Info : CSI) {
-    if (!(MFI.getStackID(Info.getFrameIdx()) == TargetStackID::ScalableVector))
+    if (MFI.getStackID(Info.getFrameIdx()) != TargetStackID::ScalableVector)
       continue;
 
     // Not all unwinders may know about SVE registers, so assume the lowest
@@ -722,7 +731,8 @@ void AArch64FrameLowering::emitCalleeSavedSVELocations(
         StackOffset::getScalable(MFI.getObjectOffset(Info.getFrameIdx())) -
         StackOffset::getFixed(AFI.getCalleeSavedStackSize(MFI));
 
-    CFIBuilder.insertCFIInst(createCFAOffset(TRI, Reg, Offset));
+    CFIBuilder.insertCFIInst(
+        createCFAOffset(TRI, Reg, Offset, IncomingVGOffsetFromDefCFA));
   }
 }
 
@@ -1465,10 +1475,10 @@ bool requiresGetVGCall(MachineFunction &MF) {
 
 static bool requiresSaveVG(const MachineFunction &MF) {
   const AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();
+  if (!AFI->needsDwarfUnwindInfo(MF) || !AFI->hasStreamingModeChanges())
+    return false;
   // For Darwin platforms we don't save VG for non-SVE functions, even if SME
   // is enabled with streaming mode changes.
-  if (!AFI->hasStreamingModeChanges())
-    return false;
   auto &ST = MF.getSubtarget<AArch64Subtarget>();
   if (ST.isTargetDarwin())
     return ST.hasSVE();
@@ -1477,8 +1487,7 @@ static bool requiresSaveVG(const MachineFunction &MF) {
 
 bool isVGInstruction(MachineBasicBlock::iterator MBBI) {
   unsigned Opc = MBBI->getOpcode();
-  if (Opc == AArch64::CNTD_XPiI || Opc == AArch64::RDSVLI_XI ||
-      Opc == AArch64::UBFMXri)
+  if (Opc == AArch64::CNTD_XPiI)
     return true;
 
   if (requiresGetVGCall(*MBBI->getMF())) {
@@ -1507,9 +1516,8 @@ static MachineBasicBlock::iterator convertCalleeSaveRestoreToSPPrePostIncDec(
   unsigned NewOpc;
 
   // If the function contains streaming mode changes, we expect instructions
-  // to calculate the value of VG before spilling. For locally-streaming
-  // functions, we need to do this for both the streaming and non-streaming
-  // vector length. Move past these instructions if necessary.
+  // to calculate the value of VG before spilling. Move past these instructions
+  // if necessary.
   MachineFunction &MF = *MBB.getParent();
   if (requiresSaveVG(MF))
     while (isVGInstruction(MBBI))
@@ -3469,7 +3477,6 @@ bool AArch64FrameLowering::spillCalleeSavedRegisters(
     ArrayRef<CalleeSavedInfo> CSI, const TargetRegisterInfo *TRI) const {
   MachineFunction &MF = *MBB.getParent();
   const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();
-  AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();
   bool NeedsWinCFI = needsWinCFI(MF);
   DebugLoc DL;
   SmallVector<RegPairInfo, 8> RegPairs;
@@ -3538,40 +3545,31 @@ bool AArch64FrameLowering::spillCalleeSavedRegisters(
     }
 
     unsigned X0Scratch = AArch64::NoRegister;
+    auto RestoreX0 = make_scope_exit([&] {
+      if (X0Scratch != AArch64::NoRegister)
+        BuildMI(MBB, MI, DL, TII.get(AArch64::ORRXrr), AArch64::X0)
+            .addReg(AArch64::XZR)
+            .addReg(X0Scratch, RegState::Undef)
+            .addReg(X0Scratch, RegState::Implicit)
+            .setMIFlag(MachineInstr::FrameSetup);
+    });
+
     if (Reg1 == AArch64::VG) {
       // Find an available register to store value of VG to.
       Reg1 = findScratchNonCalleeSaveRegister(&MBB, true);
       assert(Reg1 != AArch64::NoRegister);
-      SMEAttrs Attrs = AFI->getSMEFnAttrs();
-
-      if (Attrs.hasStreamingBody() && !Attrs.hasStreamingInterface() &&
-          AFI->getStreamingVGIdx() == std::numeric_limits<int>::max()) {
-        // For locally-streaming functions, we need to store both the streaming
-        // & non-streaming VG. Spill the streaming value first.
-        BuildMI(MBB, MI, DL, TII.get(AArch64::RDSVLI_XI), Reg1)
-            .addImm(1)
-            .setMIFlag(MachineInstr::FrameSetup);
-        BuildMI(MBB, MI, DL, TII.get(AArch64::UBFMXri), Reg1)
-            .addReg(Reg1)
-            .addImm(3)
-            .addImm(63)
-            .setMIFlag(MachineInstr::FrameSetup);
-
-        AFI->setStreamingVGIdx(RPI.FrameIdx);
-      } else if (MF.getSubtarget<AArch64Subtarget>().hasSVE()) {
+      if (MF.getSubtarget<AArch64Subtarget>().hasSVE()) {
         BuildMI(MBB, MI, DL, TII.get(AArch64::CNTD_XPiI), Reg1)
             .addImm(31)
             .addImm(1)
             .setMIFlag(MachineInstr::FrameSetup);
-        AFI->setVGIdx(RPI.FrameIdx);
       } else {
         const AArch64Subtarget &STI = MF.getSubtarget<AArch64Subtarget>();
-        if (llvm::any_of(
-                MBB.liveins(),
-                [&STI](const MachineBasicBlock::RegisterMaskPair &LiveIn) {
-                  return STI.getRegisterInfo()->isSuperOrSubRegisterEq(
-                      AArch64::X0, LiveIn.PhysReg);
-                }))
+        if (any_of(MBB.liveins(),
+                   [&STI](const MachineBasicBlock::RegisterMaskPair &LiveIn) {
+                     return STI.getRegisterInfo()->isSuperOrSubRegisterEq(
+                         AArch64::X0, LiveIn.PhysReg);
+                   }))
           X0Scratch = Reg1;
 
         if (X0Scratch != AArch64::NoRegister)
@@ -3590,7 +3588,6 @@ bool AArch64FrameLowering::spillCalleeSavedRegisters(
             .addReg(AArch64::X0, RegState::ImplicitDefine)
             .setMIFlag(MachineInstr::FrameSetup);
         Reg1 = AArch64::X0;
-        AFI->setVGIdx(RPI.FrameIdx);
       }
     }
 
@@ -3685,13 +3682,6 @@ bool AArch64FrameLowering::spillCalleeSavedRegisters(
       if (RPI.isPaired())
         MFI.setStackID(FrameIdxReg2, TargetStackID::ScalableVector);
     }
-
-    if (X0Scratch != AArch64::NoRegister)
-      BuildMI(MBB, MI, DL, TII.get(AArch64::ORRXrr), AArch64::X0)
-          .addReg(AArch64::XZR)
-          .addReg(X0Scratch, RegState::Undef)
-          .addReg(X0Scratch, RegState::Implicit)
-          .setMIFlag(MachineInstr::FrameSetup);
   }
   return true;
 }
@@ -4070,15 +4060,8 @@ void AArch64FrameLowering::determineCalleeSaves(MachineFunction &MF,
 
   // Increase the callee-saved stack size if the function has streaming mode
   // changes, as we will need to spill the value of the VG register.
-  // For locally streaming functions, we spill both the streaming and
-  // non-streaming VG value.
-  SMEAttrs Attrs = AFI->getSMEFnAttrs();
-  if (requiresSaveVG(MF)) {
-    if (Attrs.hasStreamingBody() && !Attrs.hasStreamingInterface())
-      CSStackSize += 16;
-    else
-      CSStackSize += 8;
-  }
+  if (requiresSaveVG(MF))
+    CSStackSize += 8;
 
   // Determine if a Hazard slot should be used, and increase the CSStackSize by
   // StackHazardSize if so.
@@ -4229,29 +4212,19 @@ bool AArch64FrameLowering::assignCalleeSavedSpillSlots(
 
   // Insert VG into the list of CSRs, immediately before LR if saved.
   if (requiresSaveVG(MF)) {
-    std::vector<CalleeSavedInfo> VGSaves;
-    SMEAttrs Attrs = AFI->getSMEFnAttrs();
-
-    auto VGInfo = CalleeSavedInfo(AArch64::VG);
+    CalleeSavedInfo VGInfo(AArch64::VG);
     VGInfo.setRestored(false);
-    VGSaves.push_back(VGInfo);
-
-    // Add VG again if the function is locally-streaming, as we will spill two
-    // values.
-    if (Attrs.hasStreamingBody() && !Attrs.hasStreamingInterface())
-      VGSaves.push_back(VGInfo);
-
-    bool InsertBeforeLR = false;
 
+    bool InsertedBeforeLR = false;
     for (unsigned I = 0; I < CSI.size(); I++)
       if (CSI[I].getReg() == AArch64::LR) {
-        InsertBeforeLR = true;
-        CSI.insert(CSI.begin() + I, VGSaves.begin(), VGSaves.end());
+        InsertedBeforeLR = true;
+        CSI.insert(CSI.begin() + I, VGInfo);
         break;
       }
 
-    if (!InsertBeforeLR)
-      llvm::append_range(CSI, VGSaves);
+    if (!InsertedBeforeLR)
+      CSI.push_back(VGInfo);
   }
 
   Register LastReg = 0;
@@ -5254,46 +5227,11 @@ MachineBasicBlock::iterator tryMergeAdjacentSTG(MachineBasicBlock::iterator II,
 }
 } // namespace
 
-static void emitVGSaveRestore(MachineBasicBlock::iterator II,
-                              const AArch64FrameLowering *TFI) {
-  MachineInstr &MI = *II;
-  MachineBasicBlock *MBB = MI.getParent();
-  MachineFunction *MF = MBB->getParent();
-
-  if (MI.getOpcode() != AArch64::VGSavePseudo &&
-      MI.getOpcode() != AArch64::VGRestorePseudo)
-    return;
-
-  auto *AFI = MF->getInfo<AArch64FunctionInfo>();
-  SMEAttrs FuncAttrs = AFI->getSMEFnAttrs();
-  bool LocallyStreaming =
-      FuncAttrs.hasStreamingBody() && !FuncAttrs.hasStreamingInterface();
-
-  int64_t VGFrameIdx =
-      LocallyStreaming ? AFI->getStreamingVGIdx() : AFI->getVGIdx();
-  assert(VGFrameIdx != std::numeric_limits<int>::max() &&
-         "Expected FrameIdx for VG");
-
-  CFIInstBuilder CFIBuilder(*MBB, II, MachineInstr::NoFlags);
-  if (MI.getOpcode() == AArch64::VGSavePseudo) {
-    const MachineFrameInfo &MFI = MF->getFrameInfo();
-    int64_t Offset =
-        MFI.getObjectOffset(VGFrameIdx) - TFI->getOffsetOfLocalArea();
-    CFIBuilder.buildOffset(AArch64::VG, Offset);
-  } else {
-    CFIBuilder.buildRestore(AArch64::VG);
-  }
-
-  MI.eraseFromParent();
-}
-
 void AArch64FrameLowering::processFunctionBeforeFrameIndicesReplaced(
     MachineFunction &MF, RegScavenger *RS = nullptr) const {
   for (auto &BB : MF)
     for (MachineBasicBlock::iterator II = BB.begin(); II != BB.end();) {
-      if (requiresSaveVG(MF))
-        emitVGSaveRestore(II++, this);
-      else if (StackTaggingMergeSetTag)
+      if (StackTaggingMergeSetTag)
         II = tryMergeAdjacentSTG(II, this, RS);
     }
 
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 018c16d61b12d..bf85e887df907 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -9441,12 +9441,6 @@ AArch64TargetLowering::LowerCall(CallLoweringInfo &CLI,
 
   SDValue InGlue;
   if (RequiresSMChange) {
-    if (!Subtarget->isTargetDarwin() || Subtarget->hasSVE()) {
-      Chain = DAG.getNode(AArch64ISD::VG_SAVE, DL,
-                          DAG.getVTList(MVT::Other, MVT::Glue), Chain);
-      InGlue = Chain.getValue(1);
-    }
-
     SDValue NewChain = changeStreamingMode(
         DAG, DL, CallAttrs.callee().hasStreamingInterface(), Chain, InGlue,
         getSMToggleCondition(CallAttrs), PStateSM);
@@ -9637,13 +9631,6 @@ AArch64TargetLowering::LowerCall(CallLoweringInfo &CLI,
     Result = changeStreamingMode(
         DAG, DL, !CallAttrs.callee().hasStreamingInterface(), Result, InGlue,
         getSMToggleCondition(CallAttrs), PStateSM);
-
-    if (!Subtarget->isTargetDarwin() || Subtarget->hasSVE()) {
-      InGlue = Result.getValue(1);
-      Result =
-          DAG.getNode(AArch64ISD::VG_RESTORE, DL,
-                      DAG.getVTList(MVT::Other, MVT::Glue), {Result, InGlue});
-    }
   }
 
   if (CallAttrs.requiresEnablingZAAfterCall())
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
index 98ebd512b0b75..d602ccf145b3b 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
@@ -5888,6 +5888,18 @@ static void appendReadRegExpr(SmallVectorImpl<char> &Expr, unsigned RegNum) {
   Expr.push_back(0);
 }
 
+// Convenience function to create a DWARF expression for loading a register from
+// a CFA offset.
+static void appendLoadRegExpr(SmallVectorImpl<char> &Expr,
+                              int64_t OffsetFromDefCFA) {
+  // This assumes the top of the DWARF stack contains the CFA.
+  Expr.push_back(dwarf::DW_OP_dup);
+  // Add the offset to the register.
+  appendConstantExpr(Expr, OffsetFromDefCFA, dwarf::DW_OP_plus);
+  // Dereference the address (loads a 64 bit value)..
+  Expr.push_back(dwarf::DW_OP_deref);
+}
+
 // Convenience function to create a comment for
 //  (+/-) NumBytes (* RegScale)?
 static void appendOffsetComment(int NumBytes, llvm::raw_string_ostream &Comment,
@@ -5956,9 +5968,10 @@ MCCFIInstruction llvm::createDefCFA(const TargetRegisterInfo &TRI,
   return MCCFIInstruction::cfiDefCfa(nullptr, DwarfReg, (int)Offset.getFixed());
 }
 
-MCCFIInstruction llvm::createCFAOffset(const TargetRegisterInfo &TRI,
-                                       unsigned Reg,
-                                       const StackOffset &OffsetFromDefCFA) {
+MCCFIInstruction
+llvm::createCFAOffset(const TargetRegisterInfo &TRI, unsigned Reg,
+                      const StackOffset &OffsetFromDefCFA,
+                      std::optional<int64_t> IncomingVGOffsetFromDefCFA) {
   int64_t NumBytes, NumVGScaledBytes;
   AArch64InstrInfo::decomposeStackOffsetForDwarfOffsets(
       OffsetFromDefCFA, NumBytes, NumVGScaledBytes);
@@ -5977,9 +5990,15 @@ MCCFIInstruction llvm::createCFAOffset(const TargetRegisterInfo &TRI,
   assert(NumVGScaledBytes && "Expected scalable offset");
   SmallString<64> OffsetExpr;
   // + VG * NumVGScaledBytes
-  appendOffsetComment(NumVGScaledBytes, Comment, "* VG");
-  appendReadRegExpr(OffsetExpr, TRI.getDwarfRegNum(AArch64::VG, true));
+  StringRef VGRegScale("* VG");
+  if (IncomingVGOffsetFromDefCFA) {
+    appendLoadRegExpr(OffsetExpr, *IncomingVGOffsetFromDefCFA);
+    VGRegScale = "* IncomingVG";
+  } else {
+    appendReadRegExpr(OffsetExpr, TRI.getDwarfRegNum(AArch64::VG, true));
+  }
   appendConstantExpr(OffsetExpr, NumVGScaledBytes, dwarf::DW_OP_mul);
+  appendOffsetComment(NumVGScaledBytes, Comment, VGRegScale);
   OffsetExpr.push_back(dwarf::DW_OP_plus);
   if (NumBytes) {
     // + NumBytes
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.h b/llvm/lib/Target/AArch64/AArch64InstrInfo.h
index 7c255da333e4b..6abd18fd2e52f 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.h
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.h
@@ -642,8 +642,10 @@ bool isNZCVTouchedInInstructionRange(const MachineInstr &DefMI,
 MCCFIInstruction createDefCFA(const TargetRegisterInfo &TRI, unsigned FrameReg,
                               unsigned Reg, const StackOffset &Offset,
                               bool LastAdjustmentWasScalable = true);
-MCCFIInstruction createCFAOffset(const TargetRegisterInfo &MRI, unsigned Reg,
-                                 const StackOffset &OffsetFromDefCFA);
+MCCFIInstruction
+createCFAOffset(const TargetRegisterInfo &MRI, unsigned Reg,
+                const StackOffset &OffsetFromDefCFA,
+                std::optional<int64_t> IncomingVGOffsetFromDefCFA);
 
 /// emitFrameOffset - Emit instructions as needed to set DestReg to SrcReg
 /// plus Offset.  This is intended to be used from within the prolog/epilog
diff --git a/llvm/lib/Target/AArch64/AArch64MachineFunctionInfo.h b/llvm/lib/Target/AArch64/AArch64MachineFunctionInfo.h
index 800787cc0b4f5..0f04b740dbe22 100644
--- a/llvm/lib/Target/AArch64/AArch64MachineFunctionInfo.h
+++ b/llvm/lib/Target/AArch64/AArch64MachineFunctionInfo.h
@@ -243,10 +243,6 @@ class AArch64FunctionInfo final : public MachineFunctionInfo {
   // The PTRUE is used for the LD/ST of ZReg pairs in save and restore.
   unsigned PredicateRegForFillSpill = 0;
 
-  // The stack slots where VG values are stored to.
-  int64_t VGIdx = std::numeric_limits<int>::max();
-  int64_t StreamingVGIdx = std::numeric_limits<int>::max();
-
   // Holds the SME function attributes (streaming mode, ZA/ZT0 state).
   SMEAttrs SMEFnAttrs;
 
@@ -274,12 +270,6 @@ class AArch64FunctionInfo final : public MachineFunctionInfo {
   Register getPStateSMReg() const { return PStateSMReg; };
   void setPStateSMReg(Register Reg) { PStateSMReg = Reg; };
 
-  int64_t getVGIdx() const { return VGIdx; };
-  void setVGIdx(unsigned Idx) { VGIdx = Idx; };
-
-  int64_t getStreamingVGIdx() const { return StreamingVGIdx; };
-  void setStreamingVGIdx(unsigned FrameIdx) { StreamingVGIdx = FrameIdx; };
-
   bool isSVECC() const { return IsSVECC; };
   void setIsSVECC(bool s) { IsSVECC = s; };
 
diff --git a/llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td b/llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td
index db27ca978980f..86bdc8f6e2966 100644
--- a/llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td
@@ -39,12 +39,6 @@ def AArch64_save_zt : SDNode<"AArch64ISD::SAVE_ZT", SDTypeProfile<0, 2,
 def AArch64CoalescerBarrier
     : SDNode<"AArch64ISD::COALESCER_BARRIER", SDTypeProfile<1, 1, []>, [SDNPOptInGlue, SDNPOutGlue]>;
 
-def AArch64VGSave : SDNode<"AArch64ISD::VG_SAVE", SDTypeProfi...
[truncated]

llvm/lib/Target/AArch64/AArch64FrameLowering.cpp

It is possible for the SMEPeepeholeOpt pass to optimize out the streaming-mode changes, which for a locally_streaming function essentially removes the streaming body. This means the ADDSVL is unnecessary (and can cause complications for unwindinfo, see llvm#152283 (comment))

efriedma-quic · 2025-08-07T00:14:01Z

Never emit .cfi_restore vg (this is not meaningful for unwinding)

Generally, you want to cfi_restore all registers before you free the stack, so a debugger doesn't reference a clobbered stack slot. I guess cfi_restore isn't really important for exception unwinding.

MacDue · 2025-08-07T07:25:56Z

Never emit .cfi_restore vg (this is not meaningful for unwinding)

Generally, you want to cfi_restore all registers before you free the stack, so a debugger doesn't reference a clobbered stack slot. I guess cfi_restore isn't really important for exception unwinding.

I guess emitting a cfi_restore in the epilogue (alongside other GPRs) would be fine, as no matter the function type VG should match the entry value at that point.

github-actions · 2025-08-12T09:16:09Z

✅ With the latest revision this PR passed the C/C++ code formatter.

sdesmalen-arm

just left a few nits, but overall it looks good to me.

llvm/lib/Target/AArch64/AArch64FrameLowering.cpp

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp

The vector granule (AArch64 DWARF register 46) is a pseudo-register that contains the available size in bits of SVE vector registers in the current call frame, divided by 64. The vector granule can be used in DWARF expressions to describe SVE/SME stack frame layouts (e.g., the location of SVE callee-saves). The first time VG is evaluated (if not already set), it is initialized to the result of evaluating a "CNTD" instruction (this assumes SVE is available). To support SME, the value of VG can change per call frame; this is currently handled like any other callee-save and is intended to support the unwind information implemented in llvm#152283. This limits how VG is used in the CFI information of functions with "streaming-mode changes" (mode changes that change the SVE vector length), to make the unwinder's job easier.

This patch reworks how VG is handled around streaming mode changes. Previously, for functions with streaming mode changes, we would: - Save the incoming VG in the prologue - Emit `.cfi_offset vg, <offset>` and `.cfi_restore vg` around streaming mode changes Additionally, for locally streaming functions, we would: - Also save the streaming VG in the prologue - Emit `.cfi_offset vg, <incoming VG offset>` in the prologue - Emit `.cfi_offset vg, <streaming VG offset>` and `.cfi_restore vg` around streaming mode changes In both cases, this ends up doing more than necessary and would be hard for an unwinder to parse, as using `.cfi_offset` in this way does not follow the semantics of the underlying DWARF CFI opcodes. So the new scheme in this patch is to: In functions with streaming mode changes (inc locally streaming) - Save the incoming VG in the prologue - Emit `.cfi_offset vg, <offset>` in the prologue (not at streaming mode changes) - Never emit `.cfi_restore vg` (this is not meaningful for unwinding) - Explicitly reference the incoming VG expressions for SVE callee-saves in functions with streaming mode changes - Ensure the CFA is not described in terms of VG in functions with streaming mode changes A more in-depth discussion of this scheme is available in: https://gist.github.com/MacDue/b7a5c45d131d2440858165bfc903e97b But the TLDR is that following this scheme, SME unwinding can be implemented with minimal changes to existing unwinders. All unwinders need to do is initialize VG to `CNTD` at the start of unwinding, then everything else is handled by standard opcodes (which don't need changes to handle VG).

At this point for all function types VG should match the entry VG, and the saved VG has been deallocated on the stack, so may not contain a valid value.

#153565) The vector granule (AArch64 DWARF register 46) is a pseudo-register that contains the available size in bits of SVE vector registers in the current call frame, divided by 64. The vector granule can be used in DWARF expressions to describe SVE/SME stack frame layouts (e.g., the location of SVE callee-saves). The first time VG is evaluated (if not already set), it is initialized to the result of evaluating a "CNTD" instruction (this assumes SVE is available). To support SME, the value of VG can change per call frame; this is currently handled like any other callee-save and is intended to support the unwind information implemented in #152283. This limits how VG is used in the CFI information of functions with "streaming-mode changes" (mode changes that change the SVE vector length), to make the unwinder's job easier.

MacDue marked this pull request as ready for review August 6, 2025 09:51

MacDue requested a review from sdesmalen-arm August 6, 2025 09:51

llvmbot added the backend:AArch64 label Aug 6, 2025

MacDue requested review from efriedma-quic and momchil-velikov August 6, 2025 09:51

MacDue requested a review from gbossu August 6, 2025 09:52

sdesmalen-arm reviewed Aug 6, 2025

View reviewed changes

llvm/lib/Target/AArch64/AArch64FrameLowering.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/AArch64/AArch64FrameLowering.cpp Show resolved Hide resolved

MacDue mentioned this pull request Aug 6, 2025

[AArch64][SME] Only use ADDSVL for SVE stack if there's a streaming body #152323

Closed

MacDue force-pushed the vg_rework_cfi branch 2 times, most recently from e704fa0 to 3d8bfd9 Compare August 12, 2025 09:12

MacDue force-pushed the vg_rework_cfi branch from 3d8bfd9 to b501932 Compare August 12, 2025 09:18

sdesmalen-arm reviewed Aug 12, 2025

View reviewed changes

sdesmalen-arm approved these changes Aug 12, 2025

View reviewed changes

MacDue force-pushed the vg_rework_cfi branch from 9c9d0cf to d4707de Compare August 13, 2025 08:53

MacDue mentioned this pull request Aug 14, 2025

[libunwind] Add support for the AArch64 "Vector Granule" (VG) register #153565

Merged

MacDue force-pushed the vg_rework_cfi branch from d4707de to b546b0f Compare August 18, 2025 14:02

MacDue added 4 commits August 20, 2025 10:46

Mark VG as restored in function epilogue

2f7ba65

At this point for all function types VG should match the entry VG, and the saved VG has been deallocated on the stack, so may not contain a valid value.

Remove check

bdaf58f

Fixups

6983c12

MacDue force-pushed the vg_rework_cfi branch from b546b0f to 6983c12 Compare August 20, 2025 10:51

MacDue enabled auto-merge (squash) August 20, 2025 10:56

MacDue disabled auto-merge August 20, 2025 13:05

MacDue merged commit 478b4b0 into llvm:main Aug 20, 2025
5 checks passed

MacDue deleted the vg_rework_cfi branch August 20, 2025 13:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AArch64][SME] Rework VG CFI information for streaming-mode changes #152283

[AArch64][SME] Rework VG CFI information for streaming-mode changes #152283

Uh oh!

MacDue commented Aug 6, 2025 •

edited

Loading

Uh oh!

MacDue commented Aug 6, 2025

Uh oh!

llvmbot commented Aug 6, 2025

Uh oh!

Uh oh!

Uh oh!

efriedma-quic commented Aug 7, 2025

Uh oh!

MacDue commented Aug 7, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Aug 12, 2025 •

edited

Loading

Uh oh!

sdesmalen-arm left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[AArch64][SME] Rework VG CFI information for streaming-mode changes #152283

[AArch64][SME] Rework VG CFI information for streaming-mode changes #152283

Uh oh!

Conversation

MacDue commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MacDue commented Aug 6, 2025

Uh oh!

llvmbot commented Aug 6, 2025

Uh oh!

Uh oh!

Uh oh!

efriedma-quic commented Aug 7, 2025

Uh oh!

MacDue commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sdesmalen-arm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MacDue commented Aug 6, 2025 •

edited

Loading

MacDue commented Aug 7, 2025 •

edited

Loading

github-actions bot commented Aug 12, 2025 •

edited

Loading