-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[AArch64][SME] Rework VG CFI information for streaming-mode changes #152283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
See sme-streaming-mode-changes-unwindinfo.ll for a new test that shows the unwind information in a variety of cases. |
@llvm/pr-subscribers-backend-aarch64 Author: Benjamin Maxwell (MacDue) ChangesThis patch reworks how VG is handled around streaming mode changes. Previously, for functions with streaming mode changes, we would:
Additionally, for locally streaming functions, we would:
In both cases, this ends up doing more than necessary and would be hard for an unwinder to parse, as using So the new scheme in this patch is to: In functions with streaming mode changes (inc locally streaming)
A more in-depth discussion of this scheme is available in: https://gist.github.com/MacDue/b7a5c45d131d2440858165bfc903e97b But the TLDR is that following this scheme, SME unwinding can be implemented with minimal changes to existing unwinders. All unwinders need to do is initialize VG to Patch is 382.23 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/152283.diff 28 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
index 885f2a94f85f5..de9d865465901 100644
--- a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
@@ -338,9 +338,11 @@ static bool requiresSaveVG(const MachineFunction &MF);
// Conservatively, returns true if the function is likely to have an SVE vectors
// on the stack. This function is safe to be called before callee-saves or
// object offsets have been determined.
-static bool isLikelyToHaveSVEStack(MachineFunction &MF) {
+static bool isLikelyToHaveSVEStack(const MachineFunction &MF) {
auto *AFI = MF.getInfo<AArch64FunctionInfo>();
- if (AFI->isSVECC())
+ if (MF.getFunction().getCallingConv() ==
+ CallingConv::AArch64_SVE_VectorCall ||
+ AFI->isSVECC())
return true;
if (AFI->hasCalculatedStackSizeSVE())
@@ -532,6 +534,7 @@ bool AArch64FrameLowering::canUseRedZone(const MachineFunction &MF) const {
bool AArch64FrameLowering::hasFPImpl(const MachineFunction &MF) const {
const MachineFrameInfo &MFI = MF.getFrameInfo();
const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
+ const AArch64FunctionInfo &AFI = *MF.getInfo<AArch64FunctionInfo>();
// Win64 EH requires a frame pointer if funclets are present, as the locals
// are accessed off the frame pointer in both the parent function and the
@@ -545,6 +548,16 @@ bool AArch64FrameLowering::hasFPImpl(const MachineFunction &MF) const {
MFI.hasStackMap() || MFI.hasPatchPoint() ||
RegInfo->hasStackRealignment(MF))
return true;
+ // If we have streaming mode changes and SVE registers on the stack we need a
+ // FP. This is as the stack size may depend on the VG at entry to the
+ // function, which is saved before the SVE area (so unrecoverable without a
+ // FP). Similar for locally streaming functions, but it is because we use
+ // ADDSVL to setup the SVE stack (which might not match VG, even without
+ // streaming-mode changes).
+ if (AFI.needsDwarfUnwindInfo(MF) &&
+ ((requiresSaveVG(MF) || AFI.getSMEFnAttrs().hasStreamingBody()) &&
+ (!AFI.hasCalculatedStackSizeSVE() || AFI.getStackSizeSVE() > 0)))
+ return true;
// With large callframes around we may need to use FP to access the scavenging
// emergency spillslot.
//
@@ -663,10 +676,6 @@ void AArch64FrameLowering::emitCalleeSavedGPRLocations(
MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI) const {
MachineFunction &MF = *MBB.getParent();
MachineFrameInfo &MFI = MF.getFrameInfo();
- AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();
- SMEAttrs Attrs = AFI->getSMEFnAttrs();
- bool LocallyStreaming =
- Attrs.hasStreamingBody() && !Attrs.hasStreamingInterface();
const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
if (CSI.empty())
@@ -680,14 +689,6 @@ void AArch64FrameLowering::emitCalleeSavedGPRLocations(
assert(!Info.isSpilledToReg() && "Spilling to registers not implemented");
int64_t Offset = MFI.getObjectOffset(FrameIdx) - getOffsetOfLocalArea();
-
- // The location of VG will be emitted before each streaming-mode change in
- // the function. Only locally-streaming functions require emitting the
- // non-streaming VG location here.
- if ((LocallyStreaming && FrameIdx == AFI->getStreamingVGIdx()) ||
- (!LocallyStreaming && Info.getReg() == AArch64::VG))
- continue;
-
CFIBuilder.buildOffset(Info.getReg(), Offset);
}
}
@@ -707,8 +708,16 @@ void AArch64FrameLowering::emitCalleeSavedSVELocations(
AArch64FunctionInfo &AFI = *MF.getInfo<AArch64FunctionInfo>();
CFIInstBuilder CFIBuilder(MBB, MBBI, MachineInstr::FrameSetup);
+ std::optional<int64_t> IncomingVGOffsetFromDefCFA;
+ if (requiresSaveVG(MF)) {
+ auto IncomingVG = *find_if(
+ reverse(CSI), [](auto &Info) { return Info.getReg() == AArch64::VG; });
+ IncomingVGOffsetFromDefCFA =
+ MFI.getObjectOffset(IncomingVG.getFrameIdx()) - getOffsetOfLocalArea();
+ }
+
for (const auto &Info : CSI) {
- if (!(MFI.getStackID(Info.getFrameIdx()) == TargetStackID::ScalableVector))
+ if (MFI.getStackID(Info.getFrameIdx()) != TargetStackID::ScalableVector)
continue;
// Not all unwinders may know about SVE registers, so assume the lowest
@@ -722,7 +731,8 @@ void AArch64FrameLowering::emitCalleeSavedSVELocations(
StackOffset::getScalable(MFI.getObjectOffset(Info.getFrameIdx())) -
StackOffset::getFixed(AFI.getCalleeSavedStackSize(MFI));
- CFIBuilder.insertCFIInst(createCFAOffset(TRI, Reg, Offset));
+ CFIBuilder.insertCFIInst(
+ createCFAOffset(TRI, Reg, Offset, IncomingVGOffsetFromDefCFA));
}
}
@@ -1465,10 +1475,10 @@ bool requiresGetVGCall(MachineFunction &MF) {
static bool requiresSaveVG(const MachineFunction &MF) {
const AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();
+ if (!AFI->needsDwarfUnwindInfo(MF) || !AFI->hasStreamingModeChanges())
+ return false;
// For Darwin platforms we don't save VG for non-SVE functions, even if SME
// is enabled with streaming mode changes.
- if (!AFI->hasStreamingModeChanges())
- return false;
auto &ST = MF.getSubtarget<AArch64Subtarget>();
if (ST.isTargetDarwin())
return ST.hasSVE();
@@ -1477,8 +1487,7 @@ static bool requiresSaveVG(const MachineFunction &MF) {
bool isVGInstruction(MachineBasicBlock::iterator MBBI) {
unsigned Opc = MBBI->getOpcode();
- if (Opc == AArch64::CNTD_XPiI || Opc == AArch64::RDSVLI_XI ||
- Opc == AArch64::UBFMXri)
+ if (Opc == AArch64::CNTD_XPiI)
return true;
if (requiresGetVGCall(*MBBI->getMF())) {
@@ -1507,9 +1516,8 @@ static MachineBasicBlock::iterator convertCalleeSaveRestoreToSPPrePostIncDec(
unsigned NewOpc;
// If the function contains streaming mode changes, we expect instructions
- // to calculate the value of VG before spilling. For locally-streaming
- // functions, we need to do this for both the streaming and non-streaming
- // vector length. Move past these instructions if necessary.
+ // to calculate the value of VG before spilling. Move past these instructions
+ // if necessary.
MachineFunction &MF = *MBB.getParent();
if (requiresSaveVG(MF))
while (isVGInstruction(MBBI))
@@ -3469,7 +3477,6 @@ bool AArch64FrameLowering::spillCalleeSavedRegisters(
ArrayRef<CalleeSavedInfo> CSI, const TargetRegisterInfo *TRI) const {
MachineFunction &MF = *MBB.getParent();
const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();
- AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();
bool NeedsWinCFI = needsWinCFI(MF);
DebugLoc DL;
SmallVector<RegPairInfo, 8> RegPairs;
@@ -3538,40 +3545,31 @@ bool AArch64FrameLowering::spillCalleeSavedRegisters(
}
unsigned X0Scratch = AArch64::NoRegister;
+ auto RestoreX0 = make_scope_exit([&] {
+ if (X0Scratch != AArch64::NoRegister)
+ BuildMI(MBB, MI, DL, TII.get(AArch64::ORRXrr), AArch64::X0)
+ .addReg(AArch64::XZR)
+ .addReg(X0Scratch, RegState::Undef)
+ .addReg(X0Scratch, RegState::Implicit)
+ .setMIFlag(MachineInstr::FrameSetup);
+ });
+
if (Reg1 == AArch64::VG) {
// Find an available register to store value of VG to.
Reg1 = findScratchNonCalleeSaveRegister(&MBB, true);
assert(Reg1 != AArch64::NoRegister);
- SMEAttrs Attrs = AFI->getSMEFnAttrs();
-
- if (Attrs.hasStreamingBody() && !Attrs.hasStreamingInterface() &&
- AFI->getStreamingVGIdx() == std::numeric_limits<int>::max()) {
- // For locally-streaming functions, we need to store both the streaming
- // & non-streaming VG. Spill the streaming value first.
- BuildMI(MBB, MI, DL, TII.get(AArch64::RDSVLI_XI), Reg1)
- .addImm(1)
- .setMIFlag(MachineInstr::FrameSetup);
- BuildMI(MBB, MI, DL, TII.get(AArch64::UBFMXri), Reg1)
- .addReg(Reg1)
- .addImm(3)
- .addImm(63)
- .setMIFlag(MachineInstr::FrameSetup);
-
- AFI->setStreamingVGIdx(RPI.FrameIdx);
- } else if (MF.getSubtarget<AArch64Subtarget>().hasSVE()) {
+ if (MF.getSubtarget<AArch64Subtarget>().hasSVE()) {
BuildMI(MBB, MI, DL, TII.get(AArch64::CNTD_XPiI), Reg1)
.addImm(31)
.addImm(1)
.setMIFlag(MachineInstr::FrameSetup);
- AFI->setVGIdx(RPI.FrameIdx);
} else {
const AArch64Subtarget &STI = MF.getSubtarget<AArch64Subtarget>();
- if (llvm::any_of(
- MBB.liveins(),
- [&STI](const MachineBasicBlock::RegisterMaskPair &LiveIn) {
- return STI.getRegisterInfo()->isSuperOrSubRegisterEq(
- AArch64::X0, LiveIn.PhysReg);
- }))
+ if (any_of(MBB.liveins(),
+ [&STI](const MachineBasicBlock::RegisterMaskPair &LiveIn) {
+ return STI.getRegisterInfo()->isSuperOrSubRegisterEq(
+ AArch64::X0, LiveIn.PhysReg);
+ }))
X0Scratch = Reg1;
if (X0Scratch != AArch64::NoRegister)
@@ -3590,7 +3588,6 @@ bool AArch64FrameLowering::spillCalleeSavedRegisters(
.addReg(AArch64::X0, RegState::ImplicitDefine)
.setMIFlag(MachineInstr::FrameSetup);
Reg1 = AArch64::X0;
- AFI->setVGIdx(RPI.FrameIdx);
}
}
@@ -3685,13 +3682,6 @@ bool AArch64FrameLowering::spillCalleeSavedRegisters(
if (RPI.isPaired())
MFI.setStackID(FrameIdxReg2, TargetStackID::ScalableVector);
}
-
- if (X0Scratch != AArch64::NoRegister)
- BuildMI(MBB, MI, DL, TII.get(AArch64::ORRXrr), AArch64::X0)
- .addReg(AArch64::XZR)
- .addReg(X0Scratch, RegState::Undef)
- .addReg(X0Scratch, RegState::Implicit)
- .setMIFlag(MachineInstr::FrameSetup);
}
return true;
}
@@ -4070,15 +4060,8 @@ void AArch64FrameLowering::determineCalleeSaves(MachineFunction &MF,
// Increase the callee-saved stack size if the function has streaming mode
// changes, as we will need to spill the value of the VG register.
- // For locally streaming functions, we spill both the streaming and
- // non-streaming VG value.
- SMEAttrs Attrs = AFI->getSMEFnAttrs();
- if (requiresSaveVG(MF)) {
- if (Attrs.hasStreamingBody() && !Attrs.hasStreamingInterface())
- CSStackSize += 16;
- else
- CSStackSize += 8;
- }
+ if (requiresSaveVG(MF))
+ CSStackSize += 8;
// Determine if a Hazard slot should be used, and increase the CSStackSize by
// StackHazardSize if so.
@@ -4229,29 +4212,19 @@ bool AArch64FrameLowering::assignCalleeSavedSpillSlots(
// Insert VG into the list of CSRs, immediately before LR if saved.
if (requiresSaveVG(MF)) {
- std::vector<CalleeSavedInfo> VGSaves;
- SMEAttrs Attrs = AFI->getSMEFnAttrs();
-
- auto VGInfo = CalleeSavedInfo(AArch64::VG);
+ CalleeSavedInfo VGInfo(AArch64::VG);
VGInfo.setRestored(false);
- VGSaves.push_back(VGInfo);
-
- // Add VG again if the function is locally-streaming, as we will spill two
- // values.
- if (Attrs.hasStreamingBody() && !Attrs.hasStreamingInterface())
- VGSaves.push_back(VGInfo);
-
- bool InsertBeforeLR = false;
+ bool InsertedBeforeLR = false;
for (unsigned I = 0; I < CSI.size(); I++)
if (CSI[I].getReg() == AArch64::LR) {
- InsertBeforeLR = true;
- CSI.insert(CSI.begin() + I, VGSaves.begin(), VGSaves.end());
+ InsertedBeforeLR = true;
+ CSI.insert(CSI.begin() + I, VGInfo);
break;
}
- if (!InsertBeforeLR)
- llvm::append_range(CSI, VGSaves);
+ if (!InsertedBeforeLR)
+ CSI.push_back(VGInfo);
}
Register LastReg = 0;
@@ -5254,46 +5227,11 @@ MachineBasicBlock::iterator tryMergeAdjacentSTG(MachineBasicBlock::iterator II,
}
} // namespace
-static void emitVGSaveRestore(MachineBasicBlock::iterator II,
- const AArch64FrameLowering *TFI) {
- MachineInstr &MI = *II;
- MachineBasicBlock *MBB = MI.getParent();
- MachineFunction *MF = MBB->getParent();
-
- if (MI.getOpcode() != AArch64::VGSavePseudo &&
- MI.getOpcode() != AArch64::VGRestorePseudo)
- return;
-
- auto *AFI = MF->getInfo<AArch64FunctionInfo>();
- SMEAttrs FuncAttrs = AFI->getSMEFnAttrs();
- bool LocallyStreaming =
- FuncAttrs.hasStreamingBody() && !FuncAttrs.hasStreamingInterface();
-
- int64_t VGFrameIdx =
- LocallyStreaming ? AFI->getStreamingVGIdx() : AFI->getVGIdx();
- assert(VGFrameIdx != std::numeric_limits<int>::max() &&
- "Expected FrameIdx for VG");
-
- CFIInstBuilder CFIBuilder(*MBB, II, MachineInstr::NoFlags);
- if (MI.getOpcode() == AArch64::VGSavePseudo) {
- const MachineFrameInfo &MFI = MF->getFrameInfo();
- int64_t Offset =
- MFI.getObjectOffset(VGFrameIdx) - TFI->getOffsetOfLocalArea();
- CFIBuilder.buildOffset(AArch64::VG, Offset);
- } else {
- CFIBuilder.buildRestore(AArch64::VG);
- }
-
- MI.eraseFromParent();
-}
-
void AArch64FrameLowering::processFunctionBeforeFrameIndicesReplaced(
MachineFunction &MF, RegScavenger *RS = nullptr) const {
for (auto &BB : MF)
for (MachineBasicBlock::iterator II = BB.begin(); II != BB.end();) {
- if (requiresSaveVG(MF))
- emitVGSaveRestore(II++, this);
- else if (StackTaggingMergeSetTag)
+ if (StackTaggingMergeSetTag)
II = tryMergeAdjacentSTG(II, this, RS);
}
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 018c16d61b12d..bf85e887df907 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -9441,12 +9441,6 @@ AArch64TargetLowering::LowerCall(CallLoweringInfo &CLI,
SDValue InGlue;
if (RequiresSMChange) {
- if (!Subtarget->isTargetDarwin() || Subtarget->hasSVE()) {
- Chain = DAG.getNode(AArch64ISD::VG_SAVE, DL,
- DAG.getVTList(MVT::Other, MVT::Glue), Chain);
- InGlue = Chain.getValue(1);
- }
-
SDValue NewChain = changeStreamingMode(
DAG, DL, CallAttrs.callee().hasStreamingInterface(), Chain, InGlue,
getSMToggleCondition(CallAttrs), PStateSM);
@@ -9637,13 +9631,6 @@ AArch64TargetLowering::LowerCall(CallLoweringInfo &CLI,
Result = changeStreamingMode(
DAG, DL, !CallAttrs.callee().hasStreamingInterface(), Result, InGlue,
getSMToggleCondition(CallAttrs), PStateSM);
-
- if (!Subtarget->isTargetDarwin() || Subtarget->hasSVE()) {
- InGlue = Result.getValue(1);
- Result =
- DAG.getNode(AArch64ISD::VG_RESTORE, DL,
- DAG.getVTList(MVT::Other, MVT::Glue), {Result, InGlue});
- }
}
if (CallAttrs.requiresEnablingZAAfterCall())
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
index 98ebd512b0b75..d602ccf145b3b 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
@@ -5888,6 +5888,18 @@ static void appendReadRegExpr(SmallVectorImpl<char> &Expr, unsigned RegNum) {
Expr.push_back(0);
}
+// Convenience function to create a DWARF expression for loading a register from
+// a CFA offset.
+static void appendLoadRegExpr(SmallVectorImpl<char> &Expr,
+ int64_t OffsetFromDefCFA) {
+ // This assumes the top of the DWARF stack contains the CFA.
+ Expr.push_back(dwarf::DW_OP_dup);
+ // Add the offset to the register.
+ appendConstantExpr(Expr, OffsetFromDefCFA, dwarf::DW_OP_plus);
+ // Dereference the address (loads a 64 bit value)..
+ Expr.push_back(dwarf::DW_OP_deref);
+}
+
// Convenience function to create a comment for
// (+/-) NumBytes (* RegScale)?
static void appendOffsetComment(int NumBytes, llvm::raw_string_ostream &Comment,
@@ -5956,9 +5968,10 @@ MCCFIInstruction llvm::createDefCFA(const TargetRegisterInfo &TRI,
return MCCFIInstruction::cfiDefCfa(nullptr, DwarfReg, (int)Offset.getFixed());
}
-MCCFIInstruction llvm::createCFAOffset(const TargetRegisterInfo &TRI,
- unsigned Reg,
- const StackOffset &OffsetFromDefCFA) {
+MCCFIInstruction
+llvm::createCFAOffset(const TargetRegisterInfo &TRI, unsigned Reg,
+ const StackOffset &OffsetFromDefCFA,
+ std::optional<int64_t> IncomingVGOffsetFromDefCFA) {
int64_t NumBytes, NumVGScaledBytes;
AArch64InstrInfo::decomposeStackOffsetForDwarfOffsets(
OffsetFromDefCFA, NumBytes, NumVGScaledBytes);
@@ -5977,9 +5990,15 @@ MCCFIInstruction llvm::createCFAOffset(const TargetRegisterInfo &TRI,
assert(NumVGScaledBytes && "Expected scalable offset");
SmallString<64> OffsetExpr;
// + VG * NumVGScaledBytes
- appendOffsetComment(NumVGScaledBytes, Comment, "* VG");
- appendReadRegExpr(OffsetExpr, TRI.getDwarfRegNum(AArch64::VG, true));
+ StringRef VGRegScale("* VG");
+ if (IncomingVGOffsetFromDefCFA) {
+ appendLoadRegExpr(OffsetExpr, *IncomingVGOffsetFromDefCFA);
+ VGRegScale = "* IncomingVG";
+ } else {
+ appendReadRegExpr(OffsetExpr, TRI.getDwarfRegNum(AArch64::VG, true));
+ }
appendConstantExpr(OffsetExpr, NumVGScaledBytes, dwarf::DW_OP_mul);
+ appendOffsetComment(NumVGScaledBytes, Comment, VGRegScale);
OffsetExpr.push_back(dwarf::DW_OP_plus);
if (NumBytes) {
// + NumBytes
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.h b/llvm/lib/Target/AArch64/AArch64InstrInfo.h
index 7c255da333e4b..6abd18fd2e52f 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.h
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.h
@@ -642,8 +642,10 @@ bool isNZCVTouchedInInstructionRange(const MachineInstr &DefMI,
MCCFIInstruction createDefCFA(const TargetRegisterInfo &TRI, unsigned FrameReg,
unsigned Reg, const StackOffset &Offset,
bool LastAdjustmentWasScalable = true);
-MCCFIInstruction createCFAOffset(const TargetRegisterInfo &MRI, unsigned Reg,
- const StackOffset &OffsetFromDefCFA);
+MCCFIInstruction
+createCFAOffset(const TargetRegisterInfo &MRI, unsigned Reg,
+ const StackOffset &OffsetFromDefCFA,
+ std::optional<int64_t> IncomingVGOffsetFromDefCFA);
/// emitFrameOffset - Emit instructions as needed to set DestReg to SrcReg
/// plus Offset. This is intended to be used from within the prolog/epilog
diff --git a/llvm/lib/Target/AArch64/AArch64MachineFunctionInfo.h b/llvm/lib/Target/AArch64/AArch64MachineFunctionInfo.h
index 800787cc0b4f5..0f04b740dbe22 100644
--- a/llvm/lib/Target/AArch64/AArch64MachineFunctionInfo.h
+++ b/llvm/lib/Target/AArch64/AArch64MachineFunctionInfo.h
@@ -243,10 +243,6 @@ class AArch64FunctionInfo final : public MachineFunctionInfo {
// The PTRUE is used for the LD/ST of ZReg pairs in save and restore.
unsigned PredicateRegForFillSpill = 0;
- // The stack slots where VG values are stored to.
- int64_t VGIdx = std::numeric_limits<int>::max();
- int64_t StreamingVGIdx = std::numeric_limits<int>::max();
-
// Holds the SME function attributes (streaming mode, ZA/ZT0 state).
SMEAttrs SMEFnAttrs;
@@ -274,12 +270,6 @@ class AArch64FunctionInfo final : public MachineFunctionInfo {
Register getPStateSMReg() const { return PStateSMReg; };
void setPStateSMReg(Register Reg) { PStateSMReg = Reg; };
- int64_t getVGIdx() const { return VGIdx; };
- void setVGIdx(unsigned Idx) { VGIdx = Idx; };
-
- int64_t getStreamingVGIdx() const { return StreamingVGIdx; };
- void setStreamingVGIdx(unsigned FrameIdx) { StreamingVGIdx = FrameIdx; };
-
bool isSVECC() const { return IsSVECC; };
void setIsSVECC(bool s) { IsSVECC = s; };
diff --git a/llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td b/llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td
index db27ca978980f..86bdc8f6e2966 100644
--- a/llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td
@@ -39,12 +39,6 @@ def AArch64_save_zt : SDNode<"AArch64ISD::SAVE_ZT", SDTypeProfile<0, 2,
def AArch64CoalescerBarrier
: SDNode<"AArch64ISD::COALESCER_BARRIER", SDTypeProfile<1, 1, []>, [SDNPOptInGlue, SDNPOutGlue]>;
-def AArch64VGSave : SDNode<"AArch64ISD::VG_SAVE", SDTypeProfi...
[truncated]
|
It is possible for the SMEPeepeholeOpt pass to optimize out the streaming-mode changes, which for a locally_streaming function essentially removes the streaming body. This means the ADDSVL is unnecessary (and can cause complications for unwindinfo, see llvm#152283 (comment))
Generally, you want to cfi_restore all registers before you free the stack, so a debugger doesn't reference a clobbered stack slot. I guess cfi_restore isn't really important for exception unwinding. |
I guess emitting a cfi_restore in the epilogue (alongside other GPRs) would be fine, as no matter the function type VG should match the entry value at that point. |
e704fa0
to
3d8bfd9
Compare
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just left a few nits, but overall it looks good to me.
The vector granule (AArch64 DWARF register 46) is a pseudo-register that contains the available size in bits of SVE vector registers in the current call frame, divided by 64. The vector granule can be used in DWARF expressions to describe SVE/SME stack frame layouts (e.g., the location of SVE callee-saves). The first time VG is evaluated (if not already set), it is initialized to the result of evaluating a "CNTD" instruction (this assumes SVE is available). To support SME, the value of VG can change per call frame; this is currently handled like any other callee-save and is intended to support the unwind information implemented in llvm#152283. This limits how VG is used in the CFI information of functions with "streaming-mode changes" (mode changes that change the SVE vector length), to make the unwinder's job easier.
This patch reworks how VG is handled around streaming mode changes. Previously, for functions with streaming mode changes, we would: - Save the incoming VG in the prologue - Emit `.cfi_offset vg, <offset>` and `.cfi_restore vg` around streaming mode changes Additionally, for locally streaming functions, we would: - Also save the streaming VG in the prologue - Emit `.cfi_offset vg, <incoming VG offset>` in the prologue - Emit `.cfi_offset vg, <streaming VG offset>` and `.cfi_restore vg` around streaming mode changes In both cases, this ends up doing more than necessary and would be hard for an unwinder to parse, as using `.cfi_offset` in this way does not follow the semantics of the underlying DWARF CFI opcodes. So the new scheme in this patch is to: In functions with streaming mode changes (inc locally streaming) - Save the incoming VG in the prologue - Emit `.cfi_offset vg, <offset>` in the prologue (not at streaming mode changes) - Never emit `.cfi_restore vg` (this is not meaningful for unwinding) - Explicitly reference the incoming VG expressions for SVE callee-saves in functions with streaming mode changes - Ensure the CFA is not described in terms of VG in functions with streaming mode changes A more in-depth discussion of this scheme is available in: https://gist.github.com/MacDue/b7a5c45d131d2440858165bfc903e97b But the TLDR is that following this scheme, SME unwinding can be implemented with minimal changes to existing unwinders. All unwinders need to do is initialize VG to `CNTD` at the start of unwinding, then everything else is handled by standard opcodes (which don't need changes to handle VG).
At this point for all function types VG should match the entry VG, and the saved VG has been deallocated on the stack, so may not contain a valid value.
#153565) The vector granule (AArch64 DWARF register 46) is a pseudo-register that contains the available size in bits of SVE vector registers in the current call frame, divided by 64. The vector granule can be used in DWARF expressions to describe SVE/SME stack frame layouts (e.g., the location of SVE callee-saves). The first time VG is evaluated (if not already set), it is initialized to the result of evaluating a "CNTD" instruction (this assumes SVE is available). To support SME, the value of VG can change per call frame; this is currently handled like any other callee-save and is intended to support the unwind information implemented in #152283. This limits how VG is used in the CFI information of functions with "streaming-mode changes" (mode changes that change the SVE vector length), to make the unwinder's job easier.
This patch reworks how VG is handled around streaming mode changes.
Previously, for functions with streaming mode changes, we would:
.cfi_offset vg, <offset>
and.cfi_restore vg
around streaming mode changesAdditionally, for locally streaming functions, we would:
.cfi_offset vg, <incoming VG offset>
in the prologue.cfi_offset vg, <streaming VG offset>
and.cfi_restore vg
around streaming mode changesIn both cases, this ends up doing more than necessary and would be hard for an unwinder to parse, as using
.cfi_offset
in this way does not follow the semantics of the underlying DWARF CFI opcodes.So the new scheme in this patch is to:
In functions with streaming mode changes (inc locally streaming)
.cfi_offset vg, <offset>
in the prologue (not at streaming mode changes).cfi_restore vg
after the saved VG has been deallocatedA more in-depth discussion of this scheme is available in: https://gist.github.com/MacDue/b7a5c45d131d2440858165bfc903e97b
But the TLDR is that following this scheme, SME unwinding can be implemented with minimal changes to existing unwinders. All unwinders need to do is initialize VG to
CNTD
at the start of unwinding, then everything else is handled by standard opcodes (which don't need changes to handle VG).