Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ARM] R11 not pushed adjacent to link register with PAC-M and AAPCS frame chain fix #82801

Merged
merged 4 commits into from
Mar 4, 2024

Conversation

jwestwood921
Copy link
Contributor

When code for M class architecture was compiled with AAPCS and PAC enabled, the frame pointer, r11, was not pushed to the stack adjacent to the link register. Due to PAC being enabled, r12 was placed between r11 and lr. This patch fixes this by adding an extra case to the already existing code that splits the GPR push in two when R11 is the frame pointer and certain paremeters are met. The differential revision for this previous change can be found here: https://reviews.llvm.org/D125649. This now ensures that r11 and lr are pushed in a separate push instruction to the other GPRs when PAC and AAPCS are enabled, meaning the frame pointer and link register are now pushed onto the stack adjacent to each other.

When code for M class architecture was compiled with AAPCS and PAC enabled, the frame pointer, r11, was not pushed to the stack adjacent
to the link register. Due to PAC being enabled, r12 was placed between r11 and lr. This patch fixes this by adding an extra case to the
already existing code that splits the GPR push in two when R11 is the frame pointer and certain paremeters are met. The differential
revision for this previous change can be found here: https://reviews.llvm.org/D125649. This now ensures that r11 and lr are pushed in a
separate push instruction to the other GPRs when PAC and AAPCS are enabled, meaning the frame pointer and link register are now pushed
onto the stack adjacent to each other.
@llvmbot
Copy link
Collaborator

llvmbot commented Feb 23, 2024

@llvm/pr-subscribers-backend-arm

Author: James Westwood (jwestwood921)

Changes

When code for M class architecture was compiled with AAPCS and PAC enabled, the frame pointer, r11, was not pushed to the stack adjacent to the link register. Due to PAC being enabled, r12 was placed between r11 and lr. This patch fixes this by adding an extra case to the already existing code that splits the GPR push in two when R11 is the frame pointer and certain paremeters are met. The differential revision for this previous change can be found here: https://reviews.llvm.org/D125649. This now ensures that r11 and lr are pushed in a separate push instruction to the other GPRs when PAC and AAPCS are enabled, meaning the frame pointer and link register are now pushed onto the stack adjacent to each other.


Patch is 21.29 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/82801.diff

6 Files Affected:

  • (modified) llvm/lib/Target/ARM/ARMBaseRegisterInfo.cpp (+2-2)
  • (modified) llvm/lib/Target/ARM/ARMFrameLowering.cpp (+124-62)
  • (modified) llvm/lib/Target/ARM/ARMSubtarget.cpp (+37-1)
  • (modified) llvm/lib/Target/ARM/ARMSubtarget.h (+16-6)
  • (modified) llvm/lib/Target/ARM/Thumb1FrameLowering.cpp (+2-2)
  • (added) llvm/test/CodeGen/Thumb2/pacbti-m-frame-chain.ll (+82)
diff --git a/llvm/lib/Target/ARM/ARMBaseRegisterInfo.cpp b/llvm/lib/Target/ARM/ARMBaseRegisterInfo.cpp
index 9adf758b46c481..42e2b89260e16a 100644
--- a/llvm/lib/Target/ARM/ARMBaseRegisterInfo.cpp
+++ b/llvm/lib/Target/ARM/ARMBaseRegisterInfo.cpp
@@ -62,14 +62,14 @@ ARMBaseRegisterInfo::ARMBaseRegisterInfo()
 const MCPhysReg*
 ARMBaseRegisterInfo::getCalleeSavedRegs(const MachineFunction *MF) const {
   const ARMSubtarget &STI = MF->getSubtarget<ARMSubtarget>();
-  bool UseSplitPush = STI.splitFramePushPop(*MF);
+  bool UseSplitPush = STI.splitFramePushPopR7(*MF);
   const Function &F = MF->getFunction();
 
   if (F.getCallingConv() == CallingConv::GHC) {
     // GHC set of callee saved regs is empty as all those regs are
     // used for passing STG regs around
     return CSR_NoRegs_SaveList;
-  } else if (STI.splitFramePointerPush(*MF)) {
+  } else if (STI.framePointerRequiredForSEHUnwind(*MF)) {
     return CSR_Win_SplitFP_SaveList;
   } else if (F.getCallingConv() == CallingConv::CFGuard_Check) {
     return CSR_Win_AAPCS_CFGuard_Check_SaveList;
diff --git a/llvm/lib/Target/ARM/ARMFrameLowering.cpp b/llvm/lib/Target/ARM/ARMFrameLowering.cpp
index eeb7f64aa5810e..f288e8be7d3816 100644
--- a/llvm/lib/Target/ARM/ARMFrameLowering.cpp
+++ b/llvm/lib/Target/ARM/ARMFrameLowering.cpp
@@ -718,9 +718,13 @@ static int getMaxFPOffset(const ARMSubtarget &STI, const ARMFunctionInfo &AFI,
   // This is a conservative estimation: Assume the frame pointer being r7 and
   // pc("r15") up to r8 getting spilled before (= 8 registers).
   int MaxRegBytes = 8 * 4;
-  if (STI.splitFramePointerPush(MF)) {
+  if (STI.r11AndLRNotAdjacent(MF) &&
+      STI.getRegisterInfo()->getFrameRegister(MF) == ARM::R11)
     // Here, r11 can be stored below all of r4-r15 (3 registers more than
-    // above), plus d8-d15.
+    // above).
+    MaxRegBytes = 11 * 4;
+  if (STI.framePointerRequiredForSEHUnwind(MF)) {
+    // Here, r11 can be stored below all of r4-r15 plus d8-d15.
     MaxRegBytes = 11 * 4 + 8 * 8;
   }
   int FPCXTSaveSize =
@@ -788,7 +792,7 @@ void ARMFrameLowering::emitPrologue(MachineFunction &MF,
   }
 
   // Determine spill area sizes.
-  if (STI.splitFramePointerPush(MF)) {
+  if (STI.splitFramePushPopR11(MF)) {
     for (const CalleeSavedInfo &I : CSI) {
       Register Reg = I.getReg();
       int FI = I.getFrameIdx();
@@ -834,7 +838,7 @@ void ARMFrameLowering::emitPrologue(MachineFunction &MF,
       case ARM::R10:
       case ARM::R11:
       case ARM::R12:
-        if (STI.splitFramePushPop(MF)) {
+        if (STI.splitFramePushPopR7(MF)) {
           GPRCS2Size += 4;
           break;
         }
@@ -897,13 +901,13 @@ void ARMFrameLowering::emitPrologue(MachineFunction &MF,
   unsigned GPRCS2Offset = GPRCS1Offset - GPRCS2Size;
   Align DPRAlign = DPRCSSize ? std::min(Align(8), Alignment) : Align(4);
   unsigned DPRGapSize = GPRCS1Size + FPCXTSaveSize + ArgRegsSaveSize;
-  if (!STI.splitFramePointerPush(MF)) {
+  if (!STI.framePointerRequiredForSEHUnwind(MF)) {
     DPRGapSize += GPRCS2Size;
   }
   DPRGapSize %= DPRAlign.value();
 
   unsigned DPRCSOffset;
-  if (STI.splitFramePointerPush(MF)) {
+  if (STI.framePointerRequiredForSEHUnwind(MF)) {
     DPRCSOffset = GPRCS1Offset - DPRGapSize - DPRCSSize;
     GPRCS2Offset = DPRCSOffset - GPRCS2Size;
   } else {
@@ -922,8 +926,8 @@ void ARMFrameLowering::emitPrologue(MachineFunction &MF,
   AFI->setGPRCalleeSavedArea2Offset(GPRCS2Offset);
   AFI->setDPRCalleeSavedAreaOffset(DPRCSOffset);
 
-  // Move past area 2.
-  if (GPRCS2Size > 0 && !STI.splitFramePointerPush(MF)) {
+  // Move past area 2, unless following Win_AAPCS_CFGuard calling convention.
+  if (GPRCS2Size > 0 && !STI.framePointerRequiredForSEHUnwind(MF)) {
     GPRCS2Push = LastPush = MBBI++;
     DefCFAOffsetCandidates.addInst(LastPush, GPRCS2Size);
   }
@@ -963,13 +967,14 @@ void ARMFrameLowering::emitPrologue(MachineFunction &MF,
   } else
     NumBytes = DPRCSOffset;
 
-  if (GPRCS2Size > 0 && STI.splitFramePointerPush(MF)) {
+  // Move past area 2 if following Win_AAPCS_CFGuard calling convention.
+  if (GPRCS2Size > 0 && STI.framePointerRequiredForSEHUnwind(MF)) {
     GPRCS2Push = LastPush = MBBI++;
     DefCFAOffsetCandidates.addInst(LastPush, GPRCS2Size);
   }
 
   bool NeedsWinCFIStackAlloc = NeedsWinCFI;
-  if (STI.splitFramePointerPush(MF) && HasFP)
+  if (STI.framePointerRequiredForSEHUnwind(MF) && HasFP)
     NeedsWinCFIStackAlloc = false;
 
   if (STI.isTargetWindows() && WindowsRequiresStackProbe(MF, NumBytes)) {
@@ -1074,7 +1079,7 @@ void ARMFrameLowering::emitPrologue(MachineFunction &MF,
     AfterPush = std::next(GPRCS1Push);
     unsigned PushSize = sizeOfSPAdjustment(*GPRCS1Push);
     int FPOffset = PushSize + FramePtrOffsetInPush;
-    if (STI.splitFramePointerPush(MF)) {
+    if (STI.splitFramePushPopR11(MF)) {
       AfterPush = std::next(GPRCS2Push);
       emitRegPlusImmediate(!AFI->isThumbFunction(), MBB, AfterPush, dl, TII,
                            FramePtr, ARM::SP, 0, MachineInstr::FrameSetup);
@@ -1106,7 +1111,7 @@ void ARMFrameLowering::emitPrologue(MachineFunction &MF,
   // instructions below don't need to be replayed to unwind the stack.
   if (NeedsWinCFI && MBBI != MBB.begin()) {
     MachineBasicBlock::iterator End = MBBI;
-    if (HasFP && STI.splitFramePointerPush(MF))
+    if (HasFP && STI.framePointerRequiredForSEHUnwind(MF))
       End = AfterPush;
     insertSEHRange(MBB, {}, End, TII, MachineInstr::FrameSetup);
     BuildMI(MBB, End, dl, TII.get(ARM::SEH_PrologEnd))
@@ -1118,61 +1123,114 @@ void ARMFrameLowering::emitPrologue(MachineFunction &MF,
   // the necessary DWARF cf instructions to describe the situation. Start by
   // recording where each register ended up:
   if (GPRCS1Size > 0 && !NeedsWinCFI) {
-    MachineBasicBlock::iterator Pos = std::next(GPRCS1Push);
-    int CFIIndex;
-    for (const auto &Entry : CSI) {
-      Register Reg = Entry.getReg();
-      int FI = Entry.getFrameIdx();
-      switch (Reg) {
-      case ARM::R8:
-      case ARM::R9:
-      case ARM::R10:
-      case ARM::R11:
-      case ARM::R12:
-        if (STI.splitFramePushPop(MF))
+    if (STI.r11AndLRNotAdjacent(MF) &&
+        RegInfo->getFrameRegister(MF) == ARM::R11) {
+      MachineBasicBlock::iterator Pos = std::next(GPRCS1Push);
+      int CFIIndex;
+      for (const auto &Entry : CSI) {
+        Register Reg = Entry.getReg();
+        int FI = Entry.getFrameIdx();
+        switch (Reg) {
+        case ARM::R0:
+        case ARM::R1:
+        case ARM::R2:
+        case ARM::R3:
+        case ARM::R4:
+        case ARM::R5:
+        case ARM::R6:
+        case ARM::R7:
+        case ARM::R8:
+        case ARM::R9:
+        case ARM::R10:
+        case ARM::R12:
+          CFIIndex = MF.addFrameInst(MCCFIInstruction::createOffset(
+              nullptr, MRI->getDwarfRegNum(Reg, true),
+              MFI.getObjectOffset(FI)));
+          BuildMI(MBB, Pos, dl, TII.get(TargetOpcode::CFI_INSTRUCTION))
+              .addCFIIndex(CFIIndex)
+              .setMIFlags(MachineInstr::FrameSetup);
           break;
-        [[fallthrough]];
-      case ARM::R0:
-      case ARM::R1:
-      case ARM::R2:
-      case ARM::R3:
-      case ARM::R4:
-      case ARM::R5:
-      case ARM::R6:
-      case ARM::R7:
-      case ARM::LR:
-        CFIIndex = MF.addFrameInst(MCCFIInstruction::createOffset(
-            nullptr, MRI->getDwarfRegNum(Reg, true), MFI.getObjectOffset(FI)));
-        BuildMI(MBB, Pos, dl, TII.get(TargetOpcode::CFI_INSTRUCTION))
-            .addCFIIndex(CFIIndex)
-            .setMIFlags(MachineInstr::FrameSetup);
-        break;
+        }
+      }
+    } else {
+      MachineBasicBlock::iterator Pos = std::next(GPRCS1Push);
+      int CFIIndex;
+      for (const auto &Entry : CSI) {
+        Register Reg = Entry.getReg();
+        int FI = Entry.getFrameIdx();
+        switch (Reg) {
+        case ARM::R8:
+        case ARM::R9:
+        case ARM::R10:
+        case ARM::R11:
+        case ARM::R12:
+          if (STI.splitFramePushPopR7(MF))
+            break;
+          [[fallthrough]];
+        case ARM::R0:
+        case ARM::R1:
+        case ARM::R2:
+        case ARM::R3:
+        case ARM::R4:
+        case ARM::R5:
+        case ARM::R6:
+        case ARM::R7:
+        case ARM::LR:
+          CFIIndex = MF.addFrameInst(MCCFIInstruction::createOffset(
+              nullptr, MRI->getDwarfRegNum(Reg, true),
+              MFI.getObjectOffset(FI)));
+          BuildMI(MBB, Pos, dl, TII.get(TargetOpcode::CFI_INSTRUCTION))
+              .addCFIIndex(CFIIndex)
+              .setMIFlags(MachineInstr::FrameSetup);
+          break;
+        }
       }
     }
   }
 
   if (GPRCS2Size > 0 && !NeedsWinCFI) {
     MachineBasicBlock::iterator Pos = std::next(GPRCS2Push);
-    for (const auto &Entry : CSI) {
-      Register Reg = Entry.getReg();
-      int FI = Entry.getFrameIdx();
-      switch (Reg) {
-      case ARM::R8:
-      case ARM::R9:
-      case ARM::R10:
-      case ARM::R11:
-      case ARM::R12:
-        if (STI.splitFramePushPop(MF)) {
-          unsigned DwarfReg = MRI->getDwarfRegNum(
-              Reg == ARM::R12 ? ARM::RA_AUTH_CODE : Reg, true);
+    if (STI.r11AndLRNotAdjacent(MF) &&
+        RegInfo->getFrameRegister(MF) == ARM::R11) {
+      for (const auto &Entry : CSI) {
+        Register Reg = Entry.getReg();
+        int FI = Entry.getFrameIdx();
+        switch (Reg) {
+        case ARM::R11:
+        case ARM::LR:
+          unsigned DwarfReg = MRI->getDwarfRegNum(Reg, true);
           unsigned Offset = MFI.getObjectOffset(FI);
           unsigned CFIIndex = MF.addFrameInst(
               MCCFIInstruction::createOffset(nullptr, DwarfReg, Offset));
           BuildMI(MBB, Pos, dl, TII.get(TargetOpcode::CFI_INSTRUCTION))
               .addCFIIndex(CFIIndex)
               .setMIFlags(MachineInstr::FrameSetup);
+          break;
+        }
+      }
+    } else {
+      MachineBasicBlock::iterator Pos = std::next(GPRCS2Push);
+      for (const auto &Entry : CSI) {
+        Register Reg = Entry.getReg();
+        int FI = Entry.getFrameIdx();
+        switch (Reg) {
+        case ARM::R8:
+        case ARM::R9:
+        case ARM::R10:
+        case ARM::R11:
+        case ARM::R12:
+          if (STI.splitFramePushPopR7(MF)) {
+            unsigned DwarfReg = MRI->getDwarfRegNum(
+                Reg == ARM::R12 ? ARM::RA_AUTH_CODE : Reg, true);
+            unsigned Offset = MFI.getObjectOffset(FI);
+            unsigned CFIIndex = MF.addFrameInst(
+                MCCFIInstruction::createOffset(nullptr, DwarfReg, Offset));
+            BuildMI(MBB, Pos, dl, TII.get(TargetOpcode::CFI_INSTRUCTION))
+                .addCFIIndex(CFIIndex)
+                .setMIFlags(MachineInstr::FrameSetup);
+          }
+          break;
         }
-        break;
       }
     }
   }
@@ -1382,7 +1440,8 @@ void ARMFrameLowering::emitEpilogue(MachineFunction &MF,
                    MachineInstr::FrameDestroy);
 
     // Increment past our save areas.
-    if (AFI->getGPRCalleeSavedArea2Size() && STI.splitFramePointerPush(MF))
+    if (AFI->getGPRCalleeSavedArea2Size() &&
+        STI.framePointerRequiredForSEHUnwind(MF))
       MBBI++;
 
     if (MBBI != MBB.end() && AFI->getDPRCalleeSavedAreaSize()) {
@@ -1399,7 +1458,8 @@ void ARMFrameLowering::emitEpilogue(MachineFunction &MF,
                    MachineInstr::FrameDestroy);
     }
 
-    if (AFI->getGPRCalleeSavedArea2Size() && !STI.splitFramePointerPush(MF))
+    if (AFI->getGPRCalleeSavedArea2Size() &&
+        !STI.framePointerRequiredForSEHUnwind(MF))
       MBBI++;
     if (AFI->getGPRCalleeSavedArea1Size()) MBBI++;
 
@@ -1539,7 +1599,8 @@ void ARMFrameLowering::emitPushInst(MachineBasicBlock &MBB,
     unsigned LastReg = 0;
     for (; i != 0; --i) {
       Register Reg = CSI[i-1].getReg();
-      if (!(Func)(Reg, STI.splitFramePushPop(MF))) continue;
+      if (!(Func)(Reg, STI.splitFramePushPopR7(MF)))
+        continue;
 
       // D-registers in the aligned area DPRCS2 are NOT spilled here.
       if (Reg >= ARM::D8 && Reg < ARM::D8 + NumAlignedDPRCS2Regs)
@@ -1632,7 +1693,8 @@ void ARMFrameLowering::emitPopInst(MachineBasicBlock &MBB,
     for (; i != 0; --i) {
       CalleeSavedInfo &Info = CSI[i-1];
       Register Reg = Info.getReg();
-      if (!(Func)(Reg, STI.splitFramePushPop(MF))) continue;
+      if (!(Func)(Reg, STI.splitFramePushPopR7(MF)))
+        continue;
 
       // The aligned reloads from area DPRCS2 are not inserted here.
       if (Reg >= ARM::D8 && Reg < ARM::D8 + NumAlignedDPRCS2Regs)
@@ -1640,7 +1702,7 @@ void ARMFrameLowering::emitPopInst(MachineBasicBlock &MBB,
       if (Reg == ARM::LR && !isTailCall && !isVarArg && !isInterrupt &&
           !isCmseEntry && !isTrap && AFI->getArgumentStackToRestore() == 0 &&
           STI.hasV5TOps() && MBB.succ_empty() && !hasPAC &&
-          !STI.splitFramePointerPush(MF)) {
+          !STI.splitFramePushPopR11(MF)) {
         Reg = ARM::PC;
         // Fold the return instruction into the LDM.
         DeleteRet = true;
@@ -2001,7 +2063,7 @@ bool ARMFrameLowering::spillCalleeSavedRegisters(
         .addImm(-4)
         .add(predOps(ARMCC::AL));
   }
-  if (STI.splitFramePointerPush(MF)) {
+  if (STI.splitFramePushPopR11(MF)) {
     emitPushInst(MBB, MI, CSI, PushOpc, PushOneOpc, false,
                  &isSplitFPArea1Register, 0, MachineInstr::FrameSetup);
     emitPushInst(MBB, MI, CSI, FltOpc, 0, true, &isARMArea3Register,
@@ -2046,7 +2108,7 @@ bool ARMFrameLowering::restoreCalleeSavedRegisters(
   unsigned LdrOpc =
       AFI->isThumbFunction() ? ARM::t2LDR_POST : ARM::LDR_POST_IMM;
   unsigned FltOpc = ARM::VLDMDIA_UPD;
-  if (STI.splitFramePointerPush(MF)) {
+  if (STI.splitFramePushPopR11(MF)) {
     emitPopInst(MBB, MI, CSI, PopOpc, LdrOpc, isVarArg, false,
                 &isSplitFPArea2Register, 0);
     emitPopInst(MBB, MI, CSI, FltOpc, 0, isVarArg, true, &isARMArea3Register,
@@ -2362,7 +2424,7 @@ void ARMFrameLowering::determineCalleeSaves(MachineFunction &MF,
     if (Spilled) {
       NumGPRSpills++;
 
-      if (!STI.splitFramePushPop(MF)) {
+      if (!STI.splitFramePushPopR7(MF)) {
         if (Reg == ARM::LR)
           LRSpilled = true;
         CS1Spilled = true;
@@ -2384,7 +2446,7 @@ void ARMFrameLowering::determineCalleeSaves(MachineFunction &MF,
         break;
       }
     } else {
-      if (!STI.splitFramePushPop(MF)) {
+      if (!STI.splitFramePushPopR7(MF)) {
         UnspilledCS1GPRs.push_back(Reg);
         continue;
       }
diff --git a/llvm/lib/Target/ARM/ARMSubtarget.cpp b/llvm/lib/Target/ARM/ARMSubtarget.cpp
index 922fa93226f298..cc86afb835d0c4 100644
--- a/llvm/lib/Target/ARM/ARMSubtarget.cpp
+++ b/llvm/lib/Target/ARM/ARMSubtarget.cpp
@@ -494,11 +494,47 @@ bool ARMSubtarget::ignoreCSRForAllocationOrder(const MachineFunction &MF,
          ARM::GPRRegClass.contains(PhysReg);
 }
 
-bool ARMSubtarget::splitFramePointerPush(const MachineFunction &MF) const {
+bool ARMSubtarget::r11AndLRNotAdjacent(const MachineFunction &MF) const {
+  const std::vector<CalleeSavedInfo> CSI =
+      MF.getFrameInfo().getCalleeSavedInfo();
+
+  if (CSI.size() > 1 &&
+      MF.getInfo<ARMFunctionInfo>()->shouldSignReturnAddress()) {
+
+    bool r11InCSI = false;
+    bool lrInCSI = false;
+    unsigned long r11Idx = 0;
+    unsigned long lrIdx = 0;
+
+    for (unsigned long i = 0; i < CSI.size(); i++) {
+      if (CSI[i].getReg() == ARM::LR) {
+        lrIdx = i;
+        lrInCSI = true;
+      } else if (CSI[i].getReg() == ARM::R11) {
+        r11Idx = i;
+        r11InCSI = true;
+      }
+    }
+    if (lrIdx + 1 != r11Idx && r11InCSI && lrInCSI)
+      return true;
+  }
+  return false;
+}
+
+bool ARMSubtarget::framePointerRequiredForSEHUnwind(
+    const MachineFunction &MF) const {
   const Function &F = MF.getFunction();
+  const std::vector<CalleeSavedInfo> CSI =
+      MF.getFrameInfo().getCalleeSavedInfo();
+
   if (!MF.getTarget().getMCAsmInfo()->usesWindowsCFI() ||
       !F.needsUnwindTableEntry())
     return false;
   const MachineFrameInfo &MFI = MF.getFrameInfo();
   return MFI.hasVarSizedObjects() || getRegisterInfo()->hasStackRealignment(MF);
 }
+
+bool ARMSubtarget::splitFramePushPopR11(const MachineFunction &MF) const {
+  return (r11AndLRNotAdjacent(MF) && getFramePointerReg() == ARM::R11) ||
+         framePointerRequiredForSEHUnwind(MF);
+}
diff --git a/llvm/lib/Target/ARM/ARMSubtarget.h b/llvm/lib/Target/ARM/ARMSubtarget.h
index 91f3978b041a3a..210d8fdb1440ee 100644
--- a/llvm/lib/Target/ARM/ARMSubtarget.h
+++ b/llvm/lib/Target/ARM/ARMSubtarget.h
@@ -441,18 +441,28 @@ class ARMSubtarget : public ARMGenSubtargetInfo {
   }
 
   /// Returns true if the frame setup is split into two separate pushes (first
-  /// r0-r7,lr then r8-r11), principally so that the frame pointer is adjacent
-  /// to lr. This is always required on Thumb1-only targets, as the push and
-  /// pop instructions can't access the high registers.
-  bool splitFramePushPop(const MachineFunction &MF) const {
-    if (MF.getInfo<ARMFunctionInfo>()->shouldSignReturnAddress())
+  /// r0-r7,lr then r8-r11), principally so that the frame pointer r7 is
+  /// adjacent to lr. This is always required on Thumb1-only targets, as the
+  /// push and pop instructions can't access the high registers.
+  bool splitFramePushPopR7(const MachineFunction &MF) const {
+    if (MF.getInfo<ARMFunctionInfo>()->shouldSignReturnAddress() &&
+        !createAAPCSFrameChain())
       return true;
     return (getFramePointerReg() == ARM::R7 &&
             MF.getTarget().Options.DisableFramePointerElim(MF)) ||
            isThumb1Only();
   }
 
-  bool splitFramePointerPush(const MachineFunction &MF) const;
+  bool framePointerRequiredForSEHUnwind(const MachineFunction &MF) const;
+
+  // Returns true if R11 and lr are not adjacent to each other in the list of
+  // callee saved registers in a frame.
+  bool r11AndLRNotAdjacent(const MachineFunction &MF) const;
+
+  // Returns true if the frame setup is split into two separate pushes (first
+  // r0-r10,r12 then r11,lr), principally so that the frame pointer r11 is
+  // adjacent to lr.
+  bool splitFramePushPopR11(const MachineFunction &MF) const;
 
   bool useStride4VFPs() const;
 
diff --git a/llvm/lib/Target/ARM/Thumb1FrameLowering.cpp b/llvm/lib/Target/ARM/Thumb1FrameLowering.cpp
index 0f4ece64bff532..0ad7b96453d1c5 100644
--- a/llvm/lib/Target/ARM/Thumb1FrameLowering.cpp
+++ b/llvm/lib/Target/ARM/Thumb1FrameLowering.cpp
@@ -222,7 +222,7 @@ void Thumb1FrameLowering::emitPrologue(MachineFunction &MF,
     case ARM::R8:
     case ARM::R9:
     case ARM::R10:
-      if (STI.splitFramePushPop(MF)) {
+      if (STI.splitFramePushPopR7(MF)) {
         GPRCS2Size += 4;
         break;
       }
@@ -366,7 +366,7 @@ void Thumb1FrameLowering::emitPrologue(MachineFunction &MF,
       case ARM::R10:
       case ARM::R11:
       case ARM::R12:
-        if (STI.splitFramePushPop(MF))
+        if (STI.splitFramePushPopR7(MF))
           break;
         [[fallthrough]];
       case ARM::R0:
diff --git a/llvm/test/CodeGen/Thumb2/pacbti-m-frame-chain.ll b/llvm/test/CodeGen/Thumb2/pacbti-m-frame-chain.ll
new file mode 100644
index 00000000000000..77759355e576a2
--- /dev/null
+++ b/llvm/test/CodeGen/Thumb2/pacbti-m-frame-chain.ll
@@ -0,0 +1,82 @@
+; RUN: llc -filetype asm -o - %s --frame-pointer=all -mattr=+aapcs-frame-chain -mattr=+aapcs-frame-chain-leaf -force-dwarf-frame-section | FileCheck %s
+target triple = "thumbv8m.main-none-none-eabi"
+
+; int f() {
+;     return 0;
+; }
+;
+; int x(int, char *);
+; int y(int n) {
+; char a[n];
+; return 1 + x(n, a);
+; }
+
+define hidden i32 @f() local_unnamed_addr {
+entry:
+    ret i32 0;
+}
+
+define hidden i32 @x(i32 noundef %n) local_unnamed_addr {
+entry:
+  %vla = alloca i8, i32 %n, align 1
+  %call = call i32 @y(i32 noundef %n, ptr noundef nonnull %vla)
+  %add = add nsw i32 %call, 1
+  ret i32 %add
+}
+
+declare dso_local i32 @y(i32 noundef, ptr noundef) local_unnamed_addr
+
+; CHECK-LABEL: f:
+; CHECK:       pac     r12, lr, sp
+; CHECK-NEXT:  .save   {ra_auth_code}
+; CHECK-NEXT:  str     r12, [sp, #-4]!
+; CHECK-NEXT:  .cfi_def_cfa_offset 4
+; CHECK-NEXT:  .cfi_offset r12, -8
+; CHECK-NEXT:  .save   {r11, lr}
+; CHECK-NEXT:  push.w  {r11, lr}
+; CHECK-NEXT:  .cfi_offset lr, -4
+; CHECK-NEXT:  .cfi_offset r11, -12
+; CHECK-NEXT:  .setfp  r11, sp
+; CHECK-NEXT:  mov     r11, sp
+; CHECK-NEXT:  .cfi_def_cfa r11, 12
+; CHECK-NEXT:  m...
[truncated]

// Returns true if the frame setup is split into two separate pushes (first
// r0-r10,r12 then r11,lr), principally so that the frame pointer r11 is
// adjacent to lr.
bool splitFramePushPopR11(const MachineFunction &MF) const;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to introduce an enum of some sort here? Having a bunch of separate boolean methods representing different variations is confusing.

As far as I can tell, the following variations exist:

  • The normal push: push r4-r11+lr all at once
  • The Thumb1 split push: push r4-r7+lr, then r8-r11
  • The Thumb1 r11 frame chain split push: push lr, then r11, then r0-r7, then r8-r11.
  • The branch signing unsplit push: push r4-r12+lr all at once.
  • The branch signing split push: push r4-r7+lr, then r8-r12
  • The branch signing r11 frame chain split push: push r4-r10+r12, then r11+lr
  • The Windows SEH split push: push r4-r10, then r11+lr

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that using an enum is a better option. I think having a separate enum value for each push variation may be overkill, as many push variations are implemented in the same way by the frame lowering files. For example, the normal push and branch signing unsplit push are handled the same way, there is just an extra register in the latter variation. It might therefore be a better option to have an enum with values based around which register, if any, causes the register push to be split, instead of a value for each specific push variation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds fine.


// Returns true if R11 and lr are not adjacent to each other in the list of
// callee saved registers in a frame.
bool r11AndLRNotAdjacent(const MachineFunction &MF) const;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method doesn't seem to do what the description says... it unconditionally returns false when signing is disabled, but there are other reasons they might not be adjacent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method has been deleted, but functionality moved to the getPushPopSpiltVariation method. The description for this functionality has been updated to be more accurate.

return true;
return (getFramePointerReg() == ARM::R7 &&
MF.getTarget().Options.DisableFramePointerElim(MF)) ||
isThumb1Only();
}

bool splitFramePointerPush(const MachineFunction &MF) const;
bool framePointerRequiredForSEHUnwind(const MachineFunction &MF) const;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"framePointerRequiredForSEHUnwind" doesn't describe what the actual effect is here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method name has now been deleted.

/// r4-r11+lr (+r12 if necessary) can be pushed in a single instruction.
NoSplit,
/// The registers need to be split into a push of r4-r7+lr and another
/// containing r8-r11 (+r12 if necessary).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth noting the interaction between R7Split and the Thumb1 AAPCS split push (HasFrameRecordArea).

if ((MF.getInfo<ARMFunctionInfo>()->shouldSignReturnAddress() &&
!createAAPCSFrameChain()) ||
((getFramePointerReg() == ARM::R7 &&
MF.getTarget().Options.DisableFramePointerElim(MF)) ||
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
MF.getTarget().Options.DisableFramePointerElim(MF)) ||
MF.getTarget().Options.DisableFramePointerElim(MF))) ||

@@ -222,7 +222,8 @@ void Thumb1FrameLowering::emitPrologue(MachineFunction &MF,
case ARM::R8:
case ARM::R9:
case ARM::R10:
if (STI.splitFramePushPop(MF)) {
if (STI.getPushPopSplitVariation(MF) ==
ARMSubtarget::PushPopSplitVariation::R7Split) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this is Thumb1FrameLowering... isn't this always true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like Thumb1FrameLowering is initialized only when isThumb1Only() is true, so the variation will always be R7Split. Therefore the if statement is unnecessary.

@@ -963,13 +976,18 @@ void ARMFrameLowering::emitPrologue(MachineFunction &MF,
} else
NumBytes = DPRCSOffset;

if (GPRCS2Size > 0 && STI.splitFramePointerPush(MF)) {
// Move past area 2 if following Win_AAPCS_CFGuard calling convention.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment doesn't look right? Functions defined with the CFGuard_Check should be extremely rare, if they show up at all; it's used to model the behavior of runtime routines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's a mistake on my end, thanks for catching it. The comment should refer to the CSR_Win_SplitFP calling convention.

// Returns R11SplitAAPCSBranchSigning if R11 and lr are not adjacent to each
// other in the list of callee saved registers in a frame, and branch
// signing is enabled.
if (CSI.size() > 1 &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we simplify this code? It seems sort of awkward to dig through the callee-saves here. And given the normal way this works, the only register that can show up between R11 and LR (R14), is R12.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it can probably be simplified. All that needs to be checked is if R11 is the frame pointer and R12 is in the callee saved register list. These are both already checked in the if statement, so I think the rest of the code is unnecessary. The rest of the code was there as if R11 and LR were next to each other, only one push instruction is needed. However, with the way the callee saved register list is constructed when R12 is required, I don't think this is possible.

case ARM::R10:
case ARM::R11:
case ARM::R12:
if (STI.getPushPopSplitVariation(MF) ==
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it actually possible for this condition to be false? We know GPRCS2Size is non-zero, so it's not unsplit. We're not on Windows, so it's not the SEH split. And the new code explicitly checks for the branch signing split. So by process of elimination, this must be R7Split.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that reasoning is correct. I'll remove the if statement.

Copy link
Collaborator

@efriedma-quic efriedma-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jwestwood921 jwestwood921 merged commit 00e4a41 into llvm:main Mar 4, 2024
4 checks passed
@jwestwood921 jwestwood921 deleted the r11-preserved-with-pac-aapcs branch March 4, 2024 12:00
jwestwood921 added a commit to jwestwood921/llvm-project that referenced this pull request Mar 5, 2024
… AAPCS frame chain fix (llvm#82801)"

This reverts commit 00e4a41. This patch was found to cause
miscompilations and compilation failures.
jwestwood921 added a commit that referenced this pull request Mar 5, 2024
#84019)

… AAPCS frame chain fix (#82801)"

This reverts commit 00e4a41. This patch
was found to cause miscompilations and compilation failures.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants