Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix stack layout for frames larger than 2gb #84114

Merged
merged 11 commits into from
Mar 27, 2024

Conversation

wesleywiser
Copy link
Contributor

For very large stack frames, the offset from the stack pointer to a
local can be more than 2^31 which overflows various int offsets
in the frame lowering code.

This patch updates the frame lowering code to calculate the offsets as
64-bit values and resolves the overflows, resulting in the correct
codegen for very large frames.

Fixes #48911

@llvmbot
Copy link
Collaborator

llvmbot commented Mar 6, 2024

@llvm/pr-subscribers-backend-aarch64
@llvm/pr-subscribers-mc
@llvm/pr-subscribers-backend-powerpc
@llvm/pr-subscribers-backend-arm
@llvm/pr-subscribers-backend-x86
@llvm/pr-subscribers-backend-msp430

@llvm/pr-subscribers-debuginfo

Author: Wesley Wiser (wesleywiser)

Changes

For very large stack frames, the offset from the stack pointer to a
local can be more than 2^31 which overflows various int offsets
in the frame lowering code.

This patch updates the frame lowering code to calculate the offsets as
64-bit values and resolves the overflows, resulting in the correct
codegen for very large frames.

Fixes #48911


Patch is 35.90 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/84114.diff

21 Files Affected:

  • (modified) llvm/include/llvm/CodeGen/MachineFrameInfo.h (+7-9)
  • (modified) llvm/include/llvm/CodeGen/TargetFrameLowering.h (+2-2)
  • (modified) llvm/include/llvm/MC/MCAsmBackend.h (+1-1)
  • (modified) llvm/include/llvm/MC/MCDwarf.h (+19-19)
  • (modified) llvm/lib/CodeGen/CFIInstrInserter.cpp (+5-5)
  • (modified) llvm/lib/CodeGen/MachineFrameInfo.cpp (+1-1)
  • (modified) llvm/lib/CodeGen/PrologEpilogInserter.cpp (+2-2)
  • (modified) llvm/lib/MC/MCDwarf.cpp (+3-3)
  • (modified) llvm/lib/Target/AArch64/MCTargetDesc/AArch64AsmBackend.cpp (+5-5)
  • (modified) llvm/lib/Target/ARM/ARMFrameLowering.cpp (+2-2)
  • (modified) llvm/lib/Target/ARM/MCTargetDesc/ARMAsmBackend.cpp (+1-1)
  • (modified) llvm/lib/Target/ARM/MCTargetDesc/ARMAsmBackendDarwin.h (+1-1)
  • (modified) llvm/lib/Target/Hexagon/HexagonFrameLowering.cpp (+2-2)
  • (modified) llvm/lib/Target/MSP430/MSP430FrameLowering.cpp (+1-1)
  • (modified) llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp (+6-6)
  • (modified) llvm/lib/Target/X86/MCTargetDesc/X86MCCodeEmitter.cpp (+7-6)
  • (modified) llvm/lib/Target/X86/X86FrameLowering.cpp (+14-14)
  • (modified) llvm/lib/Target/X86/X86FrameLowering.h (+2-2)
  • (modified) llvm/lib/Target/X86/X86RegisterInfo.cpp (+6-4)
  • (modified) llvm/test/CodeGen/PowerPC/huge-frame-size.ll (+1-1)
  • (added) llvm/test/CodeGen/X86/huge-stack.ll (+22)
diff --git a/llvm/include/llvm/CodeGen/MachineFrameInfo.h b/llvm/include/llvm/CodeGen/MachineFrameInfo.h
index 7d11d63d4066f4..243b124d89ff40 100644
--- a/llvm/include/llvm/CodeGen/MachineFrameInfo.h
+++ b/llvm/include/llvm/CodeGen/MachineFrameInfo.h
@@ -251,7 +251,7 @@ class MachineFrameInfo {
   /// targets, this value is only used when generating debug info (via
   /// TargetRegisterInfo::getFrameIndexReference); when generating code, the
   /// corresponding adjustments are performed directly.
-  int OffsetAdjustment = 0;
+  int64_t OffsetAdjustment = 0;
 
   /// The prolog/epilog code inserter may process objects that require greater
   /// alignment than the default alignment the target provides.
@@ -280,7 +280,7 @@ class MachineFrameInfo {
   /// setup/destroy pseudo instructions (as defined in the TargetFrameInfo
   /// class).  This information is important for frame pointer elimination.
   /// It is only valid during and after prolog/epilog code insertion.
-  unsigned MaxCallFrameSize = ~0u;
+  unsigned long MaxCallFrameSize = ~0ul;
 
   /// The number of bytes of callee saved registers that the target wants to
   /// report for the current function in the CodeView S_FRAMEPROC record.
@@ -591,10 +591,10 @@ class MachineFrameInfo {
   uint64_t estimateStackSize(const MachineFunction &MF) const;
 
   /// Return the correction for frame offsets.
-  int getOffsetAdjustment() const { return OffsetAdjustment; }
+  int64_t getOffsetAdjustment() const { return OffsetAdjustment; }
 
   /// Set the correction for frame offsets.
-  void setOffsetAdjustment(int Adj) { OffsetAdjustment = Adj; }
+  void setOffsetAdjustment(int64_t Adj) { OffsetAdjustment = Adj; }
 
   /// Return the alignment in bytes that this function must be aligned to,
   /// which is greater than the default stack alignment provided by the target.
@@ -651,17 +651,15 @@ class MachineFrameInfo {
   /// CallFrameSetup/Destroy pseudo instructions are used by the target, and
   /// then only during or after prolog/epilog code insertion.
   ///
-  unsigned getMaxCallFrameSize() const {
+  unsigned long getMaxCallFrameSize() const {
     // TODO: Enable this assert when targets are fixed.
     //assert(isMaxCallFrameSizeComputed() && "MaxCallFrameSize not computed yet");
     if (!isMaxCallFrameSizeComputed())
       return 0;
     return MaxCallFrameSize;
   }
-  bool isMaxCallFrameSizeComputed() const {
-    return MaxCallFrameSize != ~0u;
-  }
-  void setMaxCallFrameSize(unsigned S) { MaxCallFrameSize = S; }
+  bool isMaxCallFrameSizeComputed() const { return MaxCallFrameSize != ~0ul; }
+  void setMaxCallFrameSize(unsigned long S) { MaxCallFrameSize = S; }
 
   /// Returns how many bytes of callee-saved registers the target pushed in the
   /// prologue. Only used for debug info.
diff --git a/llvm/include/llvm/CodeGen/TargetFrameLowering.h b/llvm/include/llvm/CodeGen/TargetFrameLowering.h
index 0b9cacecc7cbe1..a256574d9d2bee 100644
--- a/llvm/include/llvm/CodeGen/TargetFrameLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetFrameLowering.h
@@ -51,7 +51,7 @@ class TargetFrameLowering {
   // Maps a callee saved register to a stack slot with a fixed offset.
   struct SpillSlot {
     unsigned Reg;
-    int Offset; // Offset relative to stack pointer on function entry.
+    long Offset; // Offset relative to stack pointer on function entry.
   };
 
   struct DwarfFrameBase {
@@ -66,7 +66,7 @@ class TargetFrameLowering {
       // Used with FrameBaseKind::Register.
       unsigned Reg;
       // Used with FrameBaseKind::CFA.
-      int Offset;
+      long Offset;
       struct WasmFrameBase WasmLoc;
     } Location;
   };
diff --git a/llvm/include/llvm/MC/MCAsmBackend.h b/llvm/include/llvm/MC/MCAsmBackend.h
index 01a64fb425a94f..689e3cd5dbf206 100644
--- a/llvm/include/llvm/MC/MCAsmBackend.h
+++ b/llvm/include/llvm/MC/MCAsmBackend.h
@@ -232,7 +232,7 @@ class MCAsmBackend {
   virtual void handleAssemblerFlag(MCAssemblerFlag Flag) {}
 
   /// Generate the compact unwind encoding for the CFI instructions.
-  virtual uint32_t generateCompactUnwindEncoding(const MCDwarfFrameInfo *FI,
+  virtual uint64_t generateCompactUnwindEncoding(const MCDwarfFrameInfo *FI,
                                                  const MCContext *Ctxt) const {
     return 0;
   }
diff --git a/llvm/include/llvm/MC/MCDwarf.h b/llvm/include/llvm/MC/MCDwarf.h
index 18056c5fdf816a..e39c44f38a342a 100644
--- a/llvm/include/llvm/MC/MCDwarf.h
+++ b/llvm/include/llvm/MC/MCDwarf.h
@@ -508,7 +508,7 @@ class MCCFIInstruction {
   MCSymbol *Label;
   unsigned Register;
   union {
-    int Offset;
+    long Offset;
     unsigned Register2;
   };
   unsigned AddressSpace = ~0u;
@@ -516,7 +516,7 @@ class MCCFIInstruction {
   std::vector<char> Values;
   std::string Comment;
 
-  MCCFIInstruction(OpType Op, MCSymbol *L, unsigned R, int O, SMLoc Loc,
+  MCCFIInstruction(OpType Op, MCSymbol *L, unsigned R, long O, SMLoc Loc,
                    StringRef V = "", StringRef Comment = "")
       : Operation(Op), Label(L), Register(R), Offset(O), Loc(Loc),
         Values(V.begin(), V.end()), Comment(Comment) {
@@ -528,7 +528,7 @@ class MCCFIInstruction {
     assert(Op == OpRegister);
   }
 
-  MCCFIInstruction(OpType Op, MCSymbol *L, unsigned R, int O, unsigned AS,
+  MCCFIInstruction(OpType Op, MCSymbol *L, unsigned R, long O, unsigned AS,
                    SMLoc Loc)
       : Operation(Op), Label(L), Register(R), Offset(O), AddressSpace(AS),
         Loc(Loc) {
@@ -538,7 +538,7 @@ class MCCFIInstruction {
 public:
   /// .cfi_def_cfa defines a rule for computing CFA as: take address from
   /// Register and add Offset to it.
-  static MCCFIInstruction cfiDefCfa(MCSymbol *L, unsigned Register, int Offset,
+  static MCCFIInstruction cfiDefCfa(MCSymbol *L, unsigned Register, long Offset,
                                     SMLoc Loc = {}) {
     return MCCFIInstruction(OpDefCfa, L, Register, Offset, Loc);
   }
@@ -547,13 +547,13 @@ class MCCFIInstruction {
   /// on Register will be used instead of the old one. Offset remains the same.
   static MCCFIInstruction createDefCfaRegister(MCSymbol *L, unsigned Register,
                                                SMLoc Loc = {}) {
-    return MCCFIInstruction(OpDefCfaRegister, L, Register, 0, Loc);
+    return MCCFIInstruction(OpDefCfaRegister, L, Register, 0L, Loc);
   }
 
   /// .cfi_def_cfa_offset modifies a rule for computing CFA. Register
   /// remains the same, but offset is new. Note that it is the absolute offset
   /// that will be added to a defined register to the compute CFA address.
-  static MCCFIInstruction cfiDefCfaOffset(MCSymbol *L, int Offset,
+  static MCCFIInstruction cfiDefCfaOffset(MCSymbol *L, long Offset,
                                           SMLoc Loc = {}) {
     return MCCFIInstruction(OpDefCfaOffset, L, 0, Offset, Loc);
   }
@@ -561,7 +561,7 @@ class MCCFIInstruction {
   /// .cfi_adjust_cfa_offset Same as .cfi_def_cfa_offset, but
   /// Offset is a relative value that is added/subtracted from the previous
   /// offset.
-  static MCCFIInstruction createAdjustCfaOffset(MCSymbol *L, int Adjustment,
+  static MCCFIInstruction createAdjustCfaOffset(MCSymbol *L, long Adjustment,
                                                 SMLoc Loc = {}) {
     return MCCFIInstruction(OpAdjustCfaOffset, L, 0, Adjustment, Loc);
   }
@@ -581,7 +581,7 @@ class MCCFIInstruction {
   /// .cfi_offset Previous value of Register is saved at offset Offset
   /// from CFA.
   static MCCFIInstruction createOffset(MCSymbol *L, unsigned Register,
-                                       int Offset, SMLoc Loc = {}) {
+                                       long Offset, SMLoc Loc = {}) {
     return MCCFIInstruction(OpOffset, L, Register, Offset, Loc);
   }
 
@@ -589,7 +589,7 @@ class MCCFIInstruction {
   /// Offset from the current CFA register. This is transformed to .cfi_offset
   /// using the known displacement of the CFA register from the CFA.
   static MCCFIInstruction createRelOffset(MCSymbol *L, unsigned Register,
-                                          int Offset, SMLoc Loc = {}) {
+                                          long Offset, SMLoc Loc = {}) {
     return MCCFIInstruction(OpRelOffset, L, Register, Offset, Loc);
   }
 
@@ -602,12 +602,12 @@ class MCCFIInstruction {
 
   /// .cfi_window_save SPARC register window is saved.
   static MCCFIInstruction createWindowSave(MCSymbol *L, SMLoc Loc = {}) {
-    return MCCFIInstruction(OpWindowSave, L, 0, 0, Loc);
+    return MCCFIInstruction(OpWindowSave, L, 0, 0L, Loc);
   }
 
   /// .cfi_negate_ra_state AArch64 negate RA state.
   static MCCFIInstruction createNegateRAState(MCSymbol *L, SMLoc Loc = {}) {
-    return MCCFIInstruction(OpNegateRAState, L, 0, 0, Loc);
+    return MCCFIInstruction(OpNegateRAState, L, 0, 0L, Loc);
   }
 
   /// .cfi_restore says that the rule for Register is now the same as it
@@ -615,31 +615,31 @@ class MCCFIInstruction {
   /// by .cfi_startproc were executed.
   static MCCFIInstruction createRestore(MCSymbol *L, unsigned Register,
                                         SMLoc Loc = {}) {
-    return MCCFIInstruction(OpRestore, L, Register, 0, Loc);
+    return MCCFIInstruction(OpRestore, L, Register, 0L, Loc);
   }
 
   /// .cfi_undefined From now on the previous value of Register can't be
   /// restored anymore.
   static MCCFIInstruction createUndefined(MCSymbol *L, unsigned Register,
                                           SMLoc Loc = {}) {
-    return MCCFIInstruction(OpUndefined, L, Register, 0, Loc);
+    return MCCFIInstruction(OpUndefined, L, Register, 0L, Loc);
   }
 
   /// .cfi_same_value Current value of Register is the same as in the
   /// previous frame. I.e., no restoration is needed.
   static MCCFIInstruction createSameValue(MCSymbol *L, unsigned Register,
                                           SMLoc Loc = {}) {
-    return MCCFIInstruction(OpSameValue, L, Register, 0, Loc);
+    return MCCFIInstruction(OpSameValue, L, Register, 0L, Loc);
   }
 
   /// .cfi_remember_state Save all current rules for all registers.
   static MCCFIInstruction createRememberState(MCSymbol *L, SMLoc Loc = {}) {
-    return MCCFIInstruction(OpRememberState, L, 0, 0, Loc);
+    return MCCFIInstruction(OpRememberState, L, 0, 0L, Loc);
   }
 
   /// .cfi_restore_state Restore the previously saved state.
   static MCCFIInstruction createRestoreState(MCSymbol *L, SMLoc Loc = {}) {
-    return MCCFIInstruction(OpRestoreState, L, 0, 0, Loc);
+    return MCCFIInstruction(OpRestoreState, L, 0, 0L, Loc);
   }
 
   /// .cfi_escape Allows the user to add arbitrary bytes to the unwind
@@ -650,7 +650,7 @@ class MCCFIInstruction {
   }
 
   /// A special wrapper for .cfi_escape that indicates GNU_ARGS_SIZE
-  static MCCFIInstruction createGnuArgsSize(MCSymbol *L, int Size,
+  static MCCFIInstruction createGnuArgsSize(MCSymbol *L, long Size,
                                             SMLoc Loc = {}) {
     return MCCFIInstruction(OpGnuArgsSize, L, 0, Size, Loc);
   }
@@ -677,7 +677,7 @@ class MCCFIInstruction {
     return AddressSpace;
   }
 
-  int getOffset() const {
+  long getOffset() const {
     assert(Operation == OpDefCfa || Operation == OpOffset ||
            Operation == OpRelOffset || Operation == OpDefCfaOffset ||
            Operation == OpAdjustCfaOffset || Operation == OpGnuArgsSize ||
@@ -705,7 +705,7 @@ struct MCDwarfFrameInfo {
   unsigned CurrentCfaRegister = 0;
   unsigned PersonalityEncoding = 0;
   unsigned LsdaEncoding = 0;
-  uint32_t CompactUnwindEncoding = 0;
+  uint64_t CompactUnwindEncoding = 0;
   bool IsSignalFrame = false;
   bool IsSimple = false;
   unsigned RAReg = static_cast<unsigned>(INT_MAX);
diff --git a/llvm/lib/CodeGen/CFIInstrInserter.cpp b/llvm/lib/CodeGen/CFIInstrInserter.cpp
index 87b062a16df1d2..06c4686caba96d 100644
--- a/llvm/lib/CodeGen/CFIInstrInserter.cpp
+++ b/llvm/lib/CodeGen/CFIInstrInserter.cpp
@@ -68,9 +68,9 @@ class CFIInstrInserter : public MachineFunctionPass {
   struct MBBCFAInfo {
     MachineBasicBlock *MBB;
     /// Value of cfa offset valid at basic block entry.
-    int IncomingCFAOffset = -1;
+    long IncomingCFAOffset = -1;
     /// Value of cfa offset valid at basic block exit.
-    int OutgoingCFAOffset = -1;
+    long OutgoingCFAOffset = -1;
     /// Value of cfa register valid at basic block entry.
     unsigned IncomingCFARegister = 0;
     /// Value of cfa register valid at basic block exit.
@@ -120,7 +120,7 @@ class CFIInstrInserter : public MachineFunctionPass {
   /// Return the cfa offset value that should be set at the beginning of a MBB
   /// if needed. The negated value is needed when creating CFI instructions that
   /// set absolute offset.
-  int getCorrectCFAOffset(MachineBasicBlock *MBB) {
+  long getCorrectCFAOffset(MachineBasicBlock *MBB) {
     return MBBVector[MBB->getNumber()].IncomingCFAOffset;
   }
 
@@ -175,7 +175,7 @@ void CFIInstrInserter::calculateCFAInfo(MachineFunction &MF) {
 
 void CFIInstrInserter::calculateOutgoingCFAInfo(MBBCFAInfo &MBBInfo) {
   // Outgoing cfa offset set by the block.
-  int SetOffset = MBBInfo.IncomingCFAOffset;
+  long SetOffset = MBBInfo.IncomingCFAOffset;
   // Outgoing cfa register set by the block.
   unsigned SetRegister = MBBInfo.IncomingCFARegister;
   MachineFunction *MF = MBBInfo.MBB->getParent();
@@ -188,7 +188,7 @@ void CFIInstrInserter::calculateOutgoingCFAInfo(MBBCFAInfo &MBBInfo) {
   for (MachineInstr &MI : *MBBInfo.MBB) {
     if (MI.isCFIInstruction()) {
       std::optional<unsigned> CSRReg;
-      std::optional<int> CSROffset;
+      std::optional<long> CSROffset;
       unsigned CFIIndex = MI.getOperand(0).getCFIIndex();
       const MCCFIInstruction &CFI = Instrs[CFIIndex];
       switch (CFI.getOperation()) {
diff --git a/llvm/lib/CodeGen/MachineFrameInfo.cpp b/llvm/lib/CodeGen/MachineFrameInfo.cpp
index 280d3a6a41edc9..d7dc2a926f56e2 100644
--- a/llvm/lib/CodeGen/MachineFrameInfo.cpp
+++ b/llvm/lib/CodeGen/MachineFrameInfo.cpp
@@ -196,7 +196,7 @@ void MachineFrameInfo::computeMaxCallFrameSize(const MachineFunction &MF) {
     for (const MachineInstr &MI : MBB) {
       unsigned Opcode = MI.getOpcode();
       if (Opcode == FrameSetupOpcode || Opcode == FrameDestroyOpcode) {
-        unsigned Size = TII.getFrameSize(MI);
+        unsigned long Size = TII.getFrameSize(MI);
         MaxCallFrameSize = std::max(MaxCallFrameSize, Size);
         AdjustsStack = true;
       } else if (MI.isInlineAsm()) {
diff --git a/llvm/lib/CodeGen/PrologEpilogInserter.cpp b/llvm/lib/CodeGen/PrologEpilogInserter.cpp
index 8af17e63e25c75..1193a79ee0b308 100644
--- a/llvm/lib/CodeGen/PrologEpilogInserter.cpp
+++ b/llvm/lib/CodeGen/PrologEpilogInserter.cpp
@@ -358,7 +358,7 @@ void PEI::calculateCallFrameInfo(MachineFunction &MF) {
   const TargetFrameLowering *TFI = MF.getSubtarget().getFrameLowering();
   MachineFrameInfo &MFI = MF.getFrameInfo();
 
-  unsigned MaxCallFrameSize = 0;
+  unsigned long MaxCallFrameSize = 0;
   bool AdjustsStack = MFI.adjustsStack();
 
   // Get the function call frame set-up and tear-down instruction opcode
@@ -374,7 +374,7 @@ void PEI::calculateCallFrameInfo(MachineFunction &MF) {
   for (MachineBasicBlock &BB : MF)
     for (MachineBasicBlock::iterator I = BB.begin(); I != BB.end(); ++I)
       if (TII.isFrameInstr(*I)) {
-        unsigned Size = TII.getFrameSize(*I);
+        unsigned long Size = TII.getFrameSize(*I);
         if (Size > MaxCallFrameSize) MaxCallFrameSize = Size;
         AdjustsStack = true;
         FrameSDOps.push_back(I);
diff --git a/llvm/lib/MC/MCDwarf.cpp b/llvm/lib/MC/MCDwarf.cpp
index d0face9140de66..9cb26596e82250 100644
--- a/llvm/lib/MC/MCDwarf.cpp
+++ b/llvm/lib/MC/MCDwarf.cpp
@@ -1293,8 +1293,8 @@ static void EmitPersonality(MCStreamer &streamer, const MCSymbol &symbol,
 namespace {
 
 class FrameEmitterImpl {
-  int CFAOffset = 0;
-  int InitialCFAOffset = 0;
+  long CFAOffset = 0;
+  long InitialCFAOffset = 0;
   bool IsEH;
   MCObjectStreamer &Streamer;
 
@@ -1408,7 +1408,7 @@ void FrameEmitterImpl::emitCFIInstruction(const MCCFIInstruction &Instr) {
     if (!IsEH)
       Reg = MRI->getDwarfRegNumFromDwarfEHRegNum(Reg);
 
-    int Offset = Instr.getOffset();
+    long Offset = Instr.getOffset();
     if (IsRelative)
       Offset -= CFAOffset;
     Offset = Offset / dataAlignmentFactor;
diff --git a/llvm/lib/Target/AArch64/MCTargetDesc/AArch64AsmBackend.cpp b/llvm/lib/Target/AArch64/MCTargetDesc/AArch64AsmBackend.cpp
index 30ef3680ae79c9..cdb4ddf471d418 100644
--- a/llvm/lib/Target/AArch64/MCTargetDesc/AArch64AsmBackend.cpp
+++ b/llvm/lib/Target/AArch64/MCTargetDesc/AArch64AsmBackend.cpp
@@ -584,7 +584,7 @@ class DarwinAArch64AsmBackend : public AArch64AsmBackend {
   /// Encode compact unwind stack adjustment for frameless functions.
   /// See UNWIND_ARM64_FRAMELESS_STACK_SIZE_MASK in compact_unwind_encoding.h.
   /// The stack size always needs to be 16 byte aligned.
-  uint32_t encodeStackAdjustment(uint32_t StackSize) const {
+  uint64_t encodeStackAdjustment(uint64_t StackSize) const {
     return (StackSize / 16) << 12;
   }
 
@@ -602,7 +602,7 @@ class DarwinAArch64AsmBackend : public AArch64AsmBackend {
   }
 
   /// Generate the compact unwind encoding from the CFI directives.
-  uint32_t generateCompactUnwindEncoding(const MCDwarfFrameInfo *FI,
+  uint64_t generateCompactUnwindEncoding(const MCDwarfFrameInfo *FI,
                                          const MCContext *Ctxt) const override {
     ArrayRef<MCCFIInstruction> Instrs = FI->Instructions;
     if (Instrs.empty())
@@ -612,10 +612,10 @@ class DarwinAArch64AsmBackend : public AArch64AsmBackend {
       return CU::UNWIND_ARM64_MODE_DWARF;
 
     bool HasFP = false;
-    unsigned StackSize = 0;
+    unsigned long StackSize = 0;
 
-    uint32_t CompactUnwindEncoding = 0;
-    int CurOffset = 0;
+    uint64_t CompactUnwindEncoding = 0;
+    long CurOffset = 0;
     for (size_t i = 0, e = Instrs.size(); i != e; ++i) {
       const MCCFIInstruction &Inst = Instrs[i];
 
diff --git a/llvm/lib/Target/ARM/ARMFrameLowering.cpp b/llvm/lib/Target/ARM/ARMFrameLowering.cpp
index 9b54dd4e4e618d..a1012f3996e76b 100644
--- a/llvm/lib/Target/ARM/ARMFrameLowering.cpp
+++ b/llvm/lib/Target/ARM/ARMFrameLowering.cpp
@@ -1165,7 +1165,7 @@ void ARMFrameLowering::emitPrologue(MachineFunction &MF,
         if (STI.splitFramePushPop(MF)) {
           unsigned DwarfReg = MRI->getDwarfRegNum(
               Reg == ARM::R12 ? ARM::RA_AUTH_CODE : Reg, true);
-          unsigned Offset = MFI.getObjectOffset(FI);
+          uint64_t Offset = MFI.getObjectOffset(FI);
           unsigned CFIIndex = MF.addFrameInst(
               MCCFIInstruction::createOffset(nullptr, DwarfReg, Offset));
           BuildMI(MBB, Pos, dl, TII.get(TargetOpcode::CFI_INSTRUCTION))
@@ -1187,7 +1187,7 @@ void ARMFrameLowering::emitPrologue(MachineFunction &MF,
       if ((Reg >= ARM::D0 && Reg <= ARM::D31) &&
           (Reg < ARM::D8 || Reg >= ARM::D8 + AFI->getNumAlignedDPRCS2Regs())) {
         unsigned DwarfReg = MRI->getDwarfRegNum(Reg, true);
-        unsigned Offset = MFI.getObjectOffset(FI);
+        uint64_t Offset = MFI.getObjectOffset(FI);
         unsigned CFIIndex = MF.addFrameInst(
             MCCFIInstruction::createOffset(nullptr, DwarfReg, Offset));
         BuildMI(MBB, Pos, dl, TII.get(TargetOpcode::CFI_INSTRUCTION))
diff --git a/llvm/lib/Target/ARM/MCTargetDesc/ARMAsmBackend.cpp b/llvm/lib/Target/ARM/MCTargetDesc/ARMAsmBackend.cpp
index 6cd4badb7704b7..9671f69bfd2268 100644
--- a/llvm/lib/Target/ARM/MCTargetDesc/ARMAsmBackend.cpp
+++ b/llvm/lib/Target/ARM/MCTargetDesc/ARMAsmBackend.cpp
@@ -1148,7 +1148,7 @@ enum CompactUnwindEncodings {
 /// instructions. If the CFI instructions describe a frame that cannot be
 /// encoded in compact unwind, the method returns UNWIND_ARM_MODE_DWARF which
 /// tells the runtime to fallback and unwind using dwarf.
-uint32_t ARMAsmBackendDarwin::generateCompactUnwindEncoding(
+uint64_t ARMAsmBackendDarwin::generateCompactUnwindEncoding(
     const MCDwarfFrameInfo *FI, const MCContext *Ctxt) const {
   DEBUG_WITH_TYPE("compact-unwind", llvm::dbgs() << "generateCU()\n");
   // Only armv7k uses CFI based unwinding.
diff --git a/llvm/lib/Target/ARM/MCTargetDesc/ARMAsmBackendDarwin.h b/llvm/lib/Target/ARM/MCTargetDesc/ARMAsmBackendDarwin.h
index ac0c9b101cae13..9c958003ca756a 100644
--- a/llvm/lib/Target/ARM/MCTargetDesc/ARMAsmBackendDarwin.h
+++ b/llvm/lib/Target/ARM/MCTargetDe...
[truncated]

@wesleywiser
Copy link
Contributor Author

The Windows failure appears to be unrelated to my change. If I checkout the commit prior to my first change, I still get the same failure in python/dialects/arith_dialect.py.

For very large stack frames, the offset from the stack pointer to a
local can be more than 2^31 which overflows various `int` offsets
in the frame lowering code.

This patch updates the frame lowering code to calculate the offsets as
64-bit values and resolves the overflows, resulting in the correct
codegen for very large frames.
@wesleywiser
Copy link
Contributor Author

I rebased to resolve merge conflicts. Please let me know if you'd like me to squash the history!

Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - @MaskRay any more comments?

@@ -18,7 +18,7 @@ define void @foo(i8 %x) {
; CHECK-LE-NEXT: oris 0, 0, 65535
; CHECK-LE-NEXT: ori 0, 0, 65504
; CHECK-LE-NEXT: stdux 1, 1, 0
; CHECK-LE-NEXT: .cfi_def_cfa_offset 32
; CHECK-LE-NEXT: .cfi_def_cfa_offset 4294967328
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks correct now on PPC. stack size is 4294967296 + linkagesize(32) = 4294967328 (0x100000020). Without the change, it is truncated to 0x20.

Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@RKSimon
Copy link
Collaborator

RKSimon commented Mar 25, 2024

Any more comments before I merge this for @wesleywiser ?

return MCCFIInstruction(OpDefCfa, L, Register, Offset, Loc);
}

/// .cfi_def_cfa_register modifies a rule for computing CFA. From now
/// on Register will be used instead of the old one. Offset remains the same.
static MCCFIInstruction createDefCfaRegister(MCSymbol *L, unsigned Register,
SMLoc Loc = {}) {
return MCCFIInstruction(OpDefCfaRegister, L, Register, 0, Loc);
return MCCFIInstruction(OpDefCfaRegister, L, Register, INT64_C(0), Loc);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

INT64_C(0) is not necessary. 0 is fairly common for int64_t.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wesleywiser Please can you fix those cases before I commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without the explicit cast, gcc complains the call is ambiguous:

/home/wesley/llvm-project2/llvm/include/llvm/MC/MCDwarf.h:550:66: error: call of overloaded ‘MCCFIInstruction(llvm::MCCFIInstruction::OpType, llvm::MCSymbol*&, unsigned int&, int, llvm::SMLoc&)’ is ambiguous
  550 |     return MCCFIInstruction(OpDefCfaRegister, L, Register, 0, Loc);
      |                                                                  ^
/home/wesley/llvm-project2/llvm/include/llvm/MC/MCDwarf.h:526:3: note: candidate: ‘llvm::MCCFIInstruction::MCCFIInstruction(llvm::MCCFIInstruction::OpType, llvm::MCSymbol*, unsigned int, unsigned int, llvm::SMLoc)’
  526 |   MCCFIInstruction(OpType Op, MCSymbol *L, unsigned R1, unsigned R2, SMLoc Loc)
      |   ^~~~~~~~~~~~~~~~
/home/wesley/llvm-project2/llvm/include/llvm/MC/MCDwarf.h:519:3: note: candidate: ‘llvm::MCCFIInstruction::MCCFIInstruction(llvm::MCCFIInstruction::OpType, llvm::MCSymbol*, unsigned int, int64_t, llvm::SMLoc, llvm::StringRef, llvm::StringRef)’
  519 |   MCCFIInstruction(OpType Op, MCSymbol *L, unsigned R, int64_t O, SMLoc Loc,
      |   ^~~~~~~~~~~~~~~~

Is there a different way I should solve that?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, we can try to cleanup later

@RKSimon RKSimon merged commit 58de1e2 into llvm:main Mar 27, 2024
4 checks passed
@RKSimon
Copy link
Collaborator

RKSimon commented Mar 27, 2024

@wesleywiser we're seeing some EXPENSIVE_CHECKS builds failures - I'm going to revert the patch and add -verify-machineinstrs to the problem tests (they should have had this anyhow)

RKSimon added a commit that referenced this pull request Mar 27, 2024
…or frames larger than 2gb (#84114)"

This is failing on some EXPENSIVE_CHECKS buildbots
@wesleywiser
Copy link
Contributor Author

@RKSimon thanks for the heads up! Is that something I need to look into? (Sorry, still new to contributing to LLVM)

RKSimon added a commit that referenced this pull request Mar 27, 2024
Help identify EXPENSIVE_CHECKS regressions identified in #84114
@RKSimon
Copy link
Collaborator

RKSimon commented Mar 27, 2024

If you take a look at the tests I've updated in f92fa7e - these should now fail (I'd recommend adding -verify-machineinstrs to X86/huge-stack.ll test as well)

@RKSimon
Copy link
Collaborator

RKSimon commented Apr 3, 2024

@wesleywiser Any progress on the -verify-machineinstrs regressions?

@wesleywiser
Copy link
Contributor Author

@RKSimon thanks for the ping! I've been away last week and haven't had a ton of time to look deeply at it. The crux of the issue is that with my changes, we correctly calculate the offset to various stack locals as being larger than will fit in a 32-bit displacement:

MOV8mi $rsp, 1, $noreg, 4294967167, $noreg, 42 :: (store (s8) into %ir.3)

If my understanding of x86_64 displacements is correct, then the verification failure is entirely valid as this can't be represented by the displacement. I'm not sure what the right way to fix this is. I think I could change the Prologue/Epilogue Inserter to insert an lea to calculate the right address and then perform the mov.

Is there an existing way to handle this? It seems like this kind of situation could occur in many different parts of x86_64 codegen.

@RKSimon
Copy link
Collaborator

RKSimon commented Apr 5, 2024

@phoebewang any preference to how we fix this?

@phoebewang
Copy link
Contributor

@phoebewang any preference to how we fix this?

Do you mean #87154? I took a quick look at it. The problem happens in fastISel only. It looks to me DAGISel can select to lea or mov with different offset. I didn't look deep into it, but guess we may learn how DAGISel handle it or bail out to DAGISel otherwise?

@efriedma-quic
Copy link
Collaborator

On non-x86 targets, frame indexes that use an addressing mode that ends up being illegal are not rare. Rare enough that it doesn't really matter for performance, but not so rare it doesn't come up. In that case, we scavenge a register and generate equivalent math. It should be possible to do the same on x86.

There are various tricks you can use to make scavenging happen less frequently; for example, using LocalStackSlotAllocation to lower frame indexes for large allocations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Computation of locals' offsets is wrong when stack size exceeds 2G
7 participants