-
Notifications
You must be signed in to change notification settings - Fork 12k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spill/restore FP/BP around instructions in which they are clobbered #81048
Conversation
@llvm/pr-subscribers-backend-x86 Author: None (weiguozhi) ChangesThis patch fixes #17204. If a base pointer is used in a function, and it is clobbered by an instruction (typically an inline asm), current register allocator can't handle this situation, so BP becomes garbage after those instructions. It can also occur to FP in theory. We can spill and reload FP/BP registers around those instructions. But normal spill/reload instructions also use FP/BP, so we can't spill them into normal spill slots, instead we spill them into the top of stack by using SP register. Patch is 20.58 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/81048.diff 10 Files Affected:
diff --git a/llvm/include/llvm/CodeGen/MachineFrameInfo.h b/llvm/include/llvm/CodeGen/MachineFrameInfo.h
index 7d11d63d4066f..f4578a6ca6a11 100644
--- a/llvm/include/llvm/CodeGen/MachineFrameInfo.h
+++ b/llvm/include/llvm/CodeGen/MachineFrameInfo.h
@@ -599,6 +599,7 @@ class MachineFrameInfo {
/// Return the alignment in bytes that this function must be aligned to,
/// which is greater than the default stack alignment provided by the target.
Align getMaxAlign() const { return MaxAlignment; }
+ Align getStackAlignment() const { return StackAlignment; }
/// Make sure the function is at least Align bytes aligned.
void ensureMaxAlignment(Align Alignment);
diff --git a/llvm/include/llvm/CodeGen/TargetFrameLowering.h b/llvm/include/llvm/CodeGen/TargetFrameLowering.h
index 0b9cacecc7cbe..e8ff1d69b4eb1 100644
--- a/llvm/include/llvm/CodeGen/TargetFrameLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetFrameLowering.h
@@ -466,6 +466,31 @@ class TargetFrameLowering {
/// Return the frame base information to be encoded in the DWARF subprogram
/// debug info.
virtual DwarfFrameBase getDwarfFrameBase(const MachineFunction &MF) const;
+
+ /// Issue instructions to allocate stack space and spill FP and/or BP to stack
+ /// using SP register.
+ virtual void spillFPBPUsingSP(MachineFunction &MF,
+ const MachineBasicBlock::iterator BeforeMI,
+ bool SpillFP, bool SpillBP) const {
+ report_fatal_error("spillFPBPUsingSP is not implemented for this target!");
+ }
+
+ /// Issue instructions to restore FP and/or BP from stack using SP register,
+ /// and free stack space.
+ virtual void restoreFPBPUsingSP(MachineFunction &MF,
+ const MachineBasicBlock::iterator AfterMI,
+ bool SpillFP, bool SpillBP) const {
+ report_fatal_error("restoreFPBPUsingSP is not implemented for this "
+ "target!");
+ }
+
+ // If MI uses fp/bp, but target can handle it, and doesn't want PEI to spill
+ // it, this function should return true, and update MI so PEI will not check
+ // any instructions from it and later.
+ virtual bool skipSpillFPBP(MachineFunction &MF,
+ MachineBasicBlock::reverse_iterator &MI) const {
+ return false;
+ }
};
} // End llvm namespace
diff --git a/llvm/include/llvm/CodeGen/TargetRegisterInfo.h b/llvm/include/llvm/CodeGen/TargetRegisterInfo.h
index 5098fc68df3b2..f6de9c40335e8 100644
--- a/llvm/include/llvm/CodeGen/TargetRegisterInfo.h
+++ b/llvm/include/llvm/CodeGen/TargetRegisterInfo.h
@@ -963,6 +963,16 @@ class TargetRegisterInfo : public MCRegisterInfo {
return false;
}
+ /// Returns true if the target needs a base register.
+ virtual bool hasBaseRegister(const MachineFunction &MF) const {
+ return false;
+ }
+
+ /// Returns the physical base register used to access stack object.
+ virtual Register getBaseRegister(const MachineFunction &MF) const {
+ return 0;
+ }
+
/// Return true if target has reserved a spill slot in the stack frame of
/// the given function for the specified register. e.g. On x86, if the frame
/// register is required, the first fixed stack object is reserved as its
diff --git a/llvm/lib/CodeGen/PrologEpilogInserter.cpp b/llvm/lib/CodeGen/PrologEpilogInserter.cpp
index 8af17e63e25c7..040a3f4f95186 100644
--- a/llvm/lib/CodeGen/PrologEpilogInserter.cpp
+++ b/llvm/lib/CodeGen/PrologEpilogInserter.cpp
@@ -77,6 +77,15 @@ using MBBVector = SmallVector<MachineBasicBlock *, 4>;
STATISTIC(NumLeafFuncWithSpills, "Number of leaf functions with CSRs");
STATISTIC(NumFuncSeen, "Number of functions seen in PEI");
+static cl::opt<bool> SpillClobberedFP(
+ "spill-clobbered-fp",
+ cl::desc("Spill clobbered fp register to stack."),
+ cl::init(false), cl::Hidden);
+
+static cl::opt<bool> SpillClobberedBP(
+ "spill-clobbered-bp",
+ cl::desc("Spill clobbered bp register to stack."),
+ cl::init(true), cl::Hidden);
namespace {
@@ -122,6 +131,7 @@ class PEI : public MachineFunctionPass {
void calculateCallFrameInfo(MachineFunction &MF);
void calculateSaveRestoreBlocks(MachineFunction &MF);
void spillCalleeSavedRegs(MachineFunction &MF);
+ void spillFrameBasePointer(MachineFunction &MF);
void calculateFrameObjectOffsets(MachineFunction &MF);
void replaceFrameIndices(MachineFunction &MF);
@@ -228,6 +238,9 @@ bool PEI::runOnMachineFunction(MachineFunction &MF) {
FrameIndexVirtualScavenging = TRI->requiresFrameIndexScavenging(MF);
ORE = &getAnalysis<MachineOptimizationRemarkEmitterPass>().getORE();
+ // Handle FP and BP spilling and restoring, for instructions that need it.
+ spillFrameBasePointer(MF);
+
// Calculate the MaxCallFrameSize and AdjustsStack variables for the
// function's frame information. Also eliminates call frame pseudo
// instructions.
@@ -1587,3 +1600,114 @@ void PEI::replaceFrameIndices(MachineBasicBlock *BB, MachineFunction &MF,
++I;
}
}
+
+static bool accessFrameBasePointer(const MachineInstr &MI, Register FP,
+ Register BP, bool &AccessFP, bool &AccessBP,
+ const TargetRegisterInfo *TRI) {
+ AccessFP = AccessBP = false;
+ if (FP) {
+ if (MI.findRegisterUseOperandIdx(FP, false, TRI) != -1 ||
+ MI.findRegisterDefOperandIdx(FP, false, true, TRI) != -1)
+ AccessFP = true;
+ }
+ if (BP) {
+ if (MI.findRegisterUseOperandIdx(BP, false, TRI) != -1 ||
+ MI.findRegisterDefOperandIdx(BP, false, true, TRI) != -1)
+ AccessBP = true;
+ }
+ return AccessFP || AccessBP;
+}
+
+/// If a function uses base pointer and the base pointer is clobbered by inline
+/// asm, RA doesn't detect this case, and after the inline asm, the base pointer
+/// cotains garbage value.
+/// For example if a 32b x86 function uses base pointer esi, and esi is
+/// clobbered by following inline asm
+/// asm("rep movsb" : "+D"(ptr), "+S"(x), "+c"(c)::"memory");
+/// We need to save esi before the asm and restore it after the asm.
+///
+/// The problem can also occur to frame pointer if there is a function call, and
+/// the callee uses a different calling convention and clobbers the fp.
+///
+/// Because normal frame objects (spill slots) are accessed through fp/bp
+/// register, so we can't spill fp/bp to normal spill slots.
+///
+/// FIXME: There are 2 possible enhancements:
+/// 1. In many cases there are different physical registers not clobbered by
+/// inline asm, we can use one of them as base pointer. Or use a virtual
+/// register as base pointer and let RA allocate a physical register to it.
+/// 2. If there is no other instructions access stack with fp/bp from the
+/// inline asm to the epilog, we can skip the save and restore operations.
+void PEI::spillFrameBasePointer(MachineFunction &MF) {
+ Register FP, BP;
+ const TargetFrameLowering &TFI = *MF.getSubtarget().getFrameLowering();
+ const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();
+ if (TFI.hasFP(MF) && SpillClobberedFP)
+ FP = TRI->getFrameRegister(MF);
+ if (TRI->hasBaseRegister(MF) && SpillClobberedBP)
+ BP = TRI->getBaseRegister(MF);
+
+ for (MachineBasicBlock &MBB : MF) {
+ auto MI = MBB.rbegin(), ME = MBB.rend();
+ while (MI != ME) {
+ // Skip frame setup/destroy instructions.
+ // Skip instructions handled by target.
+ if (MI->getFlag(MachineInstr::MIFlag::FrameSetup) ||
+ MI->getFlag(MachineInstr::MIFlag::FrameDestroy) ||
+ TFI.skipSpillFPBP(MF, MI)) {
+ ++MI;
+ continue;
+ }
+
+ bool AccessFP, AccessBP;
+ // Check if fp or bp is used in MI.
+ if (!accessFrameBasePointer(*MI, FP, BP, AccessFP, AccessBP, TRI)) {
+ ++MI;
+ continue;
+ }
+
+ // Look for the range [Start, Stop] in which fp or bp is defined and used.
+ bool FPLive = false, BPLive = false;
+ bool SpillFP = false, SpillBP = false;
+ auto Start = MI, Stop = MI;
+ do {
+ SpillFP |= AccessFP;
+ SpillBP |= AccessBP;
+
+ // Maintain FPLive and BPLive.
+ if (FPLive && MI->findRegisterDefOperandIdx(FP, false, true, TRI) != -1)
+ FPLive = false;
+ if (FP && MI->findRegisterUseOperandIdx(FP, false, TRI) != -1)
+ FPLive = true;
+ if (BPLive && MI->findRegisterDefOperandIdx(BP, false, true, TRI) != -1)
+ BPLive = false;
+ if (BP && MI->findRegisterUseOperandIdx(BP, false, TRI) != -1)
+ BPLive = true;
+
+ Start = MI++;
+ } while ((MI != ME) && (FPLive || BPLive ||
+ accessFrameBasePointer(*MI, FP, BP, AccessFP, AccessBP, TRI)));
+
+ // If the bp is clobbered by a call, we should save and restore outside of
+ // the frame setup instructions.
+ if (Stop->isCall()) {
+ const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();
+ auto FrameSetup = std::next(Start);
+ while (FrameSetup != ME && !TII.isFrameSetup(*FrameSetup) &&
+ !FrameSetup->isCall())
+ ++FrameSetup;
+ if (FrameSetup != ME && TII.isFrameSetup(*FrameSetup)) {
+ while (!TII.isFrameInstr(*Stop))
+ --Stop;
+ Start = FrameSetup;
+ MI = Start;
+ ++MI;
+ }
+ }
+
+ // Call target functions to spill and restore FP and BP registers.
+ TFI.spillFPBPUsingSP(MF, &(*Start), SpillFP, SpillBP);
+ TFI.restoreFPBPUsingSP(MF, &(*Stop), SpillFP, SpillBP);
+ }
+ }
+}
diff --git a/llvm/lib/Target/X86/X86FrameLowering.cpp b/llvm/lib/Target/X86/X86FrameLowering.cpp
index be416fb0db069..1fe874623bede 100644
--- a/llvm/lib/Target/X86/X86FrameLowering.cpp
+++ b/llvm/lib/Target/X86/X86FrameLowering.cpp
@@ -4178,3 +4178,114 @@ void X86FrameLowering::restoreWinEHStackPointersInParent(
/*RestoreSP=*/IsSEH);
}
}
+
+static int computeSPAdjust4SpillFPBP(MachineFunction &MF,
+ const TargetRegisterClass *RC,
+ unsigned SpillRegNum) {
+ const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();
+ unsigned AllocSize = TRI->getSpillSize(*RC) * SpillRegNum;
+ Align StackAlign = MF.getFrameInfo().getStackAlignment();
+ unsigned AlignedSize = alignTo(AllocSize, StackAlign);
+ return AlignedSize - AllocSize;
+}
+
+void X86FrameLowering::spillFPBPUsingSP(
+ MachineFunction &MF, MachineBasicBlock::iterator BeforeMI,
+ bool SpillFP, bool SpillBP) const {
+ const TargetRegisterClass *RC;
+ unsigned RegNum = 0;
+ MachineBasicBlock *MBB = BeforeMI->getParent();
+ DebugLoc DL = BeforeMI->getDebugLoc();
+
+ // Spill FP.
+ if (SpillFP) {
+ Register FP = TRI->getFrameRegister(MF);
+ if (STI.isTarget64BitILP32())
+ FP = Register(getX86SubSuperRegister(FP, 64));
+ RC = TRI->getMinimalPhysRegClass(FP);
+ ++RegNum;
+
+ BuildMI(*MBB, BeforeMI, DL,
+ TII.get(getPUSHOpcode(MF.getSubtarget<X86Subtarget>())))
+ .addReg(FP);
+ }
+
+ // Spill BP.
+ if (SpillBP) {
+ Register BP = TRI->getBaseRegister(MF);
+ if (STI.isTarget64BitILP32())
+ BP = Register(getX86SubSuperRegister(BP, 64));
+ RC = TRI->getMinimalPhysRegClass(BP);
+ ++RegNum;
+
+ BuildMI(*MBB, BeforeMI, DL,
+ TII.get(getPUSHOpcode(MF.getSubtarget<X86Subtarget>())))
+ .addReg(BP);
+ }
+
+ // Make sure SP is aligned.
+ int SPAdjust = computeSPAdjust4SpillFPBP(MF, RC, RegNum);
+ if (SPAdjust)
+ emitSPUpdate(*MBB, BeforeMI, DL, -SPAdjust, false);
+}
+
+void X86FrameLowering::restoreFPBPUsingSP(
+ MachineFunction &MF, MachineBasicBlock::iterator AfterMI,
+ bool SpillFP, bool SpillBP) const {
+ Register FP, BP;
+ const TargetRegisterClass *RC;
+ unsigned RegNum = 0;
+ if (SpillFP) {
+ FP = TRI->getFrameRegister(MF);
+ if (STI.isTarget64BitILP32())
+ FP = Register(getX86SubSuperRegister(FP, 64));
+ RC = TRI->getMinimalPhysRegClass(FP);
+ ++RegNum;
+ }
+ if (SpillBP) {
+ BP = TRI->getBaseRegister(MF);
+ if (STI.isTarget64BitILP32())
+ BP = Register(getX86SubSuperRegister(BP, 64));
+ RC = TRI->getMinimalPhysRegClass(BP);
+ ++RegNum;
+ }
+
+ // Adjust SP so it points to spilled FP or BP.
+ MachineBasicBlock *MBB = AfterMI->getParent();
+ MachineBasicBlock::iterator Pos = std::next(AfterMI);
+ DebugLoc DL = AfterMI->getDebugLoc();
+ int SPAdjust = computeSPAdjust4SpillFPBP(MF, RC, RegNum);
+ if (SPAdjust)
+ emitSPUpdate(*MBB, Pos, DL, SPAdjust, false);
+
+ // Restore BP.
+ if (SpillBP) {
+ BuildMI(*MBB, Pos, DL,
+ TII.get(getPOPOpcode(MF.getSubtarget<X86Subtarget>())), BP);
+ }
+
+ // Restore FP.
+ if (SpillFP) {
+ BuildMI(*MBB, Pos, DL,
+ TII.get(getPOPOpcode(MF.getSubtarget<X86Subtarget>())), FP);
+ }
+}
+
+bool X86FrameLowering::skipSpillFPBP(
+ MachineFunction &MF, MachineBasicBlock::reverse_iterator &MI) const {
+ if (MI->getOpcode() == X86::LCMPXCHG16B_SAVE_RBX) {
+ // The pseudo instruction LCMPXCHG16B_SAVE_RBX is generated in the form
+ // SaveRbx = COPY RBX
+ // SaveRbx = LCMPXCHG16B_SAVE_RBX ..., SaveRbx, implicit-def rbx
+ // And later LCMPXCHG16B_SAVE_RBX is expanded to restore RBX from SaveRbx.
+ // We should skip this instruction sequence.
+ int FI;
+ unsigned Reg;
+ while (!(MI->getOpcode() == TargetOpcode::COPY &&
+ MI->getOperand(1).getReg() == X86::RBX) &&
+ !((Reg = TII.isStoreToStackSlot(*MI, FI)) && Reg == X86::RBX))
+ ++MI;
+ return true;
+ }
+ return false;
+}
diff --git a/llvm/lib/Target/X86/X86FrameLowering.h b/llvm/lib/Target/X86/X86FrameLowering.h
index 2dc9ecc6109d7..0d77933363ce4 100644
--- a/llvm/lib/Target/X86/X86FrameLowering.h
+++ b/llvm/lib/Target/X86/X86FrameLowering.h
@@ -103,6 +103,20 @@ class X86FrameLowering : public TargetFrameLowering {
MutableArrayRef<CalleeSavedInfo> CSI,
const TargetRegisterInfo *TRI) const override;
+ void
+ spillFPBPUsingSP(MachineFunction &MF,
+ const MachineBasicBlock::iterator BeforeMI,
+ bool SpillFP, bool SpillBP) const override;
+
+ void
+ restoreFPBPUsingSP(MachineFunction &MF,
+ const MachineBasicBlock::iterator AfterMI,
+ bool SpillFP, bool SpillBP) const override;
+
+ bool
+ skipSpillFPBP(MachineFunction &MF,
+ MachineBasicBlock::reverse_iterator &MI) const override;
+
bool hasFP(const MachineFunction &MF) const override;
bool hasReservedCallFrame(const MachineFunction &MF) const override;
bool canSimplifyCallFramePseudos(const MachineFunction &MF) const override;
diff --git a/llvm/lib/Target/X86/X86RegisterInfo.h b/llvm/lib/Target/X86/X86RegisterInfo.h
index 7296a5f021e4a..cccb731d24354 100644
--- a/llvm/lib/Target/X86/X86RegisterInfo.h
+++ b/llvm/lib/Target/X86/X86RegisterInfo.h
@@ -133,6 +133,9 @@ class X86RegisterInfo final : public X86GenRegisterInfo {
void adjustStackMapLiveOutMask(uint32_t *Mask) const override;
bool hasBasePointer(const MachineFunction &MF) const;
+ bool hasBaseRegister(const MachineFunction &MF) const override {
+ return hasBasePointer(MF);
+ }
bool canRealignStack(const MachineFunction &MF) const override;
@@ -164,6 +167,9 @@ class X86RegisterInfo final : public X86GenRegisterInfo {
unsigned getPtrSizedStackRegister(const MachineFunction &MF) const;
Register getStackRegister() const { return StackPtr; }
Register getBaseRegister() const { return BasePtr; }
+ Register getBaseRegister(const MachineFunction &MF) const override {
+ return BasePtr;
+ }
/// Returns physical register used as frame pointer.
/// This will always returns the frame pointer register, contrary to
/// getFrameRegister() which returns the "base pointer" in situations
diff --git a/llvm/test/CodeGen/X86/clobber_base_ptr.ll b/llvm/test/CodeGen/X86/clobber_base_ptr.ll
new file mode 100644
index 0000000000000..29894b6d38906
--- /dev/null
+++ b/llvm/test/CodeGen/X86/clobber_base_ptr.ll
@@ -0,0 +1,59 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
+; RUN: llc < %s | FileCheck %s
+
+target datalayout = "e-m:x-p:32:32-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:32-n8:16:32-a:0:32-S32"
+target triple = "i386-pc-windows-gnu"
+
+; This function uses esi as base pointer, the inline asm clobbers esi, so we
+; should save esi using esp before the inline asm, and restore esi after the
+; inline asm.
+
+define i32 @foo() {
+; CHECK-LABEL: foo:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: pushl %ebp
+; CHECK-NEXT: .cfi_def_cfa_offset 8
+; CHECK-NEXT: .cfi_offset %ebp, -8
+; CHECK-NEXT: movl %esp, %ebp
+; CHECK-NEXT: .cfi_def_cfa_register %ebp
+; CHECK-NEXT: pushl %edi
+; CHECK-NEXT: pushl %esi
+; CHECK-NEXT: andl $-16, %esp
+; CHECK-NEXT: subl $16, %esp
+; CHECK-NEXT: movl %esp, %esi
+; CHECK-NEXT: .cfi_offset %esi, -16
+; CHECK-NEXT: .cfi_offset %edi, -12
+; CHECK-NEXT: movl $4, 12(%esi)
+; CHECK-NEXT: movl 12(%esi), %eax
+; CHECK-NEXT: addl $3, %eax
+; CHECK-NEXT: andl $-4, %eax
+; CHECK-NEXT: calll __alloca
+; CHECK-NEXT: movl %esp, %eax
+; CHECK-NEXT: andl $-16, %eax
+; CHECK-NEXT: movl %eax, %esp
+; CHECK-NEXT: movl $1, (%eax)
+; CHECK-NEXT: leal 8(%esi), %edi
+; CHECK-NEXT: movl $4, %ecx
+; CHECK-NEXT: pushl %esi
+; CHECK-NEXT: movl %eax, %esi
+; CHECK-NEXT: #APP
+; CHECK-NEXT: rep movsb (%esi), %es:(%edi)
+; CHECK-NEXT: #NO_APP
+; CHECK-NEXT: popl %esi
+; CHECK-NEXT: movl 8(%esi), %eax
+; CHECK-NEXT: leal -8(%ebp), %esp
+; CHECK-NEXT: popl %esi
+; CHECK-NEXT: popl %edi
+; CHECK-NEXT: popl %ebp
+; CHECK-NEXT: retl
+entry:
+ %size = alloca i32, align 4
+ %g = alloca i32, align 4
+ store volatile i32 4, ptr %size, align 4
+ %len = load volatile i32, ptr %size, align 4
+ %var_array = alloca i8, i32 %len, align 16
+ store i32 1, ptr %var_array, align 16
+ %nil = call { ptr, ptr, i32 } asm "rep movsb", "={di},={si},={cx},0,1,2,~{memory},~{dirflag},~{fpsr},~{flags}"(ptr %g, ptr %var_array, i32 4)
+ %retval = load i32, ptr %g, align 4
+ ret i32 %retval
+}
diff --git a/llvm/test/CodeGen/X86/i386-baseptr.ll b/llvm/test/CodeGen/X86/i386-baseptr.ll
index 08e4bde7353a4..777eb838b84cc 100644
--- a/llvm/test/CodeGen/X86/i386-baseptr.ll
+++ b/llvm/test/CodeGen/X86/i386-baseptr.ll
@@ -109,10 +109,14 @@ define x86_regcallcc void @clobber_baseptr_argptr(i32 %param1, i32 %param2, i32
; CHECK-NEXT: subl %eax, %edx
; CHECK-NEXT: movl %edx, %esp
; CHECK-NEXT: negl %eax
+; CHECK-NEXT: pushl %esi
+; CHECK-NEXT: subl $28, %esp
; CHECK-NEXT: movl $405, %esi # imm = 0x195
; CHECK-NEXT: #APP
; CHECK-NEXT: nop
; CHECK-NEXT: #NO_APP
+; CHECK-NEXT: addl $28, %esp
+; CHECK-NEXT: popl %esi
; CHECK-NEXT: movl $405, %ebx # imm = 0x195
; CHECK-NEXT: #APP
; CHECK-NEXT: nop
diff --git a/llvm/test/CodeGen/X86/x86-64-baseptr.ll b/llvm/test/CodeGen/X86/x86-64-baseptr.ll
index 8cda4ba2814bb..020004def6e7a 100644
--- a/llvm/test/CodeGen/X86/x86-64-baseptr.ll
+++ b/llvm/test/CodeGen/X86/x86-64-baseptr.ll
@@ -136,10 +136,14 @@ define void @clobber_base() #0 {
; X32ABI-NEXT: subl %eax, %edx
; X32ABI-NEXT: negl %eax
; X32ABI-NEXT: movl %edx, %esp
+; X32ABI-NEXT: pushq %rbx
+; X32ABI-NEXT: subl $24, %esp
; X32ABI-NEXT: movl $405, %ebx # imm = 0x195
; X32ABI-NEXT: #APP
; X32ABI-NEXT: nop
; X32ABI-NEXT: #NO_APP
+; X32ABI-NEXT: addl $24, %esp
+; X32ABI-NEXT: popq %rbx
; X32ABI-NEXT: movl $8, %edx
; X32ABI-NEXT: #APP
; X32ABI-NEXT: movl %edx, (%ebx)
@@ -268,6 +272,8 @@ define x86_regcallcc void @clobber_baseptr_argptr(i32 %param1, i32 %param2, i32
; X32ABI-NEXT: subl %eax, %edx
; X32ABI-NEXT: negl %eax
; X32ABI-NEXT: movl %edx, %esp
+; X32ABI-NEXT: pushq %rbx
+; X32ABI-NEXT: subl $24, %esp
; X32ABI-NEXT: movl $405, %ebx # imm = 0x195
; X32ABI-NEXT: #APP
; X32ABI-NEXT: nop
@@ -275,6 +281,8 @@ define x86_regcallcc void @clobber_baseptr_argptr(i32 %param1, i32 %param2, i32
; X32ABI-NEXT: #APP
; X32ABI-NEXT: nop
; X32ABI-NEXT: #NO_APP
+; X32ABI-NEXT: addl $24, %esp
+; X32ABI-NEXT: popq %rbx
; X32ABI-NEXT: movl...
[truncated]
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
This apparently doesn't catch all cases. Example: ghccc uses rbp as argument register, but rbp is used after the function call despite being clobbered before. Edit: should've read the code more carefully before commenting, sorry. Why is this hidden behind a default-off option? This feels like we should do this always (or at least fail before generating invalid code?). Plus: at least on x86-64 the CFI info is wrong once the frame pointer is clobbered.
Output (w/o directives for brevity):
|
Thanks for the test case. I added it to this patch. This is exactly what I had in my mind, see the comments for function spillFrameBasePointer. But I failed to find a real test case. So I disabled checking fp by default. It can be enabled with -spill-clobbered-fp=true, and then I got
|
ping |
Still, the CFI info looks wrong once the frame pointer is clobbered (updates to rsp and rbp should be accompanied by matching CFI directives) and I have concerns that this will break unwinding (not just asynchronous unwinding but also exceptions thrown from the called function). Could you test that unwinding through such a function call works as expected? (fwiw, I'm not a maintainer and I'd personally like to see support for this use case, but I have concerns that this will break other things.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this! This issue is over a decade old.
@@ -963,6 +963,16 @@ class TargetRegisterInfo : public MCRegisterInfo { | |||
return false; | |||
} | |||
|
|||
/// Returns true if the target needs a base register. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that AArch64, PPC, ARM, and RISCV all use virtual base registers (requiresVirtualBaseRegisters returns true in most cases) instead of physical base registers. Should we look into that in addition to this change? Perhaps we can add an x86-specific cl::opt that uses a virtual base register, try that on some corpus of code, and see if that is a) correct and b) has acceptable performance.
Regardless, we still need support for spilling the FP around FP-clobbers, so I think we need your change, and we should go forward as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simply implements requiresVirtualBaseRegisters and returns true for X86 doesn't work, it still generates wrong code for i386-baseptr.ll. It needs more changes in different places.
/// register as base pointer and let RA allocate a physical register to it. | ||
/// 2. If there is no other instructions access stack with fp/bp from the | ||
/// inline asm to the epilog, we can skip the save and restore operations. | ||
void PEI::spillFrameBasePointer(MachineFunction &MF) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the physical base pointer concept is essentially specific to x86. Is there a hook we can use to move this entire pass into X86FrameLowering? That would avoid the need to add as many hooks to TargetFrameLowering and TargetRegisterInfo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are also many targets don't implement requiresVirtualBaseRegisters, I don't know if they are also using physical base pointer. The frame pointer part should impact almost all targets.
I'm very ignorant in DWARF. After some study, I think we need the following additional CFI instructions
And the cfi_def_cfa_expression should be constructed similar to function X86FrameLowering::emitCalleeSavedFrameMoves. Is my understanding right?
Because the frame pointer is clobbered, so the unwinding information is already broken. The current patch doesn't make it worse. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm finding myself short on time, so maybe I can hand the review off to @zmodem , who has done various backend changes to switch lowering and conditional tail calls and so on.
The current behavior is pretty bad, so even if there are some edge cases that this solution doesn't handle such as inaccurate unwind info, this will be an improvement, and we should try to move forward.
I added unwinding information when FP is saved/restored. |
Hans, can you please review this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the delay. I'll do my best to review this.
Since this is currently only handling x86, should we keep it in the x86 target for now, and try to generalize later (maybe with some input from an aarch64 owner)?
@@ -599,6 +599,7 @@ class MachineFrameInfo { | |||
/// Return the alignment in bytes that this function must be aligned to, | |||
/// which is greater than the default stack alignment provided by the target. | |||
Align getMaxAlign() const { return MaxAlignment; } | |||
Align getStackAlignment() const { return StackAlignment; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's surprising that we didn't need to expose this before now. Do we really need it? And if we do, should it have its own comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can get it from TFL, so it is not needed now.
/// Issue instructions to allocate stack space and spill FP and/or BP to stack | ||
/// using SP register. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Consider spelling out the abbreviations (FP, BP, SP).
} | ||
|
||
/// Returns the physical base register used to access stack object. | ||
virtual Register getBaseRegister(const MachineFunction &MF) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it make sense to assert(hasBaseRegister(MF))
here?
static cl::opt<bool> | ||
SpillClobberedFP("spill-clobbered-fp", | ||
cl::desc("Spill clobbered fp register to stack."), | ||
cl::init(false), cl::Hidden); | ||
|
||
static cl::opt<bool> | ||
SpillClobberedBP("spill-clobbered-bp", | ||
cl::desc("Spill clobbered bp register to stack."), | ||
cl::init(true), cl::Hidden); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see why we'd need flags for this?
@@ -122,6 +131,7 @@ class PEI : public MachineFunctionPass { | |||
void calculateCallFrameInfo(MachineFunction &MF); | |||
void calculateSaveRestoreBlocks(MachineFunction &MF); | |||
void spillCalleeSavedRegs(MachineFunction &MF); | |||
void spillFrameBasePointer(MachineFunction &MF); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In TargetFrameLowering.h you use "FPBP" in the function names, here it's "FrameBase". Let's pick one convention for consistency.
I think "FPBP" makes it more clear that it concerns two registers, and not the "frame base pointer".
TII.get(getPUSHOpcode(MF.getSubtarget<X86Subtarget>()))) | ||
.addReg(BP); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function probably expects at least one of SpillFP and SpillBP to be set. Assert that at the top?
A compiler might reasonably warn that RC could be uninitialized here.
// Spill FP. | ||
if (SpillFP) { | ||
Register FP = TRI->getFrameRegister(MF); | ||
if (STI.isTarget64BitILP32()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have a test case covering X32?
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4 | ||
; RUN: llc -mtriple=x86_64-pc-linux -stackrealign -spill-clobbered-fp=true -verify-machineinstrs < %s | FileCheck %s | ||
|
||
; Calling convention ghccc uses ebp to pass parameter, so calling a function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also have a test which covers the case when both the frame and base pointer need to be spilled/restored?
; CHECK-NEXT: pushq %rbp | ||
; CHECK-NEXT: pushq %rax | ||
; CHECK-NEXT: .cfi_remember_state | ||
; CHECK-NEXT: .cfi_escape 0x0f, 0x06, 0x77, 0x08, 0x06, 0x11, 0x10, 0x22 # |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's any stack manipulation after this (e.g. to pass more arguments to the call), I think the CFI would be incorrect? I'm not sure how we'd handle that though?
BuildMI(*MBB, BeforeMI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION)) | ||
.addCFIIndex(CFIIndex); | ||
|
||
// Setup new CFA value with DW_CFA_def_cfa_expression: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rnk what about Windows unwinding. Do we need to do something for that here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're spilling the base pointer, you shouldn't need to touch the unwind: we normally restore the stack pointer from fp, anyway.
If you're spilling the frame pointer, you'd have to do something complicated to get correct unwind; probably represent it as a separate function (.seh_endproc/.seh_startproc/etc.). You can't specify unwind opcodes outside the prolog.
I will move it to x86 backend for this initial version. Then many of the callbacks can be removed. |
7017717
to
2905372
Compare
This patch fixes llvm#17204. If a base pointer is used in a function, and it is clobbered by an instruction (typically an inline asm), current register allocator can't handle this situation, so BP becomes garbage after those instructions. It can also occur to FP in theory. We can spill and reload FP/BP registers around those instructions. But normal spill/reload instructions also use FP/BP, so we can't spill them into normal spill slots, instead we spill them into the top of stack by using SP register.
Add a new test case to demostrate save/restore of FP register.
I have moved the implementation to x86 backend. Please review it again. |
Any more review comments for this patch? The underlying problem is blocking the preserve_none CC from being widely used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hans is out on summer break, but I went ahead and took a look
@@ -4231,3 +4231,325 @@ void X86FrameLowering::restoreWinEHStackPointersInParent( | |||
/*RestoreSP=*/IsSEH); | |||
} | |||
} | |||
|
|||
static int computeSPAdjust4SpillFPBP(MachineFunction &MF, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use a more descriptive name and add more comments about what this is computing. I came up with something like computeFPBPAlignmentGap
, and then would describe the alignment gap as the difference between the current SP and the next properly aligned stack offset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
FP = TRI->getFrameRegister(MF); | ||
if (STI.isTarget64BitILP32()) | ||
FP = Register(getX86SubSuperRegister(FP, 64)); | ||
RC = TRI->getMinimalPhysRegClass(FP); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This register class calculation is the same between spill and restore FPBP.
I suggest you can restructure this to have one method, saveAndRestoreFPBPUsingSP
, which takes BeforeMI
and AfterMI
, calculates this shared state once, and then calls spill/restoreFPBPUsingSP
, and maybe just pass in the FP/BP
Register variables. They are nullable, so you can check for isValid
before pushing or popping.
SPAdjust
will also be the same and only needs to be calculated once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
@@ -228,6 +228,8 @@ bool PEI::runOnMachineFunction(MachineFunction &MF) { | |||
FrameIndexVirtualScavenging = TRI->requiresFrameIndexScavenging(MF); | |||
ORE = &getAnalysis<MachineOptimizationRemarkEmitterPass>().getORE(); | |||
|
|||
TFI->spillFPBP(MF); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PEI process is very complicated, please add comments elaborating on what this does and why it comes before call frame instruction elimination.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
|
||
// Invoke instruction has been lowered to normal function call. We try to figure | ||
// out if MI comes from Invoke. | ||
// Do we have any better method? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like an incomplete solution. If we have an invoke that actually clobbers RBX and we need to spill around it, we ought to reliably fail with an error instead of silently doing nothing as we currently do. That's part of what I really want to get out of this. We can't solve all problems, but the compiler should emit an error if it gets into an impossible situation that it can't handle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I will do detection work in a separate patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can live with that.
; CHECK-NEXT: popq %rbp | ||
; CHECK-NEXT: .cfi_def_cfa %rsp, 8 | ||
; CHECK-NEXT: retq | ||
%x = call ghccc i32 @external(i32 %0, i32 %1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any other calling convention that:
- clobbers RBX
- accepts stack arguments
I'd like to exercise the test case where:
- we need to pass stack arguments (num regparm args + 4)
- we have to spill just FP (RBP, not RBX)
I'm worried that the stores or pushes to set up the stack arguments will be disturbed by the RBP push and pops.
Regardless of whether the problem exists or not, please add a test case that exercises spilling around a function that clobbers FP/BP with stack arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only calling convention that clobbers RBP but no RBX is HiPE. I added a new test case for it. And this patch does correct work.
But the generated code is still wrong because of a different bug.
%6:gr64 = MOV64rm %fixed-stack.1, 1, $noreg, 0, $noreg :: (load (s64) from %fixed-stack.1)
%7:gr64 = MOV64rm %fixed-stack.0, 1, $noreg, 0, $noreg :: (load (s64) from %fixed-stack.0)
ADJCALLSTACKDOWN64 16, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
%8:gr64 = COPY $rsp
MOV64mr %8:gr64, 1, $noreg, 8, $noreg, killed %7:gr64 :: (store (s64) into stack + 8)
MOV64mr %8:gr64, 1, $noreg, 0, $noreg, killed %6:gr64 :: (store (s64) into stack)
$r15 = COPY %0:gr64
$rbp = COPY %1:gr64
$rsi = COPY %2:gr64
$rdx = COPY %3:gr64
$rcx = COPY %4:gr64
$r8 = COPY %5:gr64
CALL64pcrel32 target-flags(x86-plt) @external2, <regmask>, ...
x86-cf-opt transforms it to
ADJCALLSTACKDOWN64 16, 0, 16, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
$r15 = COPY %0:gr64
$rbp = COPY %1:gr64
$rsi = COPY %2:gr64
$rdx = COPY %3:gr64
$rcx = COPY %4:gr64
$r8 = COPY %5:gr64
PUSH64rmm %fixed-stack.0, 1, $noreg, 0, $noreg, implicit-def $rsp, implicit $rsp :: (load (s64) from %fixed-stack.0), (store (s64) into stack + 8)
PUSH64rmm %fixed-stack.1, 1, $noreg, 0, $noreg, implicit-def $rsp, implicit $rsp :: (load (s64) from %fixed-stack.1), (store (s64) into stack)
CALL64pcrel32 target-flags(x86-plt) @external2, <regmask>, ...
The access to %fixed-stack.0 needs rbp, but it is already assigned function argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, that's a good find. :) Thanks for adding the tests, we don't need to fix this issue before landing.
Move the duplicated computation of physical register, register class and stack adjustment to new function saveAndRestoreFPBPUsingSP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, let's merge it!
; CHECK-NEXT: popq %rbp | ||
; CHECK-NEXT: .cfi_def_cfa %rsp, 8 | ||
; CHECK-NEXT: retq | ||
%x = call ghccc i32 @external(i32 %0, i32 %1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, that's a good find. :) Thanks for adding the tests, we don't need to fix this issue before landing.
|
||
// Invoke instruction has been lowered to normal function call. We try to figure | ||
// out if MI comes from Invoke. | ||
// Do we have any better method? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can live with that.
// the same BB, so it will not impact outgoing CFA. | ||
++RememberState; | ||
if (RememberState != 1) | ||
report_fatal_error( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is existing code, but I generally try to recommend MCContext::reportError
over report_fatal_error
. We do have actual non-fatal error handling facilities during CodeGen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again!
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/133/builds/2719 Here is the relevant piece of the build log for the reference:
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/72/builds/1966 Here is the relevant piece of the build log for the reference:
|
This incurs a substantial compile-time regression at O0. http://llvm-compile-time-tracker.com/compare.php?from=ce2a3d9042c95630f12b790bf201c4daf8941afb&to=0d471b3f64d3116bd57c79d872f7384fff80daa5&stat=instructions:u Can we come up with a no-overhead solution for this, e.g. check during call lowering whether there's a call that clobbers FP/BP and then selectively look at these calls instead of iterating over the entire IR? (IR iteration is expensive. And >99% of the code doesn't need this.) If this takes longer, we might want to temporarily revert this. |
…lvm#81048) This patch fixes llvm#17204. If a base pointer is used in a function, and it is clobbered by an instruction (typically an inline asm), current register allocator can't handle this situation, so BP becomes garbage after those instructions. It can also occur to FP in theory. We can spill and reload FP/BP registers around those instructions. But normal spill/reload instructions also use FP/BP, so we can't spill them into normal spill slots, instead we spill them into the top of stack by using SP register.
I think what you suggest is feasible. I would try to skip this pass under the following conditions:
The vast majority of functions do not do both realign the stack in the prologue and adjust SP (dynamic alloca) after the prologue, necessitating a base pointer. -O0 functions do tend to have FPs, so it's worth adding an extra flag to track FP clobbering so we can early-exit in those cases. Note that the non-O0 compile time are not affected, presumably they use |
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/16/builds/3017 Here is the relevant piece of the build log for the reference:
|
…81048) This patch fixes #17204. If a base pointer is used in a function, and it is clobbered by an instruction (typically an inline asm), current register allocator can't handle this situation, so BP becomes garbage after those instructions. It can also occur to FP in theory. We can spill and reload FP/BP registers around those instructions. But normal spill/reload instructions also use FP/BP, so we can't spill them into normal spill slots, instead we spill them into the top of stack by using SP register.
…lvm#81048) This patch fixes llvm#17204. If a base pointer is used in a function, and it is clobbered by an instruction (typically an inline asm), current register allocator can't handle this situation, so BP becomes garbage after those instructions. It can also occur to FP in theory. We can spill and reload FP/BP registers around those instructions. But normal spill/reload instructions also use FP/BP, so we can't spill them into normal spill slots, instead we spill them into the top of stack by using SP register.
Is this intended to impact attribute(naked)) functions? With msvc x86 this attr causes asm to be emitted exactly as specified and clang diverges here. |
No, that's a bug; please open an issue. |
This patch fixes #17204.
If a base pointer is used in a function, and it is clobbered by an instruction (typically an inline asm), current register allocator can't handle this situation, so BP becomes garbage after those instructions. It can also occur to FP in theory.
We can spill and reload FP/BP registers around those instructions. But normal spill/reload instructions also use FP/BP, so we can't spill them into normal spill slots, instead we spill them into the top of stack by using SP register.