Skip to content

Commit

Permalink
[ARM] Don't reserve R12 on Thumb1 as an emergency spill slot.
Browse files Browse the repository at this point in the history
The current implementation of ThumbRegisterInfo::saveScavengerRegister
is bad for two reasons: one, it's buggy, and two, it blocks using R12
for other optimizations.  So this patch gets rid of it, and adds the
necessary support for using an ordinary emergency spill slot on Thumb1.

(Specifically, I think saveScavengerRegister was broken by r305625, and
nobody noticed for two years because the codepath is almost never used.
The new code will also probably not be used much, but it now has better
tests, and if we fail to emit a necessary emergency spill slot we get a
reasonable error message instead of a miscompile.)

A rough outline of the changes in the patch:

1. Gets rid of ThumbRegisterInfo::saveScavengerRegister.
2. Modifies ARMFrameLowering::determineCalleeSaves to allocate an
emergency spill slot for Thumb1.
3. Implements useFPForScavengingIndex, so the emergency spill slot isn't
placed at a negative offset from FP on Thumb1.
4. Modifies the heuristics for allocating an emergency spill slot to
support Thumb1.  This includes fixing ExtraCSSpill so we don't try to
use "lr" as a substitute for allocating an emergency spill slot.
5. Allocates a base pointer in more cases, so the emergency spill slot
is always accessible.
6. Modifies ARMFrameLowering::ResolveFrameIndexReference to compute the
right offset in the new cases where we're forcing a base pointer.
7. Ensures we never generate a load or store with an offset outside of
its frame object.  This makes the heuristics more straightforward.
8. Changes Thumb1 prologue and epilogue emission so it never uses
register scavenging.

Some of the changes to the emergency spill slot heuristics in
determineCalleeSaves affect ARM/Thumb2; hopefully, they should allow
the compiler to avoid allocating an emergency spill slot in cases
where it isn't necessary. The rest of the changes should only affect
Thumb1.

Differential Revision: https://reviews.llvm.org/D63677

llvm-svn: 364490
  • Loading branch information
efriedma-quic committed Jun 26, 2019
1 parent d7999cb commit ab1d73e
Show file tree
Hide file tree
Showing 13 changed files with 626 additions and 211 deletions.
34 changes: 20 additions & 14 deletions llvm/lib/Target/ARM/ARMBaseRegisterInfo.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -370,29 +370,35 @@ bool ARMBaseRegisterInfo::hasBasePointer(const MachineFunction &MF) const {
const ARMFunctionInfo *AFI = MF.getInfo<ARMFunctionInfo>();
const ARMFrameLowering *TFI = getFrameLowering(MF);

// When outgoing call frames are so large that we adjust the stack pointer
// around the call, we can no longer use the stack pointer to reach the
// emergency spill slot.
// If we have stack realignment and VLAs, we have no pointer to use to
// access the stack. If we have stack realignment, and a large call frame,
// we have no place to allocate the emergency spill slot.
if (needsStackRealignment(MF) && !TFI->hasReservedCallFrame(MF))
return true;

// Thumb has trouble with negative offsets from the FP. Thumb2 has a limited
// negative range for ldr/str (255), and thumb1 is positive offsets only.
//
// It's going to be better to use the SP or Base Pointer instead. When there
// are variable sized objects, we can't reference off of the SP, so we
// reserve a Base Pointer.
if (AFI->isThumbFunction() && MFI.hasVarSizedObjects()) {
// Conservatively estimate whether the negative offset from the frame
// pointer will be sufficient to reach. If a function has a smallish
// frame, it's less likely to have lots of spills and callee saved
// space, so it's all more likely to be within range of the frame pointer.
// If it's wrong, the scavenger will still enable access to work, it just
// won't be optimal.
if (AFI->isThumb2Function() && MFI.getLocalFrameSize() < 128)
return false;
//
// For Thumb2, estimate whether a negative offset from the frame pointer
// will be sufficient to reach the whole stack frame. If a function has a
// smallish frame, it's less likely to have lots of spills and callee saved
// space, so it's all more likely to be within range of the frame pointer.
// If it's wrong, the scavenger will still enable access to work, it just
// won't be optimal. (We should always be able to reach the emergency
// spill slot from the frame pointer.)
if (AFI->isThumb2Function() && MFI.hasVarSizedObjects() &&
MFI.getLocalFrameSize() >= 128)
return true;
// For Thumb1, if sp moves, nothing is in range, so force a base pointer.
// This is necessary for correctness in cases where we need an emergency
// spill slot. (In Thumb1, we can't use a negative offset from the frame
// pointer.)
if (AFI->isThumb1OnlyFunction() && !TFI->hasReservedCallFrame(MF))
return true;
}

return false;
}

Expand Down
98 changes: 81 additions & 17 deletions llvm/lib/Target/ARM/ARMFrameLowering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -344,6 +344,10 @@ static void emitAligningInstructions(MachineFunction &MF, ARMFunctionInfo *AFI,
/// as assignCalleeSavedSpillSlots() hasn't run at this point. Instead we use
/// this to produce a conservative estimate that we check in an assert() later.
static int getMaxFPOffset(const Function &F, const ARMFunctionInfo &AFI) {
// For Thumb1, push.w isn't available, so the first push will always push
// r7 and lr onto the stack first.
if (AFI.isThumb1OnlyFunction())
return -AFI.getArgRegsSaveSize() - (2 * 4);
// This is a conservative estimation: Assume the frame pointer being r7 and
// pc("r15") up to r8 getting spilled before (= 8 registers).
return -AFI.getArgRegsSaveSize() - (8 * 4);
Expand Down Expand Up @@ -954,8 +958,12 @@ ARMFrameLowering::ResolveFrameIndexReference(const MachineFunction &MF,
}
}
// Use the base pointer if we have one.
if (RegInfo->hasBasePointer(MF))
// FIXME: Maybe prefer sp on Thumb1 if it's legal and the offset is cheaper?
// That can happen if we forced a base pointer for a large call frame.
if (RegInfo->hasBasePointer(MF)) {
FrameReg = RegInfo->getBaseRegister();
Offset -= SPAdj;
}
return Offset;
}

Expand Down Expand Up @@ -1775,13 +1783,59 @@ void ARMFrameLowering::determineCalleeSaves(MachineFunction &MF,
}
EstimatedStackSize += 16; // For possible paddings.

unsigned EstimatedRSStackSizeLimit = estimateRSStackSizeLimit(MF, this);
unsigned EstimatedRSStackSizeLimit, EstimatedRSFixedSizeLimit;
if (AFI->isThumb1OnlyFunction()) {
// For Thumb1, don't bother to iterate over the function. The only
// instruction that requires an emergency spill slot is a store to a
// frame index.
//
// tSTRspi, which is used for sp-relative accesses, has an 8-bit unsigned
// immediate. tSTRi, which is used for bp- and fp-relative accesses, has
// a 5-bit unsigned immediate.
//
// We could try to check if the function actually contains a tSTRspi
// that might need the spill slot, but it's not really important.
// Functions with VLAs or extremely large call frames are rare, and
// if a function is allocating more than 1KB of stack, an extra 4-byte
// slot probably isn't relevant.
if (RegInfo->hasBasePointer(MF))
EstimatedRSStackSizeLimit = (1U << 5) * 4;
else
EstimatedRSStackSizeLimit = (1U << 8) * 4;
EstimatedRSFixedSizeLimit = (1U << 5) * 4;
} else {
EstimatedRSStackSizeLimit = estimateRSStackSizeLimit(MF, this);
EstimatedRSFixedSizeLimit = EstimatedRSStackSizeLimit;
}
// Final estimate of whether sp or bp-relative accesses might require
// scavenging.
bool HasLargeStack = EstimatedStackSize > EstimatedRSStackSizeLimit;

// If the stack pointer moves and we don't have a base pointer, the
// estimate logic doesn't work. The actual offsets might be larger when
// we're constructing a call frame, or we might need to use negative
// offsets from fp.
bool HasMovingSP = MFI.hasVarSizedObjects() ||
(MFI.adjustsStack() && !canSimplifyCallFramePseudos(MF));
bool HasBPOrFixedSP = RegInfo->hasBasePointer(MF) || !HasMovingSP;

// If we have a frame pointer, we assume arguments will be accessed
// relative to the frame pointer. Check whether fp-relative accesses to
// arguments require scavenging.
//
// We could do slightly better on Thumb1; in some cases, an sp-relative
// offset would be legal even though an fp-relative offset is not.
int MaxFPOffset = getMaxFPOffset(MF.getFunction(), *AFI);
bool BigFrameOffsets = EstimatedStackSize >= EstimatedRSStackSizeLimit ||
MFI.hasVarSizedObjects() ||
(MFI.adjustsStack() && !canSimplifyCallFramePseudos(MF)) ||
// For large argument stacks fp relative addressed may overflow.
(HasFP && (MaxFixedOffset - MaxFPOffset) >= (int)EstimatedRSStackSizeLimit);
bool HasLargeArgumentList =
HasFP && (MaxFixedOffset - MaxFPOffset) > (int)EstimatedRSFixedSizeLimit;

bool BigFrameOffsets = HasLargeStack || !HasBPOrFixedSP ||
HasLargeArgumentList;
LLVM_DEBUG(dbgs() << "EstimatedLimit: " << EstimatedRSStackSizeLimit
<< "; EstimatedStack" << EstimatedStackSize
<< "; EstimatedFPStack" << MaxFixedOffset - MaxFPOffset
<< "; BigFrameOffsets: " << BigFrameOffsets
<< "\n");
if (BigFrameOffsets ||
!CanEliminateFrame || RegInfo->cannotEliminateFrame(MF)) {
AFI->setHasStackFrame(true);
Expand All @@ -1806,8 +1860,17 @@ void ARMFrameLowering::determineCalleeSaves(MachineFunction &MF,
CS1Spilled = true;
}

// This is true when we inserted a spill for an unused register that can now
// be used for register scavenging.
// This is true when we inserted a spill for a callee-save GPR which is
// not otherwise used by the function. This guaranteees it is possible
// to scavenge a register to hold the address of a stack slot. On Thumb1,
// the register must be a valid operand to tSTRi, i.e. r4-r7. For other
// subtargets, this is any GPR, i.e. r4-r11 or lr.
//
// If we don't insert a spill, we instead allocate an emergency spill
// slot, which can be used by scavenging to spill an arbitrary register.
//
// We currently don't try to figure out whether any specific instruction
// requires scavening an additional register.
bool ExtraCSSpill = false;

if (AFI->isThumb1OnlyFunction()) {
Expand Down Expand Up @@ -1916,7 +1979,7 @@ void ARMFrameLowering::determineCalleeSaves(MachineFunction &MF,
NumGPRSpills++;
CS1Spilled = true;
assert(!MRI.isReserved(Reg) && "Should not be reserved");
if (!MRI.isPhysRegUsed(Reg))
if (Reg != ARM::LR && !MRI.isPhysRegUsed(Reg))
ExtraCSSpill = true;
UnspilledCS1GPRs.erase(llvm::find(UnspilledCS1GPRs, Reg));
if (Reg == ARM::LR)
Expand All @@ -1941,7 +2004,8 @@ void ARMFrameLowering::determineCalleeSaves(MachineFunction &MF,
UnspilledCS1GPRs.erase(LRPos);

ForceLRSpill = false;
if (!MRI.isReserved(ARM::LR) && !MRI.isPhysRegUsed(ARM::LR))
if (!MRI.isReserved(ARM::LR) && !MRI.isPhysRegUsed(ARM::LR) &&
!AFI->isThumb1OnlyFunction())
ExtraCSSpill = true;
}

Expand All @@ -1963,7 +2027,8 @@ void ARMFrameLowering::determineCalleeSaves(MachineFunction &MF,
SavedRegs.set(Reg);
LLVM_DEBUG(dbgs() << "Spilling " << printReg(Reg, TRI)
<< " to make up alignment\n");
if (!MRI.isReserved(Reg) && !MRI.isPhysRegUsed(Reg))
if (!MRI.isReserved(Reg) && !MRI.isPhysRegUsed(Reg) &&
!(Reg == ARM::LR && AFI->isThumb1OnlyFunction()))
ExtraCSSpill = true;
break;
}
Expand Down Expand Up @@ -1992,8 +2057,7 @@ void ARMFrameLowering::determineCalleeSaves(MachineFunction &MF,
unsigned Reg = UnspilledCS1GPRs.back();
UnspilledCS1GPRs.pop_back();
if (!MRI.isReserved(Reg) &&
(!AFI->isThumb1OnlyFunction() || isARMLowRegister(Reg) ||
Reg == ARM::LR)) {
(!AFI->isThumb1OnlyFunction() || isARMLowRegister(Reg))) {
Extras.push_back(Reg);
NumExtras--;
}
Expand All @@ -2016,10 +2080,10 @@ void ARMFrameLowering::determineCalleeSaves(MachineFunction &MF,
ExtraCSSpill = true;
}
}
if (!ExtraCSSpill && !AFI->isThumb1OnlyFunction()) {
// note: Thumb1 functions spill to R12, not the stack. Reserve a slot
// closest to SP or frame pointer.
if (!ExtraCSSpill) {
// Reserve a slot closest to SP or frame pointer.
assert(RS && "Register scavenging not provided");
LLVM_DEBUG(dbgs() << "Reserving emergency spill slot\n");
const TargetRegisterClass &RC = ARM::GPRRegClass;
unsigned Size = TRI->getSpillSize(RC);
unsigned Align = TRI->getSpillAlignment(RC);
Expand Down
23 changes: 15 additions & 8 deletions llvm/lib/Target/ARM/ARMISelDAGToDAG.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1150,15 +1150,22 @@ bool ARMDAGToDAGISel::SelectThumbAddrModeSP(SDValue N,
if (isScaledConstantInRange(N.getOperand(1), /*Scale=*/4, 0, 256, RHSC)) {
Base = N.getOperand(0);
int FI = cast<FrameIndexSDNode>(Base)->getIndex();
// For LHS+RHS to result in an offset that's a multiple of 4 the object
// indexed by the LHS must be 4-byte aligned.
// Make sure the offset is inside the object, or we might fail to
// allocate an emergency spill slot. (An out-of-range access is UB, but
// it could show up anyway.)
MachineFrameInfo &MFI = MF->getFrameInfo();
if (MFI.getObjectAlignment(FI) < 4)
MFI.setObjectAlignment(FI, 4);
Base = CurDAG->getTargetFrameIndex(
FI, TLI->getPointerTy(CurDAG->getDataLayout()));
OffImm = CurDAG->getTargetConstant(RHSC, SDLoc(N), MVT::i32);
return true;
if (RHSC * 4 < MFI.getObjectSize(FI)) {
// For LHS+RHS to result in an offset that's a multiple of 4 the object
// indexed by the LHS must be 4-byte aligned.
if (!MFI.isFixedObjectIndex(FI) && MFI.getObjectAlignment(FI) < 4)
MFI.setObjectAlignment(FI, 4);
if (MFI.getObjectAlignment(FI) >= 4) {
Base = CurDAG->getTargetFrameIndex(
FI, TLI->getPointerTy(CurDAG->getDataLayout()));
OffImm = CurDAG->getTargetConstant(RHSC, SDLoc(N), MVT::i32);
return true;
}
}
}
}

Expand Down
3 changes: 2 additions & 1 deletion llvm/lib/Target/ARM/ARMISelLowering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3652,7 +3652,8 @@ void ARMTargetLowering::VarArgStyleRegisters(CCState &CCInfo, SelectionDAG &DAG,
// argument passed via stack.
int FrameIndex = StoreByValRegs(CCInfo, DAG, dl, Chain, nullptr,
CCInfo.getInRegsParamsCount(),
CCInfo.getNextStackOffset(), 4);
CCInfo.getNextStackOffset(),
std::max(4U, TotalArgRegsSaveSize));
AFI->setVarArgsFrameIndex(FrameIndex);
}

Expand Down
Loading

0 comments on commit ab1d73e

Please sign in to comment.