Skip to content

Commit

Permalink
Merge memtag instructions with adjacent stack slots.
Browse files Browse the repository at this point in the history
Summary:
Detect a run of memory tagging instructions for adjacent stack frame slots,
and replace them with a shorter instruction sequence
* replace STG + STG with ST2G
* replace STGloop + STGloop with STGloop

This code needs to run when stack slot offsets are already known, but before
FrameIndex operands in STG instructions are eliminated; that's the
reason for the new hook in PrologueEpilogue.

This change modifies STGloop and STZGloop pseudos to take the size as an
immediate integer operand, and base address as a FI operand when
possible. This is needed to simplify recognizing an STGloop instruction
as operating on a stack slot post-regalloc.

This improves memtag code size by ~0.25%, and it looks like an additional ~0.1%
is possible by rearranging the stack frame such that consecutive STG
instructions reference adjacent slots (patch pending).

Reviewers: pcc, ostannard

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70286
  • Loading branch information
eugenis committed Jan 8, 2020
1 parent ba181d0 commit b675a76
Show file tree
Hide file tree
Showing 13 changed files with 808 additions and 43 deletions.
7 changes: 7 additions & 0 deletions llvm/include/llvm/CodeGen/TargetFrameLowering.h
Expand Up @@ -309,6 +309,13 @@ class TargetFrameLowering {
RegScavenger *RS = nullptr) const {
}

/// processFunctionBeforeFrameIndicesReplaced - This method is called
/// immediately before MO_FrameIndex operands are eliminated, but after the
/// frame is finalized. This method is optional.
virtual void
processFunctionBeforeFrameIndicesReplaced(MachineFunction &MF,
RegScavenger *RS = nullptr) const {}

virtual unsigned getWinEHParentFrameOffset(const MachineFunction &MF) const {
report_fatal_error("WinEH not implemented for this target");
}
Expand Down
4 changes: 4 additions & 0 deletions llvm/lib/CodeGen/PrologEpilogInserter.cpp
Expand Up @@ -259,6 +259,10 @@ bool PEI::runOnMachineFunction(MachineFunction &MF) {
for (auto &I : EntryDbgValues)
I.first->insert(I.first->begin(), I.second.begin(), I.second.end());

// Allow the target machine to make final modifications to the function
// before the frame layout is finalized.
TFI->processFunctionBeforeFrameIndicesReplaced(MF, RS);

// Replace all MO_FrameIndex operands with physical register references
// and actual offsets.
//
Expand Down
24 changes: 20 additions & 4 deletions llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
Expand Up @@ -349,22 +349,38 @@ bool AArch64ExpandPseudo::expandSetTagLoop(
MachineBasicBlock::iterator &NextMBBI) {
MachineInstr &MI = *MBBI;
DebugLoc DL = MI.getDebugLoc();
Register SizeReg = MI.getOperand(2).getReg();
Register AddressReg = MI.getOperand(3).getReg();
Register SizeReg = MI.getOperand(0).getReg();
Register AddressReg = MI.getOperand(1).getReg();

MachineFunction *MF = MBB.getParent();

bool ZeroData = MI.getOpcode() == AArch64::STZGloop;
const unsigned OpCode =
const unsigned OpCode1 =
ZeroData ? AArch64::STZGPostIndex : AArch64::STGPostIndex;
const unsigned OpCode2 =
ZeroData ? AArch64::STZ2GPostIndex : AArch64::ST2GPostIndex;

unsigned Size = MI.getOperand(2).getImm();
assert(Size > 0 && Size % 16 == 0);
if (Size % (16 * 2) != 0) {
BuildMI(MBB, MBBI, DL, TII->get(OpCode1), AddressReg)
.addReg(AddressReg)
.addReg(AddressReg)
.addImm(1);
Size -= 16;
}
MachineBasicBlock::iterator I =
BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVi64imm), SizeReg)
.addImm(Size);
expandMOVImm(MBB, I, 64);

auto LoopBB = MF->CreateMachineBasicBlock(MBB.getBasicBlock());
auto DoneBB = MF->CreateMachineBasicBlock(MBB.getBasicBlock());

MF->insert(++MBB.getIterator(), LoopBB);
MF->insert(++LoopBB->getIterator(), DoneBB);

BuildMI(LoopBB, DL, TII->get(OpCode))
BuildMI(LoopBB, DL, TII->get(OpCode2))
.addDef(AddressReg)
.addReg(AddressReg)
.addReg(AddressReg)
Expand Down

0 comments on commit b675a76

Please sign in to comment.