-
Notifications
You must be signed in to change notification settings - Fork 11.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AArch64] Allow LDR merge with same destination register by renaming #71908
Conversation
@llvm/pr-subscribers-backend-aarch64 Author: Zhaoxuan Jiang (nocchijiang) ChangesThe patch is based on a reverted patch: https://reviews.llvm.org/D103597. It was trying to rename registers before alias check, which is not safe and causes miscompiles. This patch does 2 things:
Necessary refactoring has been made in order to reuse as much code possible with STR renaming. Patch is 61.47 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/71908.diff 10 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp b/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
index 299ea43a539edb7..3827cef566f12ee 100644
--- a/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
+++ b/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
@@ -816,12 +816,11 @@ AArch64LoadStoreOpt::mergeNarrowZeroStores(MachineBasicBlock::iterator I,
return NextI;
}
-// Apply Fn to all instructions between MI and the beginning of the block, until
-// a def for DefReg is reached. Returns true, iff Fn returns true for all
-// visited instructions. Stop after visiting Limit iterations.
-static bool forAllMIsUntilDef(MachineInstr &MI, MCPhysReg DefReg,
- const TargetRegisterInfo *TRI, unsigned Limit,
- std::function<bool(MachineInstr &, bool)> &Fn) {
+static bool forAllMIsUntil(MachineInstr &MI,
+ std::function<bool(MachineInstr &MI, bool)> Until,
+ MCPhysReg DefReg, const TargetRegisterInfo *TRI,
+ unsigned Limit,
+ std::function<bool(MachineInstr &, bool)> &Fn) {
auto MBB = MI.getParent();
for (MachineInstr &I :
instructionsWithoutDebug(MI.getReverseIterator(), MBB->instr_rend())) {
@@ -841,6 +840,17 @@ static bool forAllMIsUntilDef(MachineInstr &MI, MCPhysReg DefReg,
return true;
}
+// Apply Fn to all instructions between MI and the beginning of the block, until
+// a def for DefReg is reached. Returns true, iff Fn returns true for all
+// visited instructions. Stop after visiting Limit iterations.
+static bool forAllMIsUntilDef(MachineInstr &MI, MCPhysReg DefReg,
+ const TargetRegisterInfo *TRI, unsigned Limit,
+ std::function<bool(MachineInstr &, bool)> &Fn) {
+ return forAllMIsUntil(
+ MI, [](MachineInstr &, bool IsDef) { return IsDef; }, DefReg, TRI, Limit,
+ Fn);
+}
+
static void updateDefinedRegisters(MachineInstr &MI, LiveRegUnits &Units,
const TargetRegisterInfo *TRI) {
@@ -875,7 +885,7 @@ AArch64LoadStoreOpt::mergePairedInsns(MachineBasicBlock::iterator I,
bool MergeForward = Flags.getMergeForward();
std::optional<MCPhysReg> RenameReg = Flags.getRenameReg();
- if (MergeForward && RenameReg) {
+ if (RenameReg) {
MCRegister RegToRename = getLdStRegOp(*I).getReg();
DefinedInBB.addReg(*RenameReg);
@@ -892,7 +902,8 @@ AArch64LoadStoreOpt::mergePairedInsns(MachineBasicBlock::iterator I,
};
std::function<bool(MachineInstr &, bool)> UpdateMIs =
- [this, RegToRename, GetMatchingSubReg](MachineInstr &MI, bool IsDef) {
+ [this, RegToRename, GetMatchingSubReg, MergeForward](MachineInstr &MI,
+ bool IsDef) {
if (IsDef) {
bool SeenDef = false;
for (unsigned OpIdx = 0; OpIdx < MI.getNumOperands(); ++OpIdx) {
@@ -900,7 +911,8 @@ AArch64LoadStoreOpt::mergePairedInsns(MachineBasicBlock::iterator I,
// Rename the first explicit definition and all implicit
// definitions matching RegToRename.
if (MOP.isReg() && !MOP.isDebug() && MOP.getReg() &&
- (!SeenDef || (MOP.isDef() && MOP.isImplicit())) &&
+ (!MergeForward || !SeenDef ||
+ (MOP.isDef() && MOP.isImplicit())) &&
TRI->regsOverlap(MOP.getReg(), RegToRename)) {
assert((MOP.isImplicit() ||
(MOP.isRenamable() && !MOP.isEarlyClobber())) &&
@@ -938,20 +950,35 @@ AArch64LoadStoreOpt::mergePairedInsns(MachineBasicBlock::iterator I,
LLVM_DEBUG(dbgs() << "Renamed " << MI << "\n");
return true;
};
- forAllMIsUntilDef(*I, RegToRename, TRI, LdStLimit, UpdateMIs);
+ if (MergeForward)
+ forAllMIsUntilDef(*I, RegToRename, TRI, LdStLimit, UpdateMIs);
+ else
+ forAllMIsUntil(
+ *std::prev(Paired),
+ [&I](MachineInstr &MI, bool) { return &MI == &*I; }, RegToRename, TRI,
+ LdStLimit, UpdateMIs);
#if !defined(NDEBUG)
- // Make sure the register used for renaming is not used between the paired
- // instructions. That would trash the content before the new paired
- // instruction.
+ // For forward merging store:
+ // Make sure the register used for renaming is not used between the
+ // paired instructions. That would trash the content before the new
+ // paired instruction.
+ MCPhysReg RegToCheck = *RenameReg;
+ // For backward merging load:
+ // Make sure the register being renamed is not used between the
+ // paired instructions. That would trash the content after the new
+ // paired instruction.
+ if (!MergeForward)
+ RegToCheck = RegToRename;
for (auto &MI :
iterator_range<MachineInstrBundleIterator<llvm::MachineInstr>>(
- std::next(I), std::next(Paired)))
+ MergeForward ? std::next(I) : I,
+ MergeForward ? std::next(Paired) : Paired))
assert(all_of(MI.operands(),
- [this, &RenameReg](const MachineOperand &MOP) {
+ [this, RegToCheck](const MachineOperand &MOP) {
return !MOP.isReg() || MOP.isDebug() || !MOP.getReg() ||
MOP.isUndef() ||
- !TRI->regsOverlap(MOP.getReg(), *RenameReg);
+ !TRI->regsOverlap(MOP.getReg(), RegToCheck);
}) &&
"Rename register used between paired instruction, trashing the "
"content");
@@ -1397,6 +1424,38 @@ static bool areCandidatesToMergeOrPair(MachineInstr &FirstMI, MachineInstr &MI,
// FIXME: Can we also match a mixed sext/zext unscaled/scaled pair?
}
+static bool canRenameMOP(const MachineOperand &MOP,
+ const TargetRegisterInfo *TRI) {
+ if (MOP.isReg()) {
+ auto *RegClass = TRI->getMinimalPhysRegClass(MOP.getReg());
+ // Renaming registers with multiple disjunct sub-registers (e.g. the
+ // result of a LD3) means that all sub-registers are renamed, potentially
+ // impacting other instructions we did not check. Bail out.
+ // Note that this relies on the structure of the AArch64 register file. In
+ // particular, a subregister cannot be written without overwriting the
+ // whole register.
+ if (RegClass->HasDisjunctSubRegs) {
+ LLVM_DEBUG(
+ dbgs()
+ << " Cannot rename operands with multiple disjunct subregisters ("
+ << MOP << ")\n");
+ return false;
+ }
+
+ // We cannot rename arbitrary implicit-defs, the specific rule to rewrite
+ // them must be known. For example, in ORRWrs the implicit-def
+ // corresponds to the result register.
+ if (MOP.isImplicit() && MOP.isDef()) {
+ if (!isRewritableImplicitDef(MOP.getParent()->getOpcode()))
+ return false;
+ return TRI->isSuperOrSubRegisterEq(
+ MOP.getParent()->getOperand(0).getReg(), MOP.getReg());
+ }
+ }
+ return MOP.isImplicit() ||
+ (MOP.isRenamable() && !MOP.isEarlyClobber() && !MOP.isTied());
+}
+
static bool
canRenameUpToDef(MachineInstr &FirstMI, LiveRegUnits &UsedInBetween,
SmallPtrSetImpl<const TargetRegisterClass *> &RequiredClasses,
@@ -1406,10 +1465,6 @@ canRenameUpToDef(MachineInstr &FirstMI, LiveRegUnits &UsedInBetween,
// Check if we can find an unused register which we can use to rename
// the register used by the first load/store.
- auto *RegClass = TRI->getMinimalPhysRegClass(getLdStRegOp(FirstMI).getReg());
- MachineFunction &MF = *FirstMI.getParent()->getParent();
- if (!RegClass || !MF.getRegInfo().tracksLiveness())
- return false;
auto RegToRename = getLdStRegOp(FirstMI).getReg();
// For now, we only rename if the store operand gets killed at the store.
@@ -1423,36 +1478,6 @@ canRenameUpToDef(MachineInstr &FirstMI, LiveRegUnits &UsedInBetween,
LLVM_DEBUG(dbgs() << " Operand not killed at " << FirstMI << "\n");
return false;
}
- auto canRenameMOP = [TRI](const MachineOperand &MOP) {
- if (MOP.isReg()) {
- auto *RegClass = TRI->getMinimalPhysRegClass(MOP.getReg());
- // Renaming registers with multiple disjunct sub-registers (e.g. the
- // result of a LD3) means that all sub-registers are renamed, potentially
- // impacting other instructions we did not check. Bail out.
- // Note that this relies on the structure of the AArch64 register file. In
- // particular, a subregister cannot be written without overwriting the
- // whole register.
- if (RegClass->HasDisjunctSubRegs) {
- LLVM_DEBUG(
- dbgs()
- << " Cannot rename operands with multiple disjunct subregisters ("
- << MOP << ")\n");
- return false;
- }
-
- // We cannot rename arbitrary implicit-defs, the specific rule to rewrite
- // them must be known. For example, in ORRWrs the implicit-def
- // corresponds to the result register.
- if (MOP.isImplicit() && MOP.isDef()) {
- if (!isRewritableImplicitDef(MOP.getParent()->getOpcode()))
- return false;
- return TRI->isSuperOrSubRegisterEq(
- MOP.getParent()->getOperand(0).getReg(), MOP.getReg());
- }
- }
- return MOP.isImplicit() ||
- (MOP.isRenamable() && !MOP.isEarlyClobber() && !MOP.isTied());
- };
bool FoundDef = false;
@@ -1495,7 +1520,7 @@ canRenameUpToDef(MachineInstr &FirstMI, LiveRegUnits &UsedInBetween,
if (!MOP.isReg() || !MOP.isDef() || MOP.isDebug() || !MOP.getReg() ||
!TRI->regsOverlap(MOP.getReg(), RegToRename))
continue;
- if (!canRenameMOP(MOP)) {
+ if (!canRenameMOP(MOP, TRI)) {
LLVM_DEBUG(dbgs()
<< " Cannot rename " << MOP << " in " << MI << "\n");
return false;
@@ -1509,7 +1534,7 @@ canRenameUpToDef(MachineInstr &FirstMI, LiveRegUnits &UsedInBetween,
!TRI->regsOverlap(MOP.getReg(), RegToRename))
continue;
- if (!canRenameMOP(MOP)) {
+ if (!canRenameMOP(MOP, TRI)) {
LLVM_DEBUG(dbgs()
<< " Cannot rename " << MOP << " in " << MI << "\n");
return false;
@@ -1530,6 +1555,56 @@ canRenameUpToDef(MachineInstr &FirstMI, LiveRegUnits &UsedInBetween,
return true;
}
+// We want to merge the second load into the first by rewriting the usages of
+// the same reg between first (incl.) and second (excl.). We don't need to care
+// about any insns before FirstLoad or after SecondLoad.
+// 1. The second load writes new value into the same reg.
+// - The renaming is impossible to impact later use of the reg.
+// - The second load always trash the value written by the first load which
+// means the reg must be killed before the second load.
+// 2. The first load must be a def for the same reg so we don't need to look
+// into anything before it.
+static bool canRenameUntilSecondLoad(
+ MachineInstr &FirstLoad, MachineInstr &SecondLoad,
+ LiveRegUnits &UsedInBetween,
+ SmallPtrSetImpl<const TargetRegisterClass *> &RequiredClasses,
+ const TargetRegisterInfo *TRI) {
+ if (FirstLoad.isPseudo())
+ return false;
+
+ UsedInBetween.accumulate(FirstLoad);
+ auto RegToRename = getLdStRegOp(FirstLoad).getReg();
+ bool Success = std::all_of(
+ FirstLoad.getIterator(), SecondLoad.getIterator(),
+ [&](MachineBasicBlock::iterator MBBI) {
+ MachineInstr &MI = *MBBI;
+
+ LLVM_DEBUG(dbgs() << "Checking " << MI << "\n");
+ // Currently we do not try to rename across frame-setup instructions.
+ if (MI.getFlag(MachineInstr::FrameSetup)) {
+ LLVM_DEBUG(dbgs()
+ << " Cannot rename framesetup instructions currently ("
+ << MI << ")\n");
+ return false;
+ }
+
+ for (auto &MOP : MI.operands()) {
+ if (!MOP.isReg() || MOP.isDebug() || !MOP.getReg() ||
+ !TRI->regsOverlap(MOP.getReg(), RegToRename))
+ continue;
+ if (!canRenameMOP(MOP, TRI)) {
+ LLVM_DEBUG(dbgs()
+ << " Cannot rename " << MOP << " in " << MI << "\n");
+ return false;
+ }
+ RequiredClasses.insert(TRI->getMinimalPhysRegClass(MOP.getReg()));
+ }
+
+ return true;
+ });
+ return Success;
+}
+
// Check if we can find a physical register for renaming \p Reg. This register
// must:
// * not be defined already in \p DefinedInBB; DefinedInBB must contain all
@@ -1578,6 +1653,41 @@ static std::optional<MCPhysReg> tryToFindRegisterToRename(
return std::nullopt;
}
+// For store pairs: returns a register from FirstMI to the beginning of the
+// block that can be renamed.
+// For load pairs: returns a register from FirstMI to MI that can be renamed.
+static std::optional<MCPhysReg> findRenameRegForSameLdStRegPair(
+ std::optional<bool> MaybeCanRename, MachineInstr &FirstMI, MachineInstr &MI,
+ Register Reg, LiveRegUnits &DefinedInBB, LiveRegUnits &UsedInBetween,
+ SmallPtrSetImpl<const TargetRegisterClass *> &RequiredClasses,
+ const TargetRegisterInfo *TRI) {
+ std::optional<MCPhysReg> RenameReg;
+ if (!DebugCounter::shouldExecute(RegRenamingCounter))
+ return RenameReg;
+
+ auto *RegClass = TRI->getMinimalPhysRegClass(getLdStRegOp(FirstMI).getReg());
+ MachineFunction &MF = *FirstMI.getParent()->getParent();
+ if (!RegClass || !MF.getRegInfo().tracksLiveness())
+ return RenameReg;
+
+ const bool IsLoad = FirstMI.mayLoad();
+
+ if (!MaybeCanRename) {
+ if (IsLoad)
+ MaybeCanRename = {canRenameUntilSecondLoad(FirstMI, MI, UsedInBetween,
+ RequiredClasses, TRI)};
+ else
+ MaybeCanRename = {
+ canRenameUpToDef(FirstMI, UsedInBetween, RequiredClasses, TRI)};
+ }
+
+ if (*MaybeCanRename) {
+ RenameReg = tryToFindRegisterToRename(MF, Reg, DefinedInBB, UsedInBetween,
+ RequiredClasses, TRI);
+ }
+ return RenameReg;
+}
+
/// Scan the instructions looking for a load/store that can be combined with the
/// current instruction into a wider equivalent or a load/store pair.
MachineBasicBlock::iterator
@@ -1730,17 +1840,6 @@ AArch64LoadStoreOpt::findMatchingInsn(MachineBasicBlock::iterator I,
continue;
}
}
- // If the destination register of one load is the same register or a
- // sub/super register of the other load, bail and keep looking. A
- // load-pair instruction with both destination registers the same is
- // UNPREDICTABLE and will result in an exception.
- if (MayLoad &&
- TRI->isSuperOrSubRegisterEq(Reg, getLdStRegOp(MI).getReg())) {
- LiveRegUnits::accumulateUsedDefed(MI, ModifiedRegUnits, UsedRegUnits,
- TRI);
- MemInsns.push_back(&MI);
- continue;
- }
// If the BaseReg has been modified, then we cannot do the optimization.
// For example, in the following pattern
@@ -1751,17 +1850,37 @@ AArch64LoadStoreOpt::findMatchingInsn(MachineBasicBlock::iterator I,
if (!ModifiedRegUnits.available(BaseReg))
return E;
+ const bool SameLoadReg = MayLoad && TRI->isSuperOrSubRegisterEq(
+ Reg, getLdStRegOp(MI).getReg());
+
// If the Rt of the second instruction was not modified or used between
// the two instructions and none of the instructions between the second
// and first alias with the second, we can combine the second into the
// first.
if (ModifiedRegUnits.available(getLdStRegOp(MI).getReg()) &&
- !(MI.mayLoad() &&
+ !(MI.mayLoad() && !SameLoadReg &&
!UsedRegUnits.available(getLdStRegOp(MI).getReg())) &&
!mayAlias(MI, MemInsns, AA)) {
+ // For pairs loading into the same reg, try to find a renaming
+ // opportunity to allow the renaming of Reg between FirstMI and MI
+ // and combine MI into FirstMI; otherwise bail and keep looking.
+ if (SameLoadReg) {
+ std::optional<MCPhysReg> RenameReg =
+ findRenameRegForSameLdStRegPair(MaybeCanRename, FirstMI, MI,
+ Reg, DefinedInBB, UsedInBetween,
+ RequiredClasses, TRI);
+ if (!RenameReg) {
+ LiveRegUnits::accumulateUsedDefed(MI, ModifiedRegUnits,
+ UsedRegUnits, TRI);
+ MemInsns.push_back(&MI);
+ continue;
+ }
+ Flags.setRenameReg(*RenameReg);
+ }
Flags.setMergeForward(false);
- Flags.clearRenameReg();
+ if (!SameLoadReg)
+ Flags.clearRenameReg();
return MBBI;
}
@@ -1779,22 +1898,13 @@ AArch64LoadStoreOpt::findMatchingInsn(MachineBasicBlock::iterator I,
return MBBI;
}
- if (DebugCounter::shouldExecute(RegRenamingCounter)) {
- if (!MaybeCanRename)
- MaybeCanRename = {canRenameUpToDef(FirstMI, UsedInBetween,
- RequiredClasses, TRI)};
-
- if (*MaybeCanRename) {
- std::optional<MCPhysReg> MaybeRenameReg =
- tryToFindRegisterToRename(*FirstMI.getParent()->getParent(),
- Reg, DefinedInBB, UsedInBetween,
- RequiredClasses, TRI);
- if (MaybeRenameReg) {
- Flags.setRenameReg(*MaybeRenameReg);
- Flags.setMergeForward(true);
- MBBIWithRenameReg = MBBI;
- }
- }
+ std::optional<MCPhysReg> RenameReg = findRenameRegForSameLdStRegPair(
+ MaybeCanRename, FirstMI, MI, Reg, DefinedInBB, UsedInBetween,
+ RequiredClasses, TRI);
+ if (RenameReg) {
+ Flags.setMergeForward(true);
+ Flags.setRenameReg(*RenameReg);
+ MBBIWithRenameReg = MBBI;
}
}
// Unable to combine these instructions due to interference in between.
diff --git a/llvm/test/CodeGen/AArch64/arm64-variadic-aapcs.ll b/llvm/test/CodeGen/AArch64/arm64-variadic-aapcs.ll
index de07ed1b5d7ec96..e688af7fdeca38a 100644
--- a/llvm/test/CodeGen/AArch64/arm64-variadic-aapcs.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-variadic-aapcs.ll
@@ -125,9 +125,8 @@ define dso_local void @test_va_copy() {
; CHECK: add x[[SRC:[0-9]+]], {{x[0-9]+}}, :lo12:var
-; CHECK: ldr [[BLOCKB:q[0-9]+]], [x[[SRC]], #16]
+; CHECK: ldp [[BLOCKA:q[0-9]+]], [[BLOCKB:q[0-9]+]], [x[[SRC]]]
; CHECK: add x[[DST:[0-9]+]], {{x[0-9]+}}, :lo12:second_list
-; CHECK: ldr [[BLOCKA:q[0-9]+]], [x[[SRC]]]
; CHECK: stp [[BLOCKA]], [[BLOCKB]], [x[[DST]]]
ret void
; CHECK: ret
diff --git a/llvm/test/CodeGen/AArch64/fexplog.ll b/llvm/test/CodeGen/AArch64/fexplog.ll
index be6040faa2836bf..26c0b68307b32b4 100644
--- a/llvm/test/CodeGen/AArch64/fexplog.ll
+++ b/llvm/test/CodeGen/AArch64/fexplog.ll
@@ -713,14 +713,12 @@ define <7 x half> @exp_v7f16(<7 x half> %a) {
; CHECK-GI-NEXT: ldr x30, [sp, #144] // 8-byte Folded Reload
; CHECK-GI-NEXT: ldp d11, d10, [sp, #112] // 16-byte Folded Reload
; CHECK-GI-NEXT: mov v1.h[1], v2.h[0]
-; CHECK-GI-NEXT: ldr q2, [sp, #48] // 16-byte Folded Reload
+; CHECK-GI-NEXT: ldp q2, q3, [sp, #32] // 32-byte Folded Reload
; CHECK-GI-NEXT: ldp d13, d12, [sp, #96] // 16-byte Folded Reload
-; CHECK-GI-NEXT: mov v1.h[2], v2.h[0]
-; CHECK-GI-NEXT: ldr q2, [sp, #32] // 16-byte Folded Reload
+; CHECK-GI-NEXT: mov v1.h[2], v3.h[0]
; CHECK-GI-NEXT: mov v1.h[3], v2.h[0]
-; CHECK-GI-NEXT: ldr q2, [sp, #16] // 16-byte Folded Reload
-; CHECK-GI-NEXT: mov v1.h[4], v2.h[0]
-; CHECK-GI-NEXT: ldr q2, [sp] // 16-byte Folded Reload
+; CHECK-GI-NEXT: ldp q2, q3, [sp] // 32-byte Folded Reload
+; CHECK-GI-NEXT: mov v1.h[4], v3.h[0]
; CHECK-GI-NEXT: mov v1.h[5], v2.h[0]
; CHECK-GI-NEXT: mov v1.h[6]...
[truncated]
|
@mstorsjo @sjoerdmeijer @fhahn @MeeraN7 Since I don't have commit access, please help add reviewers for me and take a look at the PR when you are free. Thanks! |
Hi. This looks like an interesting patch. Have you tried running a bootstrap or the llvm-test-suite, to make sure it behaves OK? Thanks |
I tested the patch by running llvm-test-suite before rebasing onto the tip of I'm not familiar with bootstrapping stuff. I'm working on an Apple Silicon Mac so I followed this guide to build Apple clang but no luck:
Update: I just tried the simplest bootstrapping configuration introduced by the guide above and the bootstrapping process finished successfully. Please let me know if I should test some other stuff if necessary. |
d95829d
to
126c69f
Compare
Rebased onto latest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Whenever we try these kind of things there seems to often be a high chance of problems occurring. Please make sure you have tested thoroughly, but from what I can tell this LGTM.
The patch is based on a reverted patch: https://reviews.llvm.org/D103597. It was trying to rename registers before alias check, which is not safe and causes miscompiles. This patch does 2 things: 1. Do the renaming with necessary checks passed, including alias check. 2. Rename the register for the instructions between the pairs and combine the second load into the first. By doing so we can just check the renamability between the pairs and avoid scanning unknown amount of instructions before/after the pairs. Necessary refactoring has been made in order to reuse as much code as possible with STR renaming.
b99641a
to
1dca0af
Compare
Rebased, squashed all commits and addressed comments. @davemgreen Since I don't have commit access yet, could you please merge the PR for me? Before pushing the update, I have tested the patch by compiling Chromium using the bootstrapped Clang, passing all the unit tests from Chromium on an Apple Silicon Mac. |
Thanks. Sounds good. |
This includes a couple of fixes after #71908 for bundles and some cleanup for the debug output. One was an iterator type that asserted on bundles, the second a rather subtle issue where forAllMIsUntilDef would hit the LdStLimit when renaming registers, meaning the last instruction was not updated leaving an invalid `ldp x6, x6` instruction.
Target hook `canPairLdStOpc` is missing quite a few opcodes for which LDPs/STPs can created. I was hoping that it would not be necessary to add these missing opcodes here and that the attached motivating test case would be handled by the LoadStoreOptimiser (especially after llvm#71908), but it's not. The problem is that after register allocation some things are a lot harder to do. Consider this motivating example [1] renamable $q1 = LDURQi renamable $x9, -16 :: (load (s128) from %ir.r51, align 8, !tbaa !0) [2] renamable $q2 = LDURQi renamable $x0, -16 :: (load (s128) from %ir.r53, align 8, !tbaa !4) [3] renamable $q1 = nnan ninf nsz arcp contract afn reassoc nofpexcept FMLSv2f64 killed renamable $q1(tied-def 0), killed renamable $q2, renamable $q0, implicit $fpcr [4] STURQi killed renamable $q1, renamable $x9, -16 :: (store (s128) into %ir.r51, align 1, !tbaa !0) [5] renamable $q1 = LDRQui renamable $x9, 0 :: (load (s128) from %ir.r.G0001_609.0, align 8, !tbaa !0) We can't combine the the load in line [5] into the load on [1]: regisister q1 is used in between. And we can can't combine [1] into [5]: it is aliasing with the STR on line [4]. That's why I thought that adding this opcode pair here to the MI scheduler was a good compromise.
Target hook `canPairLdStOpc` is missing quite a few opcodes for which LDPs/STPs can created. I was hoping that it would not be necessary to add these missing opcodes here and that the attached motivating test case would be handled by the LoadStoreOptimiser (especially after #71908), but it's not. The problem is that after register allocation some things are a lot harder to do. Consider this for the motivating example ``` [1] renamable $q1 = LDURQi renamable $x9, -16 :: (load (s128) from %ir.r51, align 8, !tbaa !0) [2] renamable $q2 = LDURQi renamable $x0, -16 :: (load (s128) from %ir.r53, align 8, !tbaa !4) [3] renamable $q1 = nnan ninf nsz arcp contract afn reassoc nofpexcept FMLSv2f64 killed renamable $q1(tied-def 0), killed renamable $q2, renamable $q0, implicit $fpcr [4] STURQi killed renamable $q1, renamable $x9, -16 :: (store (s128) into %ir.r51, align 1, !tbaa !0) [5] renamable $q1 = LDRQui renamable $x9, 0 :: (load (s128) from %ir.r.G0001_609.0, align 8, !tbaa !0) ``` We can't combine the the load in line [5] into the load on [1]: regisister q1 is used in between. And we can can't combine [1] into [5]: it is aliasing with the STR on line [4]. So, adding some missing opcodes here seems the best/easiest approach. I will follow up to add some more missing cases here.
Target hook `canPairLdStOpc` is missing quite a few opcodes for which LDPs/STPs can created. I was hoping that it would not be necessary to add these missing opcodes here and that the attached motivating test case would be handled by the LoadStoreOptimiser (especially after llvm#71908), but it's not. The problem is that after register allocation some things are a lot harder to do. Consider this for the motivating example ``` [1] renamable $q1 = LDURQi renamable $x9, -16 :: (load (s128) from %ir.r51, align 8, !tbaa !0) [2] renamable $q2 = LDURQi renamable $x0, -16 :: (load (s128) from %ir.r53, align 8, !tbaa !4) [3] renamable $q1 = nnan ninf nsz arcp contract afn reassoc nofpexcept FMLSv2f64 killed renamable $q1(tied-def 0), killed renamable $q2, renamable $q0, implicit $fpcr [4] STURQi killed renamable $q1, renamable $x9, -16 :: (store (s128) into %ir.r51, align 1, !tbaa !0) [5] renamable $q1 = LDRQui renamable $x9, 0 :: (load (s128) from %ir.r.G0001_609.0, align 8, !tbaa !0) ``` We can't combine the the load in line [5] into the load on [1]: regisister q1 is used in between. And we can can't combine [1] into [5]: it is aliasing with the STR on line [4]. So, adding some missing opcodes here seems the best/easiest approach. I will follow up to add some more missing cases here.
The patch is based on a reverted patch: https://reviews.llvm.org/D103597. It was trying to rename registers before alias check, which is not safe and causes miscompiles. This patch does 2 things:
Necessary refactoring has been made in order to reuse as much code possible with STR renaming.