-
Notifications
You must be signed in to change notification settings - Fork 15.7k
[AArch64] Add new pass after VirtRegRewriter to add implicit-defs #174188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: users/sdesmalen-arm/srlt-mitigation-ldst-opt-renamable
Are you sure you want to change the base?
[AArch64] Add new pass after VirtRegRewriter to add implicit-defs #174188
Conversation
When SubRegister Liveness Tracking (SRLT) is enabled, this pass adds extra implicit-def's to instructions that define the low N bits of a GPR/FPR register to represent that the top bits are written, because all AArch64 instructions that write the low bits of a GPR/FPR also implicitly zero the top bits. These semantics are originally represented in the MIR using `SUBREG_TO_REG`, but during register coalescing this information is lost and when rewriting virtual -> physical registers the implicit-defs are not added to represent the the top bits are written. There have been several attempts to fix this in the coalescer (#168353), but each iteration has exposed new bugs and the patch had to be reverted. Additionally, the concept of adding 'implicit-def' of a virtual register during the register allocation process is particularly fragile and many places don't expect it (for example in `X86::commuteInstructionImpl` the code only looks at specific operands and does not consider implicit-defs. Similar in `SplitEditor::addDeadDef` where it traverses operand 'defs' rather than 'all_defs'). We want a temporary solution that doesn't impact other targets and is simpler and less intrusive than the patch proposed for the register coalescer so that we can enable SRLT to make better use of SVE/SME multi-vector instructions while we work on a more permanent solution that requires rewriting a large part of the AArch64 instructions (32-bit and NEON).
|
@llvm/pr-subscribers-backend-aarch64 Author: Sander de Smalen (sdesmalen-arm) ChangesWhen SubRegister Liveness Tracking (SRLT) is enabled, this pass adds extra implicit-def's to instructions that define the low N bits of a GPR/FPR register to represent that the top bits are written, because all AArch64 instructions that write the low bits of a GPR/FPR also implicitly zero the top bits. These semantics are originally represented in the MIR using There have been several attempts to fix this in the coalescer (#168353), but each iteration has exposed new bugs and the patch had to be reverted. Additionally, the concept of adding 'implicit-def' of a virtual register during the register allocation process is particularly fragile and many places don't expect it (for example in We want a temporary solution that doesn't impact other targets and is simpler and less intrusive than the patch proposed for the register coalescer so that we can enable SRLT to make better use of SVE/SME multi-vector instructions while we work on a more permanent solution that requires rewriting a large part of the AArch64 instructions (32-bit and NEON). Patch is 40.74 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/174188.diff 13 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64.h b/llvm/lib/Target/AArch64/AArch64.h
index a8e15c338352a..40983714ddf1d 100644
--- a/llvm/lib/Target/AArch64/AArch64.h
+++ b/llvm/lib/Target/AArch64/AArch64.h
@@ -64,6 +64,7 @@ FunctionPass *createAArch64CollectLOHPass();
FunctionPass *createSMEABIPass();
FunctionPass *createSMEPeepholeOptPass();
FunctionPass *createMachineSMEABIPass(CodeGenOptLevel);
+FunctionPass *createAArch64SRLTDefineSuperRegsPass();
ModulePass *createSVEIntrinsicOptsPass();
InstructionSelector *
createAArch64InstructionSelector(const AArch64TargetMachine &,
@@ -117,6 +118,7 @@ void initializeLDTLSCleanupPass(PassRegistry&);
void initializeSMEABIPass(PassRegistry &);
void initializeSMEPeepholeOptPass(PassRegistry &);
void initializeMachineSMEABIPass(PassRegistry &);
+void initializeAArch64SRLTDefineSuperRegsPass(PassRegistry &);
void initializeSVEIntrinsicOptsPass(PassRegistry &);
void initializeAArch64Arm64ECCallLoweringPass(PassRegistry &);
} // end namespace llvm
diff --git a/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp b/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
index a90cad4f19ef7..b8951adf27c05 100644
--- a/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
@@ -1383,9 +1383,8 @@ bool AArch64RegisterInfo::shouldCoalesce(
MachineFunction &MF = *MI->getMF();
MachineRegisterInfo &MRI = MF.getRegInfo();
- // Coalescing of SUBREG_TO_REG is broken when using subreg liveness tracking,
- // we must disable it for now.
- if (MI->isSubregToReg() && MRI.subRegLivenessEnabled())
+ if (MI->isSubregToReg() && MRI.subRegLivenessEnabled() &&
+ !MF.getSubtarget<AArch64Subtarget>().enableSRLTSubregToRegMitigation())
return false;
if (MI->isCopy() &&
diff --git a/llvm/lib/Target/AArch64/AArch64SRLTDefineSuperRegs.cpp b/llvm/lib/Target/AArch64/AArch64SRLTDefineSuperRegs.cpp
new file mode 100644
index 0000000000000..695155abc5b15
--- /dev/null
+++ b/llvm/lib/Target/AArch64/AArch64SRLTDefineSuperRegs.cpp
@@ -0,0 +1,265 @@
+//===- AArch64SRLTDefineSuperRegs.cpp -------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// When SubRegister Liveness Tracking (SRLT) is enabled, this pass adds
+// extra implicit-def's to instructions that define the low N bits of
+// a GPR/FPR register to also define the top bits, because all AArch64
+// instructions that write the low bits of a GPR/FPR also implicitly zero
+// the top bits. For example, 'mov w0, w1' writes zeroes to the top 32-bits of
+// x0, so this pass adds a `implicit-def $x0` after register allocation.
+//
+// These semantics are originally represented in the MIR using `SUBREG_TO_REG`
+// which expresses that the top bits have been defined by the preceding
+// instructions, but during register coalescing this information is lost and in
+// contrast to when SRTL is disabled, when rewriting virtual -> physical
+// registers the implicit-defs are not added to the instruction.
+//
+// There have been several attempts to fix this in the coalescer [1], but each
+// iteration has exposed new bugs and the patch had to be reverted.
+// Additionally, the concept of adding 'implicit-def' of a virtual register is
+// particularly fragile and many places don't expect it (for example in
+// `X86::commuteInstructionImpl` the code only looks at specific operands and
+// does not consider implicit-defs. Similar in `SplitEditor::addDeadDef` where
+// it traverses operand 'defs' rather than 'all_defs').
+//
+// We want a temporary solution that doesn't impact other targets and is simpler
+// and less intrusive than the patch proposed for the register coalescer [1], so
+// that we can enable SRLT for AArch64.
+//
+// The approach here is to just add the 'implicit-def' manually after rewriting
+// virtual regs -> phsyical regs. This still means that during the register
+// allocation process the dependences are not accurately represented in the MIR
+// and LiveIntervals, but there are several reasons why we believe this isn't a
+// problem in practice:
+// (A) The register allocator only spills entire virtual registers.
+// This is additionally guarded by code in
+// AArch64InstrInfo::storeRegToStackSlot/loadRegFromStackSlot
+// where it checks if a register matches the expected register class.
+// (B) Rematerialization only happens when the instruction writes the full
+// register.
+// (C) The high bits of the AArch64 register cannot be written independently.
+// (D) Instructions that write only part of a register always take that same
+// register as a tied input operand, to indicate it's a merging operation.
+//
+// (A) means that for two virtual registers of regclass GPR32 and GPR64, if the
+// GPR32 register is coalesced into the GPR64 vreg then the full GPR64 would
+// be spilled/filled even if only the low 32-bits would be required for the
+// given liverange. (B) means that the top bits of a GPR64 would never be
+// overwritten by rematerialising a GPR32 sub-register for a given liverange.
+// (C-D) means that we can assume that the MIR as input to the register
+// allocator correctly expresses the instruction behaviour and dependences
+// between values, so unless the register allocator would violate (A) or (B),
+// the MIR is otherwise sound.
+//
+// Alternative approaches have also been considered, such as:
+// (1) Changing the AArch64 instruction definitions to write all bits and
+// extract the low N bits for the result.
+// (2) Disabling coalescing of SUBREG_TO_REG and using regalloc hints to tell
+// the register allocator to favour the same register for the input/output.
+// (3) Adding a new coalescer guard node with a tied-operand constraint, such
+// that when the SUBREG_TO_REG is removed, something still represents that
+// the top bits are defined. The node would get removed before rewriting
+// virtregs.
+// (4) Using an explicit INSERT_SUBREG into a zero value and try to optimize
+// away the INSERT_SUBREG (this is a more explicit variant of (2) and (3))
+// (5) Adding a new MachineOperand flag that represents the top bits would be
+// defined, but are not read nor undef.
+//
+// (1) would be the best approach but would be a significant effort as it
+// requires rewriting most/all instruction definitions and fixing MIR passes
+// that rely on the current definitions, whereas (2-4) result in sub-optimal
+// code that can't really be avoided because the explicit nodes would stop
+// rematerialization. (5) might be a way to mitigate the
+// fragility of implicit-def's of virtual registers if we want to pursue
+// landing [1], but then we'd rather choose approach (1) to avoid using
+// SUBREG_TO_REG entirely.
+//
+// [1] https://github.com/llvm/llvm-project/pull/168353
+//===----------------------------------------------------------------------===//
+
+#include "AArch64InstrInfo.h"
+#include "AArch64MachineFunctionInfo.h"
+#include "AArch64Subtarget.h"
+#include "MCTargetDesc/AArch64AddressingModes.h"
+#include "llvm/ADT/BitVector.h"
+#include "llvm/ADT/SmallSet.h"
+#include "llvm/CodeGen/MachineBasicBlock.h"
+#include "llvm/CodeGen/MachineFunctionPass.h"
+#include "llvm/CodeGen/MachineRegisterInfo.h"
+#include "llvm/CodeGen/TargetRegisterInfo.h"
+#include "llvm/Support/Debug.h"
+
+using namespace llvm;
+
+#define DEBUG_TYPE "aarch64-srlt-define-superregs"
+#define PASS_NAME "AArch64 SRLT Define Super-Regs Pass"
+
+namespace {
+
+struct AArch64SRLTDefineSuperRegs : public MachineFunctionPass {
+ inline static char ID = 0;
+
+ AArch64SRLTDefineSuperRegs() : MachineFunctionPass(ID) {}
+
+ bool runOnMachineFunction(MachineFunction &MF) override;
+
+ void collectWidestSuperReg(Register R, const BitVector &RequiredBaseRegUnits,
+ const BitVector &QHiRegUnits,
+ SmallSet<Register, 8> &SuperRegs);
+
+ StringRef getPassName() const override { return PASS_NAME; }
+
+ void getAnalysisUsage(AnalysisUsage &AU) const override {
+ AU.setPreservesCFG();
+ AU.addPreservedID(MachineLoopInfoID);
+ AU.addPreservedID(MachineDominatorsID);
+ MachineFunctionPass::getAnalysisUsage(AU);
+ }
+
+private:
+ MachineFunction *MF = nullptr;
+ const AArch64Subtarget *Subtarget = nullptr;
+ const AArch64RegisterInfo *TRI = nullptr;
+ const AArch64FunctionInfo *AFI = nullptr;
+ const TargetInstrInfo *TII = nullptr;
+ MachineRegisterInfo *MRI = nullptr;
+};
+
+} // end anonymous namespace
+
+INITIALIZE_PASS(AArch64SRLTDefineSuperRegs, DEBUG_TYPE, PASS_NAME, false,
+ false)
+
+SmallVector<MCRegister> SupportedAliasRegs = {
+ AArch64::X0, AArch64::X1, AArch64::X2, AArch64::X3, AArch64::X4,
+ AArch64::X5, AArch64::X6, AArch64::X7, AArch64::X8, AArch64::X9,
+ AArch64::X10, AArch64::X11, AArch64::X12, AArch64::X13, AArch64::X14,
+ AArch64::X15, AArch64::X16, AArch64::X17, AArch64::X18, AArch64::X19,
+ AArch64::X20, AArch64::X21, AArch64::X22, AArch64::X23, AArch64::X24,
+ AArch64::X25, AArch64::X26, AArch64::X27, AArch64::X28, AArch64::FP,
+ AArch64::LR, AArch64::SP, AArch64::Z0, AArch64::Z1, AArch64::Z2,
+ AArch64::Z3, AArch64::Z4, AArch64::Z5, AArch64::Z6, AArch64::Z7,
+ AArch64::Z8, AArch64::Z9, AArch64::Z10, AArch64::Z11, AArch64::Z12,
+ AArch64::Z13, AArch64::Z14, AArch64::Z15, AArch64::Z16, AArch64::Z17,
+ AArch64::Z18, AArch64::Z19, AArch64::Z20, AArch64::Z21, AArch64::Z22,
+ AArch64::Z23, AArch64::Z24, AArch64::Z25, AArch64::Z26, AArch64::Z27,
+ AArch64::Z28, AArch64::Z29, AArch64::Z30, AArch64::Z31};
+
+SmallVector<MCRegister> QHiRegs = {
+ AArch64::Q0_HI, AArch64::Q1_HI, AArch64::Q2_HI, AArch64::Q3_HI,
+ AArch64::Q4_HI, AArch64::Q5_HI, AArch64::Q6_HI, AArch64::Q7_HI,
+ AArch64::Q8_HI, AArch64::Q9_HI, AArch64::Q10_HI, AArch64::Q11_HI,
+ AArch64::Q12_HI, AArch64::Q13_HI, AArch64::Q14_HI, AArch64::Q15_HI,
+ AArch64::Q16_HI, AArch64::Q17_HI, AArch64::Q18_HI, AArch64::Q19_HI,
+ AArch64::Q20_HI, AArch64::Q21_HI, AArch64::Q22_HI, AArch64::Q23_HI,
+ AArch64::Q24_HI, AArch64::Q25_HI, AArch64::Q26_HI, AArch64::Q27_HI,
+ AArch64::Q28_HI, AArch64::Q29_HI, AArch64::Q30_HI, AArch64::Q31_HI};
+
+// Find the widest super-reg for a given reg and adds it to `SuperRegs`.
+// For example:
+// W0 -> X0
+// B1 -> Q1 (without SVE)
+// -> Z1 (with SVE)
+// W1_W2 -> X1_X2
+// D0_D1 -> Q0_Q1 (without SVE)
+// -> Z0_Z1 (with SVE)
+void AArch64SRLTDefineSuperRegs::collectWidestSuperReg(
+ Register R, const BitVector &RequiredBaseRegUnits,
+ const BitVector &QHiRegUnits, SmallSet<Register, 8> &SuperRegs) {
+ assert(R.isPhysical() &&
+ "Expected to be run straight after virtregrewriter!");
+
+ BitVector Units(TRI->getNumRegUnits());
+ for (MCRegUnit U : TRI->regunits(R))
+ Units.set((unsigned)U);
+
+ auto IsSuitableSuperReg = [&](Register SR) {
+ for (MCRegUnit U : TRI->regunits(SR)) {
+ // Avoid choosing z1 as super-reg of d1 if SVE is not available.
+ if (QHiRegUnits.test((unsigned)U) &&
+ !Subtarget->isSVEorStreamingSVEAvailable())
+ return false;
+ // We consider reg A as a suitable super-reg of B when any of the reg units:
+ // * is shared (Q0 and B0, both have B0 as sub-register)
+ // * is artificial (Q0 = B0 + (B0_HI + H0_HI + S0_HI))
+ // This avoids considering e.g. Q0_Q1 as a super reg of D0 or D1.
+ if (!TRI->isArtificialRegUnit(U) &&
+ (!Units.test((unsigned)U) || !RequiredBaseRegUnits.test((unsigned)U)))
+ return false;
+ }
+ return true;
+ };
+
+ Register LargestSuperReg = AArch64::NoRegister;
+ for (Register SR : TRI->superregs(R))
+ if (IsSuitableSuperReg(SR) && (LargestSuperReg == AArch64::NoRegister ||
+ TRI->isSuperRegister(LargestSuperReg, SR)))
+ LargestSuperReg = SR;
+
+ if (LargestSuperReg != AArch64::NoRegister)
+ SuperRegs.insert(LargestSuperReg);
+}
+
+bool AArch64SRLTDefineSuperRegs::runOnMachineFunction(MachineFunction &MF) {
+ this->MF = &MF;
+ Subtarget = &MF.getSubtarget<AArch64Subtarget>();
+ TII = Subtarget->getInstrInfo();
+ TRI = Subtarget->getRegisterInfo();
+ MRI = &MF.getRegInfo();
+
+ if (!MRI->subRegLivenessEnabled())
+ return false;
+
+ assert(!MRI->isSSA() && "Expected to be run after breaking down SSA form!");
+
+ BitVector RequiredBaseRegUnits(TRI->getNumRegUnits());
+ for (Register R : SupportedAliasRegs)
+ for (MCRegUnit U : TRI->regunits(R))
+ RequiredBaseRegUnits.set((unsigned)U);
+
+ BitVector QHiRegUnits(TRI->getNumRegUnits());
+ for (Register R : QHiRegs)
+ for (MCRegUnit U : TRI->regunits(R))
+ QHiRegUnits.set((unsigned)U);
+
+ bool Changed = false;
+ for (MachineBasicBlock &MBB : MF) {
+ for (MachineInstr &MI : MBB) {
+ // For each partial register write (e.g. w0), also add implicit-def for
+ // top bits of register (x0).
+ SmallSet<Register, 8> SuperRegs;
+ for (const MachineOperand &DefOp : MI.defs())
+ collectWidestSuperReg(DefOp.getReg(), RequiredBaseRegUnits, QHiRegUnits,
+ SuperRegs);
+
+ if (!SuperRegs.size())
+ continue;
+
+ LLVM_DEBUG(dbgs() << "Adding implicit-defs to: " << MI);
+ for (Register R : SuperRegs) {
+ LLVM_DEBUG(dbgs() << " " << printReg(R, TRI) << "\n");
+ bool IsRenamable = any_of(MI.defs(), [&](const MachineOperand &MO) {
+ return TRI->regsOverlap(MO.getReg(), R) && MO.isRenamable();
+ });
+ MachineOperand DefOp = MachineOperand::CreateReg(
+ R, /*isDef=*/true, /*isImp=*/true, /*isKill=*/false,
+ /*isDead=*/false, /*isUndef=*/false, /*isEarlyClobber=*/false,
+ /*SubReg=*/0, /*isDebug=*/false, /*isInternalRead=*/false,
+ /*isRenamable=*/IsRenamable);
+ MI.addOperand(DefOp);
+ }
+ Changed = true;
+ }
+ }
+
+ return Changed;
+}
+
+FunctionPass *llvm::createAArch64SRLTDefineSuperRegsPass() {
+ return new AArch64SRLTDefineSuperRegs();
+}
diff --git a/llvm/lib/Target/AArch64/AArch64Subtarget.cpp b/llvm/lib/Target/AArch64/AArch64Subtarget.cpp
index 911eedbe36185..1737a0c1529b4 100644
--- a/llvm/lib/Target/AArch64/AArch64Subtarget.cpp
+++ b/llvm/lib/Target/AArch64/AArch64Subtarget.cpp
@@ -369,7 +369,8 @@ AArch64Subtarget::AArch64Subtarget(const Triple &TT, StringRef CPU,
unsigned MinSVEVectorSizeInBitsOverride,
unsigned MaxSVEVectorSizeInBitsOverride,
bool IsStreaming, bool IsStreamingCompatible,
- bool HasMinSize)
+ bool HasMinSize,
+ bool EnableSRLTSubregToRegMitigation)
: AArch64GenSubtargetInfo(TT, CPU, TuneCPU, FS),
ReserveXRegister(AArch64::GPR64commonRegClass.getNumRegs()),
ReserveXRegisterForRA(AArch64::GPR64commonRegClass.getNumRegs()),
@@ -381,7 +382,9 @@ AArch64Subtarget::AArch64Subtarget(const Triple &TT, StringRef CPU,
? std::optional<unsigned>(AArch64StreamingHazardSize)
: std::nullopt),
MinSVEVectorSizeInBits(MinSVEVectorSizeInBitsOverride),
- MaxSVEVectorSizeInBits(MaxSVEVectorSizeInBitsOverride), TargetTriple(TT),
+ MaxSVEVectorSizeInBits(MaxSVEVectorSizeInBitsOverride),
+ EnableSRLTSubregToRegMitigation(EnableSRLTSubregToRegMitigation),
+ TargetTriple(TT),
InstrInfo(initializeSubtargetDependencies(FS, CPU, TuneCPU, HasMinSize)),
TLInfo(TM, *this) {
if (AArch64::isX18ReservedByDefault(TT))
diff --git a/llvm/lib/Target/AArch64/AArch64Subtarget.h b/llvm/lib/Target/AArch64/AArch64Subtarget.h
index bd8a2d5234f2d..248e140b3101c 100644
--- a/llvm/lib/Target/AArch64/AArch64Subtarget.h
+++ b/llvm/lib/Target/AArch64/AArch64Subtarget.h
@@ -88,6 +88,7 @@ class AArch64Subtarget final : public AArch64GenSubtargetInfo {
std::optional<unsigned> StreamingHazardSize;
unsigned MinSVEVectorSizeInBits;
unsigned MaxSVEVectorSizeInBits;
+ bool EnableSRLTSubregToRegMitigation;
unsigned VScaleForTuning = 1;
TailFoldingOpts DefaultSVETFOpts = TailFoldingOpts::Disabled;
@@ -128,7 +129,8 @@ class AArch64Subtarget final : public AArch64GenSubtargetInfo {
unsigned MinSVEVectorSizeInBitsOverride = 0,
unsigned MaxSVEVectorSizeInBitsOverride = 0,
bool IsStreaming = false, bool IsStreamingCompatible = false,
- bool HasMinSize = false);
+ bool HasMinSize = false,
+ bool EnableSRLTSubregToRegMitigation = false);
// Getters for SubtargetFeatures defined in tablegen
#define GET_SUBTARGETINFO_MACRO(ATTRIBUTE, DEFAULT, GETTER) \
@@ -467,6 +469,10 @@ class AArch64Subtarget final : public AArch64GenSubtargetInfo {
/// add + cnt instructions.
bool useScalarIncVL() const;
+ bool enableSRLTSubregToRegMitigation() const {
+ return EnableSRLTSubregToRegMitigation;
+ }
+
/// Choose a method of checking LR before performing a tail call.
AArch64PAuth::AuthCheckMethod
getAuthenticatedLRCheckMethod(const MachineFunction &MF) const;
diff --git a/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp b/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
index 1ec5a20cc0ce0..3aba866458830 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
@@ -227,6 +227,12 @@ static cl::opt<bool>
cl::desc("Enable new lowering for the SME ABI"),
cl::init(true), cl::Hidden);
+static cl::opt<bool> EnableSRLTSubregToRegMitigation(
+ "aarch64-srlt-mitigate-sr2r",
+ cl::desc("Enable SUBREG_TO_REG mitigation by adding 'implicit-def' for "
+ "super-regs when using Subreg Liveness Tracking"),
+ cl::init(true), cl::Hidden);
+
extern "C" LLVM_ABI LLVM_EXTERNAL_VISIBILITY void
LLVMInitializeAArch64Target() {
// Register the target.
@@ -268,6 +274,7 @@ LLVMInitializeAArch64Target() {
initializeKCFIPass(PR);
initializeSMEABIPass(PR);
initializeMachineSMEABIPass(PR);
+ initializeAArch64SRLTDefineSuperRegsPass(PR);
initializeSMEPeepholeOptPass(PR);
initializeSVEIntrinsicOptsPass(PR);
initializeAArch64SpeculationHardeningPass(PR);
@@ -462,7 +469,8 @@ AArch64TargetMachine::getSubtargetImpl(const Function &F) const {
resetTargetOptions(F);
I = std::make_unique<AArch64Subtarget>(
TargetTriple, CPU, TuneCPU, FS, *this, isLittle, MinSVEVectorSize,
- MaxSVEVectorSize, IsStreaming, IsStreamingCompatible, HasMinSize);
+ MaxSVEVectorSize, IsStreaming, IsStreamingCompatible, HasMinSize,
+ EnableSRLTSubregToRegMitigation);
}
if (IsStreaming && !I->hasSME())
@@ -550,6 +558,7 @@ class AArch64PassConfig : public TargetPassConfig {
void addMachineSSAOptimization() override;
bool addILPOpts() override;
void addPreRegAlloc() override;
+ void addPostRewrite() override;
void addPostRegAlloc() override;
void addPreSched2() override;
void addPreEmitPass() override;
@@ -815,6 +824,11 @@ void AArch64PassConfig::addPreRegAlloc() {
addPass(&MachinePipelinerID);
}
+void AArch64PassConfig::addPostRewrite() {
+ if (EnableSRLTSubregToRegMitigation)
+ addPass(createAArch64SRLTDefineSuperRegsPass());
+}
+
void AArch64PassConfig::addPostRegAlloc() {
// Remove redundant copy instructions.
if (TM->getOptLevel() != CodeGenOptLevel::None &&
diff --git a/llvm/lib/Target/AArch64/CMakeLists.txt b/llvm/lib/Target/AArch64/CMakeLists.txt
index 3334b3689e03f..2fe554217c1ba 100644
--- a/llvm/lib/Target/AArch64/CMakeLists.txt
+++ b/llvm/lib/Target/AArch64/CMakeLists.txt
@@ -92,6 +92,7 @@ add_llvm_target(AArch64CodeGen
SMEPeepholeOpt.cpp
SVEIntrinsicOpts.cpp
MachineSMEABIPass.cpp
+ AArch64SRLTDefineSuperRegs.cpp
AArch64SIMDInstrOpt.cpp
AArch64PrologueEpilogue.cpp
diff --git a/llvm/test/CodeGen/AArch64/O3-pipeline.ll b/llvm/test/CodeGen/AArch64/O3-pipeline.ll
index e4249fe4fb1c8..d137b8c9ac1e0 100644
--- a/llvm/test/CodeGen/AArch64/O3-pipeline.ll
...
[truncated]
|
You can test this locally with the following command:git-clang-format --diff origin/main HEAD --extensions h,cpp -- llvm/lib/Target/AArch64/AArch64SRLTDefineSuperRegs.cpp llvm/lib/Target/AArch64/AArch64.h llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp llvm/lib/Target/AArch64/AArch64Subtarget.cpp llvm/lib/Target/AArch64/AArch64Subtarget.h llvm/lib/Target/AArch64/AArch64TargetMachine.cpp --diff_from_common_commit
View the diff from clang-format here.diff --git a/llvm/lib/Target/AArch64/AArch64SRLTDefineSuperRegs.cpp b/llvm/lib/Target/AArch64/AArch64SRLTDefineSuperRegs.cpp
index 695155abc..98ad74791 100644
--- a/llvm/lib/Target/AArch64/AArch64SRLTDefineSuperRegs.cpp
+++ b/llvm/lib/Target/AArch64/AArch64SRLTDefineSuperRegs.cpp
@@ -109,8 +109,8 @@ struct AArch64SRLTDefineSuperRegs : public MachineFunctionPass {
bool runOnMachineFunction(MachineFunction &MF) override;
void collectWidestSuperReg(Register R, const BitVector &RequiredBaseRegUnits,
- const BitVector &QHiRegUnits,
- SmallSet<Register, 8> &SuperRegs);
+ const BitVector &QHiRegUnits,
+ SmallSet<Register, 8> &SuperRegs);
StringRef getPassName() const override { return PASS_NAME; }
@@ -132,8 +132,7 @@ private:
} // end anonymous namespace
-INITIALIZE_PASS(AArch64SRLTDefineSuperRegs, DEBUG_TYPE, PASS_NAME, false,
- false)
+INITIALIZE_PASS(AArch64SRLTDefineSuperRegs, DEBUG_TYPE, PASS_NAME, false, false)
SmallVector<MCRegister> SupportedAliasRegs = {
AArch64::X0, AArch64::X1, AArch64::X2, AArch64::X3, AArch64::X4,
@@ -184,7 +183,8 @@ void AArch64SRLTDefineSuperRegs::collectWidestSuperReg(
if (QHiRegUnits.test((unsigned)U) &&
!Subtarget->isSVEorStreamingSVEAvailable())
return false;
- // We consider reg A as a suitable super-reg of B when any of the reg units:
+ // We consider reg A as a suitable super-reg of B when any of the reg
+ // units:
// * is shared (Q0 and B0, both have B0 as sub-register)
// * is artificial (Q0 = B0 + (B0_HI + H0_HI + S0_HI))
// This avoids considering e.g. Q0_Q1 as a super reg of D0 or D1.
|
|
This is a stacked PR. See other PRs below: |
🐧 Linux x64 Test Results
All executed tests passed, but another part of the build failed. Click on a failure below to see the details. lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/AArch64SRLTDefineSuperRegs.cpp.oIf these failures are unrelated to your changes (for example tests are broken or flaky at HEAD), please open an issue at https://github.com/llvm/llvm-project/issues and add the |
MacDue
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pass mostly makes sense. Just a few comments:
| INITIALIZE_PASS(AArch64SRLTDefineSuperRegs, DEBUG_TYPE, PASS_NAME, false, | ||
| false) | ||
|
|
||
| SmallVector<MCRegister> SupportedAliasRegs = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not need to be a (mutable) vector:
| SmallVector<MCRegister> SupportedAliasRegs = { | |
| constexpr MCRegister SupportedAliasRegs[] = { |
| AArch64::Z23, AArch64::Z24, AArch64::Z25, AArch64::Z26, AArch64::Z27, | ||
| AArch64::Z28, AArch64::Z29, AArch64::Z30, AArch64::Z31}; | ||
|
|
||
| SmallVector<MCRegister> QHiRegs = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| SmallVector<MCRegister> QHiRegs = { | |
| constexpr MCRegister QHiRegs[] = { |
| BitVector RequiredBaseRegUnits(TRI->getNumRegUnits()); | ||
| for (Register R : SupportedAliasRegs) | ||
| for (MCRegUnit U : TRI->regunits(R)) | ||
| RequiredBaseRegUnits.set((unsigned)U); | ||
|
|
||
| BitVector QHiRegUnits(TRI->getNumRegUnits()); | ||
| for (Register R : QHiRegs) | ||
| for (MCRegUnit U : TRI->regunits(R)) | ||
| QHiRegUnits.set((unsigned)U); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Register classes are contiguous (we already have checks that depend on this). So I think this could be:
| BitVector RequiredBaseRegUnits(TRI->getNumRegUnits()); | |
| for (Register R : SupportedAliasRegs) | |
| for (MCRegUnit U : TRI->regunits(R)) | |
| RequiredBaseRegUnits.set((unsigned)U); | |
| BitVector QHiRegUnits(TRI->getNumRegUnits()); | |
| for (Register R : QHiRegs) | |
| for (MCRegUnit U : TRI->regunits(R)) | |
| QHiRegUnits.set((unsigned)U); | |
| auto XRegs = seq_inclusive<unsigned>(AArch64::X0, AArch64::X28); | |
| auto ZRegs = seq_inclusive<unsigned>(AArch64::Z0, AArch64::Z31); | |
| auto QHiRegs = seq_inclusive<unsigned>(AArch64::Q0_HI, AArch64::Q31_HI); | |
| constexpr unsigned FixedRegs[] = { AArch64::FP, AArch64::LR, AArch64::SP }; | |
| auto SupportedAliasRegs = concat<unsigned>(XRegs, ZRegs, FixedRegs); | |
| for (MCRegister R : SupportedAliasRegs) | |
| for (MCRegUnit U : TRI->regunits(R)) | |
| RequiredBaseRegUnits.set((unsigned)U); | |
| BitVector QHiRegUnits(TRI->getNumRegUnits()); | |
| for (MCRegister R : QHiRegs) | |
| for (MCRegUnit U : TRI->regunits(R)) | |
| QHiRegUnits.set((unsigned)U); |
Which I find a little easier to verify covers all registers than hardcoded tables.
| private: | ||
| MachineFunction *MF = nullptr; | ||
| const AArch64Subtarget *Subtarget = nullptr; | ||
| const AArch64RegisterInfo *TRI = nullptr; | ||
| const AArch64FunctionInfo *AFI = nullptr; | ||
| const TargetInstrInfo *TII = nullptr; | ||
| MachineRegisterInfo *MRI = nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of these are unused. I think the only two that are necessary are:
| private: | |
| MachineFunction *MF = nullptr; | |
| const AArch64Subtarget *Subtarget = nullptr; | |
| const AArch64RegisterInfo *TRI = nullptr; | |
| const AArch64FunctionInfo *AFI = nullptr; | |
| const TargetInstrInfo *TII = nullptr; | |
| MachineRegisterInfo *MRI = nullptr; | |
| private: | |
| const AArch64Subtarget *Subtarget = nullptr; | |
| const AArch64RegisterInfo *TRI = nullptr; |
| ; CHECK-NEXT: Virtual Register Rewriter | ||
| ; CHECK-NEXT: Register Allocation Pass Scoring | ||
| ; CHECK-NEXT: Stack Slot Coloring | ||
| ; CHECK-NEXT: AArch64 SRLT Define Super-Regs Pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this intentionally only added to the optimized pipeline?
| // For each partial register write (e.g. w0), also add implicit-def for | ||
| // top bits of register (x0). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
| // For each partial register write (e.g. w0), also add implicit-def for | |
| // top bits of register (x0). | |
| // For each partial register write, also add an implicit-def for top bits | |
| // of the register (e.g. for w0 add a def of x0). |
|
|
||
| auto IsSuitableSuperReg = [&](Register SR) { | ||
| for (MCRegUnit U : TRI->regunits(SR)) { | ||
| // Avoid choosing z1 as super-reg of d1 if SVE is not available. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Maybe note Q*_HI are only set for SVE vectors (I associate Q with Neon)
| if (QHiRegUnits.test((unsigned)U) && | ||
| !Subtarget->isSVEorStreamingSVEAvailable()) | ||
| return false; | ||
| // We consider reg A as a suitable super-reg of B when any of the reg units: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // We consider reg A as a suitable super-reg of B when any of the reg units: | |
| // We consider reg A as a suitable super-reg of B when all of the reg units: |
?
Also maybe phrasing this as "we consider a register as an unsuitable super register if any reg unit is non-artificial and not shared. This implies U is a unit for a different register, which means the candidate super-register is likely a register tuple".
When SubRegister Liveness Tracking (SRLT) is enabled, this pass adds extra implicit-def's to instructions that define the low N bits of a GPR/FPR register to represent that the top bits are written, because all AArch64 instructions that write the low bits of a GPR/FPR also implicitly zero the top bits.
These semantics are originally represented in the MIR using
SUBREG_TO_REG, but during register coalescing this information is lost and when rewriting virtual -> physical registers the implicit-defs are not added to represent the the top bits are written.There have been several attempts to fix this in the coalescer (#168353), but each iteration has exposed new bugs and the patch had to be reverted. Additionally, the concept of adding 'implicit-def' of a virtual register during the register allocation process is particularly fragile and many places don't expect it (for example in
X86::commuteInstructionImplthe code only looks at specific operands and does not consider implicit-defs. Similar inSplitEditor::addDeadDefwhere it traverses operand 'defs' rather than 'all_defs').We want a temporary solution that doesn't impact other targets and is simpler and less intrusive than the patch proposed for the register coalescer so that we can enable SRLT to make better use of SVE/SME multi-vector instructions while we work on a more permanent solution that requires rewriting a large part of the AArch64 instructions (32-bit and NEON).