[AArch64] Add new pass after VirtRegRewriter to add implicit-defs #174188

sdesmalen-arm · 2026-01-02T08:39:38Z

When SubRegister Liveness Tracking (SRLT) is enabled, this pass adds extra implicit-def's to instructions that define the low N bits of a GPR/FPR register to represent that the top bits are written, because all AArch64 instructions that write the low bits of a GPR/FPR also implicitly zero the top bits.

These semantics are originally represented in the MIR using SUBREG_TO_REG, but during register coalescing this information is lost and when rewriting virtual -> physical registers the implicit-defs are not added to represent the the top bits are written.

There have been several attempts to fix this in the coalescer (#168353), but each iteration has exposed new bugs and the patch had to be reverted. Additionally, the concept of adding 'implicit-def' of a virtual register during the register allocation process is particularly fragile and many places don't expect it (for example in X86::commuteInstructionImpl the code only looks at specific operands and does not consider implicit-defs. Similar in SplitEditor::addDeadDef where it traverses operand 'defs' rather than 'all_defs').

We want a temporary solution that doesn't impact other targets and is simpler and less intrusive than the patch proposed for the register coalescer so that we can enable SRLT to make better use of SVE/SME multi-vector instructions while we work on a more permanent solution that requires rewriting a large part of the AArch64 instructions (32-bit and NEON).

When SubRegister Liveness Tracking (SRLT) is enabled, this pass adds extra implicit-def's to instructions that define the low N bits of a GPR/FPR register to represent that the top bits are written, because all AArch64 instructions that write the low bits of a GPR/FPR also implicitly zero the top bits. These semantics are originally represented in the MIR using `SUBREG_TO_REG`, but during register coalescing this information is lost and when rewriting virtual -> physical registers the implicit-defs are not added to represent the the top bits are written. There have been several attempts to fix this in the coalescer (#168353), but each iteration has exposed new bugs and the patch had to be reverted. Additionally, the concept of adding 'implicit-def' of a virtual register during the register allocation process is particularly fragile and many places don't expect it (for example in `X86::commuteInstructionImpl` the code only looks at specific operands and does not consider implicit-defs. Similar in `SplitEditor::addDeadDef` where it traverses operand 'defs' rather than 'all_defs'). We want a temporary solution that doesn't impact other targets and is simpler and less intrusive than the patch proposed for the register coalescer so that we can enable SRLT to make better use of SVE/SME multi-vector instructions while we work on a more permanent solution that requires rewriting a large part of the AArch64 instructions (32-bit and NEON).

llvmbot · 2026-01-02T08:40:07Z

@llvm/pr-subscribers-backend-aarch64

Author: Sander de Smalen (sdesmalen-arm)

Changes

When SubRegister Liveness Tracking (SRLT) is enabled, this pass adds extra implicit-def's to instructions that define the low N bits of a GPR/FPR register to represent that the top bits are written, because all AArch64 instructions that write the low bits of a GPR/FPR also implicitly zero the top bits.

These semantics are originally represented in the MIR using SUBREG_TO_REG, but during register coalescing this information is lost and when rewriting virtual -> physical registers the implicit-defs are not added to represent the the top bits are written.

There have been several attempts to fix this in the coalescer (#168353), but each iteration has exposed new bugs and the patch had to be reverted. Additionally, the concept of adding 'implicit-def' of a virtual register during the register allocation process is particularly fragile and many places don't expect it (for example in X86::commuteInstructionImpl the code only looks at specific operands and does not consider implicit-defs. Similar in SplitEditor::addDeadDef where it traverses operand 'defs' rather than 'all_defs').

We want a temporary solution that doesn't impact other targets and is simpler and less intrusive than the patch proposed for the register coalescer so that we can enable SRLT to make better use of SVE/SME multi-vector instructions while we work on a more permanent solution that requires rewriting a large part of the AArch64 instructions (32-bit and NEON).

Patch is 40.74 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/174188.diff

13 Files Affected:

(modified) llvm/lib/Target/AArch64/AArch64.h (+2)
(modified) llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp (+2-3)
(added) llvm/lib/Target/AArch64/AArch64SRLTDefineSuperRegs.cpp (+265)
(modified) llvm/lib/Target/AArch64/AArch64Subtarget.cpp (+5-2)
(modified) llvm/lib/Target/AArch64/AArch64Subtarget.h (+7-1)
(modified) llvm/lib/Target/AArch64/AArch64TargetMachine.cpp (+15-1)
(modified) llvm/lib/Target/AArch64/CMakeLists.txt (+1)
(modified) llvm/test/CodeGen/AArch64/O3-pipeline.ll (+1)
(modified) llvm/test/CodeGen/AArch64/arm64-addrmode.ll (+40-90)
(modified) llvm/test/CodeGen/AArch64/preserve_nonecc_varargs_darwin.ll (+5-10)
(modified) llvm/test/CodeGen/AArch64/register-coalesce-update-subranges-remat.mir (-1)
(added) llvm/test/CodeGen/AArch64/subreg-liveness-fix-subreg-to-reg-implicit-def.mir (+88)
(modified) llvm/test/CodeGen/AArch64/subreg_to_reg_coalescing_issue.mir (+1-2)

diff --git a/llvm/lib/Target/AArch64/AArch64.h b/llvm/lib/Target/AArch64/AArch64.h
index a8e15c338352a..40983714ddf1d 100644
--- a/llvm/lib/Target/AArch64/AArch64.h
+++ b/llvm/lib/Target/AArch64/AArch64.h
@@ -64,6 +64,7 @@ FunctionPass *createAArch64CollectLOHPass();
 FunctionPass *createSMEABIPass();
 FunctionPass *createSMEPeepholeOptPass();
 FunctionPass *createMachineSMEABIPass(CodeGenOptLevel);
+FunctionPass *createAArch64SRLTDefineSuperRegsPass();
 ModulePass *createSVEIntrinsicOptsPass();
 InstructionSelector *
 createAArch64InstructionSelector(const AArch64TargetMachine &,
@@ -117,6 +118,7 @@ void initializeLDTLSCleanupPass(PassRegistry&);
 void initializeSMEABIPass(PassRegistry &);
 void initializeSMEPeepholeOptPass(PassRegistry &);
 void initializeMachineSMEABIPass(PassRegistry &);
+void initializeAArch64SRLTDefineSuperRegsPass(PassRegistry &);
 void initializeSVEIntrinsicOptsPass(PassRegistry &);
 void initializeAArch64Arm64ECCallLoweringPass(PassRegistry &);
 } // end namespace llvm
diff --git a/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp b/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
index a90cad4f19ef7..b8951adf27c05 100644
--- a/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
@@ -1383,9 +1383,8 @@ bool AArch64RegisterInfo::shouldCoalesce(
   MachineFunction &MF = *MI->getMF();
   MachineRegisterInfo &MRI = MF.getRegInfo();
 
-  // Coalescing of SUBREG_TO_REG is broken when using subreg liveness tracking,
-  // we must disable it for now.
-  if (MI->isSubregToReg() && MRI.subRegLivenessEnabled())
+  if (MI->isSubregToReg() && MRI.subRegLivenessEnabled() &&
+      !MF.getSubtarget<AArch64Subtarget>().enableSRLTSubregToRegMitigation())
     return false;
 
   if (MI->isCopy() &&
diff --git a/llvm/lib/Target/AArch64/AArch64SRLTDefineSuperRegs.cpp b/llvm/lib/Target/AArch64/AArch64SRLTDefineSuperRegs.cpp
new file mode 100644
index 0000000000000..695155abc5b15
--- /dev/null
+++ b/llvm/lib/Target/AArch64/AArch64SRLTDefineSuperRegs.cpp
@@ -0,0 +1,265 @@
+//===- AArch64SRLTDefineSuperRegs.cpp -------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// When SubRegister Liveness Tracking (SRLT) is enabled, this pass adds
+// extra implicit-def's to instructions that define the low N bits of
+// a GPR/FPR register to also define the top bits, because all AArch64
+// instructions that write the low bits of a GPR/FPR also implicitly zero
+// the top bits.  For example, 'mov w0, w1' writes zeroes to the top 32-bits of
+// x0, so this pass adds a `implicit-def $x0` after register allocation.
+//
+// These semantics are originally represented in the MIR using `SUBREG_TO_REG`
+// which expresses that the top bits have been defined by the preceding
+// instructions, but during register coalescing this information is lost and in
+// contrast to when SRTL is disabled, when rewriting virtual -> physical
+// registers the implicit-defs are not added to the instruction.
+//
+// There have been several attempts to fix this in the coalescer [1], but each
+// iteration has exposed new bugs and the patch had to be reverted.
+// Additionally, the concept of adding 'implicit-def' of a virtual register is
+// particularly fragile and many places don't expect it (for example in
+// `X86::commuteInstructionImpl` the  code only looks at specific operands and
+// does not consider implicit-defs. Similar in `SplitEditor::addDeadDef` where
+// it traverses operand 'defs' rather than 'all_defs').
+//
+// We want a temporary solution that doesn't impact other targets and is simpler
+// and less intrusive than the patch proposed for the register coalescer [1], so
+// that we can enable SRLT for AArch64.
+//
+// The approach here is to just add the 'implicit-def' manually after rewriting
+// virtual regs -> phsyical regs. This still means that during the register
+// allocation process the dependences are not accurately represented in the MIR
+// and LiveIntervals, but there are several reasons why we believe this isn't a
+// problem in practice:
+// (A) The register allocator only spills entire virtual registers.
+//     This is additionally guarded by code in
+//     AArch64InstrInfo::storeRegToStackSlot/loadRegFromStackSlot
+//     where it checks if a register matches the expected register class.
+// (B) Rematerialization only happens when the instruction writes the full
+//     register.
+// (C) The high bits of the AArch64 register cannot be written independently.
+// (D) Instructions that write only part of a register always take that same
+//     register as a tied input operand, to indicate it's a merging operation.
+//
+// (A) means that for two virtual registers of regclass GPR32 and GPR64, if the
+// GPR32 register is coalesced into the GPR64 vreg then the full GPR64 would
+// be spilled/filled even if only the low 32-bits would be required for the
+// given liverange. (B) means that the top bits of a GPR64 would never be
+// overwritten by rematerialising a GPR32 sub-register for a given liverange.
+// (C-D) means that we can assume that the MIR as input to the register
+// allocator correctly expresses the instruction behaviour and dependences
+// between values, so unless the register allocator would violate (A) or (B),
+// the MIR is otherwise sound.
+//
+// Alternative approaches have also been considered, such as:
+// (1) Changing the AArch64 instruction definitions to write all bits and
+//     extract the low N bits for the result.
+// (2) Disabling coalescing of SUBREG_TO_REG and using regalloc hints to tell
+//     the register allocator to favour the same register for the input/output.
+// (3) Adding a new coalescer guard node with a tied-operand constraint, such
+//     that when the SUBREG_TO_REG is removed, something still represents that
+//     the top bits are defined. The node would get removed before rewriting
+//     virtregs.
+// (4) Using an explicit INSERT_SUBREG into a zero value and try to optimize
+//     away the INSERT_SUBREG (this is a more explicit variant of (2) and (3))
+// (5) Adding a new MachineOperand flag that represents the top bits would be
+//     defined, but are not read nor undef.
+//
+// (1) would be the best approach but would be a significant effort as it
+// requires rewriting most/all instruction definitions and fixing MIR passes
+// that rely on the current definitions, whereas (2-4) result in sub-optimal
+// code that can't really be avoided because the explicit nodes would stop
+// rematerialization. (5) might be a way to mitigate the
+// fragility of implicit-def's of virtual registers if we want to pursue
+// landing [1], but then we'd rather choose approach (1) to avoid using
+// SUBREG_TO_REG entirely.
+//
+// [1] https://github.com/llvm/llvm-project/pull/168353
+//===----------------------------------------------------------------------===//
+
+#include "AArch64InstrInfo.h"
+#include "AArch64MachineFunctionInfo.h"
+#include "AArch64Subtarget.h"
+#include "MCTargetDesc/AArch64AddressingModes.h"
+#include "llvm/ADT/BitVector.h"
+#include "llvm/ADT/SmallSet.h"
+#include "llvm/CodeGen/MachineBasicBlock.h"
+#include "llvm/CodeGen/MachineFunctionPass.h"
+#include "llvm/CodeGen/MachineRegisterInfo.h"
+#include "llvm/CodeGen/TargetRegisterInfo.h"
+#include "llvm/Support/Debug.h"
+
+using namespace llvm;
+
+#define DEBUG_TYPE "aarch64-srlt-define-superregs"
+#define PASS_NAME "AArch64 SRLT Define Super-Regs Pass"
+
+namespace {
+
+struct AArch64SRLTDefineSuperRegs : public MachineFunctionPass {
+  inline static char ID = 0;
+
+  AArch64SRLTDefineSuperRegs() : MachineFunctionPass(ID) {}
+
+  bool runOnMachineFunction(MachineFunction &MF) override;
+
+  void collectWidestSuperReg(Register R, const BitVector &RequiredBaseRegUnits,
+                 const BitVector &QHiRegUnits,
+                 SmallSet<Register, 8> &SuperRegs);
+
+  StringRef getPassName() const override { return PASS_NAME; }
+
+  void getAnalysisUsage(AnalysisUsage &AU) const override {
+    AU.setPreservesCFG();
+    AU.addPreservedID(MachineLoopInfoID);
+    AU.addPreservedID(MachineDominatorsID);
+    MachineFunctionPass::getAnalysisUsage(AU);
+  }
+
+private:
+  MachineFunction *MF = nullptr;
+  const AArch64Subtarget *Subtarget = nullptr;
+  const AArch64RegisterInfo *TRI = nullptr;
+  const AArch64FunctionInfo *AFI = nullptr;
+  const TargetInstrInfo *TII = nullptr;
+  MachineRegisterInfo *MRI = nullptr;
+};
+
+} // end anonymous namespace
+
+INITIALIZE_PASS(AArch64SRLTDefineSuperRegs, DEBUG_TYPE, PASS_NAME, false,
+                false)
+
+SmallVector<MCRegister> SupportedAliasRegs = {
+    AArch64::X0,  AArch64::X1,  AArch64::X2,  AArch64::X3,  AArch64::X4,
+    AArch64::X5,  AArch64::X6,  AArch64::X7,  AArch64::X8,  AArch64::X9,
+    AArch64::X10, AArch64::X11, AArch64::X12, AArch64::X13, AArch64::X14,
+    AArch64::X15, AArch64::X16, AArch64::X17, AArch64::X18, AArch64::X19,
+    AArch64::X20, AArch64::X21, AArch64::X22, AArch64::X23, AArch64::X24,
+    AArch64::X25, AArch64::X26, AArch64::X27, AArch64::X28, AArch64::FP,
+    AArch64::LR,  AArch64::SP,  AArch64::Z0,  AArch64::Z1,  AArch64::Z2,
+    AArch64::Z3,  AArch64::Z4,  AArch64::Z5,  AArch64::Z6,  AArch64::Z7,
+    AArch64::Z8,  AArch64::Z9,  AArch64::Z10, AArch64::Z11, AArch64::Z12,
+    AArch64::Z13, AArch64::Z14, AArch64::Z15, AArch64::Z16, AArch64::Z17,
+    AArch64::Z18, AArch64::Z19, AArch64::Z20, AArch64::Z21, AArch64::Z22,
+    AArch64::Z23, AArch64::Z24, AArch64::Z25, AArch64::Z26, AArch64::Z27,
+    AArch64::Z28, AArch64::Z29, AArch64::Z30, AArch64::Z31};
+
+SmallVector<MCRegister> QHiRegs = {
+    AArch64::Q0_HI,  AArch64::Q1_HI,  AArch64::Q2_HI,  AArch64::Q3_HI,
+    AArch64::Q4_HI,  AArch64::Q5_HI,  AArch64::Q6_HI,  AArch64::Q7_HI,
+    AArch64::Q8_HI,  AArch64::Q9_HI,  AArch64::Q10_HI, AArch64::Q11_HI,
+    AArch64::Q12_HI, AArch64::Q13_HI, AArch64::Q14_HI, AArch64::Q15_HI,
+    AArch64::Q16_HI, AArch64::Q17_HI, AArch64::Q18_HI, AArch64::Q19_HI,
+    AArch64::Q20_HI, AArch64::Q21_HI, AArch64::Q22_HI, AArch64::Q23_HI,
+    AArch64::Q24_HI, AArch64::Q25_HI, AArch64::Q26_HI, AArch64::Q27_HI,
+    AArch64::Q28_HI, AArch64::Q29_HI, AArch64::Q30_HI, AArch64::Q31_HI};
+
+// Find the widest super-reg for a given reg and adds it to `SuperRegs`.
+// For example:
+//  W0    -> X0
+//  B1    -> Q1 (without SVE)
+//        -> Z1 (with SVE)
+//  W1_W2 -> X1_X2
+//  D0_D1 -> Q0_Q1 (without SVE)
+//        -> Z0_Z1 (with SVE)
+void AArch64SRLTDefineSuperRegs::collectWidestSuperReg(
+    Register R, const BitVector &RequiredBaseRegUnits,
+    const BitVector &QHiRegUnits, SmallSet<Register, 8> &SuperRegs) {
+  assert(R.isPhysical() &&
+         "Expected to be run straight after virtregrewriter!");
+
+  BitVector Units(TRI->getNumRegUnits());
+  for (MCRegUnit U : TRI->regunits(R))
+    Units.set((unsigned)U);
+
+  auto IsSuitableSuperReg = [&](Register SR) {
+    for (MCRegUnit U : TRI->regunits(SR)) {
+      // Avoid choosing z1 as super-reg of d1 if SVE is not available.
+      if (QHiRegUnits.test((unsigned)U) &&
+          !Subtarget->isSVEorStreamingSVEAvailable())
+        return false;
+      // We consider reg A as a suitable super-reg of B when any of the reg units:
+      // * is shared (Q0 and B0, both have B0 as sub-register)
+      // * is artificial (Q0 = B0 + (B0_HI + H0_HI + S0_HI))
+      // This avoids considering e.g. Q0_Q1 as a super reg of D0 or D1.
+      if (!TRI->isArtificialRegUnit(U) &&
+          (!Units.test((unsigned)U) || !RequiredBaseRegUnits.test((unsigned)U)))
+        return false;
+    }
+    return true;
+  };
+
+  Register LargestSuperReg = AArch64::NoRegister;
+  for (Register SR : TRI->superregs(R))
+    if (IsSuitableSuperReg(SR) && (LargestSuperReg == AArch64::NoRegister ||
+                                   TRI->isSuperRegister(LargestSuperReg, SR)))
+      LargestSuperReg = SR;
+
+  if (LargestSuperReg != AArch64::NoRegister)
+    SuperRegs.insert(LargestSuperReg);
+}
+
+bool AArch64SRLTDefineSuperRegs::runOnMachineFunction(MachineFunction &MF) {
+  this->MF = &MF;
+  Subtarget = &MF.getSubtarget<AArch64Subtarget>();
+  TII = Subtarget->getInstrInfo();
+  TRI = Subtarget->getRegisterInfo();
+  MRI = &MF.getRegInfo();
+
+  if (!MRI->subRegLivenessEnabled())
+    return false;
+
+  assert(!MRI->isSSA() && "Expected to be run after breaking down SSA form!");
+
+  BitVector RequiredBaseRegUnits(TRI->getNumRegUnits());
+  for (Register R : SupportedAliasRegs)
+    for (MCRegUnit U : TRI->regunits(R))
+      RequiredBaseRegUnits.set((unsigned)U);
+
+  BitVector QHiRegUnits(TRI->getNumRegUnits());
+  for (Register R : QHiRegs)
+    for (MCRegUnit U : TRI->regunits(R))
+      QHiRegUnits.set((unsigned)U);
+
+  bool Changed = false;
+  for (MachineBasicBlock &MBB : MF) {
+    for (MachineInstr &MI : MBB) {
+      // For each partial register write (e.g. w0), also add implicit-def for
+      // top bits of register (x0).
+      SmallSet<Register, 8> SuperRegs;
+      for (const MachineOperand &DefOp : MI.defs())
+        collectWidestSuperReg(DefOp.getReg(), RequiredBaseRegUnits, QHiRegUnits,
+                              SuperRegs);
+
+      if (!SuperRegs.size())
+        continue;
+
+      LLVM_DEBUG(dbgs() << "Adding implicit-defs to: " << MI);
+      for (Register R : SuperRegs) {
+        LLVM_DEBUG(dbgs() << "  " << printReg(R, TRI) << "\n");
+        bool IsRenamable = any_of(MI.defs(), [&](const MachineOperand &MO) {
+          return TRI->regsOverlap(MO.getReg(), R) && MO.isRenamable();
+        });
+        MachineOperand DefOp = MachineOperand::CreateReg(
+            R, /*isDef=*/true, /*isImp=*/true, /*isKill=*/false,
+            /*isDead=*/false, /*isUndef=*/false, /*isEarlyClobber=*/false,
+            /*SubReg=*/0, /*isDebug=*/false, /*isInternalRead=*/false,
+            /*isRenamable=*/IsRenamable);
+        MI.addOperand(DefOp);
+      }
+      Changed = true;
+    }
+  }
+
+  return Changed;
+}
+
+FunctionPass *llvm::createAArch64SRLTDefineSuperRegsPass() {
+  return new AArch64SRLTDefineSuperRegs();
+}
diff --git a/llvm/lib/Target/AArch64/AArch64Subtarget.cpp b/llvm/lib/Target/AArch64/AArch64Subtarget.cpp
index 911eedbe36185..1737a0c1529b4 100644
--- a/llvm/lib/Target/AArch64/AArch64Subtarget.cpp
+++ b/llvm/lib/Target/AArch64/AArch64Subtarget.cpp
@@ -369,7 +369,8 @@ AArch64Subtarget::AArch64Subtarget(const Triple &TT, StringRef CPU,
                                    unsigned MinSVEVectorSizeInBitsOverride,
                                    unsigned MaxSVEVectorSizeInBitsOverride,
                                    bool IsStreaming, bool IsStreamingCompatible,
-                                   bool HasMinSize)
+                                   bool HasMinSize,
+                                   bool EnableSRLTSubregToRegMitigation)
     : AArch64GenSubtargetInfo(TT, CPU, TuneCPU, FS),
       ReserveXRegister(AArch64::GPR64commonRegClass.getNumRegs()),
       ReserveXRegisterForRA(AArch64::GPR64commonRegClass.getNumRegs()),
@@ -381,7 +382,9 @@ AArch64Subtarget::AArch64Subtarget(const Triple &TT, StringRef CPU,
               ? std::optional<unsigned>(AArch64StreamingHazardSize)
               : std::nullopt),
       MinSVEVectorSizeInBits(MinSVEVectorSizeInBitsOverride),
-      MaxSVEVectorSizeInBits(MaxSVEVectorSizeInBitsOverride), TargetTriple(TT),
+      MaxSVEVectorSizeInBits(MaxSVEVectorSizeInBitsOverride),
+      EnableSRLTSubregToRegMitigation(EnableSRLTSubregToRegMitigation),
+      TargetTriple(TT),
       InstrInfo(initializeSubtargetDependencies(FS, CPU, TuneCPU, HasMinSize)),
       TLInfo(TM, *this) {
   if (AArch64::isX18ReservedByDefault(TT))
diff --git a/llvm/lib/Target/AArch64/AArch64Subtarget.h b/llvm/lib/Target/AArch64/AArch64Subtarget.h
index bd8a2d5234f2d..248e140b3101c 100644
--- a/llvm/lib/Target/AArch64/AArch64Subtarget.h
+++ b/llvm/lib/Target/AArch64/AArch64Subtarget.h
@@ -88,6 +88,7 @@ class AArch64Subtarget final : public AArch64GenSubtargetInfo {
   std::optional<unsigned> StreamingHazardSize;
   unsigned MinSVEVectorSizeInBits;
   unsigned MaxSVEVectorSizeInBits;
+  bool EnableSRLTSubregToRegMitigation;
   unsigned VScaleForTuning = 1;
   TailFoldingOpts DefaultSVETFOpts = TailFoldingOpts::Disabled;
 
@@ -128,7 +129,8 @@ class AArch64Subtarget final : public AArch64GenSubtargetInfo {
                    unsigned MinSVEVectorSizeInBitsOverride = 0,
                    unsigned MaxSVEVectorSizeInBitsOverride = 0,
                    bool IsStreaming = false, bool IsStreamingCompatible = false,
-                   bool HasMinSize = false);
+                   bool HasMinSize = false,
+                   bool EnableSRLTSubregToRegMitigation = false);
 
 // Getters for SubtargetFeatures defined in tablegen
 #define GET_SUBTARGETINFO_MACRO(ATTRIBUTE, DEFAULT, GETTER)                    \
@@ -467,6 +469,10 @@ class AArch64Subtarget final : public AArch64GenSubtargetInfo {
   /// add + cnt instructions.
   bool useScalarIncVL() const;
 
+  bool enableSRLTSubregToRegMitigation() const {
+    return EnableSRLTSubregToRegMitigation;
+  }
+
   /// Choose a method of checking LR before performing a tail call.
   AArch64PAuth::AuthCheckMethod
   getAuthenticatedLRCheckMethod(const MachineFunction &MF) const;
diff --git a/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp b/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
index 1ec5a20cc0ce0..3aba866458830 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
@@ -227,6 +227,12 @@ static cl::opt<bool>
                             cl::desc("Enable new lowering for the SME ABI"),
                             cl::init(true), cl::Hidden);
 
+static cl::opt<bool> EnableSRLTSubregToRegMitigation(
+    "aarch64-srlt-mitigate-sr2r",
+    cl::desc("Enable SUBREG_TO_REG mitigation by adding 'implicit-def' for "
+             "super-regs when using Subreg Liveness Tracking"),
+    cl::init(true), cl::Hidden);
+
 extern "C" LLVM_ABI LLVM_EXTERNAL_VISIBILITY void
 LLVMInitializeAArch64Target() {
   // Register the target.
@@ -268,6 +274,7 @@ LLVMInitializeAArch64Target() {
   initializeKCFIPass(PR);
   initializeSMEABIPass(PR);
   initializeMachineSMEABIPass(PR);
+  initializeAArch64SRLTDefineSuperRegsPass(PR);
   initializeSMEPeepholeOptPass(PR);
   initializeSVEIntrinsicOptsPass(PR);
   initializeAArch64SpeculationHardeningPass(PR);
@@ -462,7 +469,8 @@ AArch64TargetMachine::getSubtargetImpl(const Function &F) const {
     resetTargetOptions(F);
     I = std::make_unique<AArch64Subtarget>(
         TargetTriple, CPU, TuneCPU, FS, *this, isLittle, MinSVEVectorSize,
-        MaxSVEVectorSize, IsStreaming, IsStreamingCompatible, HasMinSize);
+        MaxSVEVectorSize, IsStreaming, IsStreamingCompatible, HasMinSize,
+        EnableSRLTSubregToRegMitigation);
   }
 
   if (IsStreaming && !I->hasSME())
@@ -550,6 +558,7 @@ class AArch64PassConfig : public TargetPassConfig {
   void addMachineSSAOptimization() override;
   bool addILPOpts() override;
   void addPreRegAlloc() override;
+  void addPostRewrite() override;
   void addPostRegAlloc() override;
   void addPreSched2() override;
   void addPreEmitPass() override;
@@ -815,6 +824,11 @@ void AArch64PassConfig::addPreRegAlloc() {
     addPass(&MachinePipelinerID);
 }
 
+void AArch64PassConfig::addPostRewrite() {
+  if (EnableSRLTSubregToRegMitigation)
+    addPass(createAArch64SRLTDefineSuperRegsPass());
+}
+
 void AArch64PassConfig::addPostRegAlloc() {
   // Remove redundant copy instructions.
   if (TM->getOptLevel() != CodeGenOptLevel::None &&
diff --git a/llvm/lib/Target/AArch64/CMakeLists.txt b/llvm/lib/Target/AArch64/CMakeLists.txt
index 3334b3689e03f..2fe554217c1ba 100644
--- a/llvm/lib/Target/AArch64/CMakeLists.txt
+++ b/llvm/lib/Target/AArch64/CMakeLists.txt
@@ -92,6 +92,7 @@ add_llvm_target(AArch64CodeGen
   SMEPeepholeOpt.cpp
   SVEIntrinsicOpts.cpp
   MachineSMEABIPass.cpp
+  AArch64SRLTDefineSuperRegs.cpp
   AArch64SIMDInstrOpt.cpp
   AArch64PrologueEpilogue.cpp
 
diff --git a/llvm/test/CodeGen/AArch64/O3-pipeline.ll b/llvm/test/CodeGen/AArch64/O3-pipeline.ll
index e4249fe4fb1c8..d137b8c9ac1e0 100644
--- a/llvm/test/CodeGen/AArch64/O3-pipeline.ll
...
[truncated]

github-actions · 2026-01-02T08:41:16Z

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:

git-clang-format --diff origin/main HEAD --extensions h,cpp -- llvm/lib/Target/AArch64/AArch64SRLTDefineSuperRegs.cpp llvm/lib/Target/AArch64/AArch64.h llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp llvm/lib/Target/AArch64/AArch64Subtarget.cpp llvm/lib/Target/AArch64/AArch64Subtarget.h llvm/lib/Target/AArch64/AArch64TargetMachine.cpp --diff_from_common_commit

⚠️
The reproduction instructions above might return results for more than one PR
in a stack if you are using a stacked PR workflow. You can limit the results by
changing origin/main to the base branch/commit you want to compare against.
⚠️

View the diff from clang-format here.

diff --git a/llvm/lib/Target/AArch64/AArch64SRLTDefineSuperRegs.cpp b/llvm/lib/Target/AArch64/AArch64SRLTDefineSuperRegs.cpp
index 695155abc..98ad74791 100644
--- a/llvm/lib/Target/AArch64/AArch64SRLTDefineSuperRegs.cpp
+++ b/llvm/lib/Target/AArch64/AArch64SRLTDefineSuperRegs.cpp
@@ -109,8 +109,8 @@ struct AArch64SRLTDefineSuperRegs : public MachineFunctionPass {
   bool runOnMachineFunction(MachineFunction &MF) override;
 
   void collectWidestSuperReg(Register R, const BitVector &RequiredBaseRegUnits,
-                 const BitVector &QHiRegUnits,
-                 SmallSet<Register, 8> &SuperRegs);
+                             const BitVector &QHiRegUnits,
+                             SmallSet<Register, 8> &SuperRegs);
 
   StringRef getPassName() const override { return PASS_NAME; }
 
@@ -132,8 +132,7 @@ private:
 
 } // end anonymous namespace
 
-INITIALIZE_PASS(AArch64SRLTDefineSuperRegs, DEBUG_TYPE, PASS_NAME, false,
-                false)
+INITIALIZE_PASS(AArch64SRLTDefineSuperRegs, DEBUG_TYPE, PASS_NAME, false, false)
 
 SmallVector<MCRegister> SupportedAliasRegs = {
     AArch64::X0,  AArch64::X1,  AArch64::X2,  AArch64::X3,  AArch64::X4,
@@ -184,7 +183,8 @@ void AArch64SRLTDefineSuperRegs::collectWidestSuperReg(
       if (QHiRegUnits.test((unsigned)U) &&
           !Subtarget->isSVEorStreamingSVEAvailable())
         return false;
-      // We consider reg A as a suitable super-reg of B when any of the reg units:
+      // We consider reg A as a suitable super-reg of B when any of the reg
+      // units:
       // * is shared (Q0 and B0, both have B0 as sub-register)
       // * is artificial (Q0 = B0 + (B0_HI + H0_HI + S0_HI))
       // This avoids considering e.g. Q0_Q1 as a super reg of D0 or D1.

sdesmalen-arm · 2026-01-02T08:50:09Z

This is a stacked PR. See other PRs below:

github-actions · 2026-01-02T09:09:34Z

🐧 Linux x64 Test Results

3071 tests passed
7 tests skipped

All executed tests passed, but another part of the build failed. Click on a failure below to see the details.

lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/AArch64SRLTDefineSuperRegs.cpp.o

FAILED: lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/AArch64SRLTDefineSuperRegs.cpp.o
sccache /opt/llvm/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GLIBCXX_USE_CXX11_ABI=1 -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/gha/actions-runner/_work/llvm-project/llvm-project/build/lib/Target/AArch64 -I/home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/lib/Target/AArch64 -I/home/gha/actions-runner/_work/llvm-project/llvm-project/build/include -I/home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/include -gmlt -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wno-pass-failed -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden -UNDEBUG -fno-exceptions -funwind-tables -fno-rtti -MD -MT lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/AArch64SRLTDefineSuperRegs.cpp.o -MF lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/AArch64SRLTDefineSuperRegs.cpp.o.d -o lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/AArch64SRLTDefineSuperRegs.cpp.o -c /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/lib/Target/AArch64/AArch64SRLTDefineSuperRegs.cpp
/home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/lib/Target/AArch64/AArch64SRLTDefineSuperRegs.cpp:128:30: error: private field 'AFI' is not used [-Werror,-Wunused-private-field]
128 |   const AArch64FunctionInfo *AFI = nullptr;
|                              ^
1 error generated.

If these failures are unrelated to your changes (for example tests are broken or flaky at HEAD), please open an issue at https://github.com/llvm/llvm-project/issues and add the infrastructure label.

MacDue

The pass mostly makes sense. Just a few comments:

MacDue · 2026-01-05T11:36:32Z

llvm/lib/Target/AArch64/AArch64SRLTDefineSuperRegs.cpp

+INITIALIZE_PASS(AArch64SRLTDefineSuperRegs, DEBUG_TYPE, PASS_NAME, false,
+                false)
+
+SmallVector<MCRegister> SupportedAliasRegs = {


This does not need to be a (mutable) vector:

Suggested change

SmallVector<MCRegister> SupportedAliasRegs = {

constexpr MCRegister SupportedAliasRegs[] = {

MacDue · 2026-01-05T11:37:35Z

llvm/lib/Target/AArch64/AArch64SRLTDefineSuperRegs.cpp

+    AArch64::Z23, AArch64::Z24, AArch64::Z25, AArch64::Z26, AArch64::Z27,
+    AArch64::Z28, AArch64::Z29, AArch64::Z30, AArch64::Z31};
+
+SmallVector<MCRegister> QHiRegs = {


Suggested change

SmallVector<MCRegister> QHiRegs = {

constexpr MCRegister QHiRegs[] = {

MacDue · 2026-01-05T12:02:59Z

llvm/lib/Target/AArch64/AArch64SRLTDefineSuperRegs.cpp

+  BitVector RequiredBaseRegUnits(TRI->getNumRegUnits());
+  for (Register R : SupportedAliasRegs)
+    for (MCRegUnit U : TRI->regunits(R))
+      RequiredBaseRegUnits.set((unsigned)U);
+
+  BitVector QHiRegUnits(TRI->getNumRegUnits());
+  for (Register R : QHiRegs)
+    for (MCRegUnit U : TRI->regunits(R))
+      QHiRegUnits.set((unsigned)U);


Register classes are contiguous (we already have checks that depend on this). So I think this could be:

Suggested change

BitVector RequiredBaseRegUnits(TRI->getNumRegUnits());

for (Register R : SupportedAliasRegs)

for (MCRegUnit U : TRI->regunits(R))

RequiredBaseRegUnits.set((unsigned)U);

BitVector QHiRegUnits(TRI->getNumRegUnits());

for (Register R : QHiRegs)

for (MCRegUnit U : TRI->regunits(R))

QHiRegUnits.set((unsigned)U);

auto XRegs = seq_inclusive<unsigned>(AArch64::X0, AArch64::X28);

auto ZRegs = seq_inclusive<unsigned>(AArch64::Z0, AArch64::Z31);

auto QHiRegs = seq_inclusive<unsigned>(AArch64::Q0_HI, AArch64::Q31_HI);

constexpr unsigned FixedRegs[] = { AArch64::FP, AArch64::LR, AArch64::SP };

auto SupportedAliasRegs = concat<unsigned>(XRegs, ZRegs, FixedRegs);

for (MCRegister R : SupportedAliasRegs)

for (MCRegUnit U : TRI->regunits(R))

RequiredBaseRegUnits.set((unsigned)U);

BitVector QHiRegUnits(TRI->getNumRegUnits());

for (MCRegister R : QHiRegs)

for (MCRegUnit U : TRI->regunits(R))

QHiRegUnits.set((unsigned)U);

Which I find a little easier to verify covers all registers than hardcoded tables.

MacDue · 2026-01-05T12:59:47Z

llvm/lib/Target/AArch64/AArch64SRLTDefineSuperRegs.cpp

+private:
+  MachineFunction *MF = nullptr;
+  const AArch64Subtarget *Subtarget = nullptr;
+  const AArch64RegisterInfo *TRI = nullptr;
+  const AArch64FunctionInfo *AFI = nullptr;
+  const TargetInstrInfo *TII = nullptr;
+  MachineRegisterInfo *MRI = nullptr;


Most of these are unused. I think the only two that are necessary are:

Suggested change

private:

MachineFunction *MF = nullptr;

const AArch64Subtarget *Subtarget = nullptr;

const AArch64RegisterInfo *TRI = nullptr;

const AArch64FunctionInfo *AFI = nullptr;

const TargetInstrInfo *TII = nullptr;

MachineRegisterInfo *MRI = nullptr;

private:

const AArch64Subtarget *Subtarget = nullptr;

const AArch64RegisterInfo *TRI = nullptr;

MacDue · 2026-01-05T13:08:06Z

llvm/test/CodeGen/AArch64/O3-pipeline.ll

 ; CHECK-NEXT:       Virtual Register Rewriter
 ; CHECK-NEXT:       Register Allocation Pass Scoring
 ; CHECK-NEXT:       Stack Slot Coloring
+; CHECK-NEXT:       AArch64 SRLT Define Super-Regs Pass


Is this intentionally only added to the optimized pipeline?

MacDue · 2026-01-05T13:37:16Z

llvm/lib/Target/AArch64/AArch64SRLTDefineSuperRegs.cpp

+      // For each partial register write (e.g. w0), also add implicit-def for
+      // top bits of register (x0).


nit:

Suggested change

// For each partial register write (e.g. w0), also add implicit-def for

// top bits of register (x0).

// For each partial register write, also add an implicit-def for top bits

// of the register (e.g. for w0 add a def of x0).

MacDue · 2026-01-05T13:44:49Z

llvm/lib/Target/AArch64/AArch64SRLTDefineSuperRegs.cpp

+
+  auto IsSuitableSuperReg = [&](Register SR) {
+    for (MCRegUnit U : TRI->regunits(SR)) {
+      // Avoid choosing z1 as super-reg of d1 if SVE is not available.


nit: Maybe note Q*_HI are only set for SVE vectors (I associate Q with Neon)

MacDue · 2026-01-05T13:58:45Z

llvm/lib/Target/AArch64/AArch64SRLTDefineSuperRegs.cpp

+      if (QHiRegUnits.test((unsigned)U) &&
+          !Subtarget->isSVEorStreamingSVEAvailable())
+        return false;
+      // We consider reg A as a suitable super-reg of B when any of the reg units:


Suggested change

// We consider reg A as a suitable super-reg of B when any of the reg units:

// We consider reg A as a suitable super-reg of B when all of the reg units:

?

Also maybe phrasing this as "we consider a register as an unsuitable super register if any reg unit is non-artificial and not shared. This implies U is a unit for a different register, which means the candidate super-register is likely a register tuple".

sdesmalen-arm requested review from MacDue, davemgreen and gbossu January 2, 2026 08:39

llvmbot added the backend:AArch64 label Jan 2, 2026

sdesmalen-arm mentioned this pull request Jan 2, 2026

[AArch64] Enable subreg liveness tracking for streaming functions. #174189

Open

MacDue reviewed Jan 5, 2026

View reviewed changes

	SmallVector<MCRegister> SupportedAliasRegs = {
	constexpr MCRegister SupportedAliasRegs[] = {

	SmallVector<MCRegister> QHiRegs = {
	constexpr MCRegister QHiRegs[] = {

		// For each partial register write (e.g. w0), also add implicit-def for
		// top bits of register (x0).

	// We consider reg A as a suitable super-reg of B when any of the reg units:
	// We consider reg A as a suitable super-reg of B when all of the reg units:

[AArch64] Add new pass after VirtRegRewriter to add implicit-defs #174188

Are you sure you want to change the base?

[AArch64] Add new pass after VirtRegRewriter to add implicit-defs #174188

Conversation

sdesmalen-arm commented Jan 2, 2026

Uh oh!

llvmbot commented Jan 2, 2026

Uh oh!

github-actions bot commented Jan 2, 2026

Uh oh!

sdesmalen-arm commented Jan 2, 2026

Uh oh!

github-actions bot commented Jan 2, 2026

🐧 Linux x64 Test Results

Uh oh!

MacDue left a comment

Choose a reason for hiding this comment

Uh oh!

MacDue Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

MacDue Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

MacDue Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MacDue Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

MacDue Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

MacDue Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

MacDue Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

MacDue Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

MacDue Jan 5, 2026 •

edited

Loading