Skip to content

Conversation

@sdesmalen-arm
Copy link
Collaborator

@sdesmalen-arm sdesmalen-arm commented Nov 17, 2025

A SUBREG_TO_REG instruction expresses that the top bits of the result register are set to a certain value (e.g. 0).

The example below expresses that the result of %1 will have the top 32 bits zeroed and the lower 32bits being equal to the result of INSTR.

    %0:gpr32 = INSTR
    %1:gpr64 = SUBREG_TO_REG 0, %0, sub32

When the RegisterCoalescer tries to remove SUBREG_TO_REG instructions by coalescing %0 into %1, it must keep the same semantics. Currently however, the RegisterCoalescer would emit:

    %1.sub32:gpr64 = INSTR

which no longer expresses that the top 32-bits of the register are defined (zeroed) by INSTR.

This may cause issues with e.g. machine copy propagation where the pass may think it can remove a COPY-like instruction because the MIR says only the bottom 32-bits are defined/used, even though other uses of the register rely on the top 32-bits being zeroed by the COPY-like instruction.

This PR changes the RegisterCoalescer to instead emit:

    undef %1.sub32:gpr64 = MOVimm32 42, implicit-def %1

to express that the entire contents of %1:gpr64 are defined by the instruction.

This tries to reland #134408 which had to be reverted to a few reported failures.

…register when coalescing SUBREG_TO_REG" (#134408)""

This reverts commit ed5bd23.
The register coalescer ran into some asserts:

* The newly added code tried to get the LiveInterval for a physical
  register (unguarded path)

* The assert 'assert(SubregToRegSrcInsts.empty() && "can this happen?");'
  could happen when using SUBREG_TO_REG to say that the top bits
  of the second register in a {128, 128} register tuple are zero, e.g.

  %8.qsub1:qq = MOVIv2d_ns 0
  %4:zpr = SUBREG_TO_REG 0, %8.qsub1:qq, %subreg.zsub
There's no point in doing it for something like this:

    %1:gpr64sp = ORRXri %0, 8128
    %3:gpr64 = SUBREG_TO_REG 0, %1.sub_32, %subreg.sub_32

where ORRXri already writes to a gpr64 register class (although one that
also includes the sp).
otherwise it results in a machine verifier failure when
splitting the liverange and rematerialising a MOVi32imm
instruction.

This is only an issue when enabling subreg liveness tracking.
@llvmbot
Copy link
Member

llvmbot commented Nov 17, 2025

@llvm/pr-subscribers-backend-powerpc
@llvm/pr-subscribers-backend-x86
@llvm/pr-subscribers-backend-loongarch

@llvm/pr-subscribers-backend-aarch64

Author: Sander de Smalen (sdesmalen-arm)

Changes

Try to reland #134408, which had to be reverted to a few reported failures.

The patches 7228ceb and 9478968 each fix issues that were reported (with corresponding reproducers/tests). aebc086 was found when enabling subreg liveness tracking for AArch64 (still disabled by default).

Given the small changes to the original PR, I wasn't sure whether I could just push it, but I figured I'd play it safe instead.


Patch is 191.89 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/168353.diff

33 Files Affected:

  • (modified) llvm/lib/CodeGen/RegisterCoalescer.cpp (+160-14)
  • (modified) llvm/lib/CodeGen/SplitKit.cpp (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/GlobalISel/arm64-pcsections.ll (+32-32)
  • (modified) llvm/test/CodeGen/AArch64/arm64-neon-copy.ll (+10-10)
  • (modified) llvm/test/CodeGen/AArch64/implicit-def-subreg-to-reg-regression.ll (+2-2)
  • (added) llvm/test/CodeGen/AArch64/pr151592.mir (+168)
  • (added) llvm/test/CodeGen/AArch64/pr151888.mir (+17)
  • (added) llvm/test/CodeGen/AArch64/pr164181-reduced.ll (+183)
  • (modified) llvm/test/CodeGen/AArch64/pr164181.ll (+63-60)
  • (modified) llvm/test/CodeGen/AArch64/preserve_nonecc_varargs_darwin.ll (+5-5)
  • (added) llvm/test/CodeGen/AArch64/register-coalesce-implicit-def-subreg-to-reg.mir (+45)
  • (modified) llvm/test/CodeGen/AArch64/register-coalesce-update-subranges-remat.mir (+158-3)
  • (modified) llvm/test/CodeGen/LoongArch/lasx/build-vector.ll (+7-7)
  • (modified) llvm/test/CodeGen/LoongArch/lasx/fpowi.ll (+8-8)
  • (modified) llvm/test/CodeGen/LoongArch/lasx/scalar-to-vector.ll (+2-2)
  • (modified) llvm/test/CodeGen/PowerPC/aix-vec_insert_elt.ll (+4)
  • (modified) llvm/test/CodeGen/PowerPC/build-vector-tests.ll (+48)
  • (modified) llvm/test/CodeGen/PowerPC/canonical-merge-shuffles.ll (+6)
  • (modified) llvm/test/CodeGen/PowerPC/combine-fneg.ll (+1)
  • (modified) llvm/test/CodeGen/PowerPC/fp-strict-round.ll (+6)
  • (modified) llvm/test/CodeGen/PowerPC/frem.ll (+3)
  • (modified) llvm/test/CodeGen/PowerPC/froundeven-legalization.ll (+8)
  • (modified) llvm/test/CodeGen/PowerPC/half.ll (+1)
  • (modified) llvm/test/CodeGen/PowerPC/ldexp.ll (+2)
  • (modified) llvm/test/CodeGen/PowerPC/llvm.modf.ll (+1)
  • (modified) llvm/test/CodeGen/PowerPC/vec_insert_elt.ll (+4)
  • (modified) llvm/test/CodeGen/PowerPC/vector-constrained-fp-intrinsics.ll (+176)
  • (added) llvm/test/CodeGen/X86/coalescer-breaks-subreg-to-reg-liveness.ll (+185)
  • (modified) llvm/test/CodeGen/X86/coalescer-implicit-def-regression-imp-operand-assert.mir (+3-3)
  • (added) llvm/test/CodeGen/X86/coalescing-subreg-to-reg-requires-subrange-update.mir (+44)
  • (added) llvm/test/CodeGen/X86/pr76416.ll (+79)
  • (modified) llvm/test/CodeGen/X86/subreg-fail.mir (+2-2)
  • (added) llvm/test/CodeGen/X86/subreg-to-reg-coalescing.mir (+451)
diff --git a/llvm/lib/CodeGen/RegisterCoalescer.cpp b/llvm/lib/CodeGen/RegisterCoalescer.cpp
index 25c4375a73ce0..47921da9fc8fc 100644
--- a/llvm/lib/CodeGen/RegisterCoalescer.cpp
+++ b/llvm/lib/CodeGen/RegisterCoalescer.cpp
@@ -307,7 +307,12 @@ class RegisterCoalescer : private LiveRangeEdit::Delegate {
   /// number if it is not zero. If DstReg is a physical register and the
   /// existing subregister number of the def / use being updated is not zero,
   /// make sure to set it to the correct physical subregister.
-  void updateRegDefsUses(Register SrcReg, Register DstReg, unsigned SubIdx);
+  ///
+  /// If \p SubregToRegSrcInst is not empty, we are coalescing a
+  /// `DstReg = SUBREG_TO_REG SrcReg`, which should introduce an
+  /// implicit-def of DstReg on instructions that define SrcReg.
+  void updateRegDefsUses(Register SrcReg, Register DstReg, unsigned SubIdx,
+                         ArrayRef<MachineInstr *> SubregToRegSrcInst = {});
 
   /// If the given machine operand reads only undefined lanes add an undef
   /// flag.
@@ -1444,6 +1449,7 @@ bool RegisterCoalescer::reMaterializeDef(const CoalescerPair &CP,
 
   // CopyMI may have implicit operands, save them so that we can transfer them
   // over to the newly materialized instruction after CopyMI is removed.
+  LaneBitmask NewMIImplicitOpsMask;
   SmallVector<MachineOperand, 4> ImplicitOps;
   ImplicitOps.reserve(CopyMI->getNumOperands() -
                       CopyMI->getDesc().getNumOperands());
@@ -1458,6 +1464,9 @@ bool RegisterCoalescer::reMaterializeDef(const CoalescerPair &CP,
               (MO.getSubReg() == 0 && MO.getReg() == DstOperand.getReg())) &&
              "unexpected implicit virtual register def");
       ImplicitOps.push_back(MO);
+      if (MO.isDef() && MO.getReg().isVirtual() &&
+          MRI->shouldTrackSubRegLiveness(DstReg))
+        NewMIImplicitOpsMask |= MRI->getMaxLaneMaskForVReg(MO.getReg());
     }
   }
 
@@ -1494,14 +1503,11 @@ bool RegisterCoalescer::reMaterializeDef(const CoalescerPair &CP,
       } else {
         assert(MO.getReg() == NewMI.getOperand(0).getReg());
 
-        // We're only expecting another def of the main output, so the range
-        // should get updated with the regular output range.
-        //
-        // FIXME: The range updating below probably needs updating to look at
-        // the super register if subranges are tracked.
-        assert(!MRI->shouldTrackSubRegLiveness(DstReg) &&
-               "subrange update for implicit-def of super register may not be "
-               "properly handled");
+        // If lanemasks need to be tracked, compile the lanemask of the NewMI
+        // implicit def operands to avoid subranges for the super-regs from
+        // being removed by code later on in this function.
+        if (MRI->shouldTrackSubRegLiveness(MO.getReg()))
+          NewMIImplicitOpsMask |= MRI->getMaxLaneMaskForVReg(MO.getReg());
       }
     }
   }
@@ -1617,7 +1623,8 @@ bool RegisterCoalescer::reMaterializeDef(const CoalescerPair &CP,
           *LIS->getSlotIndexes(), *TRI);
 
       for (LiveInterval::SubRange &SR : DstInt.subranges()) {
-        if ((SR.LaneMask & DstMask).none()) {
+        if ((SR.LaneMask & DstMask).none() &&
+            (SR.LaneMask & NewMIImplicitOpsMask).none()) {
           LLVM_DEBUG(dbgs()
                      << "Removing undefined SubRange "
                      << PrintLaneMask(SR.LaneMask) << " : " << SR << "\n");
@@ -1891,11 +1898,14 @@ void RegisterCoalescer::addUndefFlag(const LiveInterval &Int, SlotIndex UseIdx,
   }
 }
 
-void RegisterCoalescer::updateRegDefsUses(Register SrcReg, Register DstReg,
-                                          unsigned SubIdx) {
+void RegisterCoalescer::updateRegDefsUses(
+    Register SrcReg, Register DstReg, unsigned SubIdx,
+    ArrayRef<MachineInstr *> SubregToRegSrcInsts) {
   bool DstIsPhys = DstReg.isPhysical();
   LiveInterval *DstInt = DstIsPhys ? nullptr : &LIS->getInterval(DstReg);
 
+  // Coalescing a COPY may expose reads of 'undef' subregisters.
+  // If so, then explicitly propagate 'undef' to those operands.
   if (DstInt && DstInt->hasSubRanges() && DstReg != SrcReg) {
     for (MachineOperand &MO : MRI->reg_operands(DstReg)) {
       if (MO.isUndef())
@@ -1912,6 +1922,15 @@ void RegisterCoalescer::updateRegDefsUses(Register SrcReg, Register DstReg,
     }
   }
 
+  // If DstInt already has a subrange for the unused lanes, then we shouldn't
+  // create duplicate subranges when we update the interval for unused lanes.
+  LaneBitmask DstIntLaneMask;
+  if (DstInt && MRI->shouldTrackSubRegLiveness(DstReg)) {
+    for (LiveInterval::SubRange &SR : DstInt->subranges())
+      DstIntLaneMask |= SR.LaneMask;
+  }
+
+  // Go through all instructions to replace uses of 'SrcReg' by 'DstReg'.
   SmallPtrSet<MachineInstr *, 8> Visited;
   for (MachineRegisterInfo::reg_instr_iterator I = MRI->reg_instr_begin(SrcReg),
                                                E = MRI->reg_instr_end();
@@ -1935,6 +1954,79 @@ void RegisterCoalescer::updateRegDefsUses(Register SrcReg, Register DstReg,
     if (DstInt && !Reads && SubIdx && !UseMI->isDebugInstr())
       Reads = DstInt->liveAt(LIS->getInstructionIndex(*UseMI));
 
+    bool RequiresImplicitRedef = false;
+    if (!SubregToRegSrcInsts.empty()) {
+      // We can only add an implicit-def and undef if the sub registers match,
+      // e.g.
+      //  %0:gr32      = INSTX
+      //  %0.sub8:gr32 = INSTY           // top 24 bits of %0 still defined
+      //  %1:gr64      = SUBREG_TO_REG 0, %0, %subreg.sub32
+      //
+      // This cannot be transformed into:
+      //  %1.sub32:gr64      = INSTX
+      //  undef %1.sub8:gr64 = INSTY , implicit-def %1
+      //
+      // Because that would thrash the top 24 bits of %1.sub32.
+      if (is_contained(SubregToRegSrcInsts, UseMI) &&
+          all_of(UseMI->defs(), [&SubIdx](const MachineOperand &MO) -> bool {
+            if (MO.isUndef())
+              return true;
+            return SubIdx && (!MO.getSubReg() || SubIdx == MO.getSubReg());
+          })) {
+        // Add implicit-def of super-register to express that the whole
+        // register is defined by the instruction.
+        MachineInstrBuilder MIB(*MF, UseMI);
+        MIB.addReg(DstReg, RegState::ImplicitDefine);
+        RequiresImplicitRedef = true;
+      }
+
+      // If the coalesed instruction doesn't fully define the register, we need
+      // to preserve the original super register liveness for SUBREG_TO_REG.
+      //
+      // We pretended SUBREG_TO_REG was a regular copy for coalescing purposes,
+      // but it introduces liveness for other subregisters. Downstream users may
+      // have been relying on those bits, so we need to ensure their liveness is
+      // captured with a def of other lanes.
+      if (DstInt && MRI->shouldTrackSubRegLiveness(DstReg)) {
+        // First check if there is sufficient granularity in terms of subranges.
+        LaneBitmask DstMask = MRI->getMaxLaneMaskForVReg(DstInt->reg());
+        LaneBitmask UsedLanes = TRI->getSubRegIndexLaneMask(SubIdx);
+        LaneBitmask UnusedLanes = DstMask & ~UsedLanes;
+        if ((UnusedLanes & ~DstIntLaneMask).any()) {
+          BumpPtrAllocator &Allocator = LIS->getVNInfoAllocator();
+          DstInt->createSubRangeFrom(Allocator, UnusedLanes, *DstInt);
+          DstIntLaneMask |= UnusedLanes;
+        }
+
+        // After duplicating the live ranges for the low/hi bits, we
+        // need to update the subranges of the DstReg interval such that
+        // for a case like this:
+        //
+        //       entry:
+        //  16B    %1:gpr32 = INSTRUCTION    (<=> UseMI)
+        //            :
+        //       if.then:
+        //  32B    %1:gpr32 = MOVIMM32 ..
+        //  48B    %0:gpr64 = SUBREG_TO_REG 0, %1, sub32
+        //
+        //  Only the MOVIMM32 require a def of the top lanes and any intervals
+        //  for the top 32-bits of the def at 16B should be removed.
+        for (LiveInterval::SubRange &SR : DstInt->subranges()) {
+          if (!Writes || RequiresImplicitRedef ||
+              (SR.LaneMask & UnusedLanes).none())
+            continue;
+
+          assert((SR.LaneMask & UnusedLanes) == SR.LaneMask &&
+                 "Unexpected lanemask. Subrange needs finer granularity");
+
+          SlotIndex UseIdx = LIS->getInstructionIndex(*UseMI).getRegSlot(false);
+          auto SegmentI = SR.find(UseIdx);
+          if (SegmentI != SR.end())
+            SR.removeSegment(SegmentI, true);
+        }
+      }
+    }
+
     // Replace SrcReg with DstReg in all UseMI operands.
     for (unsigned Op : Ops) {
       MachineOperand &MO = UseMI->getOperand(Op);
@@ -1943,7 +2035,7 @@ void RegisterCoalescer::updateRegDefsUses(Register SrcReg, Register DstReg,
       // turn a full def into a read-modify-write sub-register def and vice
       // versa.
       if (SubIdx && MO.isDef())
-        MO.setIsUndef(!Reads);
+        MO.setIsUndef(!Reads || RequiresImplicitRedef);
 
       // A subreg use of a partially undef (super) register may be a complete
       // undef use now and then has to be marked that way.
@@ -2046,6 +2138,30 @@ void RegisterCoalescer::setUndefOnPrunedSubRegUses(LiveInterval &LI,
   LIS->shrinkToUses(&LI);
 }
 
+/// For a given use of value \p Idx, it returns the def in the current block,
+/// or otherwise all possible defs in preceding blocks.
+static bool FindDefInBlock(SmallPtrSetImpl<MachineBasicBlock *> &VisitedBlocks,
+                           SmallVector<MachineInstr *> &Instrs,
+                           LiveIntervals *LIS, LiveInterval &SrcInt,
+                           MachineBasicBlock *MBB, VNInfo *Idx) {
+  if (!Idx->isPHIDef()) {
+    MachineInstr *Def = LIS->getInstructionFromIndex(Idx->def);
+    assert(Def && "Unable to find a def for SUBREG_TO_REG source operand");
+    Instrs.push_back(Def);
+    return true;
+  }
+
+  bool Any = false;
+  if (VisitedBlocks.count(MBB))
+    return false;
+  VisitedBlocks.insert(MBB);
+  for (MachineBasicBlock *Pred : MBB->predecessors()) {
+    Any |= FindDefInBlock(VisitedBlocks, Instrs, LIS, SrcInt, Pred,
+                          SrcInt.getVNInfoBefore(LIS->getMBBEndIdx(Pred)));
+  }
+  return Any;
+}
+
 bool RegisterCoalescer::joinCopy(
     MachineInstr *CopyMI, bool &Again,
     SmallPtrSetImpl<MachineInstr *> &CurrentErasedInstrs) {
@@ -2183,6 +2299,35 @@ bool RegisterCoalescer::joinCopy(
     });
   }
 
+  SmallVector<MachineInstr *> SubregToRegSrcInsts;
+  Register SrcReg = CP.isFlipped() ? CP.getDstReg() : CP.getSrcReg();
+  if (CopyMI->isSubregToReg() && !SrcReg.isPhysical()) {
+    // For the case where the copy instruction is a SUBREG_TO_REG, e.g.
+    //
+    //   %0:gpr32 = movimm32 ..
+    //   %1:gpr64 = SUBREG_TO_REG 0, %0, sub32
+    //   ...
+    //   %0:gpr32 = COPY <something>
+    //
+    // After joining liveranges, the original `movimm32` will need an
+    // implicit-def to make it explicit that the entire register is written,
+    // i.e.
+    //
+    //   undef %0.sub32:gpr64 = movimm32 ..., implicit-def %0
+    //   ...
+    //   undef %0.sub32:gpr64 = COPY <something>  // Note that this does not
+    //                                            // require an implicit-def,
+    //                                            // because it has nothing to
+    //                                            // do with the SUBREG_TO_REG.
+    LiveInterval &SrcInt = LIS->getInterval(SrcReg);
+    SlotIndex SubregToRegSlotIdx = LIS->getInstructionIndex(*CopyMI);
+    SmallPtrSet<MachineBasicBlock *, 8> VisitedBlocks;
+    if (!FindDefInBlock(VisitedBlocks, SubregToRegSrcInsts, LIS, SrcInt,
+                        CopyMI->getParent(),
+                        SrcInt.Query(SubregToRegSlotIdx).valueIn()))
+      llvm_unreachable("SUBREG_TO_REG src requires a def");
+  }
+
   ShrinkMask = LaneBitmask::getNone();
   ShrinkMainRange = false;
 
@@ -2253,7 +2398,8 @@ bool RegisterCoalescer::joinCopy(
   // Also update DstReg operands to include DstIdx if it is set.
   if (CP.getDstIdx())
     updateRegDefsUses(CP.getDstReg(), CP.getDstReg(), CP.getDstIdx());
-  updateRegDefsUses(CP.getSrcReg(), CP.getDstReg(), CP.getSrcIdx());
+  updateRegDefsUses(CP.getSrcReg(), CP.getDstReg(), CP.getSrcIdx(),
+                    SubregToRegSrcInsts);
 
   // Shrink subregister ranges if necessary.
   if (ShrinkMask.any()) {
diff --git a/llvm/lib/CodeGen/SplitKit.cpp b/llvm/lib/CodeGen/SplitKit.cpp
index 8ec4bfbb5a330..cf064b90a7d34 100644
--- a/llvm/lib/CodeGen/SplitKit.cpp
+++ b/llvm/lib/CodeGen/SplitKit.cpp
@@ -448,7 +448,7 @@ void SplitEditor::addDeadDef(LiveInterval &LI, VNInfo *VNI, bool Original) {
     const MachineInstr *DefMI = LIS.getInstructionFromIndex(Def);
     assert(DefMI != nullptr);
     LaneBitmask LM;
-    for (const MachineOperand &DefOp : DefMI->defs()) {
+    for (const MachineOperand &DefOp : DefMI->all_defs()) {
       Register R = DefOp.getReg();
       if (R != LI.reg())
         continue;
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-pcsections.ll b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-pcsections.ll
index 57481724936a3..cab2741be9929 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-pcsections.ll
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-pcsections.ll
@@ -12,7 +12,7 @@ define i32 @val_compare_and_swap(ptr %p, i32 %cmp, i32 %new) {
   ; CHECK-NEXT:   successors: %bb.2(0x7ffff800), %bb.3(0x00000800)
   ; CHECK-NEXT:   liveins: $w1, $w2, $x0
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT:   renamable $w8 = LDAXRW renamable $x0, implicit-def $x8, pcsections !0 :: (volatile load (s32) from %ir.p)
+  ; CHECK-NEXT:   renamable $w8 = LDAXRW renamable $x0, implicit-def renamable $x8, pcsections !0 :: (volatile load (s32) from %ir.p)
   ; CHECK-NEXT:   $wzr = SUBSWrs renamable $w8, renamable $w1, 0, implicit-def $nzcv, pcsections !0
   ; CHECK-NEXT:   Bcc 1, %bb.3, implicit killed $nzcv, pcsections !0
   ; CHECK-NEXT: {{  $}}
@@ -46,13 +46,13 @@ define i32 @val_compare_and_swap_from_load(ptr %p, i32 %cmp, ptr %pnew) {
   ; CHECK-NEXT:   successors: %bb.1(0x80000000)
   ; CHECK-NEXT:   liveins: $w1, $x0, $x2
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT:   renamable $w9 = LDRWui killed renamable $x2, 0, implicit-def $x9, pcsections !0 :: (load (s32) from %ir.pnew)
+  ; CHECK-NEXT:   renamable $w9 = LDRWui killed renamable $x2, 0, implicit-def renamable $x9, pcsections !0 :: (load (s32) from %ir.pnew)
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1.cmpxchg.start:
   ; CHECK-NEXT:   successors: %bb.2(0x7ffff800), %bb.3(0x00000800)
   ; CHECK-NEXT:   liveins: $w1, $x0, $x9
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT:   renamable $w8 = LDAXRW renamable $x0, implicit-def $x8, pcsections !0 :: (volatile load (s32) from %ir.p)
+  ; CHECK-NEXT:   renamable $w8 = LDAXRW renamable $x0, implicit-def renamable $x8, pcsections !0 :: (volatile load (s32) from %ir.p)
   ; CHECK-NEXT:   $wzr = SUBSWrs renamable $w8, renamable $w1, 0, implicit-def $nzcv, pcsections !0
   ; CHECK-NEXT:   Bcc 1, %bb.3, implicit killed $nzcv, pcsections !0
   ; CHECK-NEXT: {{  $}}
@@ -91,7 +91,7 @@ define i32 @val_compare_and_swap_rel(ptr %p, i32 %cmp, i32 %new) {
   ; CHECK-NEXT:   successors: %bb.2(0x7ffff800), %bb.3(0x00000800)
   ; CHECK-NEXT:   liveins: $w1, $w2, $x0
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT:   renamable $w8 = LDAXRW renamable $x0, implicit-def $x8, pcsections !0 :: (volatile load (s32) from %ir.p)
+  ; CHECK-NEXT:   renamable $w8 = LDAXRW renamable $x0, implicit-def renamable $x8, pcsections !0 :: (volatile load (s32) from %ir.p)
   ; CHECK-NEXT:   $wzr = SUBSWrs renamable $w8, renamable $w1, 0, implicit-def $nzcv, pcsections !0
   ; CHECK-NEXT:   Bcc 1, %bb.3, implicit killed $nzcv, pcsections !0
   ; CHECK-NEXT: {{  $}}
@@ -243,7 +243,7 @@ define i32 @fetch_and_nand(ptr %p) {
   ; CHECK-NEXT:  successors: %bb.1(0x7c000000), %bb.2(0x04000000)
   ; CHECK-NEXT:   liveins: $x0
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT:   renamable $w8 = LDXRW renamable $x0, implicit-def $x8, pcsections !0 :: (volatile load (s32) from %ir.p)
+  ; CHECK-NEXT:   renamable $w8 = LDXRW renamable $x0, implicit-def renamable $x8, pcsections !0 :: (volatile load (s32) from %ir.p)
   ; CHECK-NEXT:   renamable $w9 = ANDWri renamable $w8, 2, pcsections !0
   ; CHECK-NEXT:   $w9 = ORNWrs $wzr, killed renamable $w9, 0, pcsections !0
   ; CHECK-NEXT:   early-clobber renamable $w10 = STLXRW killed renamable $w9, renamable $x0, pcsections !0 :: (volatile store (s32) into %ir.p)
@@ -295,7 +295,7 @@ define i32 @fetch_and_or(ptr %p) {
   ; CHECK-NEXT:  successors: %bb.1(0x7c000000), %bb.2(0x04000000)
   ; CHECK-NEXT:   liveins: $w9, $x0
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT:   renamable $w8 = LDAXRW renamable $x0, implicit-def $x8, pcsections !0 :: (volatile load (s32) from %ir.p)
+  ; CHECK-NEXT:   renamable $w8 = LDAXRW renamable $x0, implicit-def renamable $x8, pcsections !0 :: (volatile load (s32) from %ir.p)
   ; CHECK-NEXT:   $w10 = ORRWrs renamable $w8, renamable $w9, 0, pcsections !0
   ; CHECK-NEXT:   early-clobber renamable $w11 = STLXRW killed renamable $w10, renamable $x0, pcsections !0 :: (volatile store (s32) into %ir.p)
   ; CHECK-NEXT:   CBNZW killed renamable $w11, %bb.1, pcsections !0
@@ -726,7 +726,7 @@ define i8 @atomicrmw_add_i8(ptr %ptr, i8 %rhs) {
   ; CHECK-NEXT:  successors: %bb.1(0x7c000000), %bb.2(0x04000000)
   ; CHECK-NEXT:   liveins: $w1, $x0
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT:   renamable $w8 = LDAXRB renamable $x0, implicit-def $x8, pcsections !0 :: (volatile load (s8) from %ir.ptr)
+  ; CHECK-NEXT:   renamable $w8 = LDAXRB renamable $x0, implicit-def renamable $x8, pcsections !0 :: (volatile load (s8) from %ir.ptr)
   ; CHECK-NEXT:   $w9 = ADDWrs renamable $w8, renamable $w1, 0, pcsections !0
   ; CHECK-NEXT:   early-clobber renamable $w10 = STLXRB killed renamable $w9, renamable $x0, pcsections !0 :: (volatile store (s8) into %ir.ptr)
   ; CHECK-NEXT:   CBNZW killed renamable $w10, %bb.1, pcsections !0
@@ -750,7 +750,7 @@ define i8 @atomicrmw_xchg_i8(ptr %ptr, i8 %rhs) {
   ; CHECK-NEXT:  successors: %bb.1(0x7c000000), %bb.2(0x04000000)
   ; CHECK-NEXT:   liveins: $w1, $x0
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT:   renamable $w8 = LDXRB renamable $x0, implicit-def $x8, pcsections !0 :: (volatile load (s8) from %ir.ptr)
+  ; CHECK-NEXT:   renamable $w8 = LDXRB renamable $x0, implicit-def renamable $x8, pcsections !0 :: (volatile load (s8) from %ir.ptr)
   ; CHECK-NEXT:   early-clobber renamable $w9 = STXRB renamable $w1, renamable $x0, pcsections !0 :: (volatile store (s8) into %ir.ptr)
   ; CHECK-NEXT:   CBNZW killed renamable $w9, %bb.1, pcsections !0
   ; CHECK-NEXT: {{  $}}
@@ -773,7 +773,7 @@ define i8 @atomicrmw_sub_i8(ptr %ptr, i8 %rhs) {
   ; CHECK-NEXT:  successors: %bb.1(0x7c000000), %bb.2(0x04000000)
   ; CHECK-NEXT:   liveins: $w1, $x0
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT:   renamable $w8 = LDAXRB renamable $x0, implicit-def $x8, pcsections !0 :: (volatile load (s8) from %ir.ptr)
+  ; CHECK-NEXT:   renamable $w8 = LDAXRB renamable $x0, implicit-def renamable $x8, pcsections !0 :: (volatile load (s8) from %ir.ptr)
   ; CHECK-NEXT:   $w9 = SUBWrs renamable $w8, renamable $w1, 0, pcsections !0
   ; CHECK-NEXT:   early-clobber renamable $w10 = STXRB killed renamable $w9, renamable $x0, pcsections !0 :: (volatile store (s8) into %ir.ptr)
   ; CHECK-NEXT:   CBNZW killed renamable $w10, %bb.1, pcsections !0
@@ -797,7 +797,7 @@ define i8 @atomicrmw_and_i8(ptr %ptr, i8 %rhs) {
   ; CHECK-NEXT:  successors: %bb.1(0x7c000000), %bb.2(0x04000000)
   ; CHECK-NEXT:   liveins: $w1, $x0
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT:   renamable $w8 = LDXRB renamable $x0, implicit-def $x8, pcsections !0 :: (volatile load (s8) from %ir.ptr)
+  ; CHECK-NEXT:   renamable $w8 = LDXRB renamable $x0, implicit-def renamable $x8, pcsections !0 :: (volatile load (s8) from %ir.ptr)
   ; CHECK-NEXT:   $w9 = ANDWrs renamable $w8, renamable $w1, 0, pcsections !0
   ; CHECK-NEXT:   early-clobber renamable $w10 = STLXRB killed renamable $w9, renamable $x0, pcsections !0 :: (volatile store (s8) into %ir.ptr)
   ; CHECK-NEXT:   CBNZW killed renamable $w10, %bb.1, pcsections !0
@@ -821,7 +821,7 @@ define i8 @atomicrmw_or_i8(ptr %ptr, i8 %rhs) {
   ; CHECK-NEXT:  successors: %bb.1(0x7c000000), %bb.2(0x04000000)
   ; CHECK-NEXT:   liveins: $w1, $x0
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT:   renamable $w8 = LDAXRB renamabl...
[truncated]

Copy link
Member

@MacDue MacDue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the new fixes look good 👍

I just have some general questions/nits about the initial patch (as I didn't review it before, and it's a fairly tricky change). If @arsenm is happy with the change as-is, you can skip my suggestions.

For the PR description, I think it'd be good to include the initial description, since that'd help explain the change in a git blame. Also, not sure if the commit references should be included, since they are not commits in the LLVM repo. Maybe just explain the changes from the initial PR directly in the description.

//
// This cannot be transformed into:
// %1.sub32:gr64 = INSTX
// undef %1.sub8:gr64 = INSTY , implicit-def %1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it specifically the undef that's a problem here? Could you note that undef here means the subreg write won't read other parts (it's not something I can find well documented, other than as a footnote on some slides: https://llvm.org/devmtg/2017-10/slides/Braun-Welcome%20to%20the%20Back%20End.pdf).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The undef means that the 64-bits of %1 are considered undef before the assigning the .sub8 result, and the implicit-def %1 would mean that the entire contents (not just the low 8bits) would be written. So as a consequence of either/both, the other bits of %1 are not read. Either way, I've updated the wording.

Comment on lines +2005 to +2010
// entry:
// 16B %1:gpr32 = INSTRUCTION (<=> UseMI)
// :
// if.then:
// 32B %1:gpr32 = MOVIMM32 ..
// 48B %0:gpr64 = SUBREG_TO_REG 0, %1, sub32
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming in this example %1 is being coalesced to directly write to %0? However, I'm not sure what the relevance of the entry/if.then basic blocks is? Would this be the same if everything was in one block.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's right. The example of the values being defined in different blocks came from a reproducer, but the same might also happen in the one block.

@jayfoad
Copy link
Contributor

jayfoad commented Nov 18, 2025

Try to reland #134408, which had to be reverted to a few reported failures.

The patches 7228ceb and 9478968 each fix issues that were reported (with corresponding reproducers/tests). aebc086 was found when enabling subreg liveness tracking for AArch64 (still disabled by default).

Given the small changes to the original PR, I wasn't sure whether I could just push it, but I figured I'd play it safe instead.

Just a high level comment: since this is now the 8th attempt to land this patch over more than 2 years, would it make sense to restate what this patch does and why? Otherwise you have to follow a very long trail of breadcrumbs to find any kind of explanation.

@github-actions
Copy link

github-actions bot commented Nov 18, 2025

🐧 Linux x64 Test Results

  • 186316 tests passed
  • 4853 tests skipped

It should have started the worklist with the predecessors of MBB,
not MBB itself.
Copy link
Member

@MacDue MacDue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new fixes make sense to me 👍
Just one more question about the initial patch:

//
// because the undef means that none of the bits of %1 are read, thus
// thrashing the top 24 bits of %1.sub32.
if (is_contained(SubregToRegSrcInsts, UseMI) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to use a SmallPtrSet<MachineInstr *> for SubregToRegSrcInsts given it's main use is to check is_contained? That may scale better (given this is checked for each UseMI).

; CHECK-GI: // %bb.0: // %entry
; CHECK-GI-NEXT: mov v2.16b, v1.16b
; CHECK-GI-NEXT: // kill: def $d0 killed $d0 def $q0
; CHECK-GI-NEXT: // kill: def $d0 killed $d0 def $q0 def $q0 def $q0 def $q0 def $q0 def $q0 def $q0 def $q0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can addRegisterDefined() help with these multiple identical defs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants