[AArch64] Run optimizeTerminators earlier too. #170907

davemgreen · 2025-12-05T19:15:49Z

Running optimizeTerminators prior to other optimizations like branch layout can lead to more folding and better codegen, but is not on its own able to capture all cases. There is benefit to running it in both places. This adds the existing code from #161508 into the AArch64RedundantCopyElimination pass, which sounds like a sensible enough place for it.

llvmbot · 2025-12-05T19:16:30Z

@llvm/pr-subscribers-backend-aarch64

Author: David Green (davemgreen)

Changes

Running optimizeTerminators prior to other optimizations like branch layout can lead to more folding and better codegen, but is not on its own able to capture all cases. There is benefit to running it in both places. This adds the existing code from #161508 into the AArch64RedundantCopyElimination pass, which sounds like a sensible enough place for it.

Patch is 28.87 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/170907.diff

11 Files Affected:

(modified) llvm/lib/Target/AArch64/AArch64InstrInfo.cpp (+43)
(modified) llvm/lib/Target/AArch64/AArch64InstrInfo.h (+2)
(modified) llvm/lib/Target/AArch64/AArch64RedundantCondBranchPass.cpp (+1-45)
(modified) llvm/lib/Target/AArch64/AArch64RedundantCopyElimination.cpp (+5-1)
(modified) llvm/test/CodeGen/AArch64/arm64-shrink-wrapping.ll (+22-52)
(modified) llvm/test/CodeGen/AArch64/block-placement-optimize-branches.ll (+10-22)
(modified) llvm/test/CodeGen/AArch64/lr-reserved-for-ra-live-in.ll (-2)
(modified) llvm/test/CodeGen/AArch64/pr164181.ll (+29-37)
(modified) llvm/test/CodeGen/AArch64/pr166870.ll (+10-11)
(modified) llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/aarch64_generated_funcs.ll.generated.expected (+8-15)
(modified) llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/aarch64_generated_funcs.ll.nogenerated.expected (+8-14)

diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
index 904577b8233d5..84c8141d5e351 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
@@ -11587,6 +11587,49 @@ AArch64InstrInfo::analyzeLoopForPipelining(MachineBasicBlock *LoopBB) const {
       Init, IsUpdatePriorComp, Cond);
 }
 
+bool llvm::optimizeTerminators(MachineBasicBlock *MBB,
+                               const TargetInstrInfo &TII) {
+  for (MachineInstr &MI : MBB->terminators()) {
+    unsigned Opc = MI.getOpcode();
+    switch (Opc) {
+    case AArch64::CBZW:
+    case AArch64::CBZX:
+    case AArch64::TBZW:
+    case AArch64::TBZX:
+      // CBZ/TBZ with WZR/XZR -> unconditional B
+      if (MI.getOperand(0).getReg() == AArch64::WZR ||
+          MI.getOperand(0).getReg() == AArch64::XZR) {
+        MachineBasicBlock *Target = TII.getBranchDestBlock(MI);
+        SmallVector<MachineBasicBlock *> Succs(MBB->successors());
+        for (auto *S : Succs)
+          if (S != Target)
+            MBB->removeSuccessor(S);
+        DebugLoc DL = MI.getDebugLoc();
+        while (MBB->rbegin() != &MI)
+          MBB->rbegin()->eraseFromParent();
+        MI.eraseFromParent();
+        BuildMI(MBB, DL, TII.get(AArch64::B)).addMBB(Target);
+        return true;
+      }
+      break;
+    case AArch64::CBNZW:
+    case AArch64::CBNZX:
+    case AArch64::TBNZW:
+    case AArch64::TBNZX:
+      // CBNZ/TBNZ with WZR/XZR -> never taken, remove branch and successor
+      if (MI.getOperand(0).getReg() == AArch64::WZR ||
+          MI.getOperand(0).getReg() == AArch64::XZR) {
+        MachineBasicBlock *Target = TII.getBranchDestBlock(MI);
+        MI.getParent()->removeSuccessor(Target);
+        MI.eraseFromParent();
+        return true;
+      }
+      break;
+    }
+  }
+  return false;
+}
+
 /// verifyInstruction - Perform target specific instruction verification.
 bool AArch64InstrInfo::verifyInstruction(const MachineInstr &MI,
                                          StringRef &ErrInfo) const {
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.h b/llvm/lib/Target/AArch64/AArch64InstrInfo.h
index 2de2e0d73901f..d237721450644 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.h
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.h
@@ -705,6 +705,8 @@ int isAArch64FrameOffsetLegal(const MachineInstr &MI, StackOffset &Offset,
                               unsigned *OutUnscaledOp = nullptr,
                               int64_t *EmittableOffset = nullptr);
 
+bool optimizeTerminators(MachineBasicBlock *MBB, const TargetInstrInfo &TII);
+
 static inline bool isUncondBranchOpcode(int Opc) { return Opc == AArch64::B; }
 
 static inline bool isCondBranchOpcode(int Opc) {
diff --git a/llvm/lib/Target/AArch64/AArch64RedundantCondBranchPass.cpp b/llvm/lib/Target/AArch64/AArch64RedundantCondBranchPass.cpp
index 1b990796ec9da..1a5a9f0a6018b 100644
--- a/llvm/lib/Target/AArch64/AArch64RedundantCondBranchPass.cpp
+++ b/llvm/lib/Target/AArch64/AArch64RedundantCondBranchPass.cpp
@@ -14,6 +14,7 @@
 //===----------------------------------------------------------------------===//
 
 #include "AArch64.h"
+#include "AArch64InstrInfo.h"
 #include "llvm/CodeGen/MachineFunctionPass.h"
 #include "llvm/CodeGen/MachineInstrBuilder.h"
 #include "llvm/CodeGen/TargetInstrInfo.h"
@@ -45,51 +46,6 @@ INITIALIZE_PASS(AArch64RedundantCondBranch, "aarch64-redundantcondbranch",
                 "AArch64 Redundant Conditional Branch Elimination pass", false,
                 false)
 
-static bool optimizeTerminators(MachineBasicBlock *MBB,
-                                const TargetInstrInfo &TII) {
-  for (MachineInstr &MI : make_early_inc_range(MBB->terminators())) {
-    unsigned Opc = MI.getOpcode();
-    switch (Opc) {
-    case AArch64::CBZW:
-    case AArch64::CBZX:
-    case AArch64::TBZW:
-    case AArch64::TBZX:
-      // CBZ/TBZ with WZR/XZR -> unconditional B
-      if (MI.getOperand(0).getReg() == AArch64::WZR ||
-          MI.getOperand(0).getReg() == AArch64::XZR) {
-        LLVM_DEBUG(dbgs() << "Removing redundant branch: " << MI);
-        MachineBasicBlock *Target = TII.getBranchDestBlock(MI);
-        SmallVector<MachineBasicBlock *> Succs(MBB->successors());
-        for (auto *S : Succs)
-          if (S != Target)
-            MBB->removeSuccessor(S);
-        DebugLoc DL = MI.getDebugLoc();
-        while (MBB->rbegin() != &MI)
-          MBB->rbegin()->eraseFromParent();
-        MI.eraseFromParent();
-        BuildMI(MBB, DL, TII.get(AArch64::B)).addMBB(Target);
-        return true;
-      }
-      break;
-    case AArch64::CBNZW:
-    case AArch64::CBNZX:
-    case AArch64::TBNZW:
-    case AArch64::TBNZX:
-      // CBNZ/TBNZ with WZR/XZR -> never taken, remove branch and successor
-      if (MI.getOperand(0).getReg() == AArch64::WZR ||
-          MI.getOperand(0).getReg() == AArch64::XZR) {
-        LLVM_DEBUG(dbgs() << "Removing redundant branch: " << MI);
-        MachineBasicBlock *Target = TII.getBranchDestBlock(MI);
-        MI.getParent()->removeSuccessor(Target);
-        MI.eraseFromParent();
-        return true;
-      }
-      break;
-    }
-  }
-  return false;
-}
-
 bool AArch64RedundantCondBranch::runOnMachineFunction(MachineFunction &MF) {
   if (skipFunction(MF.getFunction()))
     return false;
diff --git a/llvm/lib/Target/AArch64/AArch64RedundantCopyElimination.cpp b/llvm/lib/Target/AArch64/AArch64RedundantCopyElimination.cpp
index 84015e5061768..9dc721eb7315d 100644
--- a/llvm/lib/Target/AArch64/AArch64RedundantCopyElimination.cpp
+++ b/llvm/lib/Target/AArch64/AArch64RedundantCopyElimination.cpp
@@ -50,6 +50,7 @@
 //        to use WZR/XZR directly in some cases.
 //===----------------------------------------------------------------------===//
 #include "AArch64.h"
+#include "AArch64InstrInfo.h"
 #include "llvm/ADT/SetVector.h"
 #include "llvm/ADT/Statistic.h"
 #include "llvm/ADT/iterator_range.h"
@@ -475,6 +476,7 @@ bool AArch64RedundantCopyElimination::runOnMachineFunction(
     return false;
   TRI = MF.getSubtarget().getRegisterInfo();
   MRI = &MF.getRegInfo();
+  const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();
 
   // Resize the clobbered and used register unit trackers.  We do this once per
   // function.
@@ -484,8 +486,10 @@ bool AArch64RedundantCopyElimination::runOnMachineFunction(
   OptBBUsedRegs.init(*TRI);
 
   bool Changed = false;
-  for (MachineBasicBlock &MBB : MF)
+  for (MachineBasicBlock &MBB : MF) {
+    Changed |= optimizeTerminators(&MBB, TII);
     Changed |= optimizeBlock(&MBB);
+  }
   return Changed;
 }
 
diff --git a/llvm/test/CodeGen/AArch64/arm64-shrink-wrapping.ll b/llvm/test/CodeGen/AArch64/arm64-shrink-wrapping.ll
index 724c8b3fc9170..b837a361bd287 100644
--- a/llvm/test/CodeGen/AArch64/arm64-shrink-wrapping.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-shrink-wrapping.ll
@@ -735,21 +735,15 @@ define void @infiniteloop() {
 ; ENABLE-NEXT:    .cfi_offset w29, -16
 ; ENABLE-NEXT:    .cfi_offset w19, -24
 ; ENABLE-NEXT:    .cfi_offset w20, -32
-; ENABLE-NEXT:  ; %bb.1: ; %if.then
 ; ENABLE-NEXT:    sub x19, sp, #16
 ; ENABLE-NEXT:    mov sp, x19
 ; ENABLE-NEXT:    mov w20, wzr
-; ENABLE-NEXT:  LBB10_2: ; %for.body
+; ENABLE-NEXT:  LBB10_1: ; %for.body
 ; ENABLE-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; ENABLE-NEXT:    bl _something
 ; ENABLE-NEXT:    add w20, w0, w20
 ; ENABLE-NEXT:    str w20, [x19]
-; ENABLE-NEXT:    b LBB10_2
-; ENABLE-NEXT:  ; %bb.3: ; %if.end
-; ENABLE-NEXT:    sub sp, x29, #16
-; ENABLE-NEXT:    ldp x29, x30, [sp, #16] ; 16-byte Folded Reload
-; ENABLE-NEXT:    ldp x20, x19, [sp], #32 ; 16-byte Folded Reload
-; ENABLE-NEXT:    ret
+; ENABLE-NEXT:    b LBB10_1
 ;
 ; DISABLE-LABEL: infiniteloop:
 ; DISABLE:       ; %bb.0: ; %entry
@@ -761,21 +755,15 @@ define void @infiniteloop() {
 ; DISABLE-NEXT:    .cfi_offset w29, -16
 ; DISABLE-NEXT:    .cfi_offset w19, -24
 ; DISABLE-NEXT:    .cfi_offset w20, -32
-; DISABLE-NEXT:  ; %bb.1: ; %if.then
 ; DISABLE-NEXT:    sub x19, sp, #16
 ; DISABLE-NEXT:    mov sp, x19
 ; DISABLE-NEXT:    mov w20, wzr
-; DISABLE-NEXT:  LBB10_2: ; %for.body
+; DISABLE-NEXT:  LBB10_1: ; %for.body
 ; DISABLE-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; DISABLE-NEXT:    bl _something
 ; DISABLE-NEXT:    add w20, w0, w20
 ; DISABLE-NEXT:    str w20, [x19]
-; DISABLE-NEXT:    b LBB10_2
-; DISABLE-NEXT:  ; %bb.3: ; %if.end
-; DISABLE-NEXT:    sub sp, x29, #16
-; DISABLE-NEXT:    ldp x29, x30, [sp, #16] ; 16-byte Folded Reload
-; DISABLE-NEXT:    ldp x20, x19, [sp], #32 ; 16-byte Folded Reload
-; DISABLE-NEXT:    ret
+; DISABLE-NEXT:    b LBB10_1
 entry:
   br i1 undef, label %if.then, label %if.end
 
@@ -806,11 +794,10 @@ define void @infiniteloop2() {
 ; ENABLE-NEXT:    .cfi_offset w29, -16
 ; ENABLE-NEXT:    .cfi_offset w19, -24
 ; ENABLE-NEXT:    .cfi_offset w20, -32
-; ENABLE-NEXT:  ; %bb.1: ; %if.then
 ; ENABLE-NEXT:    sub x8, sp, #16
 ; ENABLE-NEXT:    mov sp, x8
 ; ENABLE-NEXT:    mov w9, wzr
-; ENABLE-NEXT:  LBB11_2: ; %for.body
+; ENABLE-NEXT:  LBB11_1: ; %for.body
 ; ENABLE-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; ENABLE-NEXT:    ; InlineAsm Start
 ; ENABLE-NEXT:    mov x10, #0 ; =0x0
@@ -821,12 +808,7 @@ define void @infiniteloop2() {
 ; ENABLE-NEXT:    ; InlineAsm Start
 ; ENABLE-NEXT:    nop
 ; ENABLE-NEXT:    ; InlineAsm End
-; ENABLE-NEXT:    b LBB11_2
-; ENABLE-NEXT:  ; %bb.3: ; %if.end
-; ENABLE-NEXT:    sub sp, x29, #16
-; ENABLE-NEXT:    ldp x29, x30, [sp, #16] ; 16-byte Folded Reload
-; ENABLE-NEXT:    ldp x20, x19, [sp], #32 ; 16-byte Folded Reload
-; ENABLE-NEXT:    ret
+; ENABLE-NEXT:    b LBB11_1
 ;
 ; DISABLE-LABEL: infiniteloop2:
 ; DISABLE:       ; %bb.0: ; %entry
@@ -838,11 +820,10 @@ define void @infiniteloop2() {
 ; DISABLE-NEXT:    .cfi_offset w29, -16
 ; DISABLE-NEXT:    .cfi_offset w19, -24
 ; DISABLE-NEXT:    .cfi_offset w20, -32
-; DISABLE-NEXT:  ; %bb.1: ; %if.then
 ; DISABLE-NEXT:    sub x8, sp, #16
 ; DISABLE-NEXT:    mov sp, x8
 ; DISABLE-NEXT:    mov w9, wzr
-; DISABLE-NEXT:  LBB11_2: ; %for.body
+; DISABLE-NEXT:  LBB11_1: ; %for.body
 ; DISABLE-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; DISABLE-NEXT:    ; InlineAsm Start
 ; DISABLE-NEXT:    mov x10, #0 ; =0x0
@@ -853,12 +834,7 @@ define void @infiniteloop2() {
 ; DISABLE-NEXT:    ; InlineAsm Start
 ; DISABLE-NEXT:    nop
 ; DISABLE-NEXT:    ; InlineAsm End
-; DISABLE-NEXT:    b LBB11_2
-; DISABLE-NEXT:  ; %bb.3: ; %if.end
-; DISABLE-NEXT:    sub sp, x29, #16
-; DISABLE-NEXT:    ldp x29, x30, [sp, #16] ; 16-byte Folded Reload
-; DISABLE-NEXT:    ldp x20, x19, [sp], #32 ; 16-byte Folded Reload
-; DISABLE-NEXT:    ret
+; DISABLE-NEXT:    b LBB11_1
 entry:
   br i1 undef, label %if.then, label %if.end
 
@@ -889,49 +865,43 @@ if.end:
 define void @infiniteloop3() {
 ; ENABLE-LABEL: infiniteloop3:
 ; ENABLE:       ; %bb.0: ; %entry
-; ENABLE-NEXT:  ; %bb.1: ; %loop2a.preheader
 ; ENABLE-NEXT:    mov x8, xzr
 ; ENABLE-NEXT:    mov x9, xzr
 ; ENABLE-NEXT:    mov x11, xzr
-; ENABLE-NEXT:    b LBB12_3
-; ENABLE-NEXT:  LBB12_2: ; %loop2b
-; ENABLE-NEXT:    ; in Loop: Header=BB12_3 Depth=1
+; ENABLE-NEXT:    b LBB12_2
+; ENABLE-NEXT:  LBB12_1: ; %loop2b
+; ENABLE-NEXT:    ; in Loop: Header=BB12_2 Depth=1
 ; ENABLE-NEXT:    str x10, [x11]
 ; ENABLE-NEXT:    mov x11, x10
-; ENABLE-NEXT:  LBB12_3: ; %loop1
+; ENABLE-NEXT:  LBB12_2: ; %loop1
 ; ENABLE-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; ENABLE-NEXT:    mov x10, x9
 ; ENABLE-NEXT:    ldr x9, [x8]
-; ENABLE-NEXT:    cbnz x8, LBB12_2
-; ENABLE-NEXT:  ; %bb.4: ; in Loop: Header=BB12_3 Depth=1
+; ENABLE-NEXT:    cbnz x8, LBB12_1
+; ENABLE-NEXT:  ; %bb.3: ; in Loop: Header=BB12_2 Depth=1
 ; ENABLE-NEXT:    mov x8, x10
 ; ENABLE-NEXT:    mov x11, x10
-; ENABLE-NEXT:    b LBB12_3
-; ENABLE-NEXT:  ; %bb.5: ; %end
-; ENABLE-NEXT:    ret
+; ENABLE-NEXT:    b LBB12_2
 ;
 ; DISABLE-LABEL: infiniteloop3:
 ; DISABLE:       ; %bb.0: ; %entry
-; DISABLE-NEXT:  ; %bb.1: ; %loop2a.preheader
 ; DISABLE-NEXT:    mov x8, xzr
 ; DISABLE-NEXT:    mov x9, xzr
 ; DISABLE-NEXT:    mov x11, xzr
-; DISABLE-NEXT:    b LBB12_3
-; DISABLE-NEXT:  LBB12_2: ; %loop2b
-; DISABLE-NEXT:    ; in Loop: Header=BB12_3 Depth=1
+; DISABLE-NEXT:    b LBB12_2
+; DISABLE-NEXT:  LBB12_1: ; %loop2b
+; DISABLE-NEXT:    ; in Loop: Header=BB12_2 Depth=1
 ; DISABLE-NEXT:    str x10, [x11]
 ; DISABLE-NEXT:    mov x11, x10
-; DISABLE-NEXT:  LBB12_3: ; %loop1
+; DISABLE-NEXT:  LBB12_2: ; %loop1
 ; DISABLE-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; DISABLE-NEXT:    mov x10, x9
 ; DISABLE-NEXT:    ldr x9, [x8]
-; DISABLE-NEXT:    cbnz x8, LBB12_2
-; DISABLE-NEXT:  ; %bb.4: ; in Loop: Header=BB12_3 Depth=1
+; DISABLE-NEXT:    cbnz x8, LBB12_1
+; DISABLE-NEXT:  ; %bb.3: ; in Loop: Header=BB12_2 Depth=1
 ; DISABLE-NEXT:    mov x8, x10
 ; DISABLE-NEXT:    mov x11, x10
-; DISABLE-NEXT:    b LBB12_3
-; DISABLE-NEXT:  ; %bb.5: ; %end
-; DISABLE-NEXT:    ret
+; DISABLE-NEXT:    b LBB12_2
 entry:
   br i1 undef, label %loop2a, label %body
 
diff --git a/llvm/test/CodeGen/AArch64/block-placement-optimize-branches.ll b/llvm/test/CodeGen/AArch64/block-placement-optimize-branches.ll
index 7c3a567d1b336..6e6fb6f367867 100644
--- a/llvm/test/CodeGen/AArch64/block-placement-optimize-branches.ll
+++ b/llvm/test/CodeGen/AArch64/block-placement-optimize-branches.ll
@@ -8,20 +8,14 @@
 define i8 @foo_optsize(i32 %v4) optsize {
 ; CHECK-LABEL: foo_optsize:
 ; CHECK:       // %bb.0: // %entry
-; CHECK-NEXT:    b .LBB0_2
-; CHECK-NEXT:  .LBB0_1:
-; CHECK-NEXT:    mov w0, wzr
-; CHECK-NEXT:    ret
-; CHECK-NEXT:  .LBB0_2: // %b1
-; CHECK-NEXT:    cbnz w0, .LBB0_4
-; CHECK-NEXT:  // %bb.3: // %b2
+; CHECK-NEXT:    cbnz w0, .LBB0_2
+; CHECK-NEXT:  // %bb.1: // %b2
 ; CHECK-NEXT:    mov w0, #1 // =0x1
 ; CHECK-NEXT:    ret
-; CHECK-NEXT:  .LBB0_4: // %b1
+; CHECK-NEXT:  .LBB0_2: // %b1
 ; CHECK-NEXT:    cmp w0, #1
-; CHECK-NEXT:    b.ne .LBB0_1
-; CHECK-NEXT:  // %bb.5: // %b3
-; CHECK-NEXT:    b .LBB0_1
+; CHECK-NEXT:    mov w0, wzr
+; CHECK-NEXT:    ret
 entry:
   %v2 = icmp eq i32 0, 0
   br i1 %v2, label %b1, label %b4
@@ -47,20 +41,14 @@ b4:
 define i8 @foo_optspeed(i32 %v4) {
 ; CHECK-LABEL: foo_optspeed:
 ; CHECK:       // %bb.0: // %entry
-; CHECK-NEXT:    b .LBB1_2
-; CHECK-NEXT:  .LBB1_1:
-; CHECK-NEXT:    mov w0, wzr
-; CHECK-NEXT:    ret
-; CHECK-NEXT:  .LBB1_2: // %b1
-; CHECK-NEXT:    cbnz w0, .LBB1_4
-; CHECK-NEXT:  // %bb.3: // %b2
+; CHECK-NEXT:    cbnz w0, .LBB1_2
+; CHECK-NEXT:  // %bb.1: // %b2
 ; CHECK-NEXT:    mov w0, #1 // =0x1
 ; CHECK-NEXT:    ret
-; CHECK-NEXT:  .LBB1_4: // %b1
+; CHECK-NEXT:  .LBB1_2: // %b1
 ; CHECK-NEXT:    cmp w0, #1
-; CHECK-NEXT:    b.ne .LBB1_1
-; CHECK-NEXT:  // %bb.5: // %b3
-; CHECK-NEXT:    b .LBB1_1
+; CHECK-NEXT:    mov w0, wzr
+; CHECK-NEXT:    ret
 entry:
   %v2 = icmp eq i32 0, 0
   br i1 %v2, label %b1, label %b4
diff --git a/llvm/test/CodeGen/AArch64/lr-reserved-for-ra-live-in.ll b/llvm/test/CodeGen/AArch64/lr-reserved-for-ra-live-in.ll
index 29427146e8a43..708ba621c26d8 100644
--- a/llvm/test/CodeGen/AArch64/lr-reserved-for-ra-live-in.ll
+++ b/llvm/test/CodeGen/AArch64/lr-reserved-for-ra-live-in.ll
@@ -21,10 +21,8 @@ define i32 @check_lr_liveness(ptr %arg) #1 {
   ; CHECK-NEXT:   B %bb.3
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1.bb:
-  ; CHECK-NEXT:   successors: %bb.3(0x2aaaaaab), %bb.2(0x55555555)
   ; CHECK-NEXT:   liveins: $w0, $lr
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT:   CBNZW $wzr, %bb.3
   ; CHECK-NEXT:   B %bb.2
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.2.bb1:
diff --git a/llvm/test/CodeGen/AArch64/pr164181.ll b/llvm/test/CodeGen/AArch64/pr164181.ll
index abb090ccbcaed..72b2a77e51c06 100644
--- a/llvm/test/CodeGen/AArch64/pr164181.ll
+++ b/llvm/test/CodeGen/AArch64/pr164181.ll
@@ -29,11 +29,11 @@ define void @f(i1 %var_0, i16 %var_1, i64 %var_2, i8 %var_3, i16 %var_4, i1 %var
 ; CHECK-NEXT:    str w4, [sp, #72] // 4-byte Spill
 ; CHECK-NEXT:    str w3, [sp, #112] // 4-byte Spill
 ; CHECK-NEXT:    str w5, [sp, #36] // 4-byte Spill
-; CHECK-NEXT:    tbz w5, #0, .LBB0_43
+; CHECK-NEXT:    tbz w5, #0, .LBB0_40
 ; CHECK-NEXT:  // %bb.1: // %for.body41.lr.ph
 ; CHECK-NEXT:    ldr x4, [sp, #312]
 ; CHECK-NEXT:    ldr x14, [sp, #280]
-; CHECK-NEXT:    tbz w0, #0, .LBB0_42
+; CHECK-NEXT:    tbz w0, #0, .LBB0_39
 ; CHECK-NEXT:  // %bb.2: // %for.body41.us.preheader
 ; CHECK-NEXT:    ldrb w8, [sp, #368]
 ; CHECK-NEXT:    ldrb w12, [sp, #256]
@@ -92,7 +92,7 @@ define void @f(i1 %var_0, i16 %var_1, i64 %var_2, i8 %var_3, i16 %var_4, i1 %var
 ; CHECK-NEXT:    // Child Loop BB0_10 Depth 4
 ; CHECK-NEXT:    // Child Loop BB0_11 Depth 5
 ; CHECK-NEXT:    // Child Loop BB0_28 Depth 5
-; CHECK-NEXT:    // Child Loop BB0_39 Depth 5
+; CHECK-NEXT:    // Child Loop BB0_36 Depth 5
 ; CHECK-NEXT:    ldr w8, [sp, #20] // 4-byte Reload
 ; CHECK-NEXT:    mov x12, x24
 ; CHECK-NEXT:    str x24, [sp, #48] // 8-byte Spill
@@ -117,7 +117,7 @@ define void @f(i1 %var_0, i16 %var_1, i64 %var_2, i8 %var_3, i16 %var_4, i1 %var
 ; CHECK-NEXT:    // Child Loop BB0_10 Depth 4
 ; CHECK-NEXT:    // Child Loop BB0_11 Depth 5
 ; CHECK-NEXT:    // Child Loop BB0_28 Depth 5
-; CHECK-NEXT:    // Child Loop BB0_39 Depth 5
+; CHECK-NEXT:    // Child Loop BB0_36 Depth 5
 ; CHECK-NEXT:    str x12, [sp, #40] // 8-byte Spill
 ; CHECK-NEXT:    cmn x24, #30
 ; CHECK-NEXT:    mov x12, #-30 // =0xffffffffffffffe2
@@ -142,7 +142,7 @@ define void @f(i1 %var_0, i16 %var_1, i64 %var_2, i8 %var_3, i16 %var_4, i1 %var
 ; CHECK-NEXT:    // Child Loop BB0_10 Depth 4
 ; CHECK-NEXT:    // Child Loop BB0_11 Depth 5
 ; CHECK-NEXT:    // Child Loop BB0_28 Depth 5
-; CHECK-NEXT:    // Child Loop BB0_39 Depth 5
+; CHECK-NEXT:    // Child Loop BB0_36 Depth 5
 ; CHECK-NEXT:    ldr x8, [sp, #64] // 8-byte Reload
 ; CHECK-NEXT:    mov w14, #1152 // =0x480
 ; CHECK-NEXT:    mov w24, #1 // =0x1
@@ -176,7 +176,7 @@ define void @f(i1 %var_0, i16 %var_1, i64 %var_2, i8 %var_3, i16 %var_4, i1 %var
 ; CHECK-NEXT:    // => This Loop Header: Depth=4
 ; CHECK-NEXT:    // Child Loop BB0_11 Depth 5
 ; CHECK-NEXT:    // Child Loop BB0_28 Depth 5
-; CHECK-NEXT:    // Child Loop BB0_39 Depth 5
+; CHECK-NEXT:    // Child Loop BB0_36 Depth 5
 ; CHECK-NEXT:    ldr w8, [sp, #116] // 4-byte Reload
 ; CHECK-NEXT:    and w8, w8, w8, asr #31
 ; CHECK-NEXT:    str w8, [sp, #128] // 4-byte Spill
@@ -281,31 +281,23 @@ define void @f(i1 %var_0, i16 %var_1, i64 %var_2, i8 %var_3, i16 %var_4, i1 %var
 ; CHECK-NEXT:    mov x24, xzr
 ; CHECK-NEXT:    mul w12, w12, w22
 ; CHECK-NEXT:    mov x22, x5
-; CHECK-NEXT:    tbz w0, #0, .LBB0_36
-; CHECK-NEXT:  .LBB0_28: // %for.body194.us
+; CHECK-NEXT:    tbz w0, #0, .LBB0_33
+; CHECK-NEXT:  .LBB0_28: // %if.then222.us
 ; CHECK-NEXT:    // Parent Loop BB0_4 Depth=1
 ; CHECK-NEXT:    // Parent Loop BB0_6 Depth=2
 ; CHECK-NEXT:    // Parent Loop BB0_8 Depth=3
 ; CHECK-NEXT:    // Parent Loop BB0_10 Depth=4
 ; CHECK-NEXT:    // => This Inner Loop Header: Depth=5
-; CHECK-NEXT:  // %bb.29: // %if.then222.us
-; CHECK-NEXT:    // in Loop: Header=BB0_28 Depth=5
 ; CHECK-NEXT:    adrp x27, :got:var_32
 ; CHECK-NEXT:    ldur w8, [x19, #-12]
 ; CHECK-NEXT:    ldr x27, [x27, :got_lo12:var_32]
 ; CHECK-NEXT:    strh w8, [x27]
 ; CHECK-NEXT:    sxtb w8, w25
-; CHECK-NEXT:    bic w25, w8, w8, asr #31
-; CHECK-NEXT:    b .LBB0_31
-; CHECK-NEXT:    .p2align 5, , 16
-; CHECK-NEXT:  // %bb.30:
-; CHECK-NEXT:    mov w25, wzr
-; CHECK-NEXT:  .LBB0_31: // %if.end239.us
-; CHECK-NEXT:    // in Loop: Header=BB0_28 Depth=5
 ; CHECK-NEXT:    strb w3, [x16]
+; CHECK-NEXT:    bic w25, w8, w8, asr #31
 ; CHECK-NEXT:    tst w13, #0xff
-; CHECK-NEXT:    b.eq .LBB0_33
-; CHECK-NEXT:  // %bb.32: // %if.then254.us
+; CHECK-NEXT:    b.eq .LBB0_30
+; CHECK-NEXT:  // %bb.29: // %if.then254.us
 ; CHECK-NEXT:    // in Loop: Header=BB0_28 Depth=5
 ; CHECK-NEXT:    ldrh w8, [x26, x14, lsl #1]
 ; CHECK-NEXT:    adrp x27, :got:var_35
@@ -314,7 +306,7 @@ define void @f(i1 %var_0, i16 %var_1, i64 %var_2, i8 %var_3, i16 %var_4, i1 %var
 ; CHECK-NEXT:    csel x8, xzr, x7, eq
 ; CHECK-NEXT:    str x8, [x27]
 ; CHECK-NEXT:    strh w1, [x17]
-; CHECK-NEXT:  .LBB0_33: // %if.end282.us
+; CHECK-NEXT:  .LBB0_30: // %if.end282.us
 ; CHECK-NEXT:    // in Loop: Header=BB0_28 Depth=5
 ; CHECK-NEXT:    orr x27, x24, x4
 ; CHECK-NEXT:    adrp x8, :got:var_39
@@ -325,14 +317,14 @@ define void @f(i1 %var_0, i16 %var_1, i64 %var_2, i8 %var_3, i16 %var_4, i1 %var
 ; CHECK-NEXT:    str x8...
[truncated]

hassnaaHamdi

What about cases like this:

mov w10, wzr
mul x8, x0, x2
stp x8, x9, [x4]
cbz w10, .LBB2_2

Will it be supported in near future ? I think it’s very similar to the case you are handling, right?

hassnaaHamdi · 2025-12-07T00:01:02Z

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp

      Init, IsUpdatePriorComp, Cond);
 }

+bool llvm::optimizeTerminators(MachineBasicBlock *MBB,


Could you move the function to the end of the file? because it's between 2 class(AArch64InstrInfo) member functions while it's not related.

I think this was just here because I wanted verifyInstruction to remain at the end. I've moved this up next to some of the other branch folding functions.

hassnaaHamdi · 2025-12-07T00:22:19Z

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp

+          MI.getOperand(0).getReg() == AArch64::XZR) {
+        MachineBasicBlock *Target = TII.getBranchDestBlock(MI);
+        MI.getParent()->removeSuccessor(Target);
+        MI.eraseFromParent();


Could you add back the 2 dbg logs which where in the old function :’D ? I think they are useful so that we can easily know if a change happens, especially that it’s not easy to notice the change in the machine code when we do print-before/print-after the pass.

Yeah - they were useful. The new file doesn't have a DEBUG_TYPE, so cant as easily use LLVM_DEBUG. I've added DEBUG_WITH_TYPE to keep the debug messages.

davemgreen

What about cases like this:

mov w10, wzr
mul x8, x0, x2
stp x8, x9, [x4]
cbz w10, .LBB2_2

Do you have an example where you see that come up? Ideally we would propagate the wzr to the cbz, and then this optimization should kick-in.

davemgreen · 2025-12-08T08:34:10Z

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp

+          MI.getOperand(0).getReg() == AArch64::XZR) {
+        MachineBasicBlock *Target = TII.getBranchDestBlock(MI);
+        MI.getParent()->removeSuccessor(Target);
+        MI.eraseFromParent();


Yeah - they were useful. The new file doesn't have a DEBUG_TYPE, so cant as easily use LLVM_DEBUG. I've added DEBUG_WITH_TYPE to keep the debug messages.

davemgreen · 2025-12-08T08:35:14Z

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp

      Init, IsUpdatePriorComp, Cond);
 }

+bool llvm::optimizeTerminators(MachineBasicBlock *MBB,


I think this was just here because I wanted verifyInstruction to remain at the end. I've moved this up next to some of the other branch folding functions.

Running optimizeTerminators prior to other optimizations like branch layout can lead to more folding and better codegen, but is not on its own able to capture all cases. There is benefit to running it in both places. This adds the existing code from llvm#161508 into the AArch64RedundantCopyElimination pass, which sounds like a sensible enough place for it.

hassnaaHamdi · 2025-12-08T15:36:27Z

What about cases like this:
mov w10, wzr
mul x8, x0, x2
stp x8, x9, [x4]
cbz w10, .LBB2_2

Do you have an example where you see that come up? Ideally we would propagate the wzr to the cbz, and then this optimization should kick-in.

It's generated out of the mul-overflow test, specifically the BB of %overflow.no:

declare i32 @error()

define i128 @test3(i128 noundef %x, i128 noundef %y, ptr %out) {
entry:
  %0 = tail call { i128, i1 } @llvm.umul.with.overflow.i128(i128 %x, i128 %y)
  %1 = extractvalue { i128, i1 } %0, 0
  store i128 %1, ptr %out
  %2 = extractvalue { i128, i1 } %0, 1
  br i1 %2, label %if.then, label %cleanup

if.then:
  %call = tail call i32 @error()
  %conv1 = sext i32 %call to i128
  br label %cleanup

cleanup:
  %retval.0 = phi i128 [ %conv1, %if.then ], [ 1, %entry ]
  ret i128 %retval.0
}

davemgreen · 2025-12-08T15:42:53Z

I hadn't checked that. We run more copy-propagation at -O3, so it might already do OK there now, after the previous patch:
https://godbolt.org/z/b4Yjc1Gq5

hassnaaHamdi · 2025-12-08T15:53:56Z

Okay, great. Sorry I didn't notice missing the -O3.

usha1830

LGTM, Thanks.

dtellenbach

LGTM, thanks!

aeubanks · 2025-12-09T21:01:56Z

hi, this is causing crashes:

$ cat /tmp/a.ll
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128-Fn32"
target triple = "aarch64-unknown-linux-android29"

declare ptr @llvm.ptr.annotation.p0.p0(ptr, ptr, ptr, i32, ptr) #0

define i1 @foo(i1 %arg) {
bb:
  br i1 %arg, label %bb1, label %bb2

bb1:                                              ; preds = %bb
  ret i1 false

bb2:                                              ; preds = %bb
  %call = call ptr @llvm.ptr.annotation.p0.p0(ptr poison, ptr null, ptr null, i32 0, ptr null)
  store ptr null, ptr %call, align 8
  %call3 = call ptr @llvm.ptr.annotation.p0.p0(ptr poison, ptr null, ptr null, i32 0, ptr null)
  %load = load ptr, ptr %call3, align 8
  %icmp = icmp eq ptr %load, null
  br i1 %icmp, label %bb6, label %bb4

bb4:                                              ; preds = %bb2
  %call5 = call ptr @llvm.ptr.annotation.p0.p0(ptr null, ptr null, ptr null, i32 0, ptr null)
  br label %bb6

bb6:                                              ; preds = %bb4, %bb2
  %icmp7 = icmp eq ptr null, null
  ret i1 %icmp7
}
$ llc -o /dev/null /tmp/a.ll
llc: ../../llvm/include/llvm/Support/GenericDomTree.h:535: NodeT *llvm::DominatorTreeBase<llvm::MachineBasicBlock, false>::findNearestCommonDominator(NodeT *, NodeT *) const [NodeT = llvm::MachineBasicBlock, IsPostDom = false]: Assertion `NodeB && "B must be in the tree"' failed.

This reverts commit ca927e5.

Reverts #170907 Causes crashes, see #170907 (comment)

…171505) Reverts llvm/llvm-project#170907 Causes crashes, see llvm/llvm-project#170907 (comment)

aeubanks · 2025-12-10T03:22:35Z

I've reverted in #171505

davemgreen requested review from dtellenbach, hassnaaHamdi, momchil-velikov, nasherm and usha1830 December 5, 2025 19:15

llvmbot added the backend:AArch64 label Dec 5, 2025

hassnaaHamdi reviewed Dec 7, 2025

View reviewed changes

davemgreen force-pushed the gh-a64-cbzwzrearly branch from d02860f to 816ec85 Compare December 8, 2025 08:55

davemgreen commented Dec 8, 2025

View reviewed changes

davemgreen force-pushed the gh-a64-cbzwzrearly branch from 816ec85 to 8dcef94 Compare December 8, 2025 09:13

llvmbot added the llvm:codegen label Dec 8, 2025

hassnaaHamdi approved these changes Dec 8, 2025

View reviewed changes

usha1830 approved these changes Dec 8, 2025

View reviewed changes

dtellenbach approved these changes Dec 8, 2025

View reviewed changes

davemgreen merged commit ca927e5 into llvm:main Dec 9, 2025
10 of 11 checks passed

davemgreen deleted the gh-a64-cbzwzrearly branch December 9, 2025 09:11

aeubanks added a commit that referenced this pull request Dec 9, 2025

Revert "[AArch64] Run optimizeTerminators earlier too. (#170907)"

47153a5

This reverts commit ca927e5.

aeubanks mentioned this pull request Dec 9, 2025

Revert "[AArch64] Run optimizeTerminators earlier too." #171505

Merged

aeubanks added a commit that referenced this pull request Dec 10, 2025

Revert "[AArch64] Run optimizeTerminators earlier too." (#171505)

53cd4ab

Reverts #170907 Causes crashes, see #170907 (comment)

llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Dec 10, 2025

Automerge: Revert "[AArch64] Run optimizeTerminators earlier too." (#…

3ecf3f8

…171505) Reverts llvm/llvm-project#170907 Causes crashes, see llvm/llvm-project#170907 (comment)

[AArch64] Run optimizeTerminators earlier too. #170907

[AArch64] Run optimizeTerminators earlier too. #170907

Conversation

davemgreen commented Dec 5, 2025

Uh oh!

llvmbot commented Dec 5, 2025

Uh oh!

hassnaaHamdi left a comment

Choose a reason for hiding this comment

Uh oh!

hassnaaHamdi Dec 7, 2025

Choose a reason for hiding this comment

Uh oh!

davemgreen Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

hassnaaHamdi Dec 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davemgreen Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

davemgreen left a comment

Choose a reason for hiding this comment

Uh oh!

davemgreen Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

davemgreen Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

hassnaaHamdi commented Dec 8, 2025

Uh oh!

davemgreen commented Dec 8, 2025

Uh oh!

hassnaaHamdi commented Dec 8, 2025

Uh oh!

usha1830 left a comment

Choose a reason for hiding this comment

Uh oh!

dtellenbach left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

aeubanks commented Dec 9, 2025

Uh oh!

aeubanks commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

hassnaaHamdi Dec 7, 2025 •

edited

Loading