Skip to content

Conversation

@dtellenbach
Copy link
Member

This patch adds codegen for CBB and CBH, CB variants operating on bytes and half-words, allowing to fold sign- and zero-extensions.

Since if-conversion needs to be able to undo conditional branches, we remember possibly folded sign- and zero-extensions as additional arguments of the CBB and CBH pseudos during codegen.

This patch adds codegen for CBB and CBH, CB variants operating on bytes
and half-words, allowing to fold sign- and zero-extensions.

Since if-conversion needs to be able to undo conditional branches, we
remember possibly folded sign- and zero-extensions as additional
arguments of the CBB and CBH pseudos during codegen.
@llvmbot
Copy link
Member

llvmbot commented Oct 23, 2025

@llvm/pr-subscribers-backend-aarch64

Author: David Tellenbach (dtellenbach)

Changes

This patch adds codegen for CBB and CBH, CB variants operating on bytes and half-words, allowing to fold sign- and zero-extensions.

Since if-conversion needs to be able to undo conditional branches, we remember possibly folded sign- and zero-extensions as additional arguments of the CBB and CBH pseudos during codegen.


Patch is 72.97 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/164899.diff

9 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp (+36-25)
  • (modified) llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp (+28)
  • (modified) llvm/lib/Target/AArch64/AArch64InstrFormats.td (+20-2)
  • (modified) llvm/lib/Target/AArch64/AArch64InstrInfo.cpp (+116-13)
  • (modified) llvm/lib/Target/AArch64/AArch64InstrInfo.h (+2)
  • (modified) llvm/lib/Target/AArch64/AArch64InstrInfo.td (+31-17)
  • (modified) llvm/test/CodeGen/AArch64/cmpbr-early-ifcvt.mir (+433-2)
  • (modified) llvm/test/CodeGen/AArch64/cmpbr-reg-reg.ll (+681-80)
  • (added) llvm/test/CodeGen/AArch64/cmpbr-zext-sext.ll (+230)
diff --git a/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp b/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
index c31a090bba77f..ca8ac4c21e4e5 100644
--- a/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
+++ b/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
@@ -2677,23 +2677,32 @@ AArch64AsmPrinter::lowerBlockAddressConstant(const BlockAddress &BA) {
 
 void AArch64AsmPrinter::emitCBPseudoExpansion(const MachineInstr *MI) {
   bool IsImm = false;
-  bool Is32Bit = false;
+  unsigned Width = 0;
 
   switch (MI->getOpcode()) {
   default:
     llvm_unreachable("This is not a CB pseudo instruction");
+  case AArch64::CBBPrr:
+    IsImm = false;
+    Width = 8;
+    break;
+  case AArch64::CBHPrr:
+    IsImm = false;
+    Width = 16;
+    break;
   case AArch64::CBWPrr:
-    Is32Bit = true;
+    Width = 32;
     break;
   case AArch64::CBXPrr:
-    Is32Bit = false;
+    Width = 64;
     break;
   case AArch64::CBWPri:
     IsImm = true;
-    Is32Bit = true;
+    Width = 32;
     break;
   case AArch64::CBXPri:
     IsImm = true;
+    Width = 64;
     break;
   }
 
@@ -2703,61 +2712,61 @@ void AArch64AsmPrinter::emitCBPseudoExpansion(const MachineInstr *MI) {
   bool NeedsImmDec = false;
   bool NeedsImmInc = false;
 
+#define GET_CB_OPC(IsImm, Width, ImmCond, RegCond)                             \
+  (IsImm                                                                       \
+       ? (Width == 32 ? AArch64::CB##ImmCond##Wri : AArch64::CB##ImmCond##Xri) \
+       : (Width == 8                                                           \
+              ? AArch64::CBB##RegCond##Wrr                                     \
+              : (Width == 16 ? AArch64::CBH##RegCond##Wrr                      \
+                             : (Width == 32 ? AArch64::CB##RegCond##Wrr        \
+                                            : AArch64::CB##RegCond##Xrr))))
+  unsigned MCOpC;
+
   // Decide if we need to either swap register operands or increment/decrement
   // immediate operands
-  unsigned MCOpC;
   switch (CC) {
   default:
     llvm_unreachable("Invalid CB condition code");
   case AArch64CC::EQ:
-    MCOpC = IsImm ? (Is32Bit ? AArch64::CBEQWri : AArch64::CBEQXri)
-                  : (Is32Bit ? AArch64::CBEQWrr : AArch64::CBEQXrr);
+    MCOpC = GET_CB_OPC(IsImm, Width, /* Reg-Imm */ EQ, /* Reg-Reg */ EQ);
     break;
   case AArch64CC::NE:
-    MCOpC = IsImm ? (Is32Bit ? AArch64::CBNEWri : AArch64::CBNEXri)
-                  : (Is32Bit ? AArch64::CBNEWrr : AArch64::CBNEXrr);
+    MCOpC = GET_CB_OPC(IsImm, Width, /* Reg-Imm */ NE, /* Reg-Reg */ NE);
     break;
   case AArch64CC::HS:
-    MCOpC = IsImm ? (Is32Bit ? AArch64::CBHIWri : AArch64::CBHIXri)
-                  : (Is32Bit ? AArch64::CBHSWrr : AArch64::CBHSXrr);
+    MCOpC = GET_CB_OPC(IsImm, Width, /* Reg-Imm */ HI, /* Reg-Reg */ HS);
     NeedsImmDec = IsImm;
     break;
   case AArch64CC::LO:
-    MCOpC = IsImm ? (Is32Bit ? AArch64::CBLOWri : AArch64::CBLOXri)
-                  : (Is32Bit ? AArch64::CBHIWrr : AArch64::CBHIXrr);
+    MCOpC = GET_CB_OPC(IsImm, Width, /* Reg-Imm */ LO, /* Reg-Reg */ HI);
     NeedsRegSwap = !IsImm;
     break;
   case AArch64CC::HI:
-    MCOpC = IsImm ? (Is32Bit ? AArch64::CBHIWri : AArch64::CBHIXri)
-                  : (Is32Bit ? AArch64::CBHIWrr : AArch64::CBHIXrr);
+    MCOpC = GET_CB_OPC(IsImm, Width, /* Reg-Imm */ HI, /* Reg-Reg */ HI);
     break;
   case AArch64CC::LS:
-    MCOpC = IsImm ? (Is32Bit ? AArch64::CBLOWri : AArch64::CBLOXri)
-                  : (Is32Bit ? AArch64::CBHSWrr : AArch64::CBHSXrr);
+    MCOpC = GET_CB_OPC(IsImm, Width, /* Reg-Imm */ LO, /* Reg-Reg */ HS);
     NeedsRegSwap = !IsImm;
     NeedsImmInc = IsImm;
     break;
   case AArch64CC::GE:
-    MCOpC = IsImm ? (Is32Bit ? AArch64::CBGTWri : AArch64::CBGTXri)
-                  : (Is32Bit ? AArch64::CBGEWrr : AArch64::CBGEXrr);
+    MCOpC = GET_CB_OPC(IsImm, Width, /* Reg-Imm */ GT, /* Reg-Reg */ GE);
     NeedsImmDec = IsImm;
     break;
   case AArch64CC::LT:
-    MCOpC = IsImm ? (Is32Bit ? AArch64::CBLTWri : AArch64::CBLTXri)
-                  : (Is32Bit ? AArch64::CBGTWrr : AArch64::CBGTXrr);
+    MCOpC = GET_CB_OPC(IsImm, Width, /* Reg-Imm */ LT, /* Reg-Reg */ GT);
     NeedsRegSwap = !IsImm;
     break;
   case AArch64CC::GT:
-    MCOpC = IsImm ? (Is32Bit ? AArch64::CBGTWri : AArch64::CBGTXri)
-                  : (Is32Bit ? AArch64::CBGTWrr : AArch64::CBGTXrr);
+    MCOpC = GET_CB_OPC(IsImm, Width, /* Reg-Imm */ GT, /* Reg-Reg */ GT);
     break;
   case AArch64CC::LE:
-    MCOpC = IsImm ? (Is32Bit ? AArch64::CBLTWri : AArch64::CBLTXri)
-                  : (Is32Bit ? AArch64::CBGEWrr : AArch64::CBGEXrr);
+    MCOpC = GET_CB_OPC(IsImm, Width, /* Reg-Imm */ LT, /* Reg-Reg */ GE);
     NeedsRegSwap = !IsImm;
     NeedsImmInc = IsImm;
     break;
   }
+#undef GET_CB_OPC
 
   MCInst Inst;
   Inst.setOpcode(MCOpC);
@@ -3422,6 +3431,8 @@ void AArch64AsmPrinter::emitInstruction(const MachineInstr *MI) {
   }
   case AArch64::CBWPri:
   case AArch64::CBXPri:
+  case AArch64::CBBPrr:
+  case AArch64::CBHPrr:
   case AArch64::CBWPrr:
   case AArch64::CBXPrr:
     emitCBPseudoExpansion(MI);
diff --git a/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp b/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
index e7b2d20e2a6cb..56ab4fa7364b8 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
@@ -513,6 +513,9 @@ class AArch64DAGToDAGISel : public SelectionDAGISel {
   bool SelectAnyPredicate(SDValue N);
 
   bool SelectCmpBranchUImm6Operand(SDNode *P, SDValue N, SDValue &Imm);
+
+  template <bool MatchCBB>
+  bool SelectCmpBranchExtOperand(SDValue N, SDValue &Reg, SDValue &ExtType);
 };
 
 class AArch64DAGToDAGISelLegacy : public SelectionDAGISelLegacy {
@@ -7697,3 +7700,28 @@ bool AArch64DAGToDAGISel::SelectCmpBranchUImm6Operand(SDNode *P, SDValue N,
 
   return false;
 }
+
+template <bool MatchCBB>
+bool AArch64DAGToDAGISel::SelectCmpBranchExtOperand(SDValue N, SDValue &Reg,
+                                                    SDValue &ExtType) {
+
+  // Use an invalid shift-extend value to indicate we don't need to extend later
+  if (N.getOpcode() == ISD::AssertZext || N.getOpcode() == ISD::AssertSext) {
+    Reg = N.getOperand(0);
+    ExtType = CurDAG->getSignedTargetConstant(AArch64_AM::InvalidShiftExtend,
+                                              SDLoc(N), MVT::i32);
+    return true;
+  }
+
+  AArch64_AM::ShiftExtendType ET = getExtendTypeForNode(N);
+
+  if ((MatchCBB && (ET == AArch64_AM::UXTB || ET == AArch64_AM::SXTB)) ||
+      (!MatchCBB && (ET == AArch64_AM::UXTH || ET == AArch64_AM::SXTH))) {
+    Reg = N.getOperand(0);
+    ExtType =
+        CurDAG->getTargetConstant(getExtendEncoding(ET), SDLoc(N), MVT::i32);
+    return true;
+  }
+
+  return false;
+}
diff --git a/llvm/lib/Target/AArch64/AArch64InstrFormats.td b/llvm/lib/Target/AArch64/AArch64InstrFormats.td
index 09ce713ea862c..941df069ec3e1 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrFormats.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrFormats.td
@@ -415,6 +415,12 @@ def CmpBranchUImm6Operand_64b
   let WantsParent = true;
 }
 
+def CmpBranchBExtOperand
+    : ComplexPattern<i32, 2, "SelectCmpBranchExtOperand<true>", []> {}
+
+def CmpBranchHExtOperand
+    : ComplexPattern<i32, 2, "SelectCmpBranchExtOperand<false>", []> {}
+
 def UImm6Plus1Operand : AsmOperandClass {
   let Name = "UImm6P1";
   let DiagnosticType = "InvalidImm1_64";
@@ -13118,8 +13124,20 @@ multiclass CmpBranchRegisterAlias<string mnemonic, string insn> {
 }
 
 class CmpBranchRegisterPseudo<RegisterClass regtype>
-  : Pseudo<(outs), (ins ccode:$Cond, regtype:$Rt, regtype:$Rm, am_brcmpcond:$Target), []>,
-    Sched<[WriteBr]> {
+    : Pseudo<(outs),
+             (ins ccode:$Cond, regtype:$Rt, regtype:$Rm, am_brcmpcond:$Target),
+             []>,
+      Sched<[WriteBr]> {
+  let isBranch = 1;
+  let isTerminator = 1;
+}
+
+class CmpBranchExtRegisterPseudo
+    : Pseudo<(outs),
+             (ins ccode:$Cond, GPR32:$Rt, GPR32:$Rm, am_brcmpcond:$Target,
+                 simm8_32b:$ExtRt, simm8_32b:$ExtRm),
+             []>,
+      Sched<[WriteBr]> {
   let isBranch = 1;
   let isTerminator = 1;
 }
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
index d5117da524231..5cd66098826fa 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
@@ -241,6 +241,17 @@ static void parseCondBranch(MachineInstr *LastInst, MachineBasicBlock *&Target,
     Cond.push_back(LastInst->getOperand(1));
     Cond.push_back(LastInst->getOperand(2));
     break;
+  case AArch64::CBBPrr:
+  case AArch64::CBHPrr:
+    Target = LastInst->getOperand(3).getMBB();
+    Cond.push_back(MachineOperand::CreateImm(-1));                    // -1
+    Cond.push_back(MachineOperand::CreateImm(LastInst->getOpcode())); // Opc
+    Cond.push_back(LastInst->getOperand(0));                          // Cond
+    Cond.push_back(LastInst->getOperand(1));                          // Op0
+    Cond.push_back(LastInst->getOperand(2));                          // Op1
+    Cond.push_back(LastInst->getOperand(4));                          // Ext0
+    Cond.push_back(LastInst->getOperand(5));                          // Ext1
+    break;
   }
 }
 
@@ -264,6 +275,8 @@ static unsigned getBranchDisplacementBits(unsigned Opc) {
     return BCCDisplacementBits;
   case AArch64::CBWPri:
   case AArch64::CBXPri:
+  case AArch64::CBBPrr:
+  case AArch64::CBHPrr:
   case AArch64::CBWPrr:
   case AArch64::CBXPrr:
     return CBDisplacementBits;
@@ -298,6 +311,8 @@ AArch64InstrInfo::getBranchDestBlock(const MachineInstr &MI) const {
     return MI.getOperand(1).getMBB();
   case AArch64::CBWPri:
   case AArch64::CBXPri:
+  case AArch64::CBBPrr:
+  case AArch64::CBHPrr:
   case AArch64::CBWPrr:
   case AArch64::CBXPrr:
     return MI.getOperand(3).getMBB();
@@ -580,9 +595,11 @@ bool AArch64InstrInfo::reverseBranchCondition(
       Cond[1].setImm(AArch64::TBZX);
       break;
 
-    // Cond is { -1, Opcode, CC, Op0, Op1 }
+    // Cond is { -1, Opcode, CC, Op0, Op1, ... }
     case AArch64::CBWPri:
     case AArch64::CBXPri:
+    case AArch64::CBBPrr:
+    case AArch64::CBHPrr:
     case AArch64::CBWPrr:
     case AArch64::CBXPrr: {
       // Pseudos using standard 4bit Arm condition codes
@@ -654,6 +671,12 @@ void AArch64InstrInfo::instantiateCondBranch(
       MIB.add(Cond[4]);
 
     MIB.addMBB(TBB);
+
+    // cb[b,h]
+    if (Cond.size() > 5) {
+      MIB.addImm(Cond[5].getImm());
+      MIB.addImm(Cond[6].getImm());
+    }
   }
 }
 
@@ -931,44 +954,122 @@ void AArch64InstrInfo::insertSelect(MachineBasicBlock &MBB,
     // We must insert a cmp, that is a subs
     //            0       1   2    3    4
     // Cond is { -1, Opcode, CC, Op0, Op1 }
-    unsigned SUBSOpC, SUBSDestReg;
+
+    unsigned SubsOpc, SubsDestReg;
     bool IsImm = false;
     CC = static_cast<AArch64CC::CondCode>(Cond[2].getImm());
     switch (Cond[1].getImm()) {
     default:
       llvm_unreachable("Unknown branch opcode in Cond");
     case AArch64::CBWPri:
-      SUBSOpC = AArch64::SUBSWri;
-      SUBSDestReg = AArch64::WZR;
+      SubsOpc = AArch64::SUBSWri;
+      SubsDestReg = AArch64::WZR;
       IsImm = true;
       break;
     case AArch64::CBXPri:
-      SUBSOpC = AArch64::SUBSXri;
-      SUBSDestReg = AArch64::XZR;
+      SubsOpc = AArch64::SUBSXri;
+      SubsDestReg = AArch64::XZR;
       IsImm = true;
       break;
     case AArch64::CBWPrr:
-      SUBSOpC = AArch64::SUBSWrr;
-      SUBSDestReg = AArch64::WZR;
+      SubsOpc = AArch64::SUBSWrr;
+      SubsDestReg = AArch64::WZR;
       IsImm = false;
       break;
     case AArch64::CBXPrr:
-      SUBSOpC = AArch64::SUBSXrr;
-      SUBSDestReg = AArch64::XZR;
+      SubsOpc = AArch64::SUBSXrr;
+      SubsDestReg = AArch64::XZR;
       IsImm = false;
       break;
     }
 
     if (IsImm)
-      BuildMI(MBB, I, DL, get(SUBSOpC), SUBSDestReg)
+      BuildMI(MBB, I, DL, get(SubsOpc), SubsDestReg)
           .addReg(Cond[3].getReg())
           .addImm(Cond[4].getImm())
           .addImm(0);
     else
-      BuildMI(MBB, I, DL, get(SUBSOpC), SUBSDestReg)
+      BuildMI(MBB, I, DL, get(SubsOpc), SubsDestReg)
           .addReg(Cond[3].getReg())
           .addReg(Cond[4].getReg());
-  }
+  } break;
+  case 7: { // cb[b,h]
+    // We must insert a cmp, that is a subs, but also zero- or sign-extensions
+    // that have been folded. For the first operand we codegen an explicit
+    // extension, for the second operand we fold the extension into cmp.
+    //            0       1   2    3    4    5      6
+    // Cond is { -1, Opcode, CC, Op0, Op1, Ext0, Ext1 }
+
+    // We need a new register for the now explicitly extended register
+    Register Reg = Cond[4].getReg();
+    if (Cond[5].getImm() != AArch64_AM::InvalidShiftExtend) {
+      unsigned ExtOpc;
+      unsigned ExtBits;
+      AArch64_AM::ShiftExtendType ExtendType =
+          AArch64_AM::getExtendType(Cond[5].getImm());
+      switch (ExtendType) {
+      default:
+        llvm_unreachable("Unkown shift-extend for CB instruction");
+      case AArch64_AM::SXTB:
+        assert(
+            Cond[1].getImm() == AArch64::CBBPrr &&
+            "Unexpected compare-and-branch instruction for SXTB shift-extend");
+        ExtOpc = AArch64::SBFMWri;
+        ExtBits = AArch64_AM::encodeLogicalImmediate(0xff, 32);
+        break;
+      case AArch64_AM::SXTH:
+        assert(
+            Cond[1].getImm() == AArch64::CBHPrr &&
+            "Unexpected compare-and-branch instruction for SXTH shift-extend");
+        ExtOpc = AArch64::SBFMWri;
+        ExtBits = AArch64_AM::encodeLogicalImmediate(0xffff, 32);
+        break;
+      case AArch64_AM::UXTB:
+        assert(
+            Cond[1].getImm() == AArch64::CBBPrr &&
+            "Unexpected compare-and-branch instruction for UXTB shift-extend");
+        ExtOpc = AArch64::ANDWri;
+        ExtBits = AArch64_AM::encodeLogicalImmediate(0xff, 32);
+        break;
+      case AArch64_AM::UXTH:
+        assert(
+            Cond[1].getImm() == AArch64::CBHPrr &&
+            "Unexpected compare-and-branch instruction for UXTH shift-extend");
+        ExtOpc = AArch64::ANDWri;
+        ExtBits = AArch64_AM::encodeLogicalImmediate(0xffff, 32);
+        break;
+      }
+
+      // Build the explicit extension of the first operand
+      Reg = MRI.createVirtualRegister(&AArch64::GPR32spRegClass);
+      MachineInstrBuilder MBBI =
+          BuildMI(MBB, I, DL, get(ExtOpc), Reg).addReg(Cond[4].getReg());
+      if (ExtOpc != AArch64::ANDWri)
+        MBBI.addImm(0);
+      MBBI.addImm(ExtBits);
+    }
+
+    // Now, subs with an extended second operand
+    if (Cond[6].getImm() != AArch64_AM::InvalidShiftExtend) {
+      AArch64_AM::ShiftExtendType ExtendType =
+          AArch64_AM::getExtendType(Cond[6].getImm());
+      MRI.constrainRegClass(Reg, MRI.getRegClass(Cond[3].getReg()));
+      MRI.constrainRegClass(Cond[3].getReg(), &AArch64::GPR32spRegClass);
+      BuildMI(MBB, I, DL, get(AArch64::SUBSWrx), AArch64::WZR)
+          .addReg(Cond[3].getReg())
+          .addReg(Reg)
+          .addImm(ExtendType);
+    } // If no extension is needed, just a regular subs
+    else {
+      MRI.constrainRegClass(Reg, MRI.getRegClass(Cond[3].getReg()));
+      MRI.constrainRegClass(Cond[3].getReg(), &AArch64::GPR32spRegClass);
+      BuildMI(MBB, I, DL, get(AArch64::SUBSWrr), AArch64::WZR)
+          .addReg(Cond[3].getReg())
+          .addReg(Reg);
+    }
+
+    CC = static_cast<AArch64CC::CondCode>(Cond[2].getImm());
+  } break;
   }
 
   unsigned Opc = 0;
@@ -9251,6 +9352,8 @@ bool AArch64InstrInfo::optimizeCondBranch(MachineInstr &MI) const {
   case AArch64::Bcc:
   case AArch64::CBWPri:
   case AArch64::CBXPri:
+  case AArch64::CBBPrr:
+  case AArch64::CBHPrr:
   case AArch64::CBWPrr:
   case AArch64::CBXPrr:
     return false;
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.h b/llvm/lib/Target/AArch64/AArch64InstrInfo.h
index 179574a73aa01..13d022a288a83 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.h
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.h
@@ -712,6 +712,8 @@ static inline bool isCondBranchOpcode(int Opc) {
   case AArch64::TBNZX:
   case AArch64::CBWPri:
   case AArch64::CBXPri:
+  case AArch64::CBBPrr:
+  case AArch64::CBHPrr:
   case AArch64::CBWPrr:
   case AArch64::CBXPrr:
     return true;
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.td b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
index 92f260f408674..7abf359a3ac8c 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
@@ -11152,23 +11152,37 @@ let Predicates = [HasCMPBR] in {
  defm : CmpBranchWRegisterAlias<"cbhlt", "CBHGT">;
 
   // Pseudos for codegen
-  def CBWPrr : CmpBranchRegisterPseudo<GPR32>;
-  def CBXPrr : CmpBranchRegisterPseudo<GPR64>;
-  def CBWPri : CmpBranchImmediatePseudo<GPR32, uimm6_32b>;
-  def CBXPri : CmpBranchImmediatePseudo<GPR64, uimm6_64b>;
-
-  def : Pat<(AArch64CB i32:$Cond, GPR32:$Rn, CmpBranchUImm6Operand_32b:$Imm,
-                bb:$Target),
-            (CBWPri i32:$Cond, GPR32:$Rn, uimm6_32b:$Imm,
-                am_brcmpcond:$Target)>;
-  def : Pat<(AArch64CB i32:$Cond, GPR64:$Rn, CmpBranchUImm6Operand_64b:$Imm,
-                bb:$Target),
-            (CBXPri i32:$Cond, GPR64:$Rn, uimm6_64b:$Imm,
-                am_brcmpcond:$Target)>;
-  def : Pat<(AArch64CB i32:$Cond, GPR32:$Rn, GPR32:$Rt, bb:$Target),
-            (CBWPrr ccode:$Cond, GPR32:$Rn, GPR32:$Rt, am_brcmpcond:$Target)>;
-  def : Pat<(AArch64CB i32:$Cond, GPR64:$Rn, GPR64:$Rt, bb:$Target),
-            (CBXPrr ccode:$Cond, GPR64:$Rn, GPR64:$Rt, am_brcmpcond:$Target)>;
+ def CBBPrr : CmpBranchExtRegisterPseudo;
+ def CBHPrr : CmpBranchExtRegisterPseudo;
+ def CBWPrr : CmpBranchRegisterPseudo<GPR32>;
+ def CBXPrr : CmpBranchRegisterPseudo<GPR64>;
+ def CBWPri : CmpBranchImmediatePseudo<GPR32, uimm6_32b>;
+ def CBXPri : CmpBranchImmediatePseudo<GPR64, uimm6_64b>;
+
+ def : Pat<(AArch64CB i32:$Cond, GPR32:$Rn, CmpBranchUImm6Operand_32b:$Imm,
+               bb:$Target),
+           (CBWPri i32:$Cond, GPR32:$Rn, uimm6_32b:$Imm, am_brcmpcond:$Target)>;
+ def : Pat<(AArch64CB i32:$Cond, GPR64:$Rn, CmpBranchUImm6Operand_64b:$Imm,
+               bb:$Target),
+           (CBXPri i32:$Cond, GPR64:$Rn, uimm6_64b:$Imm, am_brcmpcond:$Target)>;
+ def : Pat<(AArch64CB i32:$Cond, GPR32:$Rn, GPR32:$Rt, bb:$Target),
+           (CBWPrr ccode:$Cond, GPR32:$Rn, GPR32:$Rt, am_brcmpcond:$Target)>;
+ def : Pat<(AArch64CB i32:$Cond, GPR64:$Rn, GPR64:$Rt, bb:$Target),
+           (CBXPrr ccode:$Cond, GPR64:$Rn, GPR64:$Rt, am_brcmpcond:$Target)>;
+
+ def : Pat<(AArch64CB i32:$Cond,
+               (CmpBranchBExtOperand GPR32:$Rn, simm8_32b:$ExtTypeRn),
+               (CmpBranchBExtOperand GPR32:$Rt, simm8_32b:$ExtTypeRt),
+               bb:$Target),
+           (CBBPrr ccode:$Cond, GPR32:$Rn, GPR32:$Rt, bb:$Target,
+               simm8_32b:$ExtTypeRn, simm8_32b:$ExtTypeRt)>;
+
+ def : Pat<(AArch64CB i32:$Cond,
+               (CmpBranchHExtOperand GPR32:$Rn, simm8_32b:$ExtTypeRn),
+               (CmpBranchHExtOperand GPR32:$Rt, simm8_32b:$ExtTypeRt),
+               bb:$Target),
+           (CBHPrr ccode:$Cond, GPR32:$Rn, GPR32:$Rt, bb:$Target,
+               simm8_32b:$ExtTypeRn, simm8_32b:$ExtTypeRt)>;
 } // HasCMPBR
 
 
diff --git a/llvm/test/CodeGen/AArch64/cmpbr-early-ifcvt.mir b/llvm/test/CodeGen/AArch64/cmpbr-early-ifcvt.mir
index c3377164f357e..fc6a2a55d4bfd 100644
--- a/llvm/test/CodeGen/AArch64/cmpbr-early-ifcvt.mir
+++ b/llvm/test/CodeGen/AArch64/cmpbr-early-ifcvt.mir
@@ -1,6 +1,5 @@
 # NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
-# RUN: llc -mtriple=arm64-apple-ios -mattr +cmpbr -run-pass=early-ifcvt -simplify-mir -o - %s | FileCheck %s
-# CHECK: cb_diamond
+# RUN: llc -mtriple=arm64-apple-ios -mattr +cmpbr -run-pass=early-ifcvt -verify-machineinstrs -simplify-mir -o - %s | FileCheck %s
 ---
 name:            cb_diamond
 alignment:       4
@@ -114,3 +113,435 @@ body:             |
     $x0 = COPY %4
     RET_ReallyLR implicit $x0
 ...
+---
+name:            cbb_diamond_no_ext
+alignment:       4
+tracksRegLiveness: true
+noPhis:          false
+isSSA:           true
+noVRegs:         false
+hasFakeUses:     false
+registers:
+  - { id: 0, class: gpr32 }
...
[truncated]

@efriedma-quic
Copy link
Collaborator

Since if-conversion needs to be able to undo conditional branches, we remember possibly folded sign- and zero-extensions as additional arguments of the CBB and CBH pseudos during codegen.

Can you not just use condition code for this? Signed conditions need to be sign-extended, unsigned conditions need to be zero-extended.

@dtellenbach
Copy link
Member Author

Since if-conversion needs to be able to undo conditional branches, we remember possibly folded sign- and zero-extensions as additional arguments of the CBB and CBH pseudos during codegen.

Can you not just use condition code for this? Signed conditions need to be sign-extended, unsigned conditions need to be zero-extended.

How about assert[s,z]ext?

@efriedma-quic
Copy link
Collaborator

You want to add a pseudo-instruction which is CBB, except it requires the inputs to be zero/sign-extended? We can do that, I guess.

@dtellenbach
Copy link
Member Author

You want to add a pseudo-instruction which is CBB, except it requires the inputs to be zero/sign-extended? We can do that, I guess.

Not entirely sure what you are asking. CBB operates on the low byte only, if you currently have codegen

sxtb w8, w0
cmp w8, w1, sxtb
b.gt LBB0_2

the CBB codegen can just be

cbbgt w0, w1, LBB0_2

In insertSelect you might need to undo that folding and emit the original sequence again. However, if the original input was assertzext and no actual extension, we also don't have to emit one in insertSelect.

Are you saying you would prefer to have separate pseudos for these cases? I think you would have to have a couple more, like for only one of the operands needing extension while the other one doesn't and so on.

@efriedma-quic
Copy link
Collaborator

What I really don't want is a situation where the semantics are inconsistent; one bit of code tries to optimize based on the instruction only accessing the low bits, then another bit of code optimizes based on assertions that aren't valid anymore.

Along those lines, the name of the pseudo-instruction should call out that it's asserting zero/sign-extension. I guess I don't care whether we represent the exact form of extension using an operand, or a separate opcode.

@dtellenbach
Copy link
Member Author

What I really don't want is a situation where the semantics are inconsistent; one bit of code tries to optimize based on the instruction only accessing the low bits, then another bit of code optimizes based on assertions that aren't valid anymore.

Agreed, that would be a bad spot to be in.

Maybe I am still misunderstanding you or did a bad job explaining what's going on but the new CB[B,H] pseudos are not consuming the assertions and also do not perform the extension; they rather just ignore them. If other users of the registers that need extension or that have assert[z,s]ext exist, that will stick. Here is an example:

int bar(unsigned char);

unsigned char foo(unsigned char x, unsigned char y) {
  if (x > y)
    __builtin_trap();
  return bar(x + y);
}

On targets that require the caller to zero-/sing-extend (e.g. Darwin), we codegen zeroext:

define zeroext i8 @foo(i8 noundef zeroext %x, i8 noundef zeroext %y) local_unnamed_addr #0 {
entry:
  %cmp = icmp ugt i8 %x, %y
  br i1 %cmp, label %if.then, label %if.end

if.then:                                          ; preds = %entry
  tail call void @llvm.trap()
  unreachable

if.end:                                           ; preds = %entry
  %add = add i8 %y, %x
  %call = tail call i32 @bar(i8 noundef zeroext %add) #3
  %conv6 = trunc i32 %call to i8
  ret i8 %conv6
}

With and without cmpbr, the inputs to %cmp don't need zero-extension (because of assertzext during ISel) but %add needs zero-extension because the ABI requires the caller to be extended. Now, crucially the zero-extension for %add sticks even with cmpbr and the final codegen is

_foo:                                   ; @foo
; %bb.0:                                ; %entry
	cbbhi	w0, w1, LBB0_2
; %bb.1:                                ; %if.end
	stp	x29, x30, [sp, #-16]!           ; 16-byte Folded Spill
	mov	x29, sp
	add	w8, w1, w0
	and	w0, w8, #0xff
	bl	_bar
	and	w0, w0, #0xff
	ldp	x29, x30, [sp], #16             ; 16-byte Folded Reload
	ret
LBB0_2:                                 ; %if.then
	brk	#0x1

For targets that require the callee to zero- or sign-extend (I think that's all that are using aapcs64), the inputs to the comparison need zero-extension:

define i8 @foo(i8 noundef %x, i8 noundef %y)  {
entry:
  %cmp = icmp ugt i8 %x, %y
  br i1 %cmp, label %if.then, label %if.end

if.then:                                          ; preds = %entry
  tail call void @llvm.trap()
  unreachable

if.end:                                           ; preds = %entry
  %add = add i8 %y, %x
  %call = tail call i32 @bar(i8 noundef %add) #3
  %conv6 = trunc i32 %call to i8
  ret i8 %conv6
}

Codegen without cmpbr:

foo:                                    // @foo
// %bb.0:                               // %entry
	and	w8, w0, #0xff
	cmp	w8, w1, uxtb
	b.hi	.LBB0_2
// %bb.1:                               // %if.end
	stp	x29, x30, [sp, #-16]!           // 16-byte Folded Spill
	add	w0, w1, w0
	mov	x29, sp
	bl	bar
	ldp	x29, x30, [sp], #16             // 16-byte Folded Reload
	ret
.LBB0_2:                                // %if.then
	brk	#0x1

Codegen with cmpbr:

foo:                                    // @foo
// %bb.0:                               // %entry
	cbbhi	w0, w1, .LBB0_2
// %bb.1:                               // %if.end
	stp	x29, x30, [sp, #-16]!           // 16-byte Folded Spill
	add	w0, w1, w0
	mov	x29, sp
	bl	bar
	ldp	x29, x30, [sp], #16             // 16-byte Folded Reload
	ret
.LBB0_2:                                // %if.then
	brk	#0x1

Maybe I'm missing a case that could go wrong?

@efriedma-quic
Copy link
Collaborator

The case where it goes wrong is something like:

  • We MachineCombine away a "no-op" extension instruction
  • We do IfConversion to lower a CBB, and the expansion depends on the extension instruction we eliminated.

Obviously, we just need to make sure MachineCombine doesn't do that; it either needs to strip the extension assertion, or skip the optimization. But I want the opcode name to indicate we have an assertion stapled to the underlying CBB, so someone implementing that MachineCombine knows to look for it.

@aemerson aemerson requested a review from Copilot October 26, 2025 04:50
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements codegen support for ARMv9.6-a's CBB (compare-and-branch byte) and CBH (compare-and-branch halfword) instructions, which allow folding sign- and zero-extensions into compare-and-branch operations for improved code efficiency.

Key changes:

  • Added CBBPrr and CBHPrr pseudo-instructions to support byte and halfword compare-and-branch operations with extension folding
  • Implemented pattern matching to recognize and fold sign/zero extensions into CB instructions
  • Extended if-conversion support to properly reverse and materialize these extended compare-and-branch operations

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
llvm/test/CodeGen/AArch64/cmpbr-zext-sext.ll New test file validating CBB/CBH code generation with sign/zero-extended operands
llvm/test/CodeGen/AArch64/cmpbr-reg-reg.ll Extended existing tests with i8 and i16 compare-and-branch cases
llvm/test/CodeGen/AArch64/cmpbr-early-ifcvt.mir Added if-conversion tests for CBB/CBH with various extension scenarios
llvm/lib/Target/AArch64/AArch64InstrInfo.td Added CBBPrr/CBHPrr pseudo-instruction definitions and patterns
llvm/lib/Target/AArch64/AArch64InstrInfo.h Updated conditional branch opcode check to include new pseudos
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp Implemented branch parsing, condition reversal, and select insertion for extended CB instructions
llvm/lib/Target/AArch64/AArch64InstrFormats.td Added complex patterns for byte/halfword extension operand selection
llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp Implemented SelectCmpBranchExtOperand pattern matching for extensions
llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp Extended CB pseudo expansion to handle byte/halfword widths

define void @cbh_assertsext_slt(i16 signext %a, i16 signext %b) {
; CHECK-CMPBR-LABEL: cbh_assertsext_slt:
; CHECK-CMPBR: ; %bb.0:
; CHECK-CMPBR-NEXT: cbbgt w1, w0, LBB2_2
Copy link

Copilot AI Oct 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CHECK comment expects 'cbbgt' but the test function name and description indicate this should be testing 'slt' (signed less than) with 'cbh' (halfword). The condition is inverted (testing 'slt' by swapping operands to 'sgt'), and the test is for i16 (halfword), but the instruction generated is 'cbbgt' (byte greater than) instead of 'cbhgt' (halfword greater than).

Suggested change
; CHECK-CMPBR-NEXT: cbbgt w1, w0, LBB2_2
; CHECK-CMPBR-NEXT: cbhgt w1, w0, LBB2_2

Copilot uses AI. Check for mistakes.
define void @cbh_assertzext_ule(i16 zeroext %a, i16 zeroext %b) {
; CHECK-CMPBR-LABEL: cbh_assertzext_ule:
; CHECK-CMPBR: ; %bb.0:
; CHECK-CMPBR-NEXT: cbbhs w1, w0, LBB5_2
Copy link

Copilot AI Oct 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CHECK comment expects 'cbbhs' but the test function name indicates this should be testing 'ule' (unsigned less than or equal) on i16 (halfword). The instruction generated is 'cbbhs' (byte higher or same) instead of 'cbhhs' (halfword higher or same).

Suggested change
; CHECK-CMPBR-NEXT: cbbhs w1, w0, LBB5_2
; CHECK-CMPBR-NEXT: cbhhs w1, w0, LBB5_2

Copilot uses AI. Check for mistakes.
AArch64_AM::getExtendType(Cond[5].getImm());
switch (ExtendType) {
default:
llvm_unreachable("Unkown shift-extend for CB instruction");
Copy link

Copilot AI Oct 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'Unkown' to 'Unknown'.

Suggested change
llvm_unreachable("Unkown shift-extend for CB instruction");
llvm_unreachable("Unknown shift-extend for CB instruction");

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants