[RISCV] Move strength reduction of mul X, 3/5/9*2^N to combine #89966

preames · 2024-04-24T18:12:23Z

This moves our last major category tablegen driven multiply strength reduction into the post legalize combine framework. The one slightly tricky bit is making sure that we use a leading shl if we can form a slli.uw, and trailing shl otherwise. Having the trailing shl is critical for shNadd matching, and folding any following sext.w.

As can be seen in the TD deltas, this allows us to kill off both the actual multiply patterns and the explicit add (mul X, C) Y patterns. The later are now handled by the generic shNadd matching code, with the exception of the THead only C=200 case because we don't (yet) have a multiply expansion with two shNadd + a shift.

This moves our last major category tablegen driven multiply strength reduction into the post legalize combine framework. The one slightly tricky bit is making sure that we use a leading shl if we can form a slli.uw, and trailing shl otherwise. Having the trailing shl is critical for shNadd matching, and folding any following sext.w. As can be seen in the TD deltas, this allows us to kill off both the actual multiply patterns and the explicit add (mul X, C) Y patterns. The later are now handled by the generic shNadd matching code, with the exception of the THead only C=200 case because we don't (yet) have a multiply expansion with two shNadd + a shift.

llvmbot · 2024-04-24T18:12:55Z

@llvm/pr-subscribers-backend-risc-v

Author: Philip Reames (preames)

Changes

This moves our last major category tablegen driven multiply strength reduction into the post legalize combine framework. The one slightly tricky bit is making sure that we use a leading shl if we can form a slli.uw, and trailing shl otherwise. Having the trailing shl is critical for shNadd matching, and folding any following sext.w.

As can be seen in the TD deltas, this allows us to kill off both the actual multiply patterns and the explicit add (mul X, C) Y patterns. The later are now handled by the generic shNadd matching code, with the exception of the THead only C=200 case because we don't (yet) have a multiply expansion with two shNadd + a shift.

Full diff: https://github.com/llvm/llvm-project/pull/89966.diff

6 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+23-4)
(modified) llvm/lib/Target/RISCV/RISCVInstrInfoXTHead.td (-29)
(modified) llvm/lib/Target/RISCV/RISCVInstrInfoZb.td (-74)
(modified) llvm/test/CodeGen/RISCV/addimm-mulimm.ll (+5-4)
(modified) llvm/test/CodeGen/RISCV/rv64-legal-i32/rv64zba.ll (+3-6)
(modified) llvm/test/CodeGen/RISCV/rv64zba.ll (+4-7)

diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 9c66f09a0cbc85..6710785e8d4da8 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -13420,10 +13420,29 @@ static SDValue expandMul(SDNode *N, SelectionDAG &DAG,
     if (MulAmt % Divisor != 0)
       continue;
     uint64_t MulAmt2 = MulAmt / Divisor;
-    // 3/5/9 * 2^N -> shXadd (sll X, C), (sll X, C)
-    // Matched in tablegen, avoid perturbing patterns.
-    if (isPowerOf2_64(MulAmt2))
-      return SDValue();
+    // 3/5/9 * 2^N ->  shl (shXadd X, X), N
+    if (isPowerOf2_64(MulAmt2)) {
+      SDLoc DL(N);
+      SDValue X = N->getOperand(0);
+      // Put the shift first if we can fold a zext into the
+      // shift forming a sll.uw.
+      if (X.getOpcode() == ISD::AND && isa<ConstantSDNode>(X.getOperand(1)) &&
+          X.getConstantOperandVal(1) == UINT64_C(0xffffffff)) {
+        SDValue Shl = DAG.getNode(ISD::SHL, DL, VT, X,
+                                  DAG.getConstant(Log2_64(MulAmt2), DL, VT));
+        Shl = DAG.getFreeze(Shl);
+        return DAG.getNode(RISCVISD::SHL_ADD, DL, VT, Shl,
+                           DAG.getConstant(Log2_64(Divisor - 1), DL, VT), Shl);
+      }
+      // Otherwise, put rhe shl second so that it can fold with following
+      // instructions (e.g. sext or add).
+      X = DAG.getFreeze(X);
+      SDValue Mul359 =
+          DAG.getNode(RISCVISD::SHL_ADD, DL, VT, X,
+                      DAG.getConstant(Log2_64(Divisor - 1), DL, VT), X);
+      return DAG.getNode(ISD::SHL, DL, VT, Mul359,
+                         DAG.getConstant(Log2_64(MulAmt2), DL, VT));
+    }
 
     // 3/5/9 * 3/5/9 -> shXadd (shYadd X, X), (shYadd X, X)
     if (MulAmt2 == 3 || MulAmt2 == 5 || MulAmt2 == 9) {
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoXTHead.td b/llvm/lib/Target/RISCV/RISCVInstrInfoXTHead.td
index b398c5e7fec2ec..bc14f165d9622d 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfoXTHead.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfoXTHead.td
@@ -549,40 +549,11 @@ def : Pat<(add_non_imm12 sh2add_op:$rs1, (XLenVT GPR:$rs2)),
 def : Pat<(add_non_imm12 sh3add_op:$rs1, (XLenVT GPR:$rs2)),
           (TH_ADDSL GPR:$rs2, sh3add_op:$rs1, 3)>;
 
-def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 6)), GPR:$rs2),
-          (TH_ADDSL GPR:$rs2, (XLenVT (TH_ADDSL GPR:$rs1, GPR:$rs1, 1)), 1)>;
-def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 10)), GPR:$rs2),
-          (TH_ADDSL GPR:$rs2, (XLenVT (TH_ADDSL GPR:$rs1, GPR:$rs1, 2)), 1)>;
-def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 18)), GPR:$rs2),
-          (TH_ADDSL GPR:$rs2, (XLenVT (TH_ADDSL GPR:$rs1, GPR:$rs1, 3)), 1)>;
-def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 12)), GPR:$rs2),
-          (TH_ADDSL GPR:$rs2, (XLenVT (TH_ADDSL GPR:$rs1, GPR:$rs1, 1)), 2)>;
-def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 20)), GPR:$rs2),
-          (TH_ADDSL GPR:$rs2, (XLenVT (TH_ADDSL GPR:$rs1, GPR:$rs1, 2)), 2)>;
-def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 36)), GPR:$rs2),
-          (TH_ADDSL GPR:$rs2, (XLenVT (TH_ADDSL GPR:$rs1, GPR:$rs1, 3)), 2)>;
-def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 24)), GPR:$rs2),
-          (TH_ADDSL GPR:$rs2, (XLenVT (TH_ADDSL GPR:$rs1, GPR:$rs1, 1)), 3)>;
-def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 40)), GPR:$rs2),
-          (TH_ADDSL GPR:$rs2, (XLenVT (TH_ADDSL GPR:$rs1, GPR:$rs1, 2)), 3)>;
-def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 72)), GPR:$rs2),
-          (TH_ADDSL GPR:$rs2, (XLenVT (TH_ADDSL GPR:$rs1, GPR:$rs1, 3)), 3)>;
-
 def : Pat<(add (XLenVT GPR:$r), CSImm12MulBy4:$i),
           (TH_ADDSL GPR:$r, (XLenVT (ADDI (XLenVT X0), (SimmShiftRightBy2XForm CSImm12MulBy4:$i))), 2)>;
 def : Pat<(add (XLenVT GPR:$r), CSImm12MulBy8:$i),
           (TH_ADDSL GPR:$r, (XLenVT (ADDI (XLenVT X0), (SimmShiftRightBy3XForm CSImm12MulBy8:$i))), 3)>;
 
-def : Pat<(mul (XLenVT GPR:$r), C3LeftShift:$i),
-          (SLLI (XLenVT (TH_ADDSL GPR:$r, GPR:$r, 1)),
-                (TrailingZeros C3LeftShift:$i))>;
-def : Pat<(mul (XLenVT GPR:$r), C5LeftShift:$i),
-          (SLLI (XLenVT (TH_ADDSL GPR:$r, GPR:$r, 2)),
-                (TrailingZeros C5LeftShift:$i))>;
-def : Pat<(mul (XLenVT GPR:$r), C9LeftShift:$i),
-          (SLLI (XLenVT (TH_ADDSL GPR:$r, GPR:$r, 3)),
-                (TrailingZeros C9LeftShift:$i))>;
-
 def : Pat<(mul_const_oneuse GPR:$r, (XLenVT 200)),
           (SLLI (XLenVT (TH_ADDSL (XLenVT (TH_ADDSL GPR:$r, GPR:$r, 2)),
                                   (XLenVT (TH_ADDSL GPR:$r, GPR:$r, 2)), 2)), 3)>;
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoZb.td b/llvm/lib/Target/RISCV/RISCVInstrInfoZb.td
index ffe2b7e2712084..8a0bbf6abd334d 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfoZb.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfoZb.td
@@ -173,42 +173,6 @@ def BCLRIANDIMaskLow : SDNodeXForm<imm, [{
                                    SDLoc(N), N->getValueType(0));
 }]>;
 
-def C3LeftShift : PatLeaf<(imm), [{
-  uint64_t C = N->getZExtValue();
-  return C > 3 && (C >> llvm::countr_zero(C)) == 3;
-}]>;
-
-def C5LeftShift : PatLeaf<(imm), [{
-  uint64_t C = N->getZExtValue();
-  return C > 5 && (C >> llvm::countr_zero(C)) == 5;
-}]>;
-
-def C9LeftShift : PatLeaf<(imm), [{
-  uint64_t C = N->getZExtValue();
-  return C > 9 && (C >> llvm::countr_zero(C)) == 9;
-}]>;
-
-// Constant of the form (3 << C) where C is less than 32.
-def C3LeftShiftUW : PatLeaf<(imm), [{
-  uint64_t C = N->getZExtValue();
-  unsigned Shift = llvm::countr_zero(C);
-  return 1 <= Shift && Shift < 32 && (C >> Shift) == 3;
-}]>;
-
-// Constant of the form (5 << C) where C is less than 32.
-def C5LeftShiftUW : PatLeaf<(imm), [{
-  uint64_t C = N->getZExtValue();
-  unsigned Shift = llvm::countr_zero(C);
-  return 1 <= Shift && Shift < 32 && (C >> Shift) == 5;
-}]>;
-
-// Constant of the form (9 << C) where C is less than 32.
-def C9LeftShiftUW : PatLeaf<(imm), [{
-  uint64_t C = N->getZExtValue();
-  unsigned Shift = llvm::countr_zero(C);
-  return 1 <= Shift && Shift < 32 && (C >> Shift) == 9;
-}]>;
-
 def CSImm12MulBy4 : PatLeaf<(imm), [{
   if (!N->hasOneUse())
     return false;
@@ -693,25 +657,6 @@ foreach i = {1,2,3} in {
             (shxadd pat:$rs1, GPR:$rs2)>;
 }
 
-def : Pat<(add_like (mul_oneuse GPR:$rs1, (XLenVT 6)), GPR:$rs2),
-          (SH1ADD (XLenVT (SH1ADD GPR:$rs1, GPR:$rs1)), GPR:$rs2)>;
-def : Pat<(add_like (mul_oneuse GPR:$rs1, (XLenVT 10)), GPR:$rs2),
-          (SH1ADD (XLenVT (SH2ADD GPR:$rs1, GPR:$rs1)), GPR:$rs2)>;
-def : Pat<(add_like (mul_oneuse GPR:$rs1, (XLenVT 18)), GPR:$rs2),
-          (SH1ADD (XLenVT (SH3ADD GPR:$rs1, GPR:$rs1)), GPR:$rs2)>;
-def : Pat<(add_like (mul_oneuse GPR:$rs1, (XLenVT 12)), GPR:$rs2),
-          (SH2ADD (XLenVT (SH1ADD GPR:$rs1, GPR:$rs1)), GPR:$rs2)>;
-def : Pat<(add_like (mul_oneuse GPR:$rs1, (XLenVT 20)), GPR:$rs2),
-          (SH2ADD (XLenVT (SH2ADD GPR:$rs1, GPR:$rs1)), GPR:$rs2)>;
-def : Pat<(add_like (mul_oneuse GPR:$rs1, (XLenVT 36)), GPR:$rs2),
-          (SH2ADD (XLenVT (SH3ADD GPR:$rs1, GPR:$rs1)), GPR:$rs2)>;
-def : Pat<(add_like (mul_oneuse GPR:$rs1, (XLenVT 24)), GPR:$rs2),
-          (SH3ADD (XLenVT (SH1ADD GPR:$rs1, GPR:$rs1)), GPR:$rs2)>;
-def : Pat<(add_like (mul_oneuse GPR:$rs1, (XLenVT 40)), GPR:$rs2),
-          (SH3ADD (XLenVT (SH2ADD GPR:$rs1, GPR:$rs1)), GPR:$rs2)>;
-def : Pat<(add_like (mul_oneuse GPR:$rs1, (XLenVT 72)), GPR:$rs2),
-          (SH3ADD (XLenVT (SH3ADD GPR:$rs1, GPR:$rs1)), GPR:$rs2)>;
-
 def : Pat<(add_like (XLenVT GPR:$r), CSImm12MulBy4:$i),
           (SH2ADD (XLenVT (ADDI (XLenVT X0), (SimmShiftRightBy2XForm CSImm12MulBy4:$i))),
                   GPR:$r)>;
@@ -719,16 +664,6 @@ def : Pat<(add_like (XLenVT GPR:$r), CSImm12MulBy8:$i),
           (SH3ADD (XLenVT (ADDI (XLenVT X0), (SimmShiftRightBy3XForm CSImm12MulBy8:$i))),
                   GPR:$r)>;
 
-def : Pat<(mul (XLenVT GPR:$r), C3LeftShift:$i),
-          (SLLI (XLenVT (SH1ADD GPR:$r, GPR:$r)),
-                (TrailingZeros C3LeftShift:$i))>;
-def : Pat<(mul (XLenVT GPR:$r), C5LeftShift:$i),
-          (SLLI (XLenVT (SH2ADD GPR:$r, GPR:$r)),
-                (TrailingZeros C5LeftShift:$i))>;
-def : Pat<(mul (XLenVT GPR:$r), C9LeftShift:$i),
-          (SLLI (XLenVT (SH3ADD GPR:$r, GPR:$r)),
-                (TrailingZeros C9LeftShift:$i))>;
-
 } // Predicates = [HasStdExtZba]
 
 let Predicates = [HasStdExtZba, IsRV64] in {
@@ -780,15 +715,6 @@ def : Pat<(i64 (add_like_non_imm12 (and GPR:$rs1, 0x3FFFFFFFC), (XLenVT GPR:$rs2
 def : Pat<(i64 (add_like_non_imm12 (and GPR:$rs1, 0x7FFFFFFF8), (XLenVT GPR:$rs2))),
           (SH3ADD_UW (XLenVT (SRLI GPR:$rs1, 3)), GPR:$rs2)>;
 
-def : Pat<(i64 (mul (and_oneuse GPR:$r, 0xFFFFFFFF), C3LeftShiftUW:$i)),
-          (SH1ADD (XLenVT (SLLI_UW GPR:$r, (TrailingZeros C3LeftShiftUW:$i))),
-                  (XLenVT (SLLI_UW GPR:$r, (TrailingZeros C3LeftShiftUW:$i))))>;
-def : Pat<(i64 (mul (and_oneuse GPR:$r, 0xFFFFFFFF), C5LeftShiftUW:$i)),
-          (SH2ADD (XLenVT (SLLI_UW GPR:$r, (TrailingZeros C5LeftShiftUW:$i))),
-                  (XLenVT (SLLI_UW GPR:$r, (TrailingZeros C5LeftShiftUW:$i))))>;
-def : Pat<(i64 (mul (and_oneuse GPR:$r, 0xFFFFFFFF), C9LeftShiftUW:$i)),
-          (SH3ADD (XLenVT (SLLI_UW GPR:$r, (TrailingZeros C9LeftShiftUW:$i))),
-                  (XLenVT (SLLI_UW GPR:$r, (TrailingZeros C9LeftShiftUW:$i))))>;
 } // Predicates = [HasStdExtZba, IsRV64]
 
 let Predicates = [HasStdExtZbcOrZbkc] in {
diff --git a/llvm/test/CodeGen/RISCV/addimm-mulimm.ll b/llvm/test/CodeGen/RISCV/addimm-mulimm.ll
index 736c8e7d55c75b..f40444dbb206a5 100644
--- a/llvm/test/CodeGen/RISCV/addimm-mulimm.ll
+++ b/llvm/test/CodeGen/RISCV/addimm-mulimm.ll
@@ -555,8 +555,9 @@ define i64 @add_mul_combine_infinite_loop(i64 %x) {
 ; RV32IMB-NEXT:    sh3add a1, a1, a2
 ; RV32IMB-NEXT:    sh1add a0, a0, a0
 ; RV32IMB-NEXT:    slli a2, a0, 3
-; RV32IMB-NEXT:    addi a0, a2, 2047
-; RV32IMB-NEXT:    addi a0, a0, 1
+; RV32IMB-NEXT:    li a3, 1
+; RV32IMB-NEXT:    slli a3, a3, 11
+; RV32IMB-NEXT:    sh3add a0, a0, a3
 ; RV32IMB-NEXT:    sltu a2, a0, a2
 ; RV32IMB-NEXT:    add a1, a1, a2
 ; RV32IMB-NEXT:    ret
@@ -565,8 +566,8 @@ define i64 @add_mul_combine_infinite_loop(i64 %x) {
 ; RV64IMB:       # %bb.0:
 ; RV64IMB-NEXT:    addi a0, a0, 86
 ; RV64IMB-NEXT:    sh1add a0, a0, a0
-; RV64IMB-NEXT:    li a1, -16
-; RV64IMB-NEXT:    sh3add a0, a0, a1
+; RV64IMB-NEXT:    slli a0, a0, 3
+; RV64IMB-NEXT:    addi a0, a0, -16
 ; RV64IMB-NEXT:    ret
   %tmp0 = mul i64 %x, 24
   %tmp1 = add i64 %tmp0, 2048
diff --git a/llvm/test/CodeGen/RISCV/rv64-legal-i32/rv64zba.ll b/llvm/test/CodeGen/RISCV/rv64-legal-i32/rv64zba.ll
index 9f06a9dd124cef..87343a07647a43 100644
--- a/llvm/test/CodeGen/RISCV/rv64-legal-i32/rv64zba.ll
+++ b/llvm/test/CodeGen/RISCV/rv64-legal-i32/rv64zba.ll
@@ -647,9 +647,8 @@ define i64 @zext_mul12884901888(i32 signext %a) {
 ;
 ; RV64ZBA-LABEL: zext_mul12884901888:
 ; RV64ZBA:       # %bb.0:
-; RV64ZBA-NEXT:    andi a0, a0, -1
-; RV64ZBA-NEXT:    sh1add a0, a0, a0
 ; RV64ZBA-NEXT:    slli a0, a0, 32
+; RV64ZBA-NEXT:    sh1add a0, a0, a0
 ; RV64ZBA-NEXT:    ret
   %b = zext i32 %a to i64
   %c = mul i64 %b, 12884901888
@@ -670,9 +669,8 @@ define i64 @zext_mul21474836480(i32 signext %a) {
 ;
 ; RV64ZBA-LABEL: zext_mul21474836480:
 ; RV64ZBA:       # %bb.0:
-; RV64ZBA-NEXT:    andi a0, a0, -1
-; RV64ZBA-NEXT:    sh2add a0, a0, a0
 ; RV64ZBA-NEXT:    slli a0, a0, 32
+; RV64ZBA-NEXT:    sh2add a0, a0, a0
 ; RV64ZBA-NEXT:    ret
   %b = zext i32 %a to i64
   %c = mul i64 %b, 21474836480
@@ -693,9 +691,8 @@ define i64 @zext_mul38654705664(i32 signext %a) {
 ;
 ; RV64ZBA-LABEL: zext_mul38654705664:
 ; RV64ZBA:       # %bb.0:
-; RV64ZBA-NEXT:    andi a0, a0, -1
-; RV64ZBA-NEXT:    sh3add a0, a0, a0
 ; RV64ZBA-NEXT:    slli a0, a0, 32
+; RV64ZBA-NEXT:    sh3add a0, a0, a0
 ; RV64ZBA-NEXT:    ret
   %b = zext i32 %a to i64
   %c = mul i64 %b, 38654705664
diff --git a/llvm/test/CodeGen/RISCV/rv64zba.ll b/llvm/test/CodeGen/RISCV/rv64zba.ll
index 4eb493d642e853..f4c7ff3a4a1bc6 100644
--- a/llvm/test/CodeGen/RISCV/rv64zba.ll
+++ b/llvm/test/CodeGen/RISCV/rv64zba.ll
@@ -843,9 +843,8 @@ define i64 @zext_mul12884901888(i32 signext %a) {
 ;
 ; RV64ZBA-LABEL: zext_mul12884901888:
 ; RV64ZBA:       # %bb.0:
-; RV64ZBA-NEXT:    andi a0, a0, -1
-; RV64ZBA-NEXT:    sh1add a0, a0, a0
 ; RV64ZBA-NEXT:    slli a0, a0, 32
+; RV64ZBA-NEXT:    sh1add a0, a0, a0
 ; RV64ZBA-NEXT:    ret
   %b = zext i32 %a to i64
   %c = mul i64 %b, 12884901888
@@ -866,9 +865,8 @@ define i64 @zext_mul21474836480(i32 signext %a) {
 ;
 ; RV64ZBA-LABEL: zext_mul21474836480:
 ; RV64ZBA:       # %bb.0:
-; RV64ZBA-NEXT:    andi a0, a0, -1
-; RV64ZBA-NEXT:    sh2add a0, a0, a0
 ; RV64ZBA-NEXT:    slli a0, a0, 32
+; RV64ZBA-NEXT:    sh2add a0, a0, a0
 ; RV64ZBA-NEXT:    ret
   %b = zext i32 %a to i64
   %c = mul i64 %b, 21474836480
@@ -889,9 +887,8 @@ define i64 @zext_mul38654705664(i32 signext %a) {
 ;
 ; RV64ZBA-LABEL: zext_mul38654705664:
 ; RV64ZBA:       # %bb.0:
-; RV64ZBA-NEXT:    andi a0, a0, -1
-; RV64ZBA-NEXT:    sh3add a0, a0, a0
 ; RV64ZBA-NEXT:    slli a0, a0, 32
+; RV64ZBA-NEXT:    sh3add a0, a0, a0
 ; RV64ZBA-NEXT:    ret
   %b = zext i32 %a to i64
   %c = mul i64 %b, 38654705664
@@ -2616,7 +2613,7 @@ define i64 @regression(i32 signext %x, i32 signext %y) {
 ;
 ; RV64ZBA-LABEL: regression:
 ; RV64ZBA:       # %bb.0:
-; RV64ZBA-NEXT:    subw a0, a0, a1
+; RV64ZBA-NEXT:    sub a0, a0, a1
 ; RV64ZBA-NEXT:    slli.uw a0, a0, 3
 ; RV64ZBA-NEXT:    sh1add a0, a0, a0
 ; RV64ZBA-NEXT:    ret

preames · 2024-04-24T18:14:24Z

llvm/test/CodeGen/RISCV/addimm-mulimm.ll

-; RV64IMB-NEXT:    li a1, -16
-; RV64IMB-NEXT:    sh3add a0, a0, a1
+; RV64IMB-NEXT:    slli a0, a0, 3
+; RV64IMB-NEXT:    addi a0, a0, -16


This case was previously picked up by the add_like (mul_one_use X, 24), Y pattern which didn't check whether Y is an immediate or not.

PR Link: llvm/llvm-project#89966

preames · 2024-04-24T18:22:55Z

llvm/test/CodeGen/RISCV/addimm-mulimm.ll

-; RV32IMB-NEXT:    addi a0, a0, 1
+; RV32IMB-NEXT:    li a3, 1
+; RV32IMB-NEXT:    slli a3, a3, 11
+; RV32IMB-NEXT:    sh3add a0, a0, a3


This case was previously not caught by the "add mul X, 24, Y" pattern because there's two multiplies by 24, and thus it failed the one use check. Instead, it went through generic "mul X, 3 << 2" expansion, and thus ended with the shift/add.

With the change, we hit the "(add_like_non_imm12 (shl GPR:$rs1, (XLenVT i)), GPR:$rs2)" pattern - which critically doesn't check if the immediate could be split across two addi. We should probably adjust this, but it seems a) minor, and b) very very separate. (And if we had zbb, this would be a bseti anyways.)

With the change, we hit the "(add_like_non_imm12 (shl GPR:$rs1, (XLenVT i)), GPR:$rs2)" pattern - which critically doesn't check if the immediate could be split across two addi. We should probably adjust this, but it seems a) minor, and b) very very separate. (And if we had zbb, this would be a bseti anyways.)

This explanation makes sense to me.

dtcxzyw · 2024-04-24T19:32:29Z

This patch breaks bext pattern:

; bin/llc -O3 -mtriple=riscv64 -mattr=+m,+zbs,+zba test.ll -o -
define ptr @test(ptr %0, i32 %1, i32 %2) {
entry:
  %3 = lshr i32 %1, %2
  %4 = and i32 %3, 1
  %5 = zext nneg i32 %4 to i64
  %6 = getelementptr inbounds [3 x float], ptr %0, i64 %5
  ret ptr %6
}

Before:

test:
        bext    a1, a1, a2
        sh1add  a1, a1, a1
        sh2add  a0, a1, a0
        ret

After:

test:
        srl     a1, a1, a2
        andi    a1, a1, 1
        sh1add  a1, a1, a1
        sh2add  a0, a1, a0
        ret

See dtcxzyw/llvm-codegen-benchmark#25 (comment).

dtcxzyw · 2024-04-24T19:41:22Z

This patch breaks bext pattern:

; bin/llc -O3 -mtriple=riscv64 -mattr=+m,+zbs,+zba test.ll -o -
define ptr @test(ptr %0, i32 %1, i32 %2) {
entry:
  %3 = lshr i32 %1, %2
  %4 = and i32 %3, 1
  %5 = zext nneg i32 %4 to i64
  %6 = getelementptr inbounds [3 x float], ptr %0, i64 %5
  ret ptr %6
}

Before:

test:
        bext    a1, a1, a2
        sh1add  a1, a1, a1
        sh2add  a0, a1, a0
        ret

After:

test:
        srl     a1, a1, a2
        andi    a1, a1, 1
        sh1add  a1, a1, a1
        sh2add  a0, a1, a0
        ret

See dtcxzyw/llvm-codegen-benchmark#25 (comment).

Optimized legalized selection DAG: %bb.0 'test:entry'
SelectionDAG has 20 nodes:
  t0: ch,glue = EntryToken
        t4: i64,ch = CopyFromReg t0, Register:i64 %1
          t6: i64,ch = CopyFromReg t0, Register:i64 %2
        t31: i64 = and t6, Constant:i64<4294967295>
      t21: i64 = srl t4, t31
    t36: i64 = freeze t21
  t29: i64 = and t36, Constant:i64<1>
      t2: i64,ch = CopyFromReg t0, Register:i64 %0
        t33: i64 = RISCVISD::SHL_ADD t29, Constant:i64<1>, t29
      t35: i64 = shl t33, Constant:i64<2>
    t16: i64 = add t2, t35
  t18: ch,glue = CopyToReg t0, Register:i64 $x10, t16
  t19: ch = RISCVISD::RET_GLUE t18, Register:i64 $x10, t18:1

DAGCombiner pushes freeze through the and node and breaks srl + and pair :(

Inspired by #89966, this patch handles the special case `binop_allwusers<and> GPR:$rs1, 0xffffffff -> copy $rs1` to avoid creating redundant `andi rd, rs1, -1` instructions.

preames · 2024-04-25T17:50:32Z

@dtcxzyw Thank you for finding and reporting this code gen difference. And also, yuck!

This is actually the same issue I hit in X86 when trying to use the generic SHL_ADD node, so I've already investigated it a bit. In short, it really doesn't look like we have any good answers here. There's existing discussion around this here: #84921 (comment)

My takeaway is that freeze in DAG is only half implemented, and there's really no clear good path forward.

I very reluctantly think it may be time to give up here. No other target is actually undef correct for these cases, so maybe we shouldn't bother to be either. I've posted a patch for that here: #90097 If by chance anyone has a better idea, I'm definitely open to it.

PR Link: llvm/llvm-project#89966

dtcxzyw · 2024-04-25T21:30:48Z

This patch breaks bext pattern:

; bin/llc -O3 -mtriple=riscv64 -mattr=+m,+zbs,+zba test.ll -o -
define ptr @test(ptr %0, i32 %1, i32 %2) {
entry:
  %3 = lshr i32 %1, %2
  %4 = and i32 %3, 1
  %5 = zext nneg i32 %4 to i64
  %6 = getelementptr inbounds [3 x float], ptr %0, i64 %5
  ret ptr %6
}

Before:

test:
        bext    a1, a1, a2
        sh1add  a1, a1, a1
        sh2add  a0, a1, a0
        ret

After:

test:
        srl     a1, a1, a2
        andi    a1, a1, 1
        sh1add  a1, a1, a1
        sh2add  a0, a1, a0
        ret

See dtcxzyw/llvm-codegen-benchmark#25 (comment).

Optimized legalized selection DAG: %bb.0 'test:entry'
SelectionDAG has 20 nodes:
  t0: ch,glue = EntryToken
        t4: i64,ch = CopyFromReg t0, Register:i64 %1
          t6: i64,ch = CopyFromReg t0, Register:i64 %2
        t31: i64 = and t6, Constant:i64<4294967295>
      t21: i64 = srl t4, t31
    t36: i64 = freeze t21
  t29: i64 = and t36, Constant:i64<1>
      t2: i64,ch = CopyFromReg t0, Register:i64 %0
        t33: i64 = RISCVISD::SHL_ADD t29, Constant:i64<1>, t29
      t35: i64 = shl t33, Constant:i64<2>
    t16: i64 = add t2, t35
  t18: ch,glue = CopyToReg t0, Register:i64 $x10, t16
  t19: ch = RISCVISD::RET_GLUE t18, Register:i64 $x10, t18:1

DAGCombiner pushes freeze through the and node and breaks srl + and pair :(

It has been fixed :)

Another regression (extracted from qemu):

; bin/llc -mtriple=riscv64 test.ll -mattr=+m,+zba -o -
define ptr @test(ptr %0, i64 %1) {
entry:
  %2 = lshr exact i64 %1, 2
  %3 = and i64 %2, 4294967295
  %4 = getelementptr inbounds i8, ptr %0, i64 600
  %5 = getelementptr [80 x i8], ptr %4, i64 %3
  ret ptr %5
}

Before:

test:
        srli    a1, a1, 2
        slli.uw a1, a1, 4
        sh2add  a1, a1, a1
        add     a0, a0, a1
        addi    a0, a0, 600
        ret

After:

test:
        slli    a1, a1, 2
        srli    a1, a1, 4
        slli.uw a1, a1, 4
        sh2add  a1, a1, a1
        add     a0, a0, a1
        addi    a0, a0, 600
        ret

We may fix this by handling ISD::SHL_ADD in SimplifyDemandedBits after #88791 lands:)

dtcxzyw

LGTM.

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

dtcxzyw · 2024-04-25T21:41:37Z

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

+      if (X.getOpcode() == ISD::AND && isa<ConstantSDNode>(X.getOperand(1)) &&
+          X.getConstantOperandVal(1) == UINT64_C(0xffffffff)) {


Use SDPatternMatch?

Suggested change

if (X.getOpcode() == ISD::AND && isa<ConstantSDNode>(X.getOperand(1)) &&

X.getConstantOperandVal(1) == UINT64_C(0xffffffff)) {

if (sd_match(X, m_And(m_Value(), m_SpecificInt(0xffffffff)))) {

I'm going to leave this as is, and then doing a single NFC to replace several usage examples.

Co-authored-by: Yingwei Zheng <dtcxzyw@qq.com>

preames · 2024-05-09T17:13:03Z

Another regression (extracted from qemu):

; bin/llc -mtriple=riscv64 test.ll -mattr=+m,+zba -o -
define ptr @test(ptr %0, i64 %1) {
entry:
  %2 = lshr exact i64 %1, 2
  %3 = and i64 %2, 4294967295
  %4 = getelementptr inbounds i8, ptr %0, i64 600
  %5 = getelementptr [80 x i8], ptr %4, i64 %3
  ret ptr %5
}

Before:

test:
        srli    a1, a1, 2
        slli.uw a1, a1, 4
        sh2add  a1, a1, a1
        add     a0, a0, a1
        addi    a0, a0, 600
        ret

After:

test:
        slli    a1, a1, 2
        srli    a1, a1, 4
        slli.uw a1, a1, 4
        sh2add  a1, a1, a1
        add     a0, a0, a1
        addi    a0, a0, 600
        ret

We may fix this by handling ISD::SHL_ADD in SimplifyDemandedBits after #88791 lands:)

See #91626 for this one.

preames requested review from dtcxzyw and topperc April 24, 2024 18:12

llvmbot added the backend:RISC-V label Apr 24, 2024

preames commented Apr 24, 2024

View reviewed changes

dtcxzyw added a commit to dtcxzyw/llvm-codegen-benchmark that referenced this pull request Apr 24, 2024

pre-commit: test PR89966

4085ec7

PR Link: llvm/llvm-project#89966

dtcxzyw mentioned this pull request Apr 24, 2024

pre-commit: test PR89966 dtcxzyw/llvm-codegen-benchmark#25

Open

preames commented Apr 24, 2024

View reviewed changes

dtcxzyw mentioned this pull request Apr 24, 2024

[RISCV][ISel] Eliminate andi rd, rs1, -1 instructions #89976

Merged

preames added 2 commits April 25, 2024 12:02

Merge branch 'main' into riscv-mul359-shl-via-combine

1ffbe74

Remove freeze

28d75ff

dtcxzyw added a commit to dtcxzyw/llvm-codegen-benchmark that referenced this pull request Apr 25, 2024

pre-commit: test PR89966

64f2a98

PR Link: llvm/llvm-project#89966

dtcxzyw approved these changes Apr 25, 2024

View reviewed changes

Update llvm/lib/Target/RISCV/RISCVISelLowering.cpp

a1bd91b

Co-authored-by: Yingwei Zheng <dtcxzyw@qq.com>

preames merged commit 4298fc5 into llvm:main May 8, 2024
3 of 4 checks passed

preames deleted the riscv-mul359-shl-via-combine branch May 8, 2024 17:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RISCV] Move strength reduction of mul X, 3/5/9*2^N to combine #89966

[RISCV] Move strength reduction of mul X, 3/5/9*2^N to combine #89966

preames commented Apr 24, 2024

llvmbot commented Apr 24, 2024

preames Apr 24, 2024

preames Apr 24, 2024

dtcxzyw Apr 24, 2024

dtcxzyw commented Apr 24, 2024

dtcxzyw commented Apr 24, 2024

preames commented Apr 25, 2024

dtcxzyw commented Apr 25, 2024

dtcxzyw left a comment

dtcxzyw Apr 25, 2024

preames May 8, 2024

preames commented May 9, 2024

		if (X.getOpcode() == ISD::AND && isa<ConstantSDNode>(X.getOperand(1)) &&
		X.getConstantOperandVal(1) == UINT64_C(0xffffffff)) {

	if (X.getOpcode() == ISD::AND && isa<ConstantSDNode>(X.getOperand(1)) &&
	X.getConstantOperandVal(1) == UINT64_C(0xffffffff)) {
	if (sd_match(X, m_And(m_Value(), m_SpecificInt(0xffffffff)))) {

[RISCV] Move strength reduction of mul X, 3/5/9*2^N to combine #89966

[RISCV] Move strength reduction of mul X, 3/5/9*2^N to combine #89966

Conversation

preames commented Apr 24, 2024

llvmbot commented Apr 24, 2024

preames Apr 24, 2024

Choose a reason for hiding this comment

preames Apr 24, 2024

Choose a reason for hiding this comment

dtcxzyw Apr 24, 2024

Choose a reason for hiding this comment

dtcxzyw commented Apr 24, 2024

dtcxzyw commented Apr 24, 2024

preames commented Apr 25, 2024

dtcxzyw commented Apr 25, 2024

dtcxzyw left a comment

Choose a reason for hiding this comment

dtcxzyw Apr 25, 2024

Choose a reason for hiding this comment

preames May 8, 2024

Choose a reason for hiding this comment

preames commented May 9, 2024