[InstCombine] Tighten use constraint in factorization transforms #102943

kalxr · 2024-08-12T17:52:52Z

The transform in tryFactorization() used to require that both operations have one use and thus would be removable after the transform. Commit 0cfc651 loosened factorization constraints such that only one of the operations must be removable after the transform. There are cases where this will increase the number of non-intermediate variable (variable that is not the result of a binop involved in the transform) uses with no discernable benefit (e.g. https://alive2.llvm.org/ce/z/8K4VuE). Likewise, factorizeMathWithShlOps() uses the same constraints and has the same issue (https://alive2.llvm.org/ce/z/EKK-Kp).

This patch fixes this by restricting the transform to fewer cases with new checks that allow desirable transforms such as those mentioned in 0cfc651. For tryFactorization() it is restricted to cases where both operations are removable, with new support for treating identity operations as removable. For factorizeMathWithShlOps() the transform is restricted to cases where either both operations are removable, or one is and the created add/sub will be foldable.

llvmbot · 2024-08-12T17:53:28Z

@llvm/pr-subscribers-llvm-transforms

Author: Kevin McAfee (kalxr)

Changes

The transform in tryFactorization() used to require that both operations have one use and thus would be removable after the transform. Commit 0cfc651 loosened factorization constraints such that only one of the operations must be removable after the transform. There are cases where this will increase the number of non-intermediate variable (variable that is not the result of a binop involved in the transform) uses with no discernable benefit (e.g. https://alive2.llvm.org/ce/z/8K4VuE). Likewise, factorizeMathWithShlOps() uses the same constraints and has the same issue (https://alive2.llvm.org/ce/z/EKK-Kp).

This patch fixes this by restricting the transform to fewer cases with new checks that allow desirable transforms such as those mentioned in 0cfc651. For tryFactorization() it is restricted to cases where both operations are removable, with new support for treating identity operations as removable. For factorizeMathWithShlOps() the transform is restricted to cases where either both operations are removable, or one is and the created add/sub will be foldable.

Patch is 21.74 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/102943.diff

7 Files Affected:

(modified) llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp (+6)
(modified) llvm/lib/Transforms/InstCombine/InstructionCombining.cpp (+10-4)
(modified) llvm/test/Transforms/InstCombine/and-or.ll (+8-8)
(modified) llvm/test/Transforms/InstCombine/ctpop.ll (+6-6)
(modified) llvm/test/Transforms/InstCombine/shl-factor.ll (+34-10)
(modified) llvm/test/Transforms/LoopVectorize/induction.ll (+31-31)
(modified) llvm/test/Transforms/LoopVectorize/invariant-store-vectorization.ll (+3-3)

diff --git a/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp b/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
index 3bd086230cbec5..de2cdf093e1a01 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
@@ -1424,6 +1424,12 @@ static Instruction *factorizeMathWithShlOps(BinaryOperator &I,
       !match(Op1, m_Shl(m_Value(Y), m_Specific(ShAmt))))
     return nullptr;
 
+  // This transform is only profitiable if both operations or one operation and
+  // the resulting add/sub can be eliminated/folded.
+  if (!(Op0->hasOneUse() && Op1->hasOneUse()) &&
+      !(isa<Constant>(X) && isa<Constant>(Y)))
+    return nullptr;
+
   // No-wrap propagates only when all ops have no-wrap.
   bool HasNSW = I.hasNoSignedWrap() && Op0->hasNoSignedWrap() &&
                 Op1->hasNoSignedWrap();
diff --git a/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
index 0fb8b639c97b95..1365b8ac8a3c78 100644
--- a/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
@@ -688,9 +688,12 @@ static Value *tryFactorization(BinaryOperator &I, const SimplifyQuery &SQ,
       // If "B op D" simplifies then it can be formed with no cost.
       V = simplifyBinOp(TopLevelOpcode, B, D, SQ.getWithInstruction(&I));
 
-      // If "B op D" doesn't simplify then only go on if one of the existing
+      // If "B op D" doesn't simplify then only go on if both of the existing
       // operations "A op' B" and "C op' D" will be zapped as no longer used.
-      if (!V && (LHS->hasOneUse() || RHS->hasOneUse()))
+      // Note that when an operation is equal to one of its operands, that
+      // operation is "zapped" by having never existed in the first place.
+      if (!V && (LHS->hasOneUse() || LHS == A || LHS == B) &&
+          (RHS->hasOneUse() || RHS == C || RHS == D))
         V = Builder.CreateBinOp(TopLevelOpcode, B, D, RHS->getName());
       if (V)
         RetVal = Builder.CreateBinOp(InnerOpcode, A, V);
@@ -708,9 +711,12 @@ static Value *tryFactorization(BinaryOperator &I, const SimplifyQuery &SQ,
       // If "A op C" simplifies then it can be formed with no cost.
       V = simplifyBinOp(TopLevelOpcode, A, C, SQ.getWithInstruction(&I));
 
-      // If "A op C" doesn't simplify then only go on if one of the existing
+      // If "A op C" doesn't simplify then only go on if both of the existing
       // operations "A op' B" and "C op' D" will be zapped as no longer used.
-      if (!V && (LHS->hasOneUse() || RHS->hasOneUse()))
+      // Note that when an operation is equal to one of its operands, that
+      // operation is "zapped" by having never existed in the first place.
+      if (!V && (LHS->hasOneUse() || LHS == A || LHS == B) &&
+          (RHS->hasOneUse() || RHS == C || RHS == D))
         V = Builder.CreateBinOp(TopLevelOpcode, A, C, LHS->getName());
       if (V)
         RetVal = Builder.CreateBinOp(InnerOpcode, V, B);
diff --git a/llvm/test/Transforms/InstCombine/and-or.ll b/llvm/test/Transforms/InstCombine/and-or.ll
index b4ef27607121d2..7e1c99c82afca9 100644
--- a/llvm/test/Transforms/InstCombine/and-or.ll
+++ b/llvm/test/Transforms/InstCombine/and-or.ll
@@ -685,11 +685,11 @@ define i32 @or_or_and_noOneUse_fail1(i32 %a, i32 %b) {
 ; CHECK-NEXT:    [[SHR:%.*]] = ashr i32 [[A:%.*]], 23
 ; CHECK-NEXT:    [[AND:%.*]] = and i32 [[SHR]], 157
 ; CHECK-NEXT:    call void @use2(i32 [[AND]])
-; CHECK-NEXT:    [[AND1:%.*]] = or i32 [[B:%.*]], 157
-; CHECK-NEXT:    [[OR:%.*]] = and i32 [[SHR]], [[AND1]]
-; CHECK-NEXT:    [[TMP1:%.*]] = lshr i32 [[B]], 23
-; CHECK-NEXT:    [[AND9:%.*]] = and i32 [[TMP1]], 157
-; CHECK-NEXT:    [[R:%.*]] = or i32 [[OR]], [[AND9]]
+; CHECK-NEXT:    [[AND3:%.*]] = and i32 [[SHR]], [[B:%.*]]
+; CHECK-NEXT:    [[SHR8:%.*]] = lshr i32 [[B]], 23
+; CHECK-NEXT:    [[AND9:%.*]] = and i32 [[SHR8]], 157
+; CHECK-NEXT:    [[TMP1:%.*]] = or i32 [[AND3]], [[AND9]]
+; CHECK-NEXT:    [[R:%.*]] = or i32 [[TMP1]], [[AND]]
 ; CHECK-NEXT:    ret i32 [[R]]
 ;
   %shr = ashr i32 %a, 23
@@ -714,9 +714,9 @@ define { i1, i1, i1, i1, i1 } @or_or_and_noOneUse_fail2(i1 %a_0, i1 %a_1, i1 %a_
 ; CHECK-NEXT:    [[TMP3:%.*]] = and i1 [[A_1:%.*]], [[B_1:%.*]]
 ; CHECK-NEXT:    [[TMP4:%.*]] = xor i1 [[TMP3]], true
 ; CHECK-NEXT:    [[TMP5:%.*]] = and i1 [[TMP0]], [[A_1]]
-; CHECK-NEXT:    [[TMP6:%.*]] = or i1 [[TMP2]], [[A_1]]
-; CHECK-NEXT:    [[TMP7:%.*]] = and i1 [[TMP6]], [[B_1]]
-; CHECK-NEXT:    [[D:%.*]] = or i1 [[TMP7]], [[TMP5]]
+; CHECK-NEXT:    [[TMP6:%.*]] = and i1 [[TMP2]], [[B_1]]
+; CHECK-NEXT:    [[TMP7:%.*]] = or i1 [[TMP6]], [[TMP5]]
+; CHECK-NEXT:    [[D:%.*]] = or i1 [[TMP7]], [[TMP3]]
 ; CHECK-NEXT:    [[DOTNOT1:%.*]] = or i1 [[TMP1]], [[TMP3]]
 ; CHECK-NEXT:    [[TMP8:%.*]] = insertvalue { i1, i1, i1, i1, i1 } zeroinitializer, i1 [[D]], 0
 ; CHECK-NEXT:    [[TMP9:%.*]] = insertvalue { i1, i1, i1, i1, i1 } [[TMP8]], i1 [[TMP4]], 1
diff --git a/llvm/test/Transforms/InstCombine/ctpop.ll b/llvm/test/Transforms/InstCombine/ctpop.ll
index 940bb868c4c615..3177625b5cfb68 100644
--- a/llvm/test/Transforms/InstCombine/ctpop.ll
+++ b/llvm/test/Transforms/InstCombine/ctpop.ll
@@ -442,9 +442,9 @@ define i32 @parity_xor_extra_use(i32 %arg, i32 %arg1) {
 ; CHECK-NEXT:    [[I:%.*]] = tail call range(i32 0, 33) i32 @llvm.ctpop.i32(i32 [[ARG:%.*]])
 ; CHECK-NEXT:    [[I2:%.*]] = and i32 [[I]], 1
 ; CHECK-NEXT:    tail call void @use(i32 [[I2]])
-; CHECK-NEXT:    [[TMP1:%.*]] = xor i32 [[ARG1:%.*]], [[ARG]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call range(i32 0, 33) i32 @llvm.ctpop.i32(i32 [[TMP1]])
-; CHECK-NEXT:    [[I5:%.*]] = and i32 [[TMP2]], 1
+; CHECK-NEXT:    [[I3:%.*]] = tail call range(i32 0, 33) i32 @llvm.ctpop.i32(i32 [[ARG1:%.*]])
+; CHECK-NEXT:    [[I4:%.*]] = and i32 [[I3]], 1
+; CHECK-NEXT:    [[I5:%.*]] = xor i32 [[I4]], [[I2]]
 ; CHECK-NEXT:    ret i32 [[I5]]
 ;
   %i = tail call i32 @llvm.ctpop.i32(i32 %arg)
@@ -461,9 +461,9 @@ define i32 @parity_xor_extra_use2(i32 %arg, i32 %arg1) {
 ; CHECK-NEXT:    [[I:%.*]] = tail call range(i32 0, 33) i32 @llvm.ctpop.i32(i32 [[ARG1:%.*]])
 ; CHECK-NEXT:    [[I2:%.*]] = and i32 [[I]], 1
 ; CHECK-NEXT:    tail call void @use(i32 [[I2]])
-; CHECK-NEXT:    [[TMP1:%.*]] = xor i32 [[ARG1]], [[ARG:%.*]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call range(i32 0, 33) i32 @llvm.ctpop.i32(i32 [[TMP1]])
-; CHECK-NEXT:    [[I5:%.*]] = and i32 [[TMP2]], 1
+; CHECK-NEXT:    [[I3:%.*]] = tail call range(i32 0, 33) i32 @llvm.ctpop.i32(i32 [[ARG:%.*]])
+; CHECK-NEXT:    [[I4:%.*]] = and i32 [[I3]], 1
+; CHECK-NEXT:    [[I5:%.*]] = xor i32 [[I2]], [[I4]]
 ; CHECK-NEXT:    ret i32 [[I5]]
 ;
   %i = tail call i32 @llvm.ctpop.i32(i32 %arg1)
diff --git a/llvm/test/Transforms/InstCombine/shl-factor.ll b/llvm/test/Transforms/InstCombine/shl-factor.ll
index 38eecaeff8e41e..dcdd5020f76645 100644
--- a/llvm/test/Transforms/InstCombine/shl-factor.ll
+++ b/llvm/test/Transforms/InstCombine/shl-factor.ll
@@ -43,8 +43,8 @@ define i8 @add_shl_same_amount_nsw_extra_use1(i8 %x, i8 %y, i8 %z) {
 ; CHECK-LABEL: @add_shl_same_amount_nsw_extra_use1(
 ; CHECK-NEXT:    [[XS:%.*]] = shl nuw nsw i8 [[X:%.*]], [[Z:%.*]]
 ; CHECK-NEXT:    call void @use8(i8 [[XS]])
-; CHECK-NEXT:    [[TMP1:%.*]] = add nsw i8 [[X]], [[Y:%.*]]
-; CHECK-NEXT:    [[DIFF:%.*]] = shl nsw i8 [[TMP1]], [[Z]]
+; CHECK-NEXT:    [[YS:%.*]] = shl nuw nsw i8 [[Y:%.*]], [[Z]]
+; CHECK-NEXT:    [[DIFF:%.*]] = add nsw i8 [[XS]], [[YS]]
 ; CHECK-NEXT:    ret i8 [[DIFF]]
 ;
   %xs = shl nsw nuw i8 %x, %z
@@ -56,10 +56,10 @@ define i8 @add_shl_same_amount_nsw_extra_use1(i8 %x, i8 %y, i8 %z) {
 
 define i8 @add_shl_same_amount_nuw_extra_use2(i8 %x, i8 %y, i8 %z) {
 ; CHECK-LABEL: @add_shl_same_amount_nuw_extra_use2(
-; CHECK-NEXT:    [[YS:%.*]] = shl nuw nsw i8 [[Y:%.*]], [[Z:%.*]]
+; CHECK-NEXT:    [[XS:%.*]] = shl nuw i8 [[X:%.*]], [[Z:%.*]]
+; CHECK-NEXT:    [[YS:%.*]] = shl nuw nsw i8 [[Y:%.*]], [[Z]]
 ; CHECK-NEXT:    call void @use8(i8 [[YS]])
-; CHECK-NEXT:    [[TMP1:%.*]] = add nuw i8 [[X:%.*]], [[Y]]
-; CHECK-NEXT:    [[DIFF:%.*]] = shl nuw i8 [[TMP1]], [[Z]]
+; CHECK-NEXT:    [[DIFF:%.*]] = add nuw nsw i8 [[XS]], [[YS]]
 ; CHECK-NEXT:    ret i8 [[DIFF]]
 ;
   %xs = shl nuw i8 %x, %z
@@ -174,8 +174,8 @@ define i8 @sub_shl_same_amount_nsw_extra_use1(i8 %x, i8 %y, i8 %z) {
 ; CHECK-LABEL: @sub_shl_same_amount_nsw_extra_use1(
 ; CHECK-NEXT:    [[XS:%.*]] = shl nuw nsw i8 [[X:%.*]], [[Z:%.*]]
 ; CHECK-NEXT:    call void @use8(i8 [[XS]])
-; CHECK-NEXT:    [[TMP1:%.*]] = sub nsw i8 [[X]], [[Y:%.*]]
-; CHECK-NEXT:    [[DIFF:%.*]] = shl nsw i8 [[TMP1]], [[Z]]
+; CHECK-NEXT:    [[YS:%.*]] = shl nuw nsw i8 [[Y:%.*]], [[Z]]
+; CHECK-NEXT:    [[DIFF:%.*]] = sub nsw i8 [[XS]], [[YS]]
 ; CHECK-NEXT:    ret i8 [[DIFF]]
 ;
   %xs = shl nsw nuw i8 %x, %z
@@ -187,10 +187,10 @@ define i8 @sub_shl_same_amount_nsw_extra_use1(i8 %x, i8 %y, i8 %z) {
 
 define i8 @sub_shl_same_amount_nuw_extra_use2(i8 %x, i8 %y, i8 %z) {
 ; CHECK-LABEL: @sub_shl_same_amount_nuw_extra_use2(
-; CHECK-NEXT:    [[YS:%.*]] = shl nuw nsw i8 [[Y:%.*]], [[Z:%.*]]
+; CHECK-NEXT:    [[XS:%.*]] = shl nuw i8 [[X:%.*]], [[Z:%.*]]
+; CHECK-NEXT:    [[YS:%.*]] = shl nuw nsw i8 [[Y:%.*]], [[Z]]
 ; CHECK-NEXT:    call void @use8(i8 [[YS]])
-; CHECK-NEXT:    [[TMP1:%.*]] = sub nuw i8 [[X:%.*]], [[Y]]
-; CHECK-NEXT:    [[DIFF:%.*]] = shl nuw i8 [[TMP1]], [[Z]]
+; CHECK-NEXT:    [[DIFF:%.*]] = sub nuw nsw i8 [[XS]], [[YS]]
 ; CHECK-NEXT:    ret i8 [[DIFF]]
 ;
   %xs = shl nuw i8 %x, %z
@@ -265,3 +265,27 @@ define i6 @sub_shl_same_amount_partial_nuw2(i6 %x, i6 %y, i6 %z) {
   ret i6 %diff
 }
 
+define i8 @add_shl_same_amount_constants(i8 %z) {
+; CHECK-LABEL: @add_shl_same_amount_constants(
+; CHECK-NEXT:    [[SUM:%.*]] = shl i8 7, [[Z:%.*]]
+; CHECK-NEXT:    ret i8 [[SUM]]
+;
+  %s1 = shl i8 4, %z
+  %s2 = shl i8 3, %z
+  %sum = add i8 %s1, %s2
+  ret i8 %sum
+}
+
+define i8 @add_shl_same_amount_constants_extra_use(i8 %z) {
+; CHECK-LABEL: @add_shl_same_amount_constants_extra_use(
+; CHECK-NEXT:    [[S1:%.*]] = shl i8 4, [[Z:%.*]]
+; CHECK-NEXT:    [[SUM:%.*]] = shl i8 7, [[Z]]
+; CHECK-NEXT:    call void @use8(i8 [[S1]])
+; CHECK-NEXT:    ret i8 [[SUM]]
+;
+  %s1 = shl i8 4, %z
+  %s2 = shl i8 3, %z
+  %sum = add i8 %s1, %s2
+  call void @use8(i8 %s1)
+  ret i8 %sum
+}
diff --git a/llvm/test/Transforms/LoopVectorize/induction.ll b/llvm/test/Transforms/LoopVectorize/induction.ll
index 45674acaae5387..0d627e9b6a8834 100644
--- a/llvm/test/Transforms/LoopVectorize/induction.ll
+++ b/llvm/test/Transforms/LoopVectorize/induction.ll
@@ -3910,8 +3910,8 @@ define void @wrappingindvars2(i8 %t, i32 %len, ptr %A) {
 ; IND-NEXT:    [[N_VEC:%.*]] = and i32 [[TMP0]], 510
 ; IND-NEXT:    [[DOTCAST:%.*]] = trunc i32 [[N_VEC]] to i8
 ; IND-NEXT:    [[IND_END:%.*]] = add i8 [[DOTCAST]], [[T]]
-; IND-NEXT:    [[EXT_MUL5:%.*]] = add nuw nsw i32 [[N_VEC]], [[EXT]]
-; IND-NEXT:    [[IND_END1:%.*]] = shl nuw nsw i32 [[EXT_MUL5]], 2
+; IND-NEXT:    [[TMP10:%.*]] = shl nuw nsw i32 [[N_VEC]], 2
+; IND-NEXT:    [[IND_END1:%.*]] = add nuw nsw i32 [[EXT_MUL]], [[TMP10]]
 ; IND-NEXT:    [[DOTSPLATINSERT:%.*]] = insertelement <2 x i32> poison, i32 [[EXT_MUL]], i64 0
 ; IND-NEXT:    [[DOTSPLAT:%.*]] = shufflevector <2 x i32> [[DOTSPLATINSERT]], <2 x i32> poison, <2 x i32> zeroinitializer
 ; IND-NEXT:    [[INDUCTION:%.*]] = add nuw nsw <2 x i32> [[DOTSPLAT]], <i32 0, i32 4>
@@ -3921,13 +3921,13 @@ define void @wrappingindvars2(i8 %t, i32 %len, ptr %A) {
 ; IND-NEXT:    [[VEC_IND:%.*]] = phi <2 x i32> [ [[INDUCTION]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.*]], [[VECTOR_BODY]] ]
 ; IND-NEXT:    [[DOTCAST4:%.*]] = trunc i32 [[INDEX]] to i8
 ; IND-NEXT:    [[OFFSET_IDX:%.*]] = add i8 [[DOTCAST4]], [[T]]
-; IND-NEXT:    [[TMP10:%.*]] = sext i8 [[OFFSET_IDX]] to i64
-; IND-NEXT:    [[TMP11:%.*]] = getelementptr inbounds i32, ptr [[A:%.*]], i64 [[TMP10]]
-; IND-NEXT:    store <2 x i32> [[VEC_IND]], ptr [[TMP11]], align 4
+; IND-NEXT:    [[TMP11:%.*]] = sext i8 [[OFFSET_IDX]] to i64
+; IND-NEXT:    [[TMP12:%.*]] = getelementptr inbounds i32, ptr [[A:%.*]], i64 [[TMP11]]
+; IND-NEXT:    store <2 x i32> [[VEC_IND]], ptr [[TMP12]], align 4
 ; IND-NEXT:    [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 2
 ; IND-NEXT:    [[VEC_IND_NEXT]] = add <2 x i32> [[VEC_IND]], <i32 8, i32 8>
-; IND-NEXT:    [[TMP12:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
-; IND-NEXT:    br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP38:![0-9]+]]
+; IND-NEXT:    [[TMP13:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
+; IND-NEXT:    br i1 [[TMP13]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP38:![0-9]+]]
 ; IND:       middle.block:
 ; IND-NEXT:    [[CMP_N:%.*]] = icmp eq i32 [[TMP0]], [[N_VEC]]
 ; IND-NEXT:    br i1 [[CMP_N]], label [[EXIT_LOOPEXIT:%.*]], label [[SCALAR_PH]]
@@ -3940,8 +3940,8 @@ define void @wrappingindvars2(i8 %t, i32 %len, ptr %A) {
 ; IND-NEXT:    [[IDX:%.*]] = phi i8 [ [[IDX_INC:%.*]], [[LOOP]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
 ; IND-NEXT:    [[SPHI:%.*]] = phi i32 [ [[MUL:%.*]], [[LOOP]] ], [ [[BC_RESUME_VAL2]], [[SCALAR_PH]] ]
 ; IND-NEXT:    [[IDX_B:%.*]] = phi i32 [ [[IDX_B_INC:%.*]], [[LOOP]] ], [ [[BC_RESUME_VAL3]], [[SCALAR_PH]] ]
-; IND-NEXT:    [[TMP13:%.*]] = sext i8 [[IDX]] to i64
-; IND-NEXT:    [[PTR:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[TMP13]]
+; IND-NEXT:    [[TMP14:%.*]] = sext i8 [[IDX]] to i64
+; IND-NEXT:    [[PTR:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[TMP14]]
 ; IND-NEXT:    store i32 [[SPHI]], ptr [[PTR]], align 4
 ; IND-NEXT:    [[IDX_INC]] = add i8 [[IDX]], 1
 ; IND-NEXT:    [[IDX_INC_EXT:%.*]] = zext i8 [[IDX_INC]] to i32
@@ -3979,8 +3979,8 @@ define void @wrappingindvars2(i8 %t, i32 %len, ptr %A) {
 ; UNROLL-NEXT:    [[N_VEC:%.*]] = and i32 [[TMP0]], 508
 ; UNROLL-NEXT:    [[DOTCAST:%.*]] = trunc i32 [[N_VEC]] to i8
 ; UNROLL-NEXT:    [[IND_END:%.*]] = add i8 [[DOTCAST]], [[T]]
-; UNROLL-NEXT:    [[EXT_MUL6:%.*]] = add nuw nsw i32 [[N_VEC]], [[EXT]]
-; UNROLL-NEXT:    [[IND_END1:%.*]] = shl nuw nsw i32 [[EXT_MUL6]], 2
+; UNROLL-NEXT:    [[TMP10:%.*]] = shl nuw nsw i32 [[N_VEC]], 2
+; UNROLL-NEXT:    [[IND_END1:%.*]] = add nuw nsw i32 [[EXT_MUL]], [[TMP10]]
 ; UNROLL-NEXT:    [[DOTSPLATINSERT:%.*]] = insertelement <2 x i32> poison, i32 [[EXT_MUL]], i64 0
 ; UNROLL-NEXT:    [[DOTSPLAT:%.*]] = shufflevector <2 x i32> [[DOTSPLATINSERT]], <2 x i32> poison, <2 x i32> zeroinitializer
 ; UNROLL-NEXT:    [[INDUCTION:%.*]] = add nuw nsw <2 x i32> [[DOTSPLAT]], <i32 0, i32 4>
@@ -3991,15 +3991,15 @@ define void @wrappingindvars2(i8 %t, i32 %len, ptr %A) {
 ; UNROLL-NEXT:    [[STEP_ADD:%.*]] = add <2 x i32> [[VEC_IND]], <i32 8, i32 8>
 ; UNROLL-NEXT:    [[DOTCAST5:%.*]] = trunc i32 [[INDEX]] to i8
 ; UNROLL-NEXT:    [[OFFSET_IDX:%.*]] = add i8 [[DOTCAST5]], [[T]]
-; UNROLL-NEXT:    [[TMP10:%.*]] = sext i8 [[OFFSET_IDX]] to i64
-; UNROLL-NEXT:    [[TMP11:%.*]] = getelementptr inbounds i32, ptr [[A:%.*]], i64 [[TMP10]]
-; UNROLL-NEXT:    [[TMP12:%.*]] = getelementptr inbounds i8, ptr [[TMP11]], i64 8
-; UNROLL-NEXT:    store <2 x i32> [[VEC_IND]], ptr [[TMP11]], align 4
-; UNROLL-NEXT:    store <2 x i32> [[STEP_ADD]], ptr [[TMP12]], align 4
+; UNROLL-NEXT:    [[TMP11:%.*]] = sext i8 [[OFFSET_IDX]] to i64
+; UNROLL-NEXT:    [[TMP12:%.*]] = getelementptr inbounds i32, ptr [[A:%.*]], i64 [[TMP11]]
+; UNROLL-NEXT:    [[TMP13:%.*]] = getelementptr inbounds i8, ptr [[TMP12]], i64 8
+; UNROLL-NEXT:    store <2 x i32> [[VEC_IND]], ptr [[TMP12]], align 4
+; UNROLL-NEXT:    store <2 x i32> [[STEP_ADD]], ptr [[TMP13]], align 4
 ; UNROLL-NEXT:    [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 4
 ; UNROLL-NEXT:    [[VEC_IND_NEXT]] = add <2 x i32> [[VEC_IND]], <i32 16, i32 16>
-; UNROLL-NEXT:    [[TMP13:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
-; UNROLL-NEXT:    br i1 [[TMP13]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP38:![0-9]+]]
+; UNROLL-NEXT:    [[TMP14:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
+; UNROLL-NEXT:    br i1 [[TMP14]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP38:![0-9]+]]
 ; UNROLL:       middle.block:
 ; UNROLL-NEXT:    [[CMP_N:%.*]] = icmp eq i32 [[TMP0]], [[N_VEC]]
 ; UNROLL-NEXT:    br i1 [[CMP_N]], label [[EXIT_LOOPEXIT:%.*]], label [[SCALAR_PH]]
@@ -4012,8 +4012,8 @@ define void @wrappingindvars2(i8 %t, i32 %len, ptr %A) {
 ; UNROLL-NEXT:    [[IDX:%.*]] = phi i8 [ [[IDX_INC:%.*]], [[LOOP]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
 ; UNROLL-NEXT:    [[SPHI:%.*]] = phi i32 [ [[MUL:%.*]], [[LOOP]] ], [ [[BC_RESUME_VAL2]], [[SCALAR_PH]] ]
 ; UNROLL-NEXT:    [[IDX_B:%.*]] = phi i32 [ [[IDX_B_INC:%.*]], [[LOOP]] ], [ [[BC_RESUME_VAL3]], [[SCALAR_PH]] ]
-; UNROLL-NEXT:    [[TMP14:%.*]] = sext i8 [[IDX]] to i64
-; UNROLL-NEXT:    [[PTR:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[TMP14]]
+; UNROLL-NEXT:    [[TMP15:%.*]] = sext i8 [[IDX]] to i64
+; UNROLL-NEXT:    [[PTR:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[TMP15]]
 ; UNROLL-NEXT:    store i32 [[SPHI]], ptr [[PTR]], align 4
 ; UNROLL-NEXT:    [[IDX_INC]] = add i8 [[IDX]], 1
 ; UNROLL-NEXT:    [[IDX_INC_EXT:%.*]] = zext i8 [[IDX_INC]] to i32
@@ -4129,8 +4129,8 @@ define void @wrappingindvars2(i8 %t, i32 %len, ptr %A) {
 ; INTERLEAVE-NEXT:    [[N_VEC:%.*]] = and i32 [[TMP0]], 504
 ; INTERLEAVE-NEXT:    [[DOTCAST:%.*]] = trunc i32 [[N_VEC]] to i8
 ; INTERLEAVE-NEXT:    [[IND_END:%.*]] = add i8 [[DOTCAST]], [[T]]
-; INTERLEAVE-NEXT:    [[EXT_MUL6:%.*]] = add nuw nsw i32 [[N_VEC]], [[EXT]]
-; INTERLEAVE-NEXT:    [[IND_END1:%.*]] = shl nuw nsw i32 [[EXT_MUL6]], 2
+; INTERLEAVE-NEXT:    [[TMP10:%.*]] = shl nuw nsw i32 [[N_VEC]], 2
+; INTERLEAVE-NEXT:    [[IND_END1:%.*]] = add nuw nsw i32 [[EXT_MUL]], [[TMP10]]
 ; INTERLEAVE-NEXT:    [[DOTSPLATINSERT:%.*]] = insertelement <4 x i32> poison, i32 [[EXT_MUL]], i64 0
 ; INTERLEAVE-NEXT:    [[DOTSPLAT:%.*]] = shufflevector <4 x i32> [[DOTSPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer
 ; INTERLEAVE-NEXT:    [[INDUCTION:%.*]] = add nuw nsw <4 x i32> [[DOTSPLAT]], <i32 0, i32 4, i32 8, i32 12>
@@ -4141,15 +4141,15 @@ define void @wrappingindvars2(i8 %t, i32 %len, ptr %A) {
 ; INTERLEAVE-NEXT:    [[STEP_ADD:%.*]] = add <4 x i32> [[VEC_IND]], <i32 16, i32 16, i32 16, i32 16>
 ; INTERLEAVE-NEXT:    [[DOTCAST5:%.*]] = trunc i32 [[INDEX]] to i8
 ; INTERLEAVE-NEXT:    [[OFFSET_IDX:%.*]] = add i8 [[DOTCAST5]], [[T]]
-; INTERLEAVE-NEXT:    [[TMP10:%.*]] = sext i8 [[OFFSET_IDX]] to i64
-; INTERLEAVE-NEXT:    [[TMP11:%.*]] = getelementptr inbounds i32, ptr [[A:%.*]], i64 [[TMP10]]
-; INTERLEAVE-NEXT:    [[TMP12:%.*]] = getelementptr inbounds i8, ptr [[TMP11]], i64 16
-; INTERLEAVE-NEXT:    store <4 x i32> [[VEC_IND]], ptr [[TMP11]], align 4
-; INTERLEAVE-NEXT:    store <4 x i32> [[STEP_ADD]], ptr [[TMP12]], align 4
+; INTERLEAVE-NEXT:    [[TMP11:%.*]] = sext i8 [[OFFSET_IDX]] to i64
+; INTERLEAVE-NEXT:    [[TMP12:%.*]] = getelementptr inbounds i32, ptr [[A:%.*]], i64 [[TMP11]]
+; INTERLEAVE-NEXT:    [[TMP13:%.*]] = getelementptr inbounds i8, ptr [[TMP12]], i64 16
+; INTERLEAVE-NEXT:    store <4 x i32> [[VEC_IND]], ptr [[TMP12]], align 4
+; INTERLEAVE-NEXT:    store <4 x i32> [[STEP_ADD]], ptr [[TMP13]], align 4
 ; INTERLEAVE-NEXT:    [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 8
 ; INTERLEAVE-NEXT:    [[VEC_IND_NEXT]] = add <4 x i32> [[VEC_IND]], <i32 32, i32 32, i32 32, i32 32>
-; INTERLEAVE-NEXT:    [[TMP13:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
-; INTERLEAVE-NEXT:    br i1 [[TMP13]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP38:![0-9]+]]
+; INTERLEAVE-NEXT:    [[TMP14:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
+; INTERLEAVE-NEXT:    br i1 [[TMP14]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP38:![0-9]+]]
 ; INTERLEAVE:       middle.block:
 ; INTERLEAVE-NEXT:    [[CMP_N:%.*]] = icmp eq i32 [[TMP0]], [[N_VEC]]
 ; INTERLEAVE-NEXT:    br i1 [[CMP_N]], label [[EXIT_LOOPEXIT:%.*]], label [[SCALAR_PH]]
@@ -4162,8 +4162,8 @@ define void @wrappingindvars2(i8 %t, i32 %len, ptr %A) {
 ; INTERLEAVE-NEXT:    [[IDX:%.*]] = phi i8 [ [[IDX_INC:%.*]], [[LOOP]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
 ; INTERLEAVE-NEXT:    [[SPHI:%.*]] = phi i32 [ [[MUL:%.*]], [[LOOP]] ], [ [[BC_RESUME_VAL2]], [[SCALAR_PH]] ]
 ; INTERLEAVE-NEXT:    [[IDX_B:%.*]] = phi i32 [ [[IDX_B_INC:%.*]], [[LOOP]] ], [ [[BC_RESUME_VAL3]], [[SCALAR_PH]] ]
-;...
[truncated]

dtcxzyw · 2024-08-12T18:08:44Z

llvm/test/Transforms/InstCombine/shl-factor.ll

@@ -265,3 +265,27 @@ define i6 @sub_shl_same_amount_partial_nuw2(i6 %x, i6 %y, i6 %z) {
  ret i6 %diff
 }

+define i8 @add_shl_same_amount_constants(i8 %z) {


Can you pre-commit these tests? See also https://llvm.org/docs/InstCombineContributorGuide.html#precommit-tests.

These new tests aren't actually affected by this patch so there will be no diff, I added them since they show a pattern that wasn't tested before that I want to show as being preserved despite the restrictions added by the change. Should I still pre-commit? Or maybe these should be added in a separate NFC PR?

Yeah, please file a separate PR for these additional tests.

Since this patch is not NFC, you should pre-commit some tests to demonstrate the issue.

Removed the unaffected new tests and added a pre-commit to demonstrate the issue with tryFactorization transform. The issue with factorizeMathWithShlOps is already shown by the changes to shl-factor.ll.

nikic

The transform in tryFactorization() used to require that both operations have one use and thus would be removable after the transform. Commit 0cfc651 loosened factorization constraints such that only one of the operations must be removable after the transform. There are cases where this will increase the number of non-intermediate variable (variable that is not the result of a binop involved in the transform) uses with no discernable benefit (e.g. https://alive2.llvm.org/ce/z/8K4VuE). Likewise, factorizeMathWithShlOps() uses the same constraints and has the same issue (https://alive2.llvm.org/ce/z/EKK-Kp).

Could you explain in more detail why these transforms are undesirable (to the point that we must actively prevent them)?

In https://alive2.llvm.org/ce/z/8K4VuE %add after the transform only depends on one multiply, instead of two, which seems like a good change in principle?

kalxr · 2024-08-15T17:56:22Z

The transform in tryFactorization() used to require that both operations have one use and thus would be removable after the transform. Commit 0cfc651 loosened factorization constraints such that only one of the operations must be removable after the transform. There are cases where this will increase the number of non-intermediate variable (variable that is not the result of a binop involved in the transform) uses with no discernable benefit (e.g. https://alive2.llvm.org/ce/z/8K4VuE). Likewise, factorizeMathWithShlOps() uses the same constraints and has the same issue (https://alive2.llvm.org/ce/z/EKK-Kp).

Could you explain in more detail why these transforms are undesirable (to the point that we must actively prevent them)?

In https://alive2.llvm.org/ce/z/8K4VuE %add after the transform only depends on one multiply, instead of two, which seems like a good change in principle?

I've added another precommit test that should help demonstrate an issue of the transform (mul_add_chain_common_factor_uses). This is a case where applying the transform eliminates no instructions while removing redundancy that can be eliminated in the future by e.g. GVN. https://alive2.llvm.org/ce/z/zeNCwD shows this - if you apply InstCombine then GVN, only 2 instructions are removed. If you remove InstCombine and only run GVN, 4 instructions are removed.

kalxr · 2024-08-22T18:39:07Z

Ping

goldsteinn · 2024-08-22T19:23:38Z

This needs a rebase.

goldsteinn · 2024-08-22T19:30:09Z

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp

+  // This transform is only profitiable if both operations or one operation and
+  // the resulting add/sub can be eliminated/folded.
+  if (!(Op0->hasOneUse() && Op1->hasOneUse()) &&
+      !(isa<Constant>(X) && isa<Constant>(Y)))


Probably more precise is !simplifyBinOp(I.getOpcode(), X, Y)

Good idea, added.

goldsteinn · 2024-08-22T22:12:10Z

llvm/test/Transforms/InstCombine/and-or.ll

 ; CHECK-NEXT:    [[SHR8:%.*]] = lshr i32 [[B]], 23
 ; CHECK-NEXT:    [[AND9:%.*]] = and i32 [[SHR8]], 157
-; CHECK-NEXT:    [[R:%.*]] = or i32 [[OR]], [[AND9]]
+; CHECK-NEXT:    [[TMP1:%.*]] = or i32 [[AND3]], [[AND9]]
+; CHECK-NEXT:    [[R:%.*]] = or i32 [[TMP1]], [[AND]]


This is a minor regression.

goldsteinn · 2024-08-22T22:12:33Z

@dtcxzyw can you test this patch?

PR Link: llvm/llvm-project#102943

dtcxzyw · 2024-08-23T07:48:41Z

Regression:

; bin/opt -passes=instcombine test.ll -S
define i64 @test(i64 %23, i64 %24) {
entry:
   %.idx14.i = shl nsw i64 %23, 3
   %.idx.i = shl nsw i64 %24, 3
   %gepdiff.i = sub nsw i64 %.idx.i, %.idx14.i
   %25 = ashr exact i64 %gepdiff.i, 3
   call void @use(i64 %.idx14.i)
   ret i64 %25
}

declare void @use(i64)

It can be folded into:

define i64 @test(i64 %0, i64 %1) {
entry:
  %.idx14.i = shl nsw i64 %0, 3
  %2 = sub nsw i64 %1, %0
  call void @use(i64 %.idx14.i)
  ret i64 %2
}

kalxr self-assigned this Aug 12, 2024

kalxr requested a review from nikic as a code owner August 12, 2024 17:52

llvmbot added the llvm:transforms label Aug 12, 2024

dtcxzyw reviewed Aug 12, 2024

View reviewed changes

dtcxzyw requested a review from goldsteinn August 13, 2024 02:17

kalxr force-pushed the instcombine-factorization branch 2 times, most recently from 775c865 to 076b8a1 Compare August 14, 2024 16:06

nikic reviewed Aug 15, 2024

View reviewed changes

kalxr force-pushed the instcombine-factorization branch from 076b8a1 to eb0c4a3 Compare August 15, 2024 17:45

dtcxzyw requested a review from nikic August 22, 2024 18:43

goldsteinn reviewed Aug 22, 2024

View reviewed changes

kalxr added 2 commits August 22, 2024 13:35

Precommit - transform creates extra arg uses

620ce7b

[InstCombine] Tighten use constraint in factorization transforms

f9c2f0f

kalxr force-pushed the instcombine-factorization branch from eb0c4a3 to f9c2f0f Compare August 22, 2024 20:55

goldsteinn reviewed Aug 22, 2024

View reviewed changes

dtcxzyw added a commit to dtcxzyw/llvm-opt-benchmark that referenced this pull request Aug 23, 2024

pre-commit: test PR102943

52a1726

PR Link: llvm/llvm-project#102943

dtcxzyw mentioned this pull request Aug 23, 2024

pre-commit: test PR102943 dtcxzyw/llvm-opt-benchmark#1237

Closed

kalxr closed this Dec 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[InstCombine] Tighten use constraint in factorization transforms #102943

[InstCombine] Tighten use constraint in factorization transforms #102943

Uh oh!

kalxr commented Aug 12, 2024

Uh oh!

llvmbot commented Aug 12, 2024

Uh oh!

dtcxzyw Aug 12, 2024

Uh oh!

kalxr Aug 12, 2024

Uh oh!

dtcxzyw Aug 13, 2024

Uh oh!

kalxr Aug 13, 2024

Uh oh!

nikic left a comment

Uh oh!

kalxr commented Aug 15, 2024

Uh oh!

kalxr commented Aug 22, 2024

Uh oh!

goldsteinn commented Aug 22, 2024

Uh oh!

goldsteinn Aug 22, 2024

Uh oh!

kalxr Aug 22, 2024

Uh oh!

goldsteinn Aug 22, 2024

Uh oh!

goldsteinn commented Aug 22, 2024

Uh oh!

dtcxzyw commented Aug 23, 2024

Uh oh!

Uh oh!

[InstCombine] Tighten use constraint in factorization transforms #102943

[InstCombine] Tighten use constraint in factorization transforms #102943

Uh oh!

Conversation

kalxr commented Aug 12, 2024

Uh oh!

llvmbot commented Aug 12, 2024

Uh oh!

dtcxzyw Aug 12, 2024

Choose a reason for hiding this comment

Uh oh!

kalxr Aug 12, 2024

Choose a reason for hiding this comment

Uh oh!

dtcxzyw Aug 13, 2024

Choose a reason for hiding this comment

Uh oh!

kalxr Aug 13, 2024

Choose a reason for hiding this comment

Uh oh!

nikic left a comment

Choose a reason for hiding this comment

Uh oh!

kalxr commented Aug 15, 2024

Uh oh!

kalxr commented Aug 22, 2024

Uh oh!

goldsteinn commented Aug 22, 2024

Uh oh!

goldsteinn Aug 22, 2024

Choose a reason for hiding this comment

Uh oh!

kalxr Aug 22, 2024

Choose a reason for hiding this comment

Uh oh!

goldsteinn Aug 22, 2024

Choose a reason for hiding this comment

Uh oh!

goldsteinn commented Aug 22, 2024

Uh oh!

dtcxzyw commented Aug 23, 2024

Uh oh!

Uh oh!