[InstCombine] Convert or concat to fshl if opposite or concat exists #68502

HaohaiWen · 2023-10-08T03:15:31Z

If there are two 'or' instructions concat variables in opposite order
and the first 'or' dominates the second one, the second 'or' can be
optimized to fshl to rotate shift first 'or'. This can eliminate an shl
and expose more optimization opportunity for bswap/bitreverse.

Current implementation of matchFunnelShift only allows opposite shift pattern. Refactor it to allow more pattern.

If there are two 'or' instructions concat variables in opposite order and the first 'or' dominates the second one, the second 'or' can be optimized to fshl to rotate shift first 'or'. This can eliminate an shl and expose more optimization opportunity for bswap/bitreverse.

llvmbot · 2023-10-08T03:16:41Z

@llvm/pr-subscribers-llvm-transforms

Changes

If there are two 'or' instructions concat variables in opposite order
and the first 'or' dominates the second one, the second 'or' can be
optimized to fshl to rotate shift first 'or'. This can eliminate an shl
and expose more optimization opportunity for bswap/bitreverse.

Full diff: https://github.com/llvm/llvm-project/pull/68502.diff

2 Files Affected:

(modified) llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp (+134-78)
(modified) llvm/test/Transforms/InstCombine/funnel.ll (+42)

diff --git a/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp b/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
index cbdab3e9c5fb91d..09017e46ee3bd4d 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
@@ -2727,105 +2727,161 @@ Instruction *InstCombinerImpl::matchBSwapOrBitReverse(Instruction &I,
 }
 
 /// Match UB-safe variants of the funnel shift intrinsic.
-static Instruction *matchFunnelShift(Instruction &Or, InstCombinerImpl &IC) {
+static Instruction *matchFunnelShift(Instruction &Or, InstCombinerImpl &IC,
+                                     const DominatorTree &DT) {
   // TODO: Can we reduce the code duplication between this and the related
   // rotate matching code under visitSelect and visitTrunc?
   unsigned Width = Or.getType()->getScalarSizeInBits();
 
-  // First, find an or'd pair of opposite shifts:
-  // or (lshr ShVal0, ShAmt0), (shl ShVal1, ShAmt1)
-  BinaryOperator *Or0, *Or1;
-  if (!match(Or.getOperand(0), m_BinOp(Or0)) ||
-      !match(Or.getOperand(1), m_BinOp(Or1)))
+  Instruction *Or0, *Or1;
+  if (!match(Or.getOperand(0), m_Instruction(Or0)) ||
+      !match(Or.getOperand(1), m_Instruction(Or1)))
     return nullptr;
 
-  Value *ShVal0, *ShVal1, *ShAmt0, *ShAmt1;
-  if (!match(Or0, m_OneUse(m_LogicalShift(m_Value(ShVal0), m_Value(ShAmt0)))) ||
-      !match(Or1, m_OneUse(m_LogicalShift(m_Value(ShVal1), m_Value(ShAmt1)))) ||
-      Or0->getOpcode() == Or1->getOpcode())
-    return nullptr;
+  bool IsFshl = true; // Sub on LSHR.
+  SmallVector<Value *, 3> FShiftArgs;
 
-  // Canonicalize to or(shl(ShVal0, ShAmt0), lshr(ShVal1, ShAmt1)).
-  if (Or0->getOpcode() == BinaryOperator::LShr) {
-    std::swap(Or0, Or1);
-    std::swap(ShVal0, ShVal1);
-    std::swap(ShAmt0, ShAmt1);
-  }
-  assert(Or0->getOpcode() == BinaryOperator::Shl &&
-         Or1->getOpcode() == BinaryOperator::LShr &&
-         "Illegal or(shift,shift) pair");
-
-  // Match the shift amount operands for a funnel shift pattern. This always
-  // matches a subtraction on the R operand.
-  auto matchShiftAmount = [&](Value *L, Value *R, unsigned Width) -> Value * {
-    // Check for constant shift amounts that sum to the bitwidth.
-    const APInt *LI, *RI;
-    if (match(L, m_APIntAllowUndef(LI)) && match(R, m_APIntAllowUndef(RI)))
-      if (LI->ult(Width) && RI->ult(Width) && (*LI + *RI) == Width)
-        return ConstantInt::get(L->getType(), *LI);
-
-    Constant *LC, *RC;
-    if (match(L, m_Constant(LC)) && match(R, m_Constant(RC)) &&
-        match(L, m_SpecificInt_ICMP(ICmpInst::ICMP_ULT, APInt(Width, Width))) &&
-        match(R, m_SpecificInt_ICMP(ICmpInst::ICMP_ULT, APInt(Width, Width))) &&
-        match(ConstantExpr::getAdd(LC, RC), m_SpecificIntAllowUndef(Width)))
-      return ConstantExpr::mergeUndefsWith(LC, RC);
-
-    // (shl ShVal, X) | (lshr ShVal, (Width - x)) iff X < Width.
-    // We limit this to X < Width in case the backend re-expands the intrinsic,
-    // and has to reintroduce a shift modulo operation (InstCombine might remove
-    // it after this fold). This still doesn't guarantee that the final codegen
-    // will match this original pattern.
-    if (match(R, m_OneUse(m_Sub(m_SpecificInt(Width), m_Specific(L))))) {
-      KnownBits KnownL = IC.computeKnownBits(L, /*Depth*/ 0, &Or);
-      return KnownL.getMaxValue().ult(Width) ? L : nullptr;
+  // First, find an or'd pair of opposite shifts:
+  // or (lshr ShVal0, ShAmt0), (shl ShVal1, ShAmt1)
+  if (isa<BinaryOperator>(Or0) && isa<BinaryOperator>(Or1)) {
+    Value *ShVal0, *ShVal1, *ShAmt0, *ShAmt1;
+    if (!match(Or0,
+               m_OneUse(m_LogicalShift(m_Value(ShVal0), m_Value(ShAmt0)))) ||
+        !match(Or1,
+               m_OneUse(m_LogicalShift(m_Value(ShVal1), m_Value(ShAmt1)))) ||
+        Or0->getOpcode() == Or1->getOpcode())
+      return nullptr;
+
+    // Canonicalize to or(shl(ShVal0, ShAmt0), lshr(ShVal1, ShAmt1)).
+    if (Or0->getOpcode() == BinaryOperator::LShr) {
+      std::swap(Or0, Or1);
+      std::swap(ShVal0, ShVal1);
+      std::swap(ShAmt0, ShAmt1);
     }
+    assert(Or0->getOpcode() == BinaryOperator::Shl &&
+           Or1->getOpcode() == BinaryOperator::LShr &&
+           "Illegal or(shift,shift) pair");
+
+    // Match the shift amount operands for a funnel shift pattern. This always
+    // matches a subtraction on the R operand.
+    auto matchShiftAmount = [&](Value *L, Value *R, unsigned Width) -> Value * {
+      // Check for constant shift amounts that sum to the bitwidth.
+      const APInt *LI, *RI;
+      if (match(L, m_APIntAllowUndef(LI)) && match(R, m_APIntAllowUndef(RI)))
+        if (LI->ult(Width) && RI->ult(Width) && (*LI + *RI) == Width)
+          return ConstantInt::get(L->getType(), *LI);
+
+      Constant *LC, *RC;
+      if (match(L, m_Constant(LC)) && match(R, m_Constant(RC)) &&
+          match(L,
+                m_SpecificInt_ICMP(ICmpInst::ICMP_ULT, APInt(Width, Width))) &&
+          match(R,
+                m_SpecificInt_ICMP(ICmpInst::ICMP_ULT, APInt(Width, Width))) &&
+          match(ConstantExpr::getAdd(LC, RC), m_SpecificIntAllowUndef(Width)))
+        return ConstantExpr::mergeUndefsWith(LC, RC);
+
+      // (shl ShVal, X) | (lshr ShVal, (Width - x)) iff X < Width.
+      // We limit this to X < Width in case the backend re-expands the
+      // intrinsic, and has to reintroduce a shift modulo operation (InstCombine
+      // might remove it after this fold). This still doesn't guarantee that the
+      // final codegen will match this original pattern.
+      if (match(R, m_OneUse(m_Sub(m_SpecificInt(Width), m_Specific(L))))) {
+        KnownBits KnownL = IC.computeKnownBits(L, /*Depth*/ 0, &Or);
+        return KnownL.getMaxValue().ult(Width) ? L : nullptr;
+      }
+
+      // For non-constant cases, the following patterns currently only work for
+      // rotation patterns.
+      // TODO: Add general funnel-shift compatible patterns.
+      if (ShVal0 != ShVal1)
+        return nullptr;
+
+      // For non-constant cases we don't support non-pow2 shift masks.
+      // TODO: Is it worth matching urem as well?
+      if (!isPowerOf2_32(Width))
+        return nullptr;
+
+      // The shift amount may be masked with negation:
+      // (shl ShVal, (X & (Width - 1))) | (lshr ShVal, ((-X) & (Width - 1)))
+      Value *X;
+      unsigned Mask = Width - 1;
+      if (match(L, m_And(m_Value(X), m_SpecificInt(Mask))) &&
+          match(R, m_And(m_Neg(m_Specific(X)), m_SpecificInt(Mask))))
+        return X;
+
+      // Similar to above, but the shift amount may be extended after masking,
+      // so return the extended value as the parameter for the intrinsic.
+      if (match(L, m_ZExt(m_And(m_Value(X), m_SpecificInt(Mask)))) &&
+          match(R,
+                m_And(m_Neg(m_ZExt(m_And(m_Specific(X), m_SpecificInt(Mask)))),
+                      m_SpecificInt(Mask))))
+        return L;
+
+      if (match(L, m_ZExt(m_And(m_Value(X), m_SpecificInt(Mask)))) &&
+          match(R, m_ZExt(m_And(m_Neg(m_Specific(X)), m_SpecificInt(Mask)))))
+        return L;
 
-    // For non-constant cases, the following patterns currently only work for
-    // rotation patterns.
-    // TODO: Add general funnel-shift compatible patterns.
-    if (ShVal0 != ShVal1)
       return nullptr;
+    };
 
-    // For non-constant cases we don't support non-pow2 shift masks.
-    // TODO: Is it worth matching urem as well?
-    if (!isPowerOf2_32(Width))
+    Value *ShAmt = matchShiftAmount(ShAmt0, ShAmt1, Width);
+    if (!ShAmt) {
+      ShAmt = matchShiftAmount(ShAmt1, ShAmt0, Width);
+      IsFshl = false; // Sub on SHL.
+    }
+    if (!ShAmt)
       return nullptr;
 
-    // The shift amount may be masked with negation:
-    // (shl ShVal, (X & (Width - 1))) | (lshr ShVal, ((-X) & (Width - 1)))
-    Value *X;
-    unsigned Mask = Width - 1;
-    if (match(L, m_And(m_Value(X), m_SpecificInt(Mask))) &&
-        match(R, m_And(m_Neg(m_Specific(X)), m_SpecificInt(Mask))))
-      return X;
+    FShiftArgs = {ShVal0, ShVal1, ShAmt};
 
-    // Similar to above, but the shift amount may be extended after masking,
-    // so return the extended value as the parameter for the intrinsic.
-    if (match(L, m_ZExt(m_And(m_Value(X), m_SpecificInt(Mask)))) &&
-        match(R, m_And(m_Neg(m_ZExt(m_And(m_Specific(X), m_SpecificInt(Mask)))),
-                       m_SpecificInt(Mask))))
-      return L;
+  } else if (isa<ZExtInst>(Or0) || isa<ZExtInst>(Or1)) {
+    // If there are two 'or' instructions concat variables in opposite order,
+    // the latter one can be safely convert to fshl.
+    //
+    // LowHigh = or (shl (zext Low), Width - ZextHighShlAmt), (zext High)
+    // HighLow = or (shl (zext High), ZextHighShlAmt), (zext Low)
+    // ->
+    // HighLow = fshl LowHigh, LowHigh, ZextHighShlAmt
+    if (!isa<ZExtInst>(Or1))
+      std::swap(Or0, Or1);
+
+    Value *High, *ZextHigh, *Low;
+    const APInt *ZextHighShlAmt;
+    if (!match(Or0,
+               m_OneUse(m_Shl(m_Value(ZextHigh), m_APInt(ZextHighShlAmt)))))
+      return nullptr;
 
-    if (match(L, m_ZExt(m_And(m_Value(X), m_SpecificInt(Mask)))) &&
-        match(R, m_ZExt(m_And(m_Neg(m_Specific(X)), m_SpecificInt(Mask)))))
-      return L;
+    if (!match(Or1, m_ZExt(m_Value(Low))) ||
+        !match(ZextHigh, m_ZExt(m_Value(High))))
+      return nullptr;
 
-    return nullptr;
-  };
+    unsigned HighSize = High->getType()->getScalarSizeInBits();
+    unsigned LowSize = Low->getType()->getScalarSizeInBits();
+    if (*ZextHighShlAmt != LowSize || HighSize + LowSize != Width)
+      return nullptr;
 
-  Value *ShAmt = matchShiftAmount(ShAmt0, ShAmt1, Width);
-  bool IsFshl = true; // Sub on LSHR.
-  if (!ShAmt) {
-    ShAmt = matchShiftAmount(ShAmt1, ShAmt0, Width);
-    IsFshl = false; // Sub on SHL.
+    for (User *U : ZextHigh->users()) {
+      Value *X, *Y;
+      if (!match(U, m_Or(m_Value(X), m_Value(Y))))
+        continue;
+
+      if (!isa<ZExtInst>(Y))
+        std::swap(X, Y);
+
+      if (match(X, m_Shl(m_Specific(Or1), m_SpecificInt(HighSize))) &&
+          match(Y, m_Specific(ZextHigh)) && DT.dominates(U, &Or)) {
+        FShiftArgs = {U, U, ConstantInt::get(Or0->getType(), *ZextHighShlAmt)};
+        break;
+      }
+    }
   }
-  if (!ShAmt)
+
+  if (FShiftArgs.empty())
     return nullptr;
 
   Intrinsic::ID IID = IsFshl ? Intrinsic::fshl : Intrinsic::fshr;
   Function *F = Intrinsic::getDeclaration(Or.getModule(), IID, Or.getType());
-  return CallInst::Create(F, {ShVal0, ShVal1, ShAmt});
+  return CallInst::Create(F, FShiftArgs);
 }
 
 /// Attempt to combine or(zext(x),shl(zext(y),bw/2) concat packing patterns.
@@ -3319,7 +3375,7 @@ Instruction *InstCombinerImpl::visitOr(BinaryOperator &I) {
                                                   /*MatchBitReversals*/ true))
     return BitOp;
 
-  if (Instruction *Funnel = matchFunnelShift(I, *this))
+  if (Instruction *Funnel = matchFunnelShift(I, *this, DT))
     return Funnel;
 
   if (Instruction *Concat = matchOrConcat(I, Builder))
diff --git a/llvm/test/Transforms/InstCombine/funnel.ll b/llvm/test/Transforms/InstCombine/funnel.ll
index 60ce49a1635623f..c5bd1aa7b4351bc 100644
--- a/llvm/test/Transforms/InstCombine/funnel.ll
+++ b/llvm/test/Transforms/InstCombine/funnel.ll
@@ -354,6 +354,48 @@ define <2 x i64> @fshl_select_vector(<2 x i64> %x, <2 x i64> %y, <2 x i64> %sham
   ret <2 x i64> %r
 }
 
+; Convert 'or concat' to fshl if opposite 'or concat' exists.
+
+define i32 @fshl_concat(i8 %x, i24 %y, ptr %addr) {
+; CHECK-LABEL: @fshl_concat(
+; CHECK-NEXT:    [[ZEXT_X:%.*]] = zext i8 [[X:%.*]] to i32
+; CHECK-NEXT:    [[SLX:%.*]] = shl nuw i32 [[ZEXT_X]], 24
+; CHECK-NEXT:    [[ZEXT_Y:%.*]] = zext i24 [[Y:%.*]] to i32
+; CHECK-NEXT:    [[XY:%.*]] = or i32 [[SLX]], [[ZEXT_Y]]
+; CHECK-NEXT:    store i32 [[XY]], ptr [[ADDR:%.*]], align 4
+; CHECK-NEXT:    [[YX:%.*]] = call i32 @llvm.fshl.i32(i32 [[XY]], i32 [[XY]], i32 8)
+; CHECK-NEXT:    ret i32 [[YX]]
+;
+  %zext.x = zext i8 %x to i32
+  %slx = shl nuw i32 %zext.x, 24
+  %zext.y = zext i24 %y to i32
+  %xy = or i32 %zext.y, %slx
+  store i32 %xy, ptr %addr, align 4
+  %sly = shl nuw i32 %zext.y, 8
+  %yx = or i32 %zext.x, %sly
+  ret i32 %yx
+}
+
+define <2 x i32> @fshl_concat_vector(<2 x i8> %x, <2 x i24> %y, ptr %addr) {
+; CHECK-LABEL: @fshl_concat_vector(
+; CHECK-NEXT:    [[ZEXT_X:%.*]] = zext <2 x i8> [[X:%.*]] to <2 x i32>
+; CHECK-NEXT:    [[SLX:%.*]] = shl nuw <2 x i32> [[ZEXT_X]], <i32 24, i32 24>
+; CHECK-NEXT:    [[ZEXT_Y:%.*]] = zext <2 x i24> [[Y:%.*]] to <2 x i32>
+; CHECK-NEXT:    [[XY:%.*]] = or <2 x i32> [[SLX]], [[ZEXT_Y]]
+; CHECK-NEXT:    store <2 x i32> [[XY]], ptr [[ADDR:%.*]], align 4
+; CHECK-NEXT:    [[YX:%.*]] = call <2 x i32> @llvm.fshl.v2i32(<2 x i32> [[XY]], <2 x i32> [[XY]], <2 x i32> <i32 8, i32 8>)
+; CHECK-NEXT:    ret <2 x i32> [[YX]]
+;
+  %zext.x = zext <2 x i8> %x to <2 x i32>
+  %slx = shl nuw <2 x i32> %zext.x, <i32 24, i32 24>
+  %zext.y = zext <2 x i24> %y to <2 x i32>
+  %xy = or <2 x i32> %slx, %zext.y
+  store <2 x i32> %xy, ptr %addr, align 4
+  %sly = shl nuw <2 x i32> %zext.y, <i32 8, i32 8>
+  %yx = or <2 x i32> %sly, %zext.x
+  ret <2 x i32> %yx
+}
+
 ; Negative test - an oversized shift in the narrow type would produce the wrong value.
 
 define i8 @unmasked_shlop_unmasked_shift_amount(i32 %x, i32 %y, i32 %shamt) {

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp

HaohaiWen · 2023-10-12T01:50:44Z

Any comments?

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp

HaohaiWen · 2023-10-13T01:08:09Z

This PR is based on #68474

goldsteinn · 2023-10-13T04:56:26Z

This PR is based on #68474

I commented on the NFC patch. Get that in then rebase this.

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp

llvm/test/Transforms/InstCombine/funnel.ll

goldsteinn · 2023-10-19T17:59:26Z

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp

+      return nullptr;
+
+    if (!match(Or1, m_ZExt(m_Value(Low))) ||
+        !match(ZextHigh, m_ZExt(m_Value(High))))


Do you actually need the shifted value to be zext? I.e if you have (shl (zext i24 to i32), 8) that is bitwise equivilent to (shl i32, 8)

If we only match (shl i32, 8), we can't guarantee it's reverse concat since we don't know if the most significant 8bit in i32 is zero.

%zext.x = zext i8 %x to i32 %slx = shl nuw i32 %zext.x, 24 %zext.y = zext i24 %y to i32 %xy = or i32 %zext.y, %slx #[x[7:0], y[23:0]] %sly = shl nuw i32 %zext.y, 8 %yx = or i32 %zext.x, %sly #[y[23:0], x[7:0]]

If not match zext:

%zext.x = zext i8 %x to i32 %slx = shl nuw i32 %zext.x, 24 %xy = or i32 %y, %slx #[unknown, y[23,0]] y[31:24] may not be zero. %sly = shl nuw i32 %y, 8 #[y[23:0], 0,0,0,0,0,0,0,0] %yx = or i32 %zext.x, %sly #[y[23:0], x[7:0]]

Ahh, I see, although could handle masking and. Really its just a knownbits check though.

I think we don't need to check zext. An example without zext: https://alive2.llvm.org/ce/z/MytePb

goldsteinn · 2023-10-22T06:07:07Z

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp

+    unsigned LowSize = Low->getType()->getScalarSizeInBits();
+    // Make sure High does not overlap with Low and most significant bits of
+    // High aren't shifted out.
+    if (ZextHighShlAmt->ult(LowSize) || ZextHighShlAmt->ugt(Width - HighSize))


Think you are missing a negative test for this.

goldsteinn · 2023-10-22T06:07:29Z

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp

+      // Make sure Low does not overlap with High and most significant bits of
+      // Low aren't shifted out and we can rotate shift LowHigh to HighLow.
+      if (ZextLowShlAmt->ult(HighSize) || ZextLowShlAmt->ugt(Width - LowSize) ||
+          *ZextLowShlAmt + *ZextHighShlAmt != Width)


Think you are missing negative test for this (also for non-dominating).

In fact, non-dominating was already tested:
We checked:
yx = fshl xy, xy,
When instcombine handle first or (combine for xy), it can meet all check expect dominance.
If we don't check dominance, if would be convert to xy = fshl yx, yx which is absolutely wrong.

goldsteinn · 2023-10-22T06:08:44Z

See most recent comments about missing tests. That being said code looks functionally correct to me.
Still not 100% sure this is a desirable change. Will defer to @nikic about that.

nikic · 2023-10-24T10:30:28Z

See most recent comments about missing tests. That being said code looks functionally correct to me. Still not 100% sure this is a desirable change. Will defer to @nikic about that.

I'm also skeptical about accepting this optimization. Looking at the motivating case in #68502 (comment), this seems like a bad approach to the problem: It means that in order to fold the pattern to bitreverse(%xy), we must just so happen to have the right %xy lying around in the IR, even though it doesn't have any relation to the main pattern (it's not used inside it, just injected via an extra use). It sounds to me like the better way to handle that case would be to support matching a variant of plain bitreverse in the form of bitreverse(rot(%yx)).

Replacing the rot(%yx) by %xy is then an extra optimization opportunity, but it's no longer a precondition to performing the transform.

HaohaiWen · 2023-10-25T01:32:11Z

See most recent comments about missing tests. That being said code looks functionally correct to me. Still not 100% sure this is a desirable change. Will defer to @nikic about that.

I'm also skeptical about accepting this optimization. Looking at the motivating case in #68502 (comment), this seems like a bad approach to the problem: It means that in order to fold the pattern to bitreverse(%xy), we must just so happen to have the right %xy lying around in the IR, even though it doesn't have any relation to the main pattern (it's not used inside it, just injected via an extra use). It sounds to me like the better way to handle that case would be to support matching a variant of plain bitreverse in the form of bitreverse(rot(%yx)).

Replacing the rot(%yx) by %xy is then an extra optimization opportunity, but it's no longer a precondition to performing the transform.

In fact, this is a real case.
Currently, there's no chance for matchBSwapOrBitReverse to match bitreverse pattern if we don't do this fshl optimization. Even if we teach it to recognize this pattern, this requires injecting a new rot(%yx) then look for %xy and it still requires to look for user chain.
Even if we don't have bitreverse pattern in the rest IR, there's still beneficial to do such fshl optimization. This removes an extra shl.

HaohaiWen · 2023-10-30T14:18:50Z

Any more comments? I'd like to merge it if no objection.

goldsteinn · 2023-10-30T19:06:53Z

Any more comments? I'd like to merge it if no objection.

I think there are outstanding objections, or at least blocking concerns from nikic.

HaohaiWen · 2023-11-01T06:20:14Z

@nikic , any more comments?

HaohaiWen · 2023-11-06T00:34:11Z

Any more comments? I'd like to merge it if no objection.

I think there are outstanding objections, or at least blocking concerns from nikic.

gentle ping @nikic

HaohaiWen · 2023-11-11T03:43:28Z

I'd like to merge it if no any more comments.

williamweixiao

LGTM.

nikic · 2023-11-11T16:47:15Z

Your original example does not verify: https://alive2.llvm.org/ce/z/Bkd89Z Can you please provide a correct example of what you're trying to do?

HaohaiWen · 2023-11-12T03:22:46Z

Your original example does not verify: https://alive2.llvm.org/ce/z/Bkd89Z Can you please provide a correct example of what you're trying to do?

As I said in previous comments:

This requires first optimize

%sly = shl nuw i32 %zext.y, 8
%yx = or i32 %zext.x, %sly
to
fshl %xy, %xy, 8
Then teach InstCombine to optimize the rest to bitreverse.

This PR only teach InstCombine to optimize shl+or to fshl.
Then, I'll add another patch to teach InstComibine(matchBSwapOrBitReverse) to convert the example you metioned to bitreverse (It relies on this PR). And the original example will be the test in that patch.

nikic · 2023-11-12T08:48:03Z

Yes, I understand that this transform is only a step towards handling the full pattern. I'm asking for a complete, working example of the original motivating case. The snippets posted in #68502 (comment) do not appear to be correct, or I failed to assemble them correctly. Please provide complete src and tgt functions that verify with alive2.

HaohaiWen · 2023-11-12T13:40:43Z

Yes, I understand that this transform is only a step towards handling the full pattern. I'm asking for a complete, working example of the original motivating case. The snippets posted in #68502 (comment) do not appear to be correct, or I failed to assemble them correctly. Please provide complete src and tgt functions that verify with alive2.

Oh... sorry, I made a wrong IR example.
The real case looks like this: https://alive2.llvm.org/ce/z/-DXnJc
Both %x and %y are i16. The first or/fshl swaps half of of i32. The rest swaps byte, 4bit, 2bit.
I extended the fshl transformation to apply for both symmetric and asymmetric combination.
If %x and %y is not same width, the first or/fshl is not used to swap half bytes so can't be convert to bitrevserse.

nikic · 2023-11-12T16:51:32Z

Thanks for the updated example!

To explain what I meant in first comment using this example: We would perform the transform https://alive2.llvm.org/ce/z/nllcB_, which does not depend at all on how %yx is constructed, and whether there is any way to form the fshl separately. If the %yx is appropriately constructed, the fshl can be removed (https://alive2.llvm.org/ce/z/B_KOwv, another missing transform).

Is this not a viable approach? Is there a concern here that generating both fshl and bitreverse may be non-profitable for targets without bitreverse? Or maybe supporting this makes the matching too expensive?

HaohaiWen · 2023-11-13T07:52:57Z

Thanks for the updated example!

To explain what I meant in first comment using this example: We would perform the transform https://alive2.llvm.org/ce/z/nllcB_, which does not depend at all on how %yx is constructed, and whether there is any way to form the fshl separately. If the %yx is appropriately constructed, the fshl can be removed (https://alive2.llvm.org/ce/z/B_KOwv, another missing transform).

Is this not a viable approach? Is there a concern here that generating both fshl and bitreverse may be non-profitable for targets without bitreverse? Or maybe supporting this makes the matching too expensive?

It's absolutely a feasible solution.

Solution1:
First optimize bitreverse then eliminate redundant fshl: https://alive2.llvm.org/ce/z/g_gWf3
This requires
a) First teach collectBitParts to not only search until unknown opcode, but also try to use itself as root.
b) Teach recognizeBSwapOrBitReverseIdiom to recognize bit pattern [n/2-1, ..., 1, 0, n-1, n-2, .... n/2]. Then insert bitreverse and fshl.
c) Teach instcombine to remove redundant fshl if opposite concat exists. This requires to scan def-users chains.

Advantage:
1). Even if we can't eliminate fshl, we can still optimize a bunch of IR to fshl+bitreverse. Don't know whether its profitable for most targets.

Solution2:
First optimize or to fshl then optimize bitreverse: https://alive2.llvm.org/ce/z/WbzJVo
This requires
a) What we did in this PR. This requires to scan def-users chains.
b) same as step a) in Solution 1.

Advantage:
1). Can optimize more opposite concat pattern to fshl. It's beneficial for targets with cycle rotate instruction (e.g. rol in x86).
2). More easily for implementation. Do not requires step b) in Solution1.

Both solutions requires to scan def-users chains. I don't think this is an issue.
Both solutions can handle my cases. Solution2 is easier to implementation. Any concern about this PR?
I think b) in solution1 can be implemented in the future if we want both advantages of solution1 and 2. InstCombine will always first try to match fshl then bitreverse. Therefore with solution2 and b) of solution1, we don't need to implent c) in solution1 at all.

Ref for bitreverse optimization:

llvm-project/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp

Line 2686 in 38b34c6

Instruction *InstCombinerImpl::matchBSwapOrBitReverse(Instruction &I,

HaohaiWen · 2023-11-16T00:21:47Z

Thanks for the updated example!

To explain what I meant in first comment using this example: We would perform the transform https://alive2.llvm.org/ce/z/nllcB_, which does not depend at all on how %yx is constructed, and whether there is any way to form the fshl separately. If the %yx is appropriately constructed, the fshl can be removed (https://alive2.llvm.org/ce/z/B_KOwv, another missing transform).

Is this not a viable approach? Is there a concern here that generating both fshl and bitreverse may be non-profitable for targets without bitreverse? Or maybe supporting this makes the matching too expensive?

It's absolutely a feasible solution.

Solution1:
First optimize bitreverse then eliminate redundant fshl: https://alive2.llvm.org/ce/z/g_gWf3
This requires
a) First teach collectBitParts to not only search until unknown opcode, but also try to use itself as root.
b) Teach recognizeBSwapOrBitReverseIdiom to recognize bit pattern [n/2-1, ..., 1, 0, n-1, n-2, .... n/2]. Then insert bitreverse and fshl.
c) Teach instcombine to remove redundant fshl if opposite concat exists. This requires to scan def-users chains.

Advantage:
1). Even if we can't eliminate fshl, we can still optimize a bunch of IR to fshl+bitreverse. Don't know whether its profitable for most targets.

Solution2:
First optimize or to fshl then optimize bitreverse: https://alive2.llvm.org/ce/z/WbzJVo
This requires
a) What we did in this PR. This requires to scan def-users chains.
b) same as step a) in Solution 1.

Advantage:
1). Can optimize more opposite concat pattern to fshl. It's beneficial for targets with cycle rotate instruction (e.g. rol in x86).
2). More easily for implementation. Do not requires step b) in Solution1.

Both solutions requires to scan def-users chains. I don't think this is an issue.
Both solutions can handle my cases. Solution2 is easier to implementation. Any concern about this PR?
I think b) in solution1 can be implemented in the future if we want both advantages of solution1 and 2. InstCombine will always first try to match fshl then bitreverse. Therefore with solution2 and b) of solution1, we don't need to implent c) in solution1 at all.

Ref for bitreverse optimization:

llvm-project/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp

Line 2686 in 38b34c6

Instruction *InstCombinerImpl::matchBSwapOrBitReverse(Instruction &I,

Any comments?

HaohaiWen · 2023-11-17T01:08:33Z

I'd like to merge it. Please let me know if you have more concern.

goldsteinn · 2023-11-20T06:23:52Z

For my money this was merged prematurely. There are still outstanding concerns about whether this transform is desirable, as well there is an outstanding comment about the implementation itself.
I'm fairly agnostic about this code getting in, but I think it should be reverted until some degree of consensus is reached.
It's painful to have comments/a PR languish without replies, but don't think the answer is to just push.

…lvm#68502) If there are two 'or' instructions concat variables in opposite order and the first 'or' dominates the second one, the second 'or' can be optimized to fshl to rotate shift first 'or'. This can eliminate an shl and expose more optimization opportunity for bswap/bitreverse.

HaohaiWen · 2023-11-21T08:37:19Z

For my money this was merged prematurely. There are still outstanding concerns about whether this transform is desirable, as well there is an outstanding comment about the implementation itself. I'm fairly agnostic about this code getting in, but I think it should be reverted until some degree of consensus is reached. It's painful to have comments/a PR languish without replies, but don't think the answer is to just push.

Thanks for kindly reminding. I think I've replied the concerns and wait a long time... I thought there should be no problem so I merged it...
Hi @nikic, any comments about the reply to your concerns? #68502 (comment)
If you have concerns that this patch is bad for some cases, or you have more suggestions, please let us know and we can improve it.

HaohaiWen added 2 commits October 7, 2023 17:18

[InstCombine] Refactor matchFunnelShift to allow more pattern (NFC)

5b3b1bb

Current implementation of matchFunnelShift only allows opposite shift pattern. Refactor it to allow more pattern.

HaohaiWen requested a review from nikic as a code owner October 8, 2023 03:15

llvmbot added the llvm:transforms label Oct 8, 2023

HaohaiWen requested a review from williamweixiao October 8, 2023 03:17

goldsteinn reviewed Oct 9, 2023

View reviewed changes

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp Show resolved Hide resolved

goldsteinn reviewed Oct 12, 2023

View reviewed changes

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp Outdated Show resolved Hide resolved

goldsteinn reviewed Oct 12, 2023

View reviewed changes

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp Outdated Show resolved Hide resolved

goldsteinn reviewed Oct 12, 2023

View reviewed changes

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp Outdated Show resolved Hide resolved

goldsteinn reviewed Oct 12, 2023

View reviewed changes

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp Outdated Show resolved Hide resolved

HaohaiWen mentioned this pull request Oct 15, 2023

[InstCombine] Refactor matchFunnelShift to allow more pattern (NFC) #68474

Merged

Merge remote-tracking branch 'origin/main' into concat

aca9aa3

HaohaiWen requested a review from goldsteinn October 19, 2023 06:04

goldsteinn reviewed Oct 19, 2023

View reviewed changes

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp Outdated Show resolved Hide resolved

goldsteinn reviewed Oct 19, 2023

View reviewed changes

llvm/test/Transforms/InstCombine/funnel.ll Show resolved Hide resolved

goldsteinn reviewed Oct 19, 2023

View reviewed changes

Support padding zero between high and low.

b7810a1

goldsteinn reviewed Oct 22, 2023

View reviewed changes

HaohaiWen added 2 commits October 23, 2023 16:25

Simplify code. Add more tests

116e529

Fix nuw flag

c882da8

williamweixiao approved these changes Nov 11, 2023

View reviewed changes

nikic requested a review from dtcxzyw November 12, 2023 16:51

Merge remote-tracking branch 'origin/main' into concat

fb1af49

HaohaiWen merged commit 95d584c into llvm:main Nov 20, 2023
3 checks passed

HaohaiWen deleted the concat branch November 20, 2023 05:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[InstCombine] Convert or concat to fshl if opposite or concat exists #68502

[InstCombine] Convert or concat to fshl if opposite or concat exists #68502

HaohaiWen commented Oct 8, 2023

llvmbot commented Oct 8, 2023

HaohaiWen commented Oct 12, 2023

HaohaiWen commented Oct 13, 2023

goldsteinn commented Oct 13, 2023

goldsteinn Oct 19, 2023

HaohaiWen Oct 20, 2023

goldsteinn Oct 22, 2023

dtcxzyw Nov 12, 2023

goldsteinn Oct 22, 2023

HaohaiWen Oct 23, 2023

goldsteinn Oct 22, 2023

HaohaiWen Oct 23, 2023

goldsteinn commented Oct 22, 2023

nikic commented Oct 24, 2023

HaohaiWen commented Oct 25, 2023

HaohaiWen commented Oct 30, 2023

goldsteinn commented Oct 30, 2023

HaohaiWen commented Nov 1, 2023

HaohaiWen commented Nov 6, 2023

HaohaiWen commented Nov 11, 2023

williamweixiao left a comment

nikic commented Nov 11, 2023

HaohaiWen commented Nov 12, 2023 •

edited

nikic commented Nov 12, 2023

HaohaiWen commented Nov 12, 2023 •

edited

nikic commented Nov 12, 2023

HaohaiWen commented Nov 13, 2023 •

edited

HaohaiWen commented Nov 16, 2023 •

edited

HaohaiWen commented Nov 17, 2023

goldsteinn commented Nov 20, 2023

HaohaiWen commented Nov 21, 2023

[InstCombine] Convert or concat to fshl if opposite or concat exists #68502

[InstCombine] Convert or concat to fshl if opposite or concat exists #68502

Conversation

HaohaiWen commented Oct 8, 2023

llvmbot commented Oct 8, 2023

HaohaiWen commented Oct 12, 2023

HaohaiWen commented Oct 13, 2023

goldsteinn commented Oct 13, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

goldsteinn commented Oct 22, 2023

nikic commented Oct 24, 2023

HaohaiWen commented Oct 25, 2023

HaohaiWen commented Oct 30, 2023

goldsteinn commented Oct 30, 2023

HaohaiWen commented Nov 1, 2023

HaohaiWen commented Nov 6, 2023

HaohaiWen commented Nov 11, 2023

williamweixiao left a comment

Choose a reason for hiding this comment

nikic commented Nov 11, 2023

HaohaiWen commented Nov 12, 2023 • edited

nikic commented Nov 12, 2023

HaohaiWen commented Nov 12, 2023 • edited

nikic commented Nov 12, 2023

HaohaiWen commented Nov 13, 2023 • edited

HaohaiWen commented Nov 16, 2023 • edited

HaohaiWen commented Nov 17, 2023

goldsteinn commented Nov 20, 2023

HaohaiWen commented Nov 21, 2023

HaohaiWen commented Nov 12, 2023 •

edited

HaohaiWen commented Nov 12, 2023 •

edited

HaohaiWen commented Nov 13, 2023 •

edited

HaohaiWen commented Nov 16, 2023 •

edited