[InstCombine] Do not simplify lshr/shl arg if it is part of fshl rotate pattern. #66115

quic-eikansh · 2023-09-12T17:41:26Z

The fshl/fshr having first two arguments as same gets lowered to targets specific rotate. But based on the uses, one of the arguments can get simplified resulting in different arguments performing equivalent operation.

This patch prevents the simplification of the arguments of lshr/shl if they are part of fshl pattern.

…te pattern. The fshl/fshr having first two arguments as same gets lowered to targets specific rotate. But based on the uses, one of the arguments can get simplified resulting in different arguments performing equivalent operation. This patch prevents the simplification of the arguments of lshr/shl if they are part of fshl pattern.

quic-eikansh · 2023-09-13T16:30:14Z

More context for the patch in issue #66185.

llvmbot · 2023-09-14T16:12:10Z

@llvm/pr-subscribers-llvm-transforms

Changes

The fshl/fshr having first two arguments as same gets lowered to targets specific rotate. But based on the uses, one of the arguments can get simplified resulting in different arguments performing equivalent operation.

This patch prevents the simplification of the arguments of lshr/shl if they are part of fshl pattern.

Full diff: https://github.com/llvm/llvm-project/pull/66115.diff

2 Files Affected:

(modified) llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp (+33)
(modified) llvm/test/Transforms/InstCombine/fsh.ll (+128)

diff --git a/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp b/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
index be005e61a8d2d89..c3adbaa44448fe4 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
@@ -611,6 +611,23 @@ Value *InstCombinerImpl::SimplifyDemandedUseBits(Value *V, APInt DemandedMask,
                                                     DemandedMask, Known))
             return R;
 
+      // Do not simplify if lshr is part of fshl rotate pattern
+      if (I-&gt;hasOneUser()) {
+        auto *Op = I-&gt;user_back();
+        if (Op-&gt;getOpcode() == BinaryOperator::Or) {
+          const APInt *LShrAmt;
+          Value *LShrVal;
+          auto *Operand =
+              Op-&gt;getOperand(0) == I ? Op-&gt;getOperand(1) : Op-&gt;getOperand(0);
+          if (match(Operand,
+                    m_OneUse(m_LShr(m_Value(LShrVal), m_APInt(LShrAmt)))))
+            if (I-&gt;getOperand(0) == LShrVal)
+              if (SA-&gt;ult(BitWidth) &amp;&amp; LShrAmt-&gt;ult(BitWidth) &amp;&amp;
+                  (*SA + *LShrAmt) == BitWidth)
+                return nullptr;
+        }
+      }
+
       // TODO: If we only want bits that already match the signbit then we don&#x27;t
       // need to shift.
 
@@ -669,6 +686,22 @@ Value *InstCombinerImpl::SimplifyDemandedUseBits(Value *V, APInt DemandedMask,
     if (match(I-&gt;getOperand(1), m_APInt(SA))) {
       uint64_t ShiftAmt = SA-&gt;getLimitedValue(BitWidth-1);
 
+      // Do not simplify if shl is part of fshl rotate pattern
+      if (I-&gt;hasOneUser()) {
+        auto *Op = I-&gt;user_back();
+        if (Op-&gt;getOpcode() == BinaryOperator::Or) {
+          const APInt *ShlAmt;
+          Value *ShlVal;
+          auto *Operand =
+              Op-&gt;getOperand(0) == I ? Op-&gt;getOperand(1) : Op-&gt;getOperand(0);
+          if (match(Operand, m_OneUse(m_Shl(m_Value(ShlVal), m_APInt(ShlAmt)))))
+            if (I-&gt;getOperand(0) == ShlVal)
+              if (SA-&gt;ult(BitWidth) &amp;&amp; ShlAmt-&gt;ult(BitWidth) &amp;&amp;
+                  (*SA + *ShlAmt) == BitWidth)
+                return nullptr;
+        }
+      }
+
       // If we are just demanding the shifted sign bit and below, then this can
       // be treated as an ASHR in disguise.
       if (DemandedMask.countl_zero() &gt;= ShiftAmt) {
diff --git a/llvm/test/Transforms/InstCombine/fsh.ll b/llvm/test/Transforms/InstCombine/fsh.ll
index 48bf296993f6ac2..34761623028b942 100644
--- a/llvm/test/Transforms/InstCombine/fsh.ll
+++ b/llvm/test/Transforms/InstCombine/fsh.ll
@@ -722,6 +722,134 @@ define i32 @fsh_orconst_rotate(i32 %a) {
   ret i32 %t2
 }
 
+define i32 @fsh_rotate_25(i8 %x, i32 %y) {
+; CHECK-LABEL: @fsh_rotate_25(
+; CHECK-NEXT:    [[T1:%.*]] = zext i8 [[X:%.*]] to i32
+; CHECK-NEXT:    [[OR1:%.*]] = or i32 [[T1]], [[Y:%.*]]
+; CHECK-NEXT:    [[OR2:%.*]] = call i32 @llvm.fshl.i32(i32 [[OR1]], i32 [[OR1]], i32 25)
+; CHECK-NEXT:    ret i32 [[OR2]]
+;
+
+  %t1 = zext i8 %x to i32
+  %or1 = or i32 %t1, %y
+  %shr = lshr i32 %or1, 7
+  %shl = shl i32 %or1, 25
+  %or2 = or i32 %shr, %shl
+  ret i32 %or2
+}
+
+define i32 @fsh_rotate_18(i8 %x, i32 %y) {
+; CHECK-LABEL: @fsh_rotate_18(
+; CHECK-NEXT:    [[T1:%.*]] = zext i8 [[X:%.*]] to i32
+; CHECK-NEXT:    [[OR1:%.*]] = or i32 [[T1]], [[Y:%.*]]
+; CHECK-NEXT:    [[OR2:%.*]] = call i32 @llvm.fshl.i32(i32 [[OR1]], i32 [[OR1]], i32 18)
+; CHECK-NEXT:    ret i32 [[OR2]]
+;
+
+  %t1 = zext i8 %x to i32
+  %or1 = or i32 %t1, %y
+  %shr = lshr i32 %or1, 14
+  %shl = shl i32 %or1, 18
+  %or2 = or i32 %shr, %shl
+  ret i32 %or2
+}
+
+define i32 @fsh_load_rotate_12(ptr %data) {
+; CHECK-LABEL: @fsh_load_rotate_12(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[TMP0:%.*]] = load i8, ptr [[DATA:%.*]], align 1
+; CHECK-NEXT:    [[CONV:%.*]] = zext i8 [[TMP0]] to i32
+; CHECK-NEXT:    [[SHL:%.*]] = shl nuw i32 [[CONV]], 24
+; CHECK-NEXT:    [[ARRAYIDX1:%.*]] = getelementptr inbounds i8, ptr [[DATA]], i64 1
+; CHECK-NEXT:    [[TMP1:%.*]] = load i8, ptr [[ARRAYIDX1]], align 1
+; CHECK-NEXT:    [[CONV2:%.*]] = zext i8 [[TMP1]] to i32
+; CHECK-NEXT:    [[SHL3:%.*]] = shl nuw nsw i32 [[CONV2]], 16
+; CHECK-NEXT:    [[OR:%.*]] = or i32 [[SHL3]], [[SHL]]
+; CHECK-NEXT:    [[ARRAYIDX4:%.*]] = getelementptr inbounds i8, ptr [[DATA]], i64 2
+; CHECK-NEXT:    [[TMP2:%.*]] = load i8, ptr [[ARRAYIDX4]], align 1
+; CHECK-NEXT:    [[CONV5:%.*]] = zext i8 [[TMP2]] to i32
+; CHECK-NEXT:    [[SHL6:%.*]] = shl nuw nsw i32 [[CONV5]], 8
+; CHECK-NEXT:    [[OR7:%.*]] = or i32 [[OR]], [[SHL6]]
+; CHECK-NEXT:    [[ARRAYIDX8:%.*]] = getelementptr inbounds i8, ptr [[DATA]], i64 3
+; CHECK-NEXT:    [[TMP3:%.*]] = load i8, ptr [[ARRAYIDX8]], align 1
+; CHECK-NEXT:    [[CONV9:%.*]] = zext i8 [[TMP3]] to i32
+; CHECK-NEXT:    [[OR10:%.*]] = or i32 [[OR7]], [[CONV9]]
+; CHECK-NEXT:    [[OR15:%.*]] = call i32 @llvm.fshl.i32(i32 [[OR10]], i32 [[OR10]], i32 12)
+; CHECK-NEXT:    ret i32 [[OR15]]
+;
+
+entry:
+  %0 = load i8, ptr %data
+  %conv = zext i8 %0 to i32
+  %shl = shl nuw i32 %conv, 24
+  %arrayidx1 = getelementptr inbounds i8, ptr %data, i64 1
+  %1 = load i8, ptr %arrayidx1
+  %conv2 = zext i8 %1 to i32
+  %shl3 = shl nuw nsw i32 %conv2, 16
+  %or = or i32 %shl3, %shl
+  %arrayidx4 = getelementptr inbounds i8, ptr %data, i64 2
+  %2 = load i8, ptr %arrayidx4
+  %conv5 = zext i8 %2 to i32
+  %shl6 = shl nuw nsw i32 %conv5, 8
+  %or7 = or i32 %or, %shl6
+  %arrayidx8 = getelementptr inbounds i8, ptr %data, i64 3
+  %3 = load i8, ptr %arrayidx8
+  %conv9 = zext i8 %3 to i32
+  %or10 = or i32 %or7, %conv9
+  %shr = lshr i32 %or10, 20
+  %shl7 = shl i32 %or10, 12
+  %or15 = or i32 %shr, %shl7
+  ret i32 %or15
+}
+
+define i32 @fsh_load_rotate_5(ptr %data) {
+; CHECK-LABEL: @fsh_load_rotate_5(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[TMP0:%.*]] = load i8, ptr [[DATA:%.*]], align 1
+; CHECK-NEXT:    [[CONV:%.*]] = zext i8 [[TMP0]] to i32
+; CHECK-NEXT:    [[SHL:%.*]] = shl nuw i32 [[CONV]], 24
+; CHECK-NEXT:    [[ARRAYIDX1:%.*]] = getelementptr inbounds i8, ptr [[DATA]], i64 1
+; CHECK-NEXT:    [[TMP1:%.*]] = load i8, ptr [[ARRAYIDX1]], align 1
+; CHECK-NEXT:    [[CONV2:%.*]] = zext i8 [[TMP1]] to i32
+; CHECK-NEXT:    [[SHL3:%.*]] = shl nuw nsw i32 [[CONV2]], 16
+; CHECK-NEXT:    [[OR:%.*]] = or i32 [[SHL3]], [[SHL]]
+; CHECK-NEXT:    [[ARRAYIDX4:%.*]] = getelementptr inbounds i8, ptr [[DATA]], i64 2
+; CHECK-NEXT:    [[TMP2:%.*]] = load i8, ptr [[ARRAYIDX4]], align 1
+; CHECK-NEXT:    [[CONV5:%.*]] = zext i8 [[TMP2]] to i32
+; CHECK-NEXT:    [[SHL6:%.*]] = shl nuw nsw i32 [[CONV5]], 8
+; CHECK-NEXT:    [[OR7:%.*]] = or i32 [[OR]], [[SHL6]]
+; CHECK-NEXT:    [[ARRAYIDX8:%.*]] = getelementptr inbounds i8, ptr [[DATA]], i64 3
+; CHECK-NEXT:    [[TMP3:%.*]] = load i8, ptr [[ARRAYIDX8]], align 1
+; CHECK-NEXT:    [[CONV9:%.*]] = zext i8 [[TMP3]] to i32
+; CHECK-NEXT:    [[OR10:%.*]] = or i32 [[OR7]], [[CONV9]]
+; CHECK-NEXT:    [[OR15:%.*]] = call i32 @llvm.fshl.i32(i32 [[OR10]], i32 [[OR10]], i32 5)
+; CHECK-NEXT:    ret i32 [[OR15]]
+;
+
+entry:
+  %0 = load i8, ptr %data
+  %conv = zext i8 %0 to i32
+  %shl = shl nuw i32 %conv, 24
+  %arrayidx1 = getelementptr inbounds i8, ptr %data, i64 1
+  %1 = load i8, ptr %arrayidx1
+  %conv2 = zext i8 %1 to i32
+  %shl3 = shl nuw nsw i32 %conv2, 16
+  %or = or i32 %shl3, %shl
+  %arrayidx4 = getelementptr inbounds i8, ptr %data, i64 2
+  %2 = load i8, ptr %arrayidx4
+  %conv5 = zext i8 %2 to i32
+  %shl6 = shl nuw nsw i32 %conv5, 8
+  %or7 = or i32 %or, %shl6
+  %arrayidx8 = getelementptr inbounds i8, ptr %data, i64 3
+  %3 = load i8, ptr %arrayidx8
+  %conv9 = zext i8 %3 to i32
+  %or10 = or i32 %or7, %conv9
+  %shr = lshr i32 %or10, 27
+  %shl7 = shl i32 %or10, 5
+  %or15 = or i32 %shr, %shl7
+  ret i32 %or15
+}
+
 define &lt;2 x i31&gt; @fshr_mask_args_same_vector(&lt;2 x i31&gt; %a) {
 ; CHECK-LABEL: @fshr_mask_args_same_vector(
 ; CHECK-NEXT:    [[T3:%.*]] = shl &lt;2 x i31&gt; [[A:%.*]], &lt;i31 10, i31 10&gt;

quic-eikansh · 2023-09-20T13:35:28Z

ping.

momchil-velikov

LGTM, thanks.

momchil-velikov · 2023-09-22T13:16:16Z

llvm/test/Transforms/InstCombine/fsh.ll

@@ -722,6 +722,134 @@ define i32 @fsh_orconst_rotate(i32 %a) {
  ret i32 %t2
 }

+define i32 @fsh_rotate_25(i8 %x, i32 %y) {


It would be nice to explicitly mark with a comment the tests where the simplification didn't not trigger anyway.

+1 to indicate negative tests

Als, you can add the tests as a stacked first commit so the second commit in the patch can show the test changes

RKSimon

A couple of minors - try to use m_Specific instead of matching the operand values afterward

RKSimon · 2023-09-22T13:25:58Z

llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp

+          auto *Operand =
+              Op->getOperand(0) == I ? Op->getOperand(1) : Op->getOperand(0);
+          if (match(Operand,
+                    m_OneUse(m_LShr(m_Value(LShrVal), m_APInt(LShrAmt)))))


Use m_Specific(I->getOperand(0))?

RKSimon · 2023-09-22T13:26:51Z

llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp

+          Value *ShlVal;
+          auto *Operand =
+              Op->getOperand(0) == I ? Op->getOperand(1) : Op->getOperand(0);
+          if (match(Operand, m_OneUse(m_Shl(m_Value(ShlVal), m_APInt(ShlAmt)))))


Use m_Specific(I->getOperand(0))?

goldsteinn · 2023-09-22T14:17:23Z

llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp

+                  (*SA + *ShlAmt) == BitWidth)
+                return nullptr;
+        }
+      }


Somewhat opposed to this, because its easy for logic to detect "should we be able to do X" to de-sync from the logic that does X.

I would prefer adding a help to InstructionCombiner that can be used for both the rotate creation folds and here to keep the two in sync.

Something along the following lines:

std::optional<std::tuple<Intrinsic::ID, Value *, Value *>> ConvertShlOrLShrToFShlOrFShr(BinaryOperator *I) { Value *ShAmt1, *ShAmt2; unsigned BitWidth = I->getType()->getBitWidth(); if (!match(I, m_c_Or(m_OneUse(m_Shl(m_Value(X), m_Value(ShAmt1))), m_OneUse(m_LShr(m_Deferred(X), m_Value(ShAmt2)))))) return std::nullopt; const APint *C1; const APint *C2; if (match(ShAmt1, m_APint(C1)) && match(ShAmt2, m_APInt(C2))) { if (C1->uge(BitWidth) || C2->uge(BitWidth) || (*C1 + *C2) != BitWidth) return std::nullopt; return {Intrinsic::FShl, X, ShAmt1}; } auto ShAmtsMatchForRotate = [](Value *Amt1, Value *Amt2) { return match(Amt1, m_And(m_Neg(m_Specific(Amt2)), m_SpecificInt(BitWidth - 1))); }; if (ShAMtMatchForRotate(ShAmt1, ShAmt2)) return {Intrinsic::FShl, X, ShAmt1}; if (ShAMtMatchForRotate(ShAmt2, ShAmt1)) return {Intrinsic::FShr, X, ShAmt2}; return std::nullopt; }

If here you do:

if(I->hasOneUse() && ConvertShlOrLShrToFShlOrFShr(I->user_back().has_value()) return nullptr;

And at rotate fold you switch to using ConvertShlOrLShrToFShlOrFShr to actually create rotates.
Then as pattern detection improves both will stay aligned.

RKSimon · 2024-03-07T12:20:30Z

@quic-eikansh reverse-ping?

nikic · 2024-03-07T13:13:17Z

This has landed in #73441.

quic-eikansh requested a review from a team as a code owner September 12, 2023 17:41

llvmbot added the llvm:transforms label Sep 12, 2023

quic-eikansh mentioned this pull request Sep 13, 2023

Simplification of arguments of fshl leads to unoptimized assembly #66185

Open

UsmanNadeem requested review from nikic and rotateright September 13, 2023 16:19

momchil-velikov approved these changes Sep 22, 2023

View reviewed changes

RKSimon reviewed Sep 22, 2023

View reviewed changes

nikic requested review from goldsteinn and removed request for rotateright September 22, 2023 13:29

goldsteinn reviewed Sep 22, 2023

View reviewed changes

Merge branch 'llvm:main' into fsh_rotate

28978ed

This was referenced Nov 25, 2023

[InstCombine] Refactoring matchFunnelShift (NFC) #73390

Closed

[InstCombine] Do not simplify lshr/shl arg if it is part of fshl rotate pattern. #73441

Closed

nikic closed this Mar 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[InstCombine] Do not simplify lshr/shl arg if it is part of fshl rotate pattern. #66115

[InstCombine] Do not simplify lshr/shl arg if it is part of fshl rotate pattern. #66115

quic-eikansh commented Sep 12, 2023

quic-eikansh commented Sep 13, 2023

llvmbot commented Sep 14, 2023

This patch prevents the simplification of the arguments of lshr/shl if they are part of fshl pattern.

quic-eikansh commented Sep 20, 2023

momchil-velikov left a comment

momchil-velikov Sep 22, 2023

RKSimon Sep 22, 2023

RKSimon left a comment

RKSimon Sep 22, 2023

RKSimon Sep 22, 2023

goldsteinn Sep 22, 2023

RKSimon commented Mar 7, 2024

nikic commented Mar 7, 2024

[InstCombine] Do not simplify lshr/shl arg if it is part of fshl rotate pattern. #66115

[InstCombine] Do not simplify lshr/shl arg if it is part of fshl rotate pattern. #66115

Conversation

quic-eikansh commented Sep 12, 2023

quic-eikansh commented Sep 13, 2023

llvmbot commented Sep 14, 2023

This patch prevents the simplification of the arguments of lshr/shl if they are part of fshl pattern.

quic-eikansh commented Sep 20, 2023

momchil-velikov left a comment

Choose a reason for hiding this comment

momchil-velikov Sep 22, 2023

Choose a reason for hiding this comment

RKSimon Sep 22, 2023

Choose a reason for hiding this comment

RKSimon left a comment

Choose a reason for hiding this comment

RKSimon Sep 22, 2023

Choose a reason for hiding this comment

RKSimon Sep 22, 2023

Choose a reason for hiding this comment

goldsteinn Sep 22, 2023

Choose a reason for hiding this comment

RKSimon commented Mar 7, 2024

nikic commented Mar 7, 2024