[InstCombine] Improve shamt range calculation #72646

dtcxzyw · 2023-11-17T12:36:03Z

This patch improves the shift amount range calculation by taking account of ConstantRange.

Related patch: 2dd52b4

This missed optimization is discovered with the help of AliveToolkit/alive2#962.

llvmbot · 2023-11-17T12:36:31Z

@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-llvm-transforms

Author: Yingwei Zheng (dtcxzyw)

Changes

This patch improves the shift amount range calculation by taking account of ConstantRange.

Related patch: 2dd52b4

This missed optimization is discovered with the help of AliveToolkit/alive2#962.

Full diff: https://github.com/llvm/llvm-project/pull/72646.diff

5 Files Affected:

(modified) llvm/include/llvm/Analysis/ValueTracking.h (+5)
(modified) llvm/lib/Analysis/ValueTracking.cpp (+4-4)
(modified) llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp (+4-7)
(modified) llvm/test/Transforms/InstCombine/and-narrow.ll (+1-1)
(modified) llvm/test/Transforms/InstCombine/shift-shift.ll (+2-2)

diff --git a/llvm/include/llvm/Analysis/ValueTracking.h b/llvm/include/llvm/Analysis/ValueTracking.h
index 01eb8532d1f56d2..ba7bfcab03482f6 100644
--- a/llvm/include/llvm/Analysis/ValueTracking.h
+++ b/llvm/include/llvm/Analysis/ValueTracking.h
@@ -888,6 +888,11 @@ ConstantRange computeConstantRange(const Value *V, bool ForSigned,
                                    const DominatorTree *DT = nullptr,
                                    unsigned Depth = 0);
 
+/// Combine constant ranges from computeConstantRange() and computeKnownBits().
+ConstantRange
+computeConstantRangeIncludingKnownBits(const WithCache<const Value *> &V,
+                                       bool ForSigned, const SimplifyQuery &SQ);
+
 /// Return true if this function can prove that the instruction I will
 /// always transfer execution to one of its successors (including the next
 /// instruction that follows within a basic block). E.g. this is not
diff --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp
index 5f5d7e07cac1e46..b166806f0ec0e2a 100644
--- a/llvm/lib/Analysis/ValueTracking.cpp
+++ b/llvm/lib/Analysis/ValueTracking.cpp
@@ -6292,10 +6292,10 @@ static OverflowResult mapOverflowResult(ConstantRange::OverflowResult OR) {
 }
 
 /// Combine constant ranges from computeConstantRange() and computeKnownBits().
-static ConstantRange
-computeConstantRangeIncludingKnownBits(const WithCache<const Value *> &V,
-                                       bool ForSigned,
-                                       const SimplifyQuery &SQ) {
+ConstantRange
+llvm::computeConstantRangeIncludingKnownBits(const WithCache<const Value *> &V,
+                                             bool ForSigned,
+                                             const SimplifyQuery &SQ) {
   ConstantRange CR1 =
       ConstantRange::fromKnownBits(V.getKnownBits(SQ), ForSigned);
   ConstantRange CR2 = computeConstantRange(V, ForSigned, SQ.IIQ.UseInstrInfo);
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp b/llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
index 9d4a2cc08cca30c..419d2109b0b7680 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
@@ -12,6 +12,7 @@
 
 #include "InstCombineInternal.h"
 #include "llvm/Analysis/InstructionSimplify.h"
+#include "llvm/IR/ConstantRange.h"
 #include "llvm/IR/IntrinsicInst.h"
 #include "llvm/IR/PatternMatch.h"
 #include "llvm/Transforms/InstCombine/InstCombiner.h"
@@ -962,14 +963,10 @@ static bool setShiftFlags(BinaryOperator &I, const SimplifyQuery &Q) {
   }
 
   // Compute what we know about shift count.
-  KnownBits KnownCnt =
-      computeKnownBits(I.getOperand(1), Q.DL, /*Depth*/ 0, Q.AC, Q.CxtI, Q.DT);
-  // If we know nothing about shift count or its a poison shift, we won't be
-  // able to prove anything so return before computing shift amount.
-  if (KnownCnt.isUnknown())
-    return false;
+  ConstantRange KnownCnt = computeConstantRangeIncludingKnownBits(
+      I.getOperand(1), /* ForSigned */ false, Q);
   unsigned BitWidth = KnownCnt.getBitWidth();
-  APInt MaxCnt = KnownCnt.getMaxValue();
+  APInt MaxCnt = KnownCnt.getUnsignedMax();
   if (MaxCnt.uge(BitWidth))
     return false;
 
diff --git a/llvm/test/Transforms/InstCombine/and-narrow.ll b/llvm/test/Transforms/InstCombine/and-narrow.ll
index c8c720f5fbc5534..0cc74008144b738 100644
--- a/llvm/test/Transforms/InstCombine/and-narrow.ll
+++ b/llvm/test/Transforms/InstCombine/and-narrow.ll
@@ -190,7 +190,7 @@ define <2 x i16> @zext_lshr_vec_undef(<2 x i8> %x) {
 define <2 x i16> @zext_shl_vec_overshift(<2 x i8> %x) {
 ; CHECK-LABEL: @zext_shl_vec_overshift(
 ; CHECK-NEXT:    [[Z:%.*]] = zext <2 x i8> [[X:%.*]] to <2 x i16>
-; CHECK-NEXT:    [[B:%.*]] = shl <2 x i16> [[Z]], <i16 8, i16 2>
+; CHECK-NEXT:    [[B:%.*]] = shl nuw <2 x i16> [[Z]], <i16 8, i16 2>
 ; CHECK-NEXT:    [[R:%.*]] = and <2 x i16> [[B]], [[Z]]
 ; CHECK-NEXT:    ret <2 x i16> [[R]]
 ;
diff --git a/llvm/test/Transforms/InstCombine/shift-shift.ll b/llvm/test/Transforms/InstCombine/shift-shift.ll
index 4270a2c2ba205ed..ad25160d10c8a64 100644
--- a/llvm/test/Transforms/InstCombine/shift-shift.ll
+++ b/llvm/test/Transforms/InstCombine/shift-shift.ll
@@ -531,7 +531,7 @@ define <2 x i6> @shl_lshr_demand5_undef_right(<2 x i8> %x) {
 define <2 x i6> @shl_lshr_demand5_nonuniform_vec_left(<2 x i8> %x) {
 ; CHECK-LABEL: @shl_lshr_demand5_nonuniform_vec_left(
 ; CHECK-NEXT:    [[SHL:%.*]] = shl <2 x i8> <i8 -108, i8 -108>, [[X:%.*]]
-; CHECK-NEXT:    [[LSHR:%.*]] = lshr <2 x i8> [[SHL]], <i8 1, i8 2>
+; CHECK-NEXT:    [[LSHR:%.*]] = lshr exact <2 x i8> [[SHL]], <i8 1, i8 2>
 ; CHECK-NEXT:    [[R:%.*]] = trunc <2 x i8> [[LSHR]] to <2 x i6>
 ; CHECK-NEXT:    ret <2 x i6> [[R]]
 ;
@@ -694,7 +694,7 @@ define <2 x i8> @lshr_shl_demand5_undef_right(<2 x i8> %x) {
 define <2 x i8> @lshr_shl_demand5_nonuniform_vec_left(<2 x i8> %x) {
 ; CHECK-LABEL: @lshr_shl_demand5_nonuniform_vec_left(
 ; CHECK-NEXT:    [[SHR:%.*]] = lshr <2 x i8> <i8 45, i8 45>, [[X:%.*]]
-; CHECK-NEXT:    [[SHL:%.*]] = shl <2 x i8> [[SHR]], <i8 1, i8 2>
+; CHECK-NEXT:    [[SHL:%.*]] = shl nuw <2 x i8> [[SHR]], <i8 1, i8 2>
 ; CHECK-NEXT:    [[R:%.*]] = and <2 x i8> [[SHL]], <i8 108, i8 108>
 ; CHECK-NEXT:    ret <2 x i8> [[R]]
 ;

dtcxzyw · 2023-11-28T13:19:51Z

Ping.

nikic · 2023-11-28T15:26:22Z

It looks like this only affects tests using non-splat vectors. I'm usually not willing to accept improvements for non-splat vector support unless they are completely free in terms of complexity. Is there any particular reason why we would want this support?

(I'm not particularly strongly opposed though. If you think it has value I'm okay with it.)

goldsteinn · 2023-11-28T16:19:35Z

It looks like this only affects tests using non-splat vectors. I'm usually not willing to accept improvements for non-splat vector support unless they are completely free in terms of complexity. Is there any particular reason why we would want this support?

(I'm not particularly strongly opposed though. If you think it has value I'm okay with it.)

The impl doesn't seem to go out of ways to handle non-splats, just they are a byproduct. Think you could create the same result with (add (and X, ~Pow2), Pow2 * 3) where we could get the knownbit but not the range.

nikic · 2023-11-28T18:53:16Z

It looks like this only affects tests using non-splat vectors. I'm usually not willing to accept improvements for non-splat vector support unless they are completely free in terms of complexity. Is there any particular reason why we would want this support?
(I'm not particularly strongly opposed though. If you think it has value I'm okay with it.)

The impl doesn't seem to go out of ways to handle non-splats, just they are a byproduct.

Right, I get that the implementation is not specific to non-splat vectors. I'm mainly commenting on what I see in the test coverage.

Usually, we use getConstantRangeIncludingKnownBits() in places where accurate handling for non-power-of-two ranges is important, e.g. if we care that urem 42 has an upper bound of 41, not of 63.

Think you could create the same result with (add (and X, ~Pow2), Pow2 * 3) where we could get the knownbit but not the range.

I didn't understand this example.

goldsteinn · 2023-11-28T19:53:09Z

It looks like this only affects tests using non-splat vectors. I'm usually not willing to accept improvements for non-splat vector support unless they are completely free in terms of complexity. Is there any particular reason why we would want this support?
(I'm not particularly strongly opposed though. If you think it has value I'm okay with it.)

The impl doesn't seem to go out of ways to handle non-splats, just they are a byproduct.

Right, I get that the implementation is not specific to non-splat vectors. I'm mainly commenting on what I see in the test coverage.

Are you opposed to the impl, or just to the test coverage we have right now?

Usually, we use getConstantRangeIncludingKnownBits() in places where accurate handling for non-power-of-two ranges is important, e.g. if we care that urem 42 has an upper bound of 41, not of 63.

Think you could create the same result with (add (and X, ~Pow2), Pow2 * 3) where we could get the knownbit but not the range.

I didn't understand this example.

Just a case we will get info from knownbits not from constantrange.

dtcxzyw · 2023-11-29T11:18:08Z

Closed as it failed to improve the performance of existing benchmarks.

dtcxzyw requested a review from goldsteinn November 17, 2023 12:36

dtcxzyw requested a review from nikic as a code owner November 17, 2023 12:36

llvmbot added llvm:analysis llvm:transforms labels Nov 17, 2023

[InstCombine] Improve shamt range calculation

94dcda5

dtcxzyw force-pushed the compute-shamt-constrange branch from 4a4cc7c to 94dcda5 Compare November 28, 2023 05:36

dtcxzyw mentioned this pull request Nov 29, 2023

Test PR72646 dtcxzyw/llvm-ci#843

Closed

dtcxzyw closed this Nov 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[InstCombine] Improve shamt range calculation #72646

[InstCombine] Improve shamt range calculation #72646

dtcxzyw commented Nov 17, 2023

llvmbot commented Nov 17, 2023 •

edited

dtcxzyw commented Nov 28, 2023

nikic commented Nov 28, 2023 •

edited

goldsteinn commented Nov 28, 2023

nikic commented Nov 28, 2023

goldsteinn commented Nov 28, 2023

dtcxzyw commented Nov 29, 2023

[InstCombine] Improve shamt range calculation #72646

[InstCombine] Improve shamt range calculation #72646

Conversation

dtcxzyw commented Nov 17, 2023

llvmbot commented Nov 17, 2023 • edited

dtcxzyw commented Nov 28, 2023

nikic commented Nov 28, 2023 • edited

goldsteinn commented Nov 28, 2023

nikic commented Nov 28, 2023

goldsteinn commented Nov 28, 2023

dtcxzyw commented Nov 29, 2023

llvmbot commented Nov 17, 2023 •

edited

nikic commented Nov 28, 2023 •

edited