Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[InstCombine] Improve shamt range calculation #72646

Closed
wants to merge 1 commit into from

Conversation

dtcxzyw
Copy link
Member

@dtcxzyw dtcxzyw commented Nov 17, 2023

This patch improves the shift amount range calculation by taking account of ConstantRange.

Related patch: 2dd52b4

This missed optimization is discovered with the help of AliveToolkit/alive2#962.

@llvmbot
Copy link
Collaborator

llvmbot commented Nov 17, 2023

@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-llvm-transforms

Author: Yingwei Zheng (dtcxzyw)

Changes

This patch improves the shift amount range calculation by taking account of ConstantRange.

Related patch: 2dd52b4

This missed optimization is discovered with the help of AliveToolkit/alive2#962.


Full diff: https://github.com/llvm/llvm-project/pull/72646.diff

5 Files Affected:

  • (modified) llvm/include/llvm/Analysis/ValueTracking.h (+5)
  • (modified) llvm/lib/Analysis/ValueTracking.cpp (+4-4)
  • (modified) llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp (+4-7)
  • (modified) llvm/test/Transforms/InstCombine/and-narrow.ll (+1-1)
  • (modified) llvm/test/Transforms/InstCombine/shift-shift.ll (+2-2)
diff --git a/llvm/include/llvm/Analysis/ValueTracking.h b/llvm/include/llvm/Analysis/ValueTracking.h
index 01eb8532d1f56d2..ba7bfcab03482f6 100644
--- a/llvm/include/llvm/Analysis/ValueTracking.h
+++ b/llvm/include/llvm/Analysis/ValueTracking.h
@@ -888,6 +888,11 @@ ConstantRange computeConstantRange(const Value *V, bool ForSigned,
                                    const DominatorTree *DT = nullptr,
                                    unsigned Depth = 0);
 
+/// Combine constant ranges from computeConstantRange() and computeKnownBits().
+ConstantRange
+computeConstantRangeIncludingKnownBits(const WithCache<const Value *> &V,
+                                       bool ForSigned, const SimplifyQuery &SQ);
+
 /// Return true if this function can prove that the instruction I will
 /// always transfer execution to one of its successors (including the next
 /// instruction that follows within a basic block). E.g. this is not
diff --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp
index 5f5d7e07cac1e46..b166806f0ec0e2a 100644
--- a/llvm/lib/Analysis/ValueTracking.cpp
+++ b/llvm/lib/Analysis/ValueTracking.cpp
@@ -6292,10 +6292,10 @@ static OverflowResult mapOverflowResult(ConstantRange::OverflowResult OR) {
 }
 
 /// Combine constant ranges from computeConstantRange() and computeKnownBits().
-static ConstantRange
-computeConstantRangeIncludingKnownBits(const WithCache<const Value *> &V,
-                                       bool ForSigned,
-                                       const SimplifyQuery &SQ) {
+ConstantRange
+llvm::computeConstantRangeIncludingKnownBits(const WithCache<const Value *> &V,
+                                             bool ForSigned,
+                                             const SimplifyQuery &SQ) {
   ConstantRange CR1 =
       ConstantRange::fromKnownBits(V.getKnownBits(SQ), ForSigned);
   ConstantRange CR2 = computeConstantRange(V, ForSigned, SQ.IIQ.UseInstrInfo);
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp b/llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
index 9d4a2cc08cca30c..419d2109b0b7680 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
@@ -12,6 +12,7 @@
 
 #include "InstCombineInternal.h"
 #include "llvm/Analysis/InstructionSimplify.h"
+#include "llvm/IR/ConstantRange.h"
 #include "llvm/IR/IntrinsicInst.h"
 #include "llvm/IR/PatternMatch.h"
 #include "llvm/Transforms/InstCombine/InstCombiner.h"
@@ -962,14 +963,10 @@ static bool setShiftFlags(BinaryOperator &I, const SimplifyQuery &Q) {
   }
 
   // Compute what we know about shift count.
-  KnownBits KnownCnt =
-      computeKnownBits(I.getOperand(1), Q.DL, /*Depth*/ 0, Q.AC, Q.CxtI, Q.DT);
-  // If we know nothing about shift count or its a poison shift, we won't be
-  // able to prove anything so return before computing shift amount.
-  if (KnownCnt.isUnknown())
-    return false;
+  ConstantRange KnownCnt = computeConstantRangeIncludingKnownBits(
+      I.getOperand(1), /* ForSigned */ false, Q);
   unsigned BitWidth = KnownCnt.getBitWidth();
-  APInt MaxCnt = KnownCnt.getMaxValue();
+  APInt MaxCnt = KnownCnt.getUnsignedMax();
   if (MaxCnt.uge(BitWidth))
     return false;
 
diff --git a/llvm/test/Transforms/InstCombine/and-narrow.ll b/llvm/test/Transforms/InstCombine/and-narrow.ll
index c8c720f5fbc5534..0cc74008144b738 100644
--- a/llvm/test/Transforms/InstCombine/and-narrow.ll
+++ b/llvm/test/Transforms/InstCombine/and-narrow.ll
@@ -190,7 +190,7 @@ define <2 x i16> @zext_lshr_vec_undef(<2 x i8> %x) {
 define <2 x i16> @zext_shl_vec_overshift(<2 x i8> %x) {
 ; CHECK-LABEL: @zext_shl_vec_overshift(
 ; CHECK-NEXT:    [[Z:%.*]] = zext <2 x i8> [[X:%.*]] to <2 x i16>
-; CHECK-NEXT:    [[B:%.*]] = shl <2 x i16> [[Z]], <i16 8, i16 2>
+; CHECK-NEXT:    [[B:%.*]] = shl nuw <2 x i16> [[Z]], <i16 8, i16 2>
 ; CHECK-NEXT:    [[R:%.*]] = and <2 x i16> [[B]], [[Z]]
 ; CHECK-NEXT:    ret <2 x i16> [[R]]
 ;
diff --git a/llvm/test/Transforms/InstCombine/shift-shift.ll b/llvm/test/Transforms/InstCombine/shift-shift.ll
index 4270a2c2ba205ed..ad25160d10c8a64 100644
--- a/llvm/test/Transforms/InstCombine/shift-shift.ll
+++ b/llvm/test/Transforms/InstCombine/shift-shift.ll
@@ -531,7 +531,7 @@ define <2 x i6> @shl_lshr_demand5_undef_right(<2 x i8> %x) {
 define <2 x i6> @shl_lshr_demand5_nonuniform_vec_left(<2 x i8> %x) {
 ; CHECK-LABEL: @shl_lshr_demand5_nonuniform_vec_left(
 ; CHECK-NEXT:    [[SHL:%.*]] = shl <2 x i8> <i8 -108, i8 -108>, [[X:%.*]]
-; CHECK-NEXT:    [[LSHR:%.*]] = lshr <2 x i8> [[SHL]], <i8 1, i8 2>
+; CHECK-NEXT:    [[LSHR:%.*]] = lshr exact <2 x i8> [[SHL]], <i8 1, i8 2>
 ; CHECK-NEXT:    [[R:%.*]] = trunc <2 x i8> [[LSHR]] to <2 x i6>
 ; CHECK-NEXT:    ret <2 x i6> [[R]]
 ;
@@ -694,7 +694,7 @@ define <2 x i8> @lshr_shl_demand5_undef_right(<2 x i8> %x) {
 define <2 x i8> @lshr_shl_demand5_nonuniform_vec_left(<2 x i8> %x) {
 ; CHECK-LABEL: @lshr_shl_demand5_nonuniform_vec_left(
 ; CHECK-NEXT:    [[SHR:%.*]] = lshr <2 x i8> <i8 45, i8 45>, [[X:%.*]]
-; CHECK-NEXT:    [[SHL:%.*]] = shl <2 x i8> [[SHR]], <i8 1, i8 2>
+; CHECK-NEXT:    [[SHL:%.*]] = shl nuw <2 x i8> [[SHR]], <i8 1, i8 2>
 ; CHECK-NEXT:    [[R:%.*]] = and <2 x i8> [[SHL]], <i8 108, i8 108>
 ; CHECK-NEXT:    ret <2 x i8> [[R]]
 ;

@dtcxzyw
Copy link
Member Author

dtcxzyw commented Nov 28, 2023

Ping.

@nikic
Copy link
Contributor

nikic commented Nov 28, 2023

It looks like this only affects tests using non-splat vectors. I'm usually not willing to accept improvements for non-splat vector support unless they are completely free in terms of complexity. Is there any particular reason why we would want this support?

(I'm not particularly strongly opposed though. If you think it has value I'm okay with it.)

@goldsteinn
Copy link
Contributor

It looks like this only affects tests using non-splat vectors. I'm usually not willing to accept improvements for non-splat vector support unless they are completely free in terms of complexity. Is there any particular reason why we would want this support?

(I'm not particularly strongly opposed though. If you think it has value I'm okay with it.)

The impl doesn't seem to go out of ways to handle non-splats, just they are a byproduct. Think you could create the same result with (add (and X, ~Pow2), Pow2 * 3) where we could get the knownbit but not the range.

@nikic
Copy link
Contributor

nikic commented Nov 28, 2023

It looks like this only affects tests using non-splat vectors. I'm usually not willing to accept improvements for non-splat vector support unless they are completely free in terms of complexity. Is there any particular reason why we would want this support?
(I'm not particularly strongly opposed though. If you think it has value I'm okay with it.)

The impl doesn't seem to go out of ways to handle non-splats, just they are a byproduct.

Right, I get that the implementation is not specific to non-splat vectors. I'm mainly commenting on what I see in the test coverage.

Usually, we use getConstantRangeIncludingKnownBits() in places where accurate handling for non-power-of-two ranges is important, e.g. if we care that urem 42 has an upper bound of 41, not of 63.

Think you could create the same result with (add (and X, ~Pow2), Pow2 * 3) where we could get the knownbit but not the range.

I didn't understand this example.

@goldsteinn
Copy link
Contributor

It looks like this only affects tests using non-splat vectors. I'm usually not willing to accept improvements for non-splat vector support unless they are completely free in terms of complexity. Is there any particular reason why we would want this support?
(I'm not particularly strongly opposed though. If you think it has value I'm okay with it.)

The impl doesn't seem to go out of ways to handle non-splats, just they are a byproduct.

Right, I get that the implementation is not specific to non-splat vectors. I'm mainly commenting on what I see in the test coverage.

Are you opposed to the impl, or just to the test coverage we have right now?

Usually, we use getConstantRangeIncludingKnownBits() in places where accurate handling for non-power-of-two ranges is important, e.g. if we care that urem 42 has an upper bound of 41, not of 63.

Think you could create the same result with (add (and X, ~Pow2), Pow2 * 3) where we could get the knownbit but not the range.

I didn't understand this example.

Just a case we will get info from knownbits not from constantrange.

@dtcxzyw
Copy link
Member Author

dtcxzyw commented Nov 29, 2023

Closed as it failed to improve the performance of existing benchmarks.

@dtcxzyw dtcxzyw closed this Nov 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants