-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[InstCombine] Improve shamt range calculation #72646
Conversation
@llvm/pr-subscribers-llvm-analysis @llvm/pr-subscribers-llvm-transforms Author: Yingwei Zheng (dtcxzyw) ChangesThis patch improves the shift amount range calculation by taking account of ConstantRange. Related patch: 2dd52b4 This missed optimization is discovered with the help of AliveToolkit/alive2#962. Full diff: https://github.com/llvm/llvm-project/pull/72646.diff 5 Files Affected:
diff --git a/llvm/include/llvm/Analysis/ValueTracking.h b/llvm/include/llvm/Analysis/ValueTracking.h
index 01eb8532d1f56d2..ba7bfcab03482f6 100644
--- a/llvm/include/llvm/Analysis/ValueTracking.h
+++ b/llvm/include/llvm/Analysis/ValueTracking.h
@@ -888,6 +888,11 @@ ConstantRange computeConstantRange(const Value *V, bool ForSigned,
const DominatorTree *DT = nullptr,
unsigned Depth = 0);
+/// Combine constant ranges from computeConstantRange() and computeKnownBits().
+ConstantRange
+computeConstantRangeIncludingKnownBits(const WithCache<const Value *> &V,
+ bool ForSigned, const SimplifyQuery &SQ);
+
/// Return true if this function can prove that the instruction I will
/// always transfer execution to one of its successors (including the next
/// instruction that follows within a basic block). E.g. this is not
diff --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp
index 5f5d7e07cac1e46..b166806f0ec0e2a 100644
--- a/llvm/lib/Analysis/ValueTracking.cpp
+++ b/llvm/lib/Analysis/ValueTracking.cpp
@@ -6292,10 +6292,10 @@ static OverflowResult mapOverflowResult(ConstantRange::OverflowResult OR) {
}
/// Combine constant ranges from computeConstantRange() and computeKnownBits().
-static ConstantRange
-computeConstantRangeIncludingKnownBits(const WithCache<const Value *> &V,
- bool ForSigned,
- const SimplifyQuery &SQ) {
+ConstantRange
+llvm::computeConstantRangeIncludingKnownBits(const WithCache<const Value *> &V,
+ bool ForSigned,
+ const SimplifyQuery &SQ) {
ConstantRange CR1 =
ConstantRange::fromKnownBits(V.getKnownBits(SQ), ForSigned);
ConstantRange CR2 = computeConstantRange(V, ForSigned, SQ.IIQ.UseInstrInfo);
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp b/llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
index 9d4a2cc08cca30c..419d2109b0b7680 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
@@ -12,6 +12,7 @@
#include "InstCombineInternal.h"
#include "llvm/Analysis/InstructionSimplify.h"
+#include "llvm/IR/ConstantRange.h"
#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/PatternMatch.h"
#include "llvm/Transforms/InstCombine/InstCombiner.h"
@@ -962,14 +963,10 @@ static bool setShiftFlags(BinaryOperator &I, const SimplifyQuery &Q) {
}
// Compute what we know about shift count.
- KnownBits KnownCnt =
- computeKnownBits(I.getOperand(1), Q.DL, /*Depth*/ 0, Q.AC, Q.CxtI, Q.DT);
- // If we know nothing about shift count or its a poison shift, we won't be
- // able to prove anything so return before computing shift amount.
- if (KnownCnt.isUnknown())
- return false;
+ ConstantRange KnownCnt = computeConstantRangeIncludingKnownBits(
+ I.getOperand(1), /* ForSigned */ false, Q);
unsigned BitWidth = KnownCnt.getBitWidth();
- APInt MaxCnt = KnownCnt.getMaxValue();
+ APInt MaxCnt = KnownCnt.getUnsignedMax();
if (MaxCnt.uge(BitWidth))
return false;
diff --git a/llvm/test/Transforms/InstCombine/and-narrow.ll b/llvm/test/Transforms/InstCombine/and-narrow.ll
index c8c720f5fbc5534..0cc74008144b738 100644
--- a/llvm/test/Transforms/InstCombine/and-narrow.ll
+++ b/llvm/test/Transforms/InstCombine/and-narrow.ll
@@ -190,7 +190,7 @@ define <2 x i16> @zext_lshr_vec_undef(<2 x i8> %x) {
define <2 x i16> @zext_shl_vec_overshift(<2 x i8> %x) {
; CHECK-LABEL: @zext_shl_vec_overshift(
; CHECK-NEXT: [[Z:%.*]] = zext <2 x i8> [[X:%.*]] to <2 x i16>
-; CHECK-NEXT: [[B:%.*]] = shl <2 x i16> [[Z]], <i16 8, i16 2>
+; CHECK-NEXT: [[B:%.*]] = shl nuw <2 x i16> [[Z]], <i16 8, i16 2>
; CHECK-NEXT: [[R:%.*]] = and <2 x i16> [[B]], [[Z]]
; CHECK-NEXT: ret <2 x i16> [[R]]
;
diff --git a/llvm/test/Transforms/InstCombine/shift-shift.ll b/llvm/test/Transforms/InstCombine/shift-shift.ll
index 4270a2c2ba205ed..ad25160d10c8a64 100644
--- a/llvm/test/Transforms/InstCombine/shift-shift.ll
+++ b/llvm/test/Transforms/InstCombine/shift-shift.ll
@@ -531,7 +531,7 @@ define <2 x i6> @shl_lshr_demand5_undef_right(<2 x i8> %x) {
define <2 x i6> @shl_lshr_demand5_nonuniform_vec_left(<2 x i8> %x) {
; CHECK-LABEL: @shl_lshr_demand5_nonuniform_vec_left(
; CHECK-NEXT: [[SHL:%.*]] = shl <2 x i8> <i8 -108, i8 -108>, [[X:%.*]]
-; CHECK-NEXT: [[LSHR:%.*]] = lshr <2 x i8> [[SHL]], <i8 1, i8 2>
+; CHECK-NEXT: [[LSHR:%.*]] = lshr exact <2 x i8> [[SHL]], <i8 1, i8 2>
; CHECK-NEXT: [[R:%.*]] = trunc <2 x i8> [[LSHR]] to <2 x i6>
; CHECK-NEXT: ret <2 x i6> [[R]]
;
@@ -694,7 +694,7 @@ define <2 x i8> @lshr_shl_demand5_undef_right(<2 x i8> %x) {
define <2 x i8> @lshr_shl_demand5_nonuniform_vec_left(<2 x i8> %x) {
; CHECK-LABEL: @lshr_shl_demand5_nonuniform_vec_left(
; CHECK-NEXT: [[SHR:%.*]] = lshr <2 x i8> <i8 45, i8 45>, [[X:%.*]]
-; CHECK-NEXT: [[SHL:%.*]] = shl <2 x i8> [[SHR]], <i8 1, i8 2>
+; CHECK-NEXT: [[SHL:%.*]] = shl nuw <2 x i8> [[SHR]], <i8 1, i8 2>
; CHECK-NEXT: [[R:%.*]] = and <2 x i8> [[SHL]], <i8 108, i8 108>
; CHECK-NEXT: ret <2 x i8> [[R]]
;
|
4a4cc7c
to
94dcda5
Compare
Ping. |
It looks like this only affects tests using non-splat vectors. I'm usually not willing to accept improvements for non-splat vector support unless they are completely free in terms of complexity. Is there any particular reason why we would want this support? (I'm not particularly strongly opposed though. If you think it has value I'm okay with it.) |
The impl doesn't seem to go out of ways to handle non-splats, just they are a byproduct. Think you could create the same result with |
Right, I get that the implementation is not specific to non-splat vectors. I'm mainly commenting on what I see in the test coverage. Usually, we use getConstantRangeIncludingKnownBits() in places where accurate handling for non-power-of-two ranges is important, e.g. if we care that
I didn't understand this example. |
Are you opposed to the impl, or just to the test coverage we have right now?
Just a case we will get info from knownbits not from constantrange. |
Closed as it failed to improve the performance of existing benchmarks. |
This patch improves the shift amount range calculation by taking account of ConstantRange.
Related patch: 2dd52b4
This missed optimization is discovered with the help of AliveToolkit/alive2#962.