-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[InstCombine] Prefer to keep power-of-2 constants when combining ashr exact and slt/ult of a constant #86111
Conversation
@llvm/pr-subscribers-llvm-transforms Author: Alex Bradbury (asb) ChangesWe have flexibility in what constant to use when combining an Alive2: <https://alive2.llvm.org/ce/z/anqsyz> and Discussion: More test changes are obviously required if landing this, but I thought I'd put this out there to get feedback on the approach as I'm not sure there's any precedent on making a heuristic choice for constants in instcombine. If it were possible to do this later in the pipeline (e.g. riscvcodegenprepare or later) I would, but as noted above once you lose the The motivating example was something like:
You get this kind of code from pointer arithmetic like With this patch this can be compiled to:
But without it you get an awkward to generate constant:
I'm arguing that producing pow2 or close to pow2 constants if you can is a good default that shouldn't hurt and may help - but perhaps there are cases I haven't considered? Full diff: https://github.com/llvm/llvm-project/pull/86111.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp b/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
index db302d7e526844..016b16ec50f8ac 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
@@ -2482,10 +2482,21 @@ Instruction *InstCombinerImpl::foldICmpShrConstant(ICmpInst &Cmp,
// those conditions rather than checking them. This is difficult because of
// undef/poison (PR34838).
if (IsAShr && Shr->hasOneUse()) {
+ if (IsExact && (Pred == CmpInst::ICMP_SLT || Pred == CmpInst::ICMP_ULT) &&
+ !C.isMinSignedValue() && (C - 1).isPowerOf2()) {
+ // When C - 1 is a power of two and the transform can be legally
+ // performed, prefer this form so the produced constant is close to a
+ // power of two.
+ // icmp slt/ult (ashr exact X, ShAmtC), C
+ // --> icmp slt/ult (C - 1) << ShAmtC) -1
+ APInt ShiftedC = (C - 1).shl(ShAmtVal) + 1;
+ return new ICmpInst(Pred, X, ConstantInt::get(ShrTy, ShiftedC));
+ }
if (IsExact || Pred == CmpInst::ICMP_SLT || Pred == CmpInst::ICMP_ULT) {
// When ShAmtC can be shifted losslessly:
// icmp PRED (ashr exact X, ShAmtC), C --> icmp PRED X, (C << ShAmtC)
// icmp slt/ult (ashr X, ShAmtC), C --> icmp slt/ult X, (C << ShAmtC)
+
APInt ShiftedC = C.shl(ShAmtVal);
if (ShiftedC.ashr(ShAmtVal) == C)
return new ICmpInst(Pred, X, ConstantInt::get(ShrTy, ShiftedC));
diff --git a/llvm/test/Transforms/InstCombine/icmp-shr-lt-gt.ll b/llvm/test/Transforms/InstCombine/icmp-shr-lt-gt.ll
index 1b8efe4351c6dc..6cdf18d73c9e90 100644
--- a/llvm/test/Transforms/InstCombine/icmp-shr-lt-gt.ll
+++ b/llvm/test/Transforms/InstCombine/icmp-shr-lt-gt.ll
@@ -3379,7 +3379,7 @@ define i1 @ashrslt_01_01_exact(i4 %x) {
define i1 @ashrslt_01_02_exact(i4 %x) {
; CHECK-LABEL: @ashrslt_01_02_exact(
-; CHECK-NEXT: [[C:%.*]] = icmp slt i4 [[X:%.*]], 4
+; CHECK-NEXT: [[C:%.*]] = icmp slt i4 [[X:%.*]], 3
; CHECK-NEXT: ret i1 [[C]]
;
%s = ashr exact i4 %x, 1
@@ -3389,7 +3389,7 @@ define i1 @ashrslt_01_02_exact(i4 %x) {
define i1 @ashrslt_01_03_exact(i4 %x) {
; CHECK-LABEL: @ashrslt_01_03_exact(
-; CHECK-NEXT: [[C:%.*]] = icmp slt i4 [[X:%.*]], 6
+; CHECK-NEXT: [[C:%.*]] = icmp slt i4 [[X:%.*]], 5
; CHECK-NEXT: ret i1 [[C]]
;
%s = ashr exact i4 %x, 1
|
a0d036b
to
7935c85
Compare
The upcoming patch adds logic to prefer to use constants close to power-of-two in these ashr exact + slt/ult pattersn when it has a choice on which constant can be used.
I think all comments are now addressed - tests are added also. I've updated the PR description to link to the generalised proofs shared on a subthread here. |
The implementation LGTM. However, I don't see any performance improvement/regression according to the IR diff/instruction count. Can you provide some performance data for this patch? I guess it requires the zbs extension. |
I will re-test this patch later as I noticed llvm-diff doesn't handle flags/attrs :( |
Yes, I'd expect no real difference without zbs. I don't have performance figures for a full benchmark suite - I was analysing bedcov.c from https://github.com/attractivechaos/plb2 vs GCC and although it doesn't show up high on the profile, noted gcc managed to select |
The IR diff basically looks fine to me. I am investigating the regression in pbrt-v4: https://github.com/dtcxzyw/llvm-opt-benchmark/pull/589/files#r1592080470 |
Missing fold: https://alive2.llvm.org/ce/z/eUii9T |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Please wait for another approval.
The upcoming patch adds logic to prefer to use constants close to power-of-two in these ashr exact + slt/ult patterns when it has a choice on which constant can be used.
… exact and slt/ult of a constant We have flexibility in what constant to use when combining an `ashr exact` with a slt or ult of a constant, and it's not possible to revisit this decision later in the compilation pipeline after the `ashr exact` is removed. Keeping a constant close to power-of-2 (pow2val + 1) should be no worse than neutral, and in some cases may allow better codegen later on for targets that can more cheaply generate power of 2 or near power of 2 constants. Alive2 proofs: <https://alive2.llvm.org/ce/z/2BmPnq> and <https://alive2.llvm.org/ce/z/DtuhnR>
7935c85
to
8624cf8
Compare
…lConstant` (#92773) This patch extends the transform `(icmp pred iM (shl iM %v, N), C) -> (icmp pred i(M-N) (trunc %v iM to i(M-N)), (trunc (C>>N))` to handle icmps with the flipped strictness of predicate. See the following case: ``` icmp ult i64 (shl X, 32), 8589934593 -> icmp ule i64 (shl X, 32), 8589934592 -> icmp ule i32 (trunc X, i32), 2 -> icmp ult i32 (trunc X, i32), 3 ``` Fixes the regression introduced by #86111 (comment). Alive2 proofs: https://alive2.llvm.org/ce/z/-sp5n3 `nuw` cannot be propagated as we always use `ashr` here. I don't see the value of fixing this (see the test `test_icmp_shl_nuw`).
We have flexibility in what constant to use when combining an
ashr exact
with a slt or ult of a constant, and it's not possible to revisit this decision later in the compilation pipeline after theashr exact
is removed. Keeping a constant close to power-of-2 (pow2val + 1) should be no worse than neutral, and in some cases may allow better codegen later on for targets that can more cheaply generate power of 2 (which may be selectable if converting back to setle/setge) or near power of 2 constants.Alive2 proofs:
https://alive2.llvm.org/ce/z/2BmPnq and
https://alive2.llvm.org/ce/z/DtuhnR
Discussion:
More test changes are obviously required if landing this, but I thought I'd put this out there to get feedback on the approach as I'm not sure there's any precedent on making a heuristic choice for constants in instcombine. If it were possible to do this later in the pipeline (e.g. riscvcodegenprepare or later) I would, but as noted above once you lose the
ashr exact
you no longer know it's legal to change the constant. I don't have a benchmark this makes a huge difference on, but came across it when chasing something else (and intuitively, it seems plausible that making this kind of compare materialisable with a single instruction may reduce register pressure in the right cases).The motivating example was something like:
You get this kind of code from pointer arithmetic like
end - beg <= CONST
in C.With this patch this can be compiled to:
But without it you get an awkward to generate constant:
I'm arguing that producing pow2 or close to pow2 constants if you can is a good default that shouldn't hurt and may help - but perhaps there are cases I haven't considered?