Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[InstCombine] Prefer to keep power-of-2 constants when combining ashr exact and slt/ult of a constant #86111

Merged
merged 1 commit into from
May 10, 2024

Conversation

asb
Copy link
Contributor

@asb asb commented Mar 21, 2024

We have flexibility in what constant to use when combining an ashr exact with a slt or ult of a constant, and it's not possible to revisit this decision later in the compilation pipeline after the ashr exact is removed. Keeping a constant close to power-of-2 (pow2val + 1) should be no worse than neutral, and in some cases may allow better codegen later on for targets that can more cheaply generate power of 2 (which may be selectable if converting back to setle/setge) or near power of 2 constants.

Alive2 proofs:
https://alive2.llvm.org/ce/z/2BmPnq and
https://alive2.llvm.org/ce/z/DtuhnR


Discussion:

More test changes are obviously required if landing this, but I thought I'd put this out there to get feedback on the approach as I'm not sure there's any precedent on making a heuristic choice for constants in instcombine. If it were possible to do this later in the pipeline (e.g. riscvcodegenprepare or later) I would, but as noted above once you lose the ashr exact you no longer know it's legal to change the constant. I don't have a benchmark this makes a huge difference on, but came across it when chasing something else (and intuitively, it seems plausible that making this kind of compare materialisable with a single instruction may reduce register pressure in the right cases).

The motivating example was something like:

define dso_local signext i32 @foo(ptr noundef %0, ptr noundef %1) local_unnamed_addr #0 {
  %3 = ptrtoint ptr %1 to i64
  %4 = ptrtoint ptr %0 to i64
  %5 = sub i64 %3, %4
  %6 = sdiv exact i64 %5, 32
  %7 = icmp sle i64 %6, 64
  br i1 %7, label %8, label %9

8:                                                ; preds = %2
  call void @abort() #2
  unreachable

9:                                                ; preds = %2
  ret i32 0

}

You get this kind of code from pointer arithmetic like end - beg <= CONST in C.

With this patch this can be compiled to:

	sub	a1, a1, a0
	bseti	a0, zero, 11
	bge	a0, a1, .LBB0_2
...

But without it you get an awkward to generate constant:

	sub	a1, a1, a0
	lui	a0, 1
	addiw	a0, a0, -2017
	bge	a0, a1, .LBB0_2
...

I'm arguing that producing pow2 or close to pow2 constants if you can is a good default that shouldn't hurt and may help - but perhaps there are cases I haven't considered?

@llvmbot
Copy link
Collaborator

llvmbot commented Mar 21, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Alex Bradbury (asb)

Changes

We have flexibility in what constant to use when combining an ashr exact with a slt or ult of a constant, and it's not possible to revisit this decision later in the compilation pipeline after the ashr exact is removed. Keeping a constant close to power-of-2 (pow2val + 1) should be no worse than neutral, and in some cases may allow better codegen later on for targets that can more cheaply generated power of 2 (which may be selectable if converting back to setle/setge) or near power of 2 constants.

Alive2: <https://alive2.llvm.org/ce/z/anqsyz> and
<https://alive2.llvm.org/ce/z/AL9P3o>


Discussion:

More test changes are obviously required if landing this, but I thought I'd put this out there to get feedback on the approach as I'm not sure there's any precedent on making a heuristic choice for constants in instcombine. If it were possible to do this later in the pipeline (e.g. riscvcodegenprepare or later) I would, but as noted above once you lose the ashr exact you no longer know it's legal to change the constant. I don't have a benchmark this makes a huge difference on, but came across it when chasing something else (and intuitively, it seems plausible that making this kind of compare materialisable with a single instruction may reduce register pressure in the right cases).

The motivating example was something like:

define dso_local signext i32 @<!-- -->foo(ptr noundef %0, ptr noundef %1) local_unnamed_addr #<!-- -->0 {
  %3 = ptrtoint ptr %1 to i64
  %4 = ptrtoint ptr %0 to i64
  %5 = sub i64 %3, %4
  %6 = sdiv exact i64 %5, 32
  %7 = icmp sle i64 %6, 64
  br i1 %7, label %8, label %9

8:                                                ; preds = %2
  call void @<!-- -->abort() #<!-- -->2
  unreachable

9:                                                ; preds = %2
  ret i32 0

}

You get this kind of code from pointer arithmetic like end - beg &lt;= CONST in C.

With this patch this can be compiled to:

	sub	a1, a1, a0
	bseti	a0, zero, 11
	bge	a0, a1, .LBB0_2
# %bb.1:
	li	a0, 0
	ret

But without it you get an awkward to generate constant:

	sub	a1, a1, a0
	lui	a0, 1
	addiw	a0, a0, -2017
	bge	a0, a1, .LBB0_2

I'm arguing that producing pow2 or close to pow2 constants if you can is a good default that shouldn't hurt and may help - but perhaps there are cases I haven't considered?


Full diff: https://github.com/llvm/llvm-project/pull/86111.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp (+11)
  • (modified) llvm/test/Transforms/InstCombine/icmp-shr-lt-gt.ll (+2-2)
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp b/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
index db302d7e526844..016b16ec50f8ac 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
@@ -2482,10 +2482,21 @@ Instruction *InstCombinerImpl::foldICmpShrConstant(ICmpInst &Cmp,
   // those conditions rather than checking them. This is difficult because of
   // undef/poison (PR34838).
   if (IsAShr && Shr->hasOneUse()) {
+    if (IsExact && (Pred == CmpInst::ICMP_SLT || Pred == CmpInst::ICMP_ULT) &&
+        !C.isMinSignedValue() && (C - 1).isPowerOf2()) {
+      // When C - 1 is a power of two and the transform can be legally
+      // performed, prefer this form so the produced constant is close to a
+      // power of two.
+      // icmp slt/ult (ashr exact X, ShAmtC), C
+      // --> icmp slt/ult (C - 1) << ShAmtC) -1
+      APInt ShiftedC = (C - 1).shl(ShAmtVal) + 1;
+      return new ICmpInst(Pred, X, ConstantInt::get(ShrTy, ShiftedC));
+    }
     if (IsExact || Pred == CmpInst::ICMP_SLT || Pred == CmpInst::ICMP_ULT) {
       // When ShAmtC can be shifted losslessly:
       // icmp PRED (ashr exact X, ShAmtC), C --> icmp PRED X, (C << ShAmtC)
       // icmp slt/ult (ashr X, ShAmtC), C --> icmp slt/ult X, (C << ShAmtC)
+
       APInt ShiftedC = C.shl(ShAmtVal);
       if (ShiftedC.ashr(ShAmtVal) == C)
         return new ICmpInst(Pred, X, ConstantInt::get(ShrTy, ShiftedC));
diff --git a/llvm/test/Transforms/InstCombine/icmp-shr-lt-gt.ll b/llvm/test/Transforms/InstCombine/icmp-shr-lt-gt.ll
index 1b8efe4351c6dc..6cdf18d73c9e90 100644
--- a/llvm/test/Transforms/InstCombine/icmp-shr-lt-gt.ll
+++ b/llvm/test/Transforms/InstCombine/icmp-shr-lt-gt.ll
@@ -3379,7 +3379,7 @@ define i1 @ashrslt_01_01_exact(i4 %x) {
 
 define i1 @ashrslt_01_02_exact(i4 %x) {
 ; CHECK-LABEL: @ashrslt_01_02_exact(
-; CHECK-NEXT:    [[C:%.*]] = icmp slt i4 [[X:%.*]], 4
+; CHECK-NEXT:    [[C:%.*]] = icmp slt i4 [[X:%.*]], 3
 ; CHECK-NEXT:    ret i1 [[C]]
 ;
   %s = ashr exact i4 %x, 1
@@ -3389,7 +3389,7 @@ define i1 @ashrslt_01_02_exact(i4 %x) {
 
 define i1 @ashrslt_01_03_exact(i4 %x) {
 ; CHECK-LABEL: @ashrslt_01_03_exact(
-; CHECK-NEXT:    [[C:%.*]] = icmp slt i4 [[X:%.*]], 6
+; CHECK-NEXT:    [[C:%.*]] = icmp slt i4 [[X:%.*]], 5
 ; CHECK-NEXT:    ret i1 [[C]]
 ;
   %s = ashr exact i4 %x, 1

@nikic nikic requested a review from dtcxzyw March 21, 2024 11:33
@asb asb force-pushed the 2024q1-instcombine-ashr-cmp-constant-pref branch from a0d036b to 7935c85 Compare May 6, 2024 06:38
asb added a commit to asb/llvm-project that referenced this pull request May 6, 2024
The upcoming patch adds logic to prefer to use constants close to
power-of-two in these ashr exact + slt/ult pattersn when it has a choice
on which constant can be used.
@asb asb requested a review from dtcxzyw May 6, 2024 06:40
@asb
Copy link
Contributor Author

asb commented May 6, 2024

I think all comments are now addressed - tests are added also. I've updated the PR description to link to the generalised proofs shared on a subthread here.

dtcxzyw added a commit to dtcxzyw/llvm-opt-benchmark that referenced this pull request May 6, 2024
@dtcxzyw
Copy link
Member

dtcxzyw commented May 6, 2024

The implementation LGTM. However, I don't see any performance improvement/regression according to the IR diff/instruction count. Can you provide some performance data for this patch? I guess it requires the zbs extension.

@dtcxzyw
Copy link
Member

dtcxzyw commented May 7, 2024

However, I don't see any performance improvement/regression according to the IR diff/instruction count.

I will re-test this patch later as I noticed llvm-diff doesn't handle flags/attrs :(

dtcxzyw added a commit to dtcxzyw/llvm-opt-benchmark that referenced this pull request May 7, 2024
@asb
Copy link
Contributor Author

asb commented May 7, 2024

The implementation LGTM. However, I don't see any performance improvement/regression according to the IR diff/instruction count. Can you provide some performance data for this patch? I guess it requires the zbs extension.

Yes, I'd expect no real difference without zbs. I don't have performance figures for a full benchmark suite - I was analysing bedcov.c from https://github.com/attractivechaos/plb2 vs GCC and although it doesn't show up high on the profile, noted gcc managed to select bseti where we didn't. I thought this might be a simple missed isel pattern and of course ended up finding it was related to us choosing a harder to efficiently generate constant in InstCombine.

@dtcxzyw
Copy link
Member

dtcxzyw commented May 7, 2024

The implementation LGTM. However, I don't see any performance improvement/regression according to the IR diff/instruction count. Can you provide some performance data for this patch? I guess it requires the zbs extension.

Yes, I'd expect no real difference without zbs. I don't have performance figures for a full benchmark suite - I was analysing bedcov.c from https://github.com/attractivechaos/plb2 vs GCC and although it doesn't show up high on the profile, noted gcc managed to select bseti where we didn't. I thought this might be a simple missed isel pattern and of course ended up finding it was related to us choosing a harder to efficiently generate constant in InstCombine.

The IR diff basically looks fine to me. I am investigating the regression in pbrt-v4: https://github.com/dtcxzyw/llvm-opt-benchmark/pull/589/files#r1592080470

@dtcxzyw
Copy link
Member

dtcxzyw commented May 7, 2024

The implementation LGTM. However, I don't see any performance improvement/regression according to the IR diff/instruction count. Can you provide some performance data for this patch? I guess it requires the zbs extension.

Yes, I'd expect no real difference without zbs. I don't have performance figures for a full benchmark suite - I was analysing bedcov.c from https://github.com/attractivechaos/plb2 vs GCC and although it doesn't show up high on the profile, noted gcc managed to select bseti where we didn't. I thought this might be a simple missed isel pattern and of course ended up finding it was related to us choosing a harder to efficiently generate constant in InstCombine.

The IR diff basically looks fine to me. I am investigating the regression in pbrt-v4: https://github.com/dtcxzyw/llvm-opt-benchmark/pull/589/files#r1592080470

Missing fold: https://alive2.llvm.org/ce/z/eUii9T
I will post a patch later to fix this.

Copy link
Member

@dtcxzyw dtcxzyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please wait for another approval.

asb added a commit that referenced this pull request May 10, 2024
The upcoming patch adds logic to prefer to use constants close to
power-of-two in these ashr exact + slt/ult patterns when it has a choice
on which constant can be used.
… exact and slt/ult of a constant

We have flexibility in what constant to use when combining an `ashr
exact` with a slt or ult of a constant, and it's not possible to revisit
this decision later in the compilation pipeline after the `ashr exact`
is removed. Keeping a constant close to power-of-2 (pow2val + 1) should
be no worse than neutral, and in some cases may allow better codegen
later on for targets that can more cheaply generate power of 2 or near
power of 2 constants.

Alive2 proofs:
<https://alive2.llvm.org/ce/z/2BmPnq> and
<https://alive2.llvm.org/ce/z/DtuhnR>
@asb asb force-pushed the 2024q1-instcombine-ashr-cmp-constant-pref branch from 7935c85 to 8624cf8 Compare May 10, 2024 12:49
@asb asb merged commit 3be8e2c into llvm:main May 10, 2024
3 of 4 checks passed
dtcxzyw added a commit that referenced this pull request May 28, 2024
…lConstant` (#92773)

This patch extends the transform `(icmp pred iM (shl iM %v, N), C) ->
(icmp pred i(M-N) (trunc %v iM to i(M-N)), (trunc (C>>N))` to handle
icmps with the flipped strictness of predicate.

See the following case:
```
icmp ult i64 (shl X, 32), 8589934593 ->
icmp ule i64 (shl X, 32), 8589934592 ->
icmp ule i32 (trunc X, i32), 2 ->
icmp ult i32 (trunc X, i32), 3
```

Fixes the regression introduced by
#86111 (comment).

Alive2 proofs: https://alive2.llvm.org/ce/z/-sp5n3

`nuw` cannot be propagated as we always use `ashr` here. I don't see the
value of fixing this (see the test `test_icmp_shl_nuw`).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants