[InstCombine] Prefer to keep power-of-2 constants when combining ashr exact and slt/ult of a constant #86111

asb · 2024-03-21T11:27:16Z

We have flexibility in what constant to use when combining an ashr exact with a slt or ult of a constant, and it's not possible to revisit this decision later in the compilation pipeline after the ashr exact is removed. Keeping a constant close to power-of-2 (pow2val + 1) should be no worse than neutral, and in some cases may allow better codegen later on for targets that can more cheaply generate power of 2 (which may be selectable if converting back to setle/setge) or near power of 2 constants.

Alive2 proofs:
https://alive2.llvm.org/ce/z/2BmPnq and
https://alive2.llvm.org/ce/z/DtuhnR

Discussion:

More test changes are obviously required if landing this, but I thought I'd put this out there to get feedback on the approach as I'm not sure there's any precedent on making a heuristic choice for constants in instcombine. If it were possible to do this later in the pipeline (e.g. riscvcodegenprepare or later) I would, but as noted above once you lose the ashr exact you no longer know it's legal to change the constant. I don't have a benchmark this makes a huge difference on, but came across it when chasing something else (and intuitively, it seems plausible that making this kind of compare materialisable with a single instruction may reduce register pressure in the right cases).

The motivating example was something like:

define dso_local signext i32 @foo(ptr noundef %0, ptr noundef %1) local_unnamed_addr #0 {
  %3 = ptrtoint ptr %1 to i64
  %4 = ptrtoint ptr %0 to i64
  %5 = sub i64 %3, %4
  %6 = sdiv exact i64 %5, 32
  %7 = icmp sle i64 %6, 64
  br i1 %7, label %8, label %9

8:                                                ; preds = %2
  call void @abort() #2
  unreachable

9:                                                ; preds = %2
  ret i32 0

}

You get this kind of code from pointer arithmetic like end - beg <= CONST in C.

With this patch this can be compiled to:

	sub	a1, a1, a0
	bseti	a0, zero, 11
	bge	a0, a1, .LBB0_2
...

But without it you get an awkward to generate constant:

	sub	a1, a1, a0
	lui	a0, 1
	addiw	a0, a0, -2017
	bge	a0, a1, .LBB0_2
...

I'm arguing that producing pow2 or close to pow2 constants if you can is a good default that shouldn't hurt and may help - but perhaps there are cases I haven't considered?

llvmbot · 2024-03-21T11:27:47Z

@llvm/pr-subscribers-llvm-transforms

Author: Alex Bradbury (asb)

Changes

We have flexibility in what constant to use when combining an ashr exact with a slt or ult of a constant, and it's not possible to revisit this decision later in the compilation pipeline after the ashr exact is removed. Keeping a constant close to power-of-2 (pow2val + 1) should be no worse than neutral, and in some cases may allow better codegen later on for targets that can more cheaply generated power of 2 (which may be selectable if converting back to setle/setge) or near power of 2 constants.

Alive2: <https://alive2.llvm.org/ce/z/anqsyz> and
<https://alive2.llvm.org/ce/z/AL9P3o>

Discussion:

More test changes are obviously required if landing this, but I thought I'd put this out there to get feedback on the approach as I'm not sure there's any precedent on making a heuristic choice for constants in instcombine. If it were possible to do this later in the pipeline (e.g. riscvcodegenprepare or later) I would, but as noted above once you lose the ashr exact you no longer know it's legal to change the constant. I don't have a benchmark this makes a huge difference on, but came across it when chasing something else (and intuitively, it seems plausible that making this kind of compare materialisable with a single instruction may reduce register pressure in the right cases).

The motivating example was something like:

define dso_local signext i32 @<!-- -->foo(ptr noundef %0, ptr noundef %1) local_unnamed_addr #<!-- -->0 {
  %3 = ptrtoint ptr %1 to i64
  %4 = ptrtoint ptr %0 to i64
  %5 = sub i64 %3, %4
  %6 = sdiv exact i64 %5, 32
  %7 = icmp sle i64 %6, 64
  br i1 %7, label %8, label %9

8:                                                ; preds = %2
  call void @<!-- -->abort() #<!-- -->2
  unreachable

9:                                                ; preds = %2
  ret i32 0

}

You get this kind of code from pointer arithmetic like end - beg <= CONST in C.

With this patch this can be compiled to:

	sub	a1, a1, a0
	bseti	a0, zero, 11
	bge	a0, a1, .LBB0_2
# %bb.1:
	li	a0, 0
	ret

But without it you get an awkward to generate constant:

	sub	a1, a1, a0
	lui	a0, 1
	addiw	a0, a0, -2017
	bge	a0, a1, .LBB0_2

I'm arguing that producing pow2 or close to pow2 constants if you can is a good default that shouldn't hurt and may help - but perhaps there are cases I haven't considered?

Full diff: https://github.com/llvm/llvm-project/pull/86111.diff

2 Files Affected:

(modified) llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp (+11)
(modified) llvm/test/Transforms/InstCombine/icmp-shr-lt-gt.ll (+2-2)

diff --git a/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp b/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
index db302d7e526844..016b16ec50f8ac 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
@@ -2482,10 +2482,21 @@ Instruction *InstCombinerImpl::foldICmpShrConstant(ICmpInst &Cmp,
   // those conditions rather than checking them. This is difficult because of
   // undef/poison (PR34838).
   if (IsAShr && Shr->hasOneUse()) {
+    if (IsExact && (Pred == CmpInst::ICMP_SLT || Pred == CmpInst::ICMP_ULT) &&
+        !C.isMinSignedValue() && (C - 1).isPowerOf2()) {
+      // When C - 1 is a power of two and the transform can be legally
+      // performed, prefer this form so the produced constant is close to a
+      // power of two.
+      // icmp slt/ult (ashr exact X, ShAmtC), C
+      // --> icmp slt/ult (C - 1) << ShAmtC) -1
+      APInt ShiftedC = (C - 1).shl(ShAmtVal) + 1;
+      return new ICmpInst(Pred, X, ConstantInt::get(ShrTy, ShiftedC));
+    }
     if (IsExact || Pred == CmpInst::ICMP_SLT || Pred == CmpInst::ICMP_ULT) {
       // When ShAmtC can be shifted losslessly:
       // icmp PRED (ashr exact X, ShAmtC), C --> icmp PRED X, (C << ShAmtC)
       // icmp slt/ult (ashr X, ShAmtC), C --> icmp slt/ult X, (C << ShAmtC)
+
       APInt ShiftedC = C.shl(ShAmtVal);
       if (ShiftedC.ashr(ShAmtVal) == C)
         return new ICmpInst(Pred, X, ConstantInt::get(ShrTy, ShiftedC));
diff --git a/llvm/test/Transforms/InstCombine/icmp-shr-lt-gt.ll b/llvm/test/Transforms/InstCombine/icmp-shr-lt-gt.ll
index 1b8efe4351c6dc..6cdf18d73c9e90 100644
--- a/llvm/test/Transforms/InstCombine/icmp-shr-lt-gt.ll
+++ b/llvm/test/Transforms/InstCombine/icmp-shr-lt-gt.ll
@@ -3379,7 +3379,7 @@ define i1 @ashrslt_01_01_exact(i4 %x) {
 
 define i1 @ashrslt_01_02_exact(i4 %x) {
 ; CHECK-LABEL: @ashrslt_01_02_exact(
-; CHECK-NEXT:    [[C:%.*]] = icmp slt i4 [[X:%.*]], 4
+; CHECK-NEXT:    [[C:%.*]] = icmp slt i4 [[X:%.*]], 3
 ; CHECK-NEXT:    ret i1 [[C]]
 ;
   %s = ashr exact i4 %x, 1
@@ -3389,7 +3389,7 @@ define i1 @ashrslt_01_02_exact(i4 %x) {
 
 define i1 @ashrslt_01_03_exact(i4 %x) {
 ; CHECK-LABEL: @ashrslt_01_03_exact(
-; CHECK-NEXT:    [[C:%.*]] = icmp slt i4 [[X:%.*]], 6
+; CHECK-NEXT:    [[C:%.*]] = icmp slt i4 [[X:%.*]], 5
 ; CHECK-NEXT:    ret i1 [[C]]
 ;
   %s = ashr exact i4 %x, 1

llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp

The upcoming patch adds logic to prefer to use constants close to power-of-two in these ashr exact + slt/ult pattersn when it has a choice on which constant can be used.

asb · 2024-05-06T06:41:56Z

I think all comments are now addressed - tests are added also. I've updated the PR description to link to the generalised proofs shared on a subthread here.

PR Link: llvm/llvm-project#86111

dtcxzyw · 2024-05-06T11:51:14Z

The implementation LGTM. However, I don't see any performance improvement/regression according to the IR diff/instruction count. Can you provide some performance data for this patch? I guess it requires the zbs extension.

dtcxzyw · 2024-05-07T06:38:17Z

However, I don't see any performance improvement/regression according to the IR diff/instruction count.

I will re-test this patch later as I noticed llvm-diff doesn't handle flags/attrs :(

PR Link: llvm/llvm-project#86111

asb · 2024-05-07T09:15:07Z

The implementation LGTM. However, I don't see any performance improvement/regression according to the IR diff/instruction count. Can you provide some performance data for this patch? I guess it requires the zbs extension.

Yes, I'd expect no real difference without zbs. I don't have performance figures for a full benchmark suite - I was analysing bedcov.c from https://github.com/attractivechaos/plb2 vs GCC and although it doesn't show up high on the profile, noted gcc managed to select bseti where we didn't. I thought this might be a simple missed isel pattern and of course ended up finding it was related to us choosing a harder to efficiently generate constant in InstCombine.

dtcxzyw · 2024-05-07T09:25:15Z

The implementation LGTM. However, I don't see any performance improvement/regression according to the IR diff/instruction count. Can you provide some performance data for this patch? I guess it requires the zbs extension.

Yes, I'd expect no real difference without zbs. I don't have performance figures for a full benchmark suite - I was analysing bedcov.c from https://github.com/attractivechaos/plb2 vs GCC and although it doesn't show up high on the profile, noted gcc managed to select bseti where we didn't. I thought this might be a simple missed isel pattern and of course ended up finding it was related to us choosing a harder to efficiently generate constant in InstCombine.

The IR diff basically looks fine to me. I am investigating the regression in pbrt-v4: https://github.com/dtcxzyw/llvm-opt-benchmark/pull/589/files#r1592080470

dtcxzyw · 2024-05-07T11:39:12Z

The implementation LGTM. However, I don't see any performance improvement/regression according to the IR diff/instruction count. Can you provide some performance data for this patch? I guess it requires the zbs extension.

Yes, I'd expect no real difference without zbs. I don't have performance figures for a full benchmark suite - I was analysing bedcov.c from https://github.com/attractivechaos/plb2 vs GCC and although it doesn't show up high on the profile, noted gcc managed to select bseti where we didn't. I thought this might be a simple missed isel pattern and of course ended up finding it was related to us choosing a harder to efficiently generate constant in InstCombine.

The IR diff basically looks fine to me. I am investigating the regression in pbrt-v4: https://github.com/dtcxzyw/llvm-opt-benchmark/pull/589/files#r1592080470

Missing fold: https://alive2.llvm.org/ce/z/eUii9T
I will post a patch later to fix this.

dtcxzyw

LGTM. Please wait for another approval.

The upcoming patch adds logic to prefer to use constants close to power-of-two in these ashr exact + slt/ult patterns when it has a choice on which constant can be used.

… exact and slt/ult of a constant We have flexibility in what constant to use when combining an `ashr exact` with a slt or ult of a constant, and it's not possible to revisit this decision later in the compilation pipeline after the `ashr exact` is removed. Keeping a constant close to power-of-2 (pow2val + 1) should be no worse than neutral, and in some cases may allow better codegen later on for targets that can more cheaply generate power of 2 or near power of 2 constants. Alive2 proofs: <https://alive2.llvm.org/ce/z/2BmPnq> and <https://alive2.llvm.org/ce/z/DtuhnR>

…lConstant` (#92773) This patch extends the transform `(icmp pred iM (shl iM %v, N), C) -> (icmp pred i(M-N) (trunc %v iM to i(M-N)), (trunc (C>>N))` to handle icmps with the flipped strictness of predicate. See the following case: ``` icmp ult i64 (shl X, 32), 8589934593 -> icmp ule i64 (shl X, 32), 8589934592 -> icmp ule i32 (trunc X, i32), 2 -> icmp ult i32 (trunc X, i32), 3 ``` Fixes the regression introduced by #86111 (comment). Alive2 proofs: https://alive2.llvm.org/ce/z/-sp5n3 `nuw` cannot be propagated as we always use `ashr` here. I don't see the value of fixing this (see the test `test_icmp_shl_nuw`).

asb requested review from nikic, preames and topperc March 21, 2024 11:27

llvmbot added the llvm:transforms label Mar 21, 2024

nikic requested a review from dtcxzyw March 21, 2024 11:33

dtcxzyw requested changes Mar 21, 2024

View reviewed changes

goldsteinn reviewed Mar 21, 2024

View reviewed changes

llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp Outdated Show resolved Hide resolved

asb force-pushed the 2024q1-instcombine-ashr-cmp-constant-pref branch from a0d036b to 7935c85 Compare May 6, 2024 06:38

asb requested a review from dtcxzyw May 6, 2024 06:40

dtcxzyw added a commit to dtcxzyw/llvm-opt-benchmark that referenced this pull request May 6, 2024

pre-commit: test PR86111

a2a89ca

PR Link: llvm/llvm-project#86111

This was referenced May 6, 2024

pre-commit: test PR86111 dtcxzyw/llvm-opt-benchmark#581

Closed

Test PR86111 dtcxzyw/llvm-ci#1138

Closed

dtcxzyw added a commit to dtcxzyw/llvm-opt-benchmark that referenced this pull request May 7, 2024

pre-commit: test PR86111

a14d422

PR Link: llvm/llvm-project#86111

dtcxzyw mentioned this pull request May 7, 2024

pre-commit: test PR86111 (New) dtcxzyw/llvm-opt-benchmark#589

Closed

dtcxzyw approved these changes May 7, 2024

View reviewed changes

nikic approved these changes May 8, 2024

View reviewed changes

asb added a commit that referenced this pull request May 10, 2024

[InstCombine] Precommit tests for #86111

452f439

The upcoming patch adds logic to prefer to use constants close to power-of-two in these ashr exact + slt/ult patterns when it has a choice on which constant can be used.

asb force-pushed the 2024q1-instcombine-ashr-cmp-constant-pref branch from 7935c85 to 8624cf8 Compare May 10, 2024 12:49

asb merged commit 3be8e2c into llvm:main May 10, 2024
3 of 4 checks passed

dtcxzyw mentioned this pull request May 10, 2024

Update diff May 10th 2024, 1:32:19 pm dtcxzyw/llvm-opt-benchmark#597

Closed

dtcxzyw mentioned this pull request May 20, 2024

[InstCombine] Try the flipped strictness of predicate in foldICmpShlConstant #92773

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[InstCombine] Prefer to keep power-of-2 constants when combining ashr exact and slt/ult of a constant #86111

[InstCombine] Prefer to keep power-of-2 constants when combining ashr exact and slt/ult of a constant #86111

asb commented Mar 21, 2024 •

edited

llvmbot commented Mar 21, 2024

asb commented May 6, 2024 •

edited

dtcxzyw commented May 6, 2024

dtcxzyw commented May 7, 2024

asb commented May 7, 2024

dtcxzyw commented May 7, 2024

dtcxzyw commented May 7, 2024

dtcxzyw left a comment

[InstCombine] Prefer to keep power-of-2 constants when combining ashr exact and slt/ult of a constant #86111

[InstCombine] Prefer to keep power-of-2 constants when combining ashr exact and slt/ult of a constant #86111

Conversation

asb commented Mar 21, 2024 • edited

llvmbot commented Mar 21, 2024

asb commented May 6, 2024 • edited

dtcxzyw commented May 6, 2024

dtcxzyw commented May 7, 2024

asb commented May 7, 2024

dtcxzyw commented May 7, 2024

dtcxzyw commented May 7, 2024

dtcxzyw left a comment

Choose a reason for hiding this comment

asb commented Mar 21, 2024 •

edited

asb commented May 6, 2024 •

edited