Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[InstCombine] Fold (X / C) < X and (X >> C) < X into X > 0 #85555

Merged

Conversation

Poseydon42
Copy link
Contributor

@Poseydon42 Poseydon42 commented Mar 16, 2024

@Poseydon42 Poseydon42 requested a review from nikic as a code owner March 16, 2024 22:16
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Collaborator

llvmbot commented Mar 16, 2024

@llvm/pr-subscribers-llvm-transforms

Author: None (Poseydon42)

Changes

Proofs:
https://alive2.llvm.org/ce/z/yG_wxp
https://alive2.llvm.org/ce/z/iNBZVy
https://alive2.llvm.org/ce/z/XA_HG5
This resolves #85313


Full diff: https://github.com/llvm/llvm-project/pull/85555.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp (+25)
  • (modified) llvm/test/Transforms/InstCombine/icmp-div-constant.ll (+76)
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp b/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
index 0dce0077bf1588..415ea61362776d 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
@@ -7406,6 +7406,31 @@ Instruction *InstCombinerImpl::visitICmpInst(ICmpInst &I) {
   if (Instruction *Res = foldReductionIdiom(I, Builder, DL))
     return Res;
 
+  // Folding (X / Y) < X => X > 0 for some constant Y other than 0 or 1
+  {
+    Constant *Divisor;
+    Value *Dividend;
+    if (match(Op0,
+              m_CombineOr(m_SDiv(m_Value(Dividend), m_ImmConstant(Divisor)),
+                          m_UDiv(m_Value(Dividend), m_ImmConstant(Divisor)))) &&
+        match(Op1, m_Deferred(Dividend)) &&
+        !(Divisor->isZeroValue() || Divisor->isOneValue())) {
+      return new ICmpInst(ICmpInst::ICMP_SGT, Dividend,
+                          Constant::getNullValue(Dividend->getType()));
+    }
+  }
+
+  // Another case of this fold is (X >> Y) < X => X > 0 if Y != 0
+  {
+    Constant *Shift;
+    Value *V;
+    if (match(Op0, m_LShr(m_Value(V), m_ImmConstant(Shift))) &&
+        match(Op1, m_Deferred(V)) && !Shift->isZeroValue()) {
+      return new ICmpInst(ICmpInst::ICMP_SGT, V,
+                          Constant::getNullValue(V->getType()));
+    }
+  }
+
   return Changed ? &I : nullptr;
 }
 
diff --git a/llvm/test/Transforms/InstCombine/icmp-div-constant.ll b/llvm/test/Transforms/InstCombine/icmp-div-constant.ll
index 8dcb96284685ff..0bdfa5daa547a5 100644
--- a/llvm/test/Transforms/InstCombine/icmp-div-constant.ll
+++ b/llvm/test/Transforms/InstCombine/icmp-div-constant.ll
@@ -375,3 +375,79 @@ define i1 @sdiv_eq_smin_use(i32 %x, i32 %y) {
   %r = icmp eq i32 %d, -2147483648
   ret i1 %r
 }
+
+; Folding (X / C) < X => X > 0 for C != 1
+
+define i1 @sdiv_x_by_const_cmp_x(i32 %x) {
+; CHECK-LABEL: @sdiv_x_by_const_cmp_x(
+; CHECK-NEXT:    [[TMP1:%.*]] = icmp sgt i32 [[X:%.*]], 0
+; CHECK-NEXT:    ret i1 [[TMP1]]
+;
+  %1 = sdiv i32 %x, 123
+  %2 = icmp slt i32 %1, %x
+  ret i1 %2
+}
+
+define i1 @udiv_x_by_const_cmp_x(i32 %x) {
+; CHECK-LABEL: @udiv_x_by_const_cmp_x(
+; CHECK-NEXT:    [[TMP1:%.*]] = icmp sgt i32 [[X:%.*]], 0
+; CHECK-NEXT:    ret i1 [[TMP1]]
+;
+  %1 = udiv i32 %x, 123
+  %2 = icmp slt i32 %1, %x
+  ret i1 %2
+}
+
+; Same as above but with right shift instead of division (C != 0)
+
+define i1 @lshr_x_by_const_cmp_x(i32 %x) {
+; CHECK-LABEL: @lshr_x_by_const_cmp_x(
+; CHECK-NEXT:    [[TMP1:%.*]] = icmp sgt i32 [[X:%.*]], 0
+; CHECK-NEXT:    ret i1 [[TMP1]]
+;
+  %1 = lshr i32 %x, 1
+  %2 = icmp slt i32 %1, %x
+  ret i1 %2
+}
+
+; Negative tests for the folds above
+
+define i1 @sdiv_x_by_1_cmp_x_negative(i32 %x) {
+; CHECK-LABEL: @sdiv_x_by_1_cmp_x_negative(
+; CHECK-NEXT:    ret i1 false
+;
+  %1 = sdiv i32 %x, 1
+  %2 = icmp slt i32 %1, %x
+  ret i1 %2
+}
+
+define i1 @sdiv_x_by_1_cmp_y(i32 %x, i32 %y) {
+; CHECK-LABEL: @sdiv_x_by_1_cmp_y(
+; CHECK-NEXT:    [[TMP1:%.*]] = sdiv i32 [[X:%.*]], 123
+; CHECK-NEXT:    [[TMP2:%.*]] = icmp slt i32 [[TMP1]], [[Y:%.*]]
+; CHECK-NEXT:    ret i1 [[TMP2]]
+;
+  %1 = sdiv i32 %x, 123
+  %2 = icmp slt i32 %1, %y
+  ret i1 %2
+}
+
+define i1 @lshr_x_by_0_cmp_x(i32 %x) {
+; CHECK-LABEL: @lshr_x_by_0_cmp_x(
+; CHECK-NEXT:    ret i1 false
+;
+  %1 = lshr i32 %x, 0
+  %2 = icmp slt i32 %1, %x
+  ret i1 %2
+}
+
+define i1 @lshr_x_by_const_cmp_y(i32 %x, i32 %y) {
+; CHECK-LABEL: @lshr_x_by_const_cmp_y(
+; CHECK-NEXT:    [[X:%.*]] = lshr i32 [[X1:%.*]], 1
+; CHECK-NEXT:    [[TMP1:%.*]] = icmp slt i32 [[X]], [[Y:%.*]]
+; CHECK-NEXT:    ret i1 [[TMP1]]
+;
+  %1 = lshr i32 %x, 1
+  %2 = icmp slt i32 %1, %y
+  ret i1 %2
+}

@goldsteinn
Copy link
Contributor

goldsteinn commented Mar 16, 2024

Just a note on the proofs.
You really only proved for the exact constant chosen i.e lshr x, 1 in the case of the lshr proofs. That doesn't necessarily transfer to all other constants (in fact 0 is incorrect).
See some general proofs: https://alive2.llvm.org/ce/z/4Q448E
I filled in the sdiv cases, can you please complete the udiv/lshr/ashr ones.

Edit: added proofs for eq/ne case.

@goldsteinn goldsteinn requested a review from dtcxzyw March 16, 2024 22:53
@goldsteinn
Copy link
Contributor

Also welcome and thank you for the commit. There a rough draft of a contribution guide for InstCombine here: #79007

You might find it useful.

@Poseydon42
Copy link
Contributor Author

I have modified the code to work with all predicates for udiv/lshr and unsigned predicates for sdiv, however I believe that ashr does not work with this type of fold. As you can see here, the only two predicates that it does work with are slt and sge. Should I include these two folds for ashr or should I ignore it altogether and only focus on the other 3 instructions?

@goldsteinn
Copy link
Contributor

goldsteinn commented Mar 17, 2024

I have modified the code to work with all predicates for udiv/lshr and unsigned predicates for sdiv, however I believe that ashr does not work with this type of fold. As you can see here, the only two predicates that it does work with are slt and sge. Should I include these two folds for ashr or should I ignore it altogether and only focus on the other 3 instructions?

Think made a mistake when merging your updates.
You are right about ashr. Up to you if you want to handle it in this patch.

Also please update your proofs with all cases you plan to handle.

@Poseydon42
Copy link
Contributor Author

Yeah, that merge was an accident, I'll remove it before pushing the most recent changes. It seems to me like handling ashr would just unnecessarily increase the complexity of the code without providing significant benefit, so I'll omit it. The only question I have now is whether I should add tests for all three instructions I'm working with and all 10 (or 6) comparison predicates, making it 26 tests in total, which, together with negative tests, will bring it closer to 30, or should I simply test 2-3 predicates for each of the three instructions?

@goldsteinn
Copy link
Contributor

goldsteinn commented Mar 17, 2024

Yeah, that merge was an accident, I'll remove it before pushing the most recent changes. It seems to me like handling ashr would just unnecessarily increase the complexity of the code without providing significant benefit, so I'll omit it. The only question I have now is whether I should add tests for all three instructions I'm working with and all 10 (or 6) comparison predicates, making it 26 tests in total, which, together with negative tests, will bring it closer to 30, or should I simply test 2-3 predicates for each of the three instructions?

So generally you should be testing different control flows in your code moreso than IR variants.

In this case you have all predicates have the same control flow for lshr/udiv, so maybe between the two test one equality pred, one signed, and one unsigned (i.e eq, sle, ugt). Then a negative test for each maybe with an out of bounds constant.

For sdiv maybe ne / sgt with two negative tests (out of bounds/unsigned pred).

Then you can pretty randomly make 1-2 of the tests use vec.

Please be sure to have a negative test with a vec where one vec element is invalid and the other is valid.

@Poseydon42 Poseydon42 force-pushed the icmp-div-by-constant-compare-with-dividend branch from 3bf2533 to 3f08ea9 Compare March 17, 2024 22:41
Copy link

github-actions bot commented Mar 17, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@Poseydon42 Poseydon42 force-pushed the icmp-div-by-constant-compare-with-dividend branch from 3f08ea9 to 65de414 Compare March 17, 2024 22:48
@Poseydon42
Copy link
Contributor Author

Updated proofs (26 in total) can be found here. I've fixed most of the issues mentioned here, the only two things I'm worried about are:
1.The current code does not work with vector constants that have different values in each component (e.g. take a look at test case lshr_by_const_cmp_sle_value). I believe this is caused by m_APInt not matching when a vector constant has different components, but I am unsure about that. Can something be done about this or should I leave this particular scenario unaffected by the optimization?
2. It appears that lshr already has a similar fold implemented, since test case lshr_by_const_cmp_ule_value was not affected by the changes that I've made (it folded to ret i1 true both before and after the changes to the code). Can someone confirm that this is the case, and if so, should I remove the lshr code path from my PR?

@Poseydon42 Poseydon42 force-pushed the icmp-div-by-constant-compare-with-dividend branch from 65de414 to 47cc91b Compare March 17, 2024 23:15
@goldsteinn
Copy link
Contributor

Updated proofs (26 in total) can be found here. I've fixed most of the issues mentioned here, the only two things I'm worried about are: 1.The current code does not work with vector constants that have different values in each component (e.g. take a look at test case lshr_by_const_cmp_sle_value). I believe this is caused by m_APInt not matching when a vector constant has different components, but I am unsure about that. Can something be done about this or should I leave this particular scenario unaffected by the optimization?

Yes that is one of the drawbacks of APInt, it only match if the vector is a splat (all elements that same).
You could use: m_SpecificInt_ICMP (see PatternMatch.h).
So for the div cases: m_SpecificInt_ICMP(ICmpInst::ICMP_UGT, APInt::getOneBitSet(Op0->getType()->getScalarSizeInBits(), 1)) (not the cleanest API to use).

  1. It appears that lshr already has a similar fold implemented, since test case lshr_by_const_cmp_ule_value was not affected by the changes that I've made (it folded to ret i1 true both before and after the changes to the code). Can someone confirm that this is the case, and if so, should I remove the lshr code path from my PR?

See: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Analysis/InstructionSimplify.cpp#L3172
No need to go out of your way to avoid that case. Its not extra code on your end to handle (and its correct). I wouldn't use it as a test, however, as its not going through your code path.

@Poseydon42 Poseydon42 force-pushed the icmp-div-by-constant-compare-with-dividend branch from 47cc91b to 634bbc3 Compare March 18, 2024 20:58
@Poseydon42
Copy link
Contributor Author

Right, after a painful hour of rewriting git history multiple times, it seems to me that I've covered everything that was mentioned here before. Please let me know if there's anything else for me to fix or improve.

@Poseydon42
Copy link
Contributor Author

Oops, looks like I forgot to run tests on my machine before pushing the changes. I've localised the source of one of many failing tests to this function in getelementptr.ll:

define i1 @test10(ptr %x, ptr %y) {
; CHECK-LABEL: @test10(
; CHECK-NEXT:    [[T4:%.*]] = icmp eq ptr [[X:%.*]], [[Y:%.*]]
; CHECK-NEXT:    ret i1 [[T4]]
;
  %t1 = getelementptr { i32, i32 }, ptr %x, i32 0, i32 1
  %t3 = getelementptr { i32, i32 }, ptr %y, i32 0, i32 1
  %t4 = icmp eq ptr %t1, %t3
  ret i1 %t4
}

The crash report indicates assertion failure in APInt.h within setBit(). It appears to be caused by the fact that for ptr types, bit size is reported as 0, so later in the matching code, a call to APInt::getOneBitSet essentially asks for a number that has 0 bits, with bit #0 set. I'm unsure if I should simply ignore cases like this (e.g. by adding if (Op0->getType()->getScalarSizeInBits() == 0) return nullptr), or if there's a nicer/more usual way of handling this?

@goldsteinn
Copy link
Contributor

Oops, looks like I forgot to run tests on my machine before pushing the changes. I've localised the source of one of many failing tests to this function in getelementptr.ll:

define i1 @test10(ptr %x, ptr %y) {
; CHECK-LABEL: @test10(
; CHECK-NEXT:    [[T4:%.*]] = icmp eq ptr [[X:%.*]], [[Y:%.*]]
; CHECK-NEXT:    ret i1 [[T4]]
;
  %t1 = getelementptr { i32, i32 }, ptr %x, i32 0, i32 1
  %t3 = getelementptr { i32, i32 }, ptr %y, i32 0, i32 1
  %t4 = icmp eq ptr %t1, %t3
  ret i1 %t4
}

The crash report indicates assertion failure in APInt.h within setBit(). It appears to be caused by the fact that for ptr types, bit size is reported as 0, so later in the matching code, a call to APInt::getOneBitSet essentially asks for a number that has 0 bits, with bit #0 set. I'm unsure if I should simply ignore cases like this (e.g. by adding if (Op0->getType()->getScalarSizeInBits() == 0) return nullptr), or if there's a nicer/more usual way of handling this?

I'd say just handle the splat vector case for now. If #85676 gets in before this, you could use that API.

@Poseydon42 Poseydon42 force-pushed the icmp-div-by-constant-compare-with-dividend branch from 634bbc3 to 92dd571 Compare March 19, 2024 19:30
@Poseydon42
Copy link
Contributor Author

Poseydon42 commented Mar 19, 2024

I'd say just handle the splat vector case for now. If #85676 gets in before this, you could use that API.

Done, should be passing tests now.

@Poseydon42 Poseydon42 force-pushed the icmp-div-by-constant-compare-with-dividend branch from 92dd571 to 0d36e34 Compare March 19, 2024 19:35
Copy link
Member

@dtcxzyw dtcxzyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Poseydon42
Copy link
Contributor Author

Updated proofs with ashr: https://alive2.llvm.org/ce/z/52droC

@Poseydon42 Poseydon42 force-pushed the icmp-div-by-constant-compare-with-dividend branch from 0d36e34 to 52da7a7 Compare March 19, 2024 22:24
Constant::getNullValue(Op1->getType()));
}

if (match(Op0, m_AShr(m_Specific(Op1), m_APInt(Shift))) && Shift->ugt(0) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: !Shift->isZero()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not fixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it is fixed now.

return new ICmpInst(ICmpInst::getSwappedPredicate(Pred), Op1,
Constant::getNullValue(Op1->getType()));
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: There is a bit of redundancy in that each of the return cases is identical, but at least imo its fine.

@goldsteinn
Copy link
Contributor

LGTM, wait on one more approval.

@Poseydon42 Poseydon42 requested a review from dtcxzyw March 23, 2024 19:03
@Poseydon42
Copy link
Contributor Author

Ping @nikic @dtcxzyw

dtcxzyw added a commit to dtcxzyw/llvm-opt-benchmark that referenced this pull request Mar 28, 2024
@Poseydon42 Poseydon42 force-pushed the icmp-div-by-constant-compare-with-dividend branch from 52da7a7 to 51d44fd Compare March 28, 2024 17:37
@Poseydon42 Poseydon42 requested a review from dtcxzyw March 28, 2024 21:42
Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Poseydon42 Poseydon42 force-pushed the icmp-div-by-constant-compare-with-dividend branch from 51d44fd to c203d1f Compare April 10, 2024 22:28
@Poseydon42
Copy link
Contributor Author

Thanks nikic, looks like it is now ready for another review.
Ping @dtcxzyw

@nikic nikic merged commit 462e102 into llvm:main Apr 11, 2024
4 checks passed
Copy link

@Poseydon42 Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested
by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as
the builds can include changes from many authors. It is not uncommon for your
change to be included in a build that fails due to someone else's changes, or
infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself.
This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

missed fold, x / c < x => x > 0
5 participants