Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[InstCombine] Handle ICMP_EQ when flooring by constant two #73706

Conversation

antoniofrighetto
Copy link
Contributor

@antoniofrighetto antoniofrighetto commented Nov 28, 2023

Support icmp eq when reducing signed divisions by power of two to arithmetic shift right, as icmp ugt might have been canonicalized into icmp eq by the time we fold additions into ashr.

Shouldn't be needed to check directly for constant -127, as we already check that *MaskC != (SMin | (*DivC - 1)).
Alive2: https://alive2.llvm.org/ce/z/d0q2px

Fixes: #73622.

@llvmbot
Copy link
Collaborator

llvmbot commented Nov 28, 2023

@llvm/pr-subscribers-llvm-transforms

Author: Antonio Frighetto (antoniofrighetto)

Changes

Support icmp eq when reducing signed divisions by power of two to arithmetic shift right, as icmp ugt might have been canonicalized into icmp eq by the time we fold additions into ashr.

Shouldn't be needed to check directly for constant -127, as we already check that *MaskC != (SMin | (*DivC - 1)).

Fixes: #73622.


Full diff: https://github.com/llvm/llvm-project/pull/73706.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp (+12-5)
  • (modified) llvm/test/Transforms/InstCombine/add.ll (+13)
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp b/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
index 24a166906f1f46d..856fc388f855e7e 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
@@ -1234,14 +1234,21 @@ static Instruction *foldAddToAshr(BinaryOperator &Add) {
     return nullptr;
 
   // Rounding is done by adding -1 if the dividend (X) is negative and has any
-  // low bits set. The canonical pattern for that is an "ugt" compare with SMIN:
-  // sext (icmp ugt (X & (DivC - 1)), SMIN)
-  const APInt *MaskC;
+  // low bits set. It recognizes two canonical patterns:
+  // 1. For an 'ugt' cmp with the signed minimum value (SMIN), the
+  //    pattern is: sext (icmp ugt (X & (DivC - 1)), SMIN).
+  // 2. For an 'eq' cmp, the pattern is: sext (icmp eq (SMIN + 1), SMIN + 1).
+  // Note that, by the time we end up here, if possible, ugt has been
+  // canonicalized into eq.
+  const APInt *MaskC, *MaskCCmp;
   ICmpInst::Predicate Pred;
   if (!match(Add.getOperand(1),
              m_SExt(m_ICmp(Pred, m_And(m_Specific(X), m_APInt(MaskC)),
-                           m_SignMask()))) ||
-      Pred != ICmpInst::ICMP_UGT)
+                           m_APInt(MaskCCmp)))))
+    return nullptr;
+
+  if ((Pred != ICmpInst::ICMP_UGT || !MaskCCmp->isSignMask()) &&
+      (Pred != ICmpInst::ICMP_EQ || *MaskCCmp != *MaskC))
     return nullptr;
 
   APInt SMin = APInt::getSignedMinValue(Add.getType()->getScalarSizeInBits());
diff --git a/llvm/test/Transforms/InstCombine/add.ll b/llvm/test/Transforms/InstCombine/add.ll
index c35d2af42a4beae..eb986137c2cf9a9 100644
--- a/llvm/test/Transforms/InstCombine/add.ll
+++ b/llvm/test/Transforms/InstCombine/add.ll
@@ -2700,6 +2700,19 @@ define i32 @floor_sdiv(i32 %x) {
   ret i32 %r
 }
 
+define i8 @floor_sdiv_by_2(i8 %x) {
+; CHECK-LABEL: @floor_sdiv_by_2(
+; CHECK-NEXT:    [[RV:%.*]] = ashr i8 [[X:%.*]], 1
+; CHECK-NEXT:    ret i8 [[RV]]
+;
+  %div = sdiv i8 %x, 2
+  %and = and i8 %x, -127
+  %icmp = icmp eq i8 %and, -127
+  %sext = sext i1 %icmp to i8
+  %rv = add nsw i8 %div, %sext
+  ret i8 %rv
+}
+
 ; vectors work too and commute is handled by complexity-based canonicalization
 
 define <2 x i32> @floor_sdiv_vec_commute(<2 x i32> %x) {

@antoniofrighetto antoniofrighetto force-pushed the feature/instcombine_floor_div_by_two branch from d0c7665 to f34cb2d Compare November 29, 2023 08:07
@nikic
Copy link
Contributor

nikic commented Nov 29, 2023

Can you please add an alive2 proof?

Copy link
Member

@dtcxzyw dtcxzyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@nikic
Copy link
Contributor

nikic commented Nov 29, 2023

As far as I can tell, your optimization is not limited for DivC == 2, but I don't think it's valid for other cases? I've tried to transfer the generic pre-conditions in this proof, maybe I did something wrong: https://alive2.llvm.org/ce/z/XaWD7N

@antoniofrighetto
Copy link
Contributor Author

antoniofrighetto commented Nov 29, 2023

That was my concern too, but I couldn't find any instance that would violate this, so I believed restricting it to DivC == 2 was not necessary. Preconditions in the proof look correct to me.

@nikic
Copy link
Contributor

nikic commented Nov 29, 2023

Can you please also add a negative test with DivC != 2?

@antoniofrighetto antoniofrighetto force-pushed the feature/instcombine_floor_div_by_two branch from ad492c0 to 63b3b1e Compare November 29, 2023 14:10
@antoniofrighetto
Copy link
Contributor Author

Reviewed tests, thanks.

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Support `icmp eq` when reducing signed divisions by power of 2 to
arithmetic shift right, as `icmp ugt` may have been canonicalized
into `icmp eq` by the time additions are folded into `ashr`.

Fixes: llvm#73622.

Proof: https://alive2.llvm.org/ce/z/8-eUdb.
@antoniofrighetto antoniofrighetto force-pushed the feature/instcombine_floor_div_by_two branch from bd8f0a6 to 7d5f79f Compare November 30, 2023 10:57
@antoniofrighetto antoniofrighetto merged commit 7d5f79f into llvm:main Nov 30, 2023
1 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[InstCombine] Missing optimization for div_euclid(2)
4 participants