Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[InstCombine] Treat umax as select(icmp eq x, 0), 1, x) in binop select fold. #65978

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

davemgreen
Copy link
Collaborator

@davemgreen davemgreen commented Sep 11, 2023

There is an existing instcombine for folding:
(A ? B : C) binop (A ? E : F) -> A ? (B binop E) : (C binop F).
However this will not combine if the select (x>=1 ? x : 1) has
been converted to a umax(x, 1). This adds code to treat the umax
as a select, allowing binops to fold into the select.

This patch handles the more special case where the predicate is the limit
extreme (such as x==0 ? x : 1), meaning it will use a ==0 predicate as
opposed to the >= predicate usually applied to a umax. According to alive this
is valid so long as the variables are not undef, are known to be non-equal
(https://alive2.llvm.org/ce/z/VAYw3J) or known to be the same value
(https://alive2.llvm.org/ce/z/4TSd7v).

Here are some proofs for different min/maxes:
umax: https://alive2.llvm.org/ce/z/kLw4J9
umin: https://alive2.llvm.org/ce/z/nvPa7p
smin: https://alive2.llvm.org/ce/z/HS8V7V
smax: https://alive2.llvm.org/ce/z/D2ocju

@llvmbot
Copy link
Collaborator

llvmbot commented Sep 11, 2023

@llvm/pr-subscribers-llvm-transforms

Changes

There is an existing instcombine in SimplifySelectsFeedingBinaryOp for folding (A ? B : C) binop (A ? E : F) -> A ? (B binop E) : (C binop F). However this will not combine if the select (x>=1 ? x : 1) has been converted to a umax(x, 1). This adds code to treat the umax as a select(x==0,1,x) if it matches the other operand, allowing binops to fold into the select/umax pair.

Full diff: https://github.com/llvm/llvm-project/pull/65978.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/InstCombine/InstructionCombining.cpp (+20)
  • (modified) llvm/test/Transforms/InstCombine/binop-select.ll (+58)
diff --git a/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
index ed8709ea4c051f7..dc551a8dba1900f 100644
--- a/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
@@ -1103,9 +1103,29 @@ Value *InstCombinerImpl::SimplifySelectsFeedingBinaryOp(BinaryOperator &I,
   Value *A, *B, *C, *D, *E, *F;
   bool LHSIsSelect = match(LHS, m_Select(m_Value(A), m_Value(B), m_Value(C)));
   bool RHSIsSelect = match(RHS, m_Select(m_Value(D), m_Value(E), m_Value(F)));
+
   if (!LHSIsSelect && !RHSIsSelect)
     return nullptr;
 
+  // Treat umax(x, 1) as select(icmp(eq, x, 0), 1, x), if it matches the other
+  // predicate.
+  auto TryMatchSelectFromUMax = [](bool LHSIsSelect, Value *RHS, bool &RHSIsSelect,
+                                   Value *A, Value *B, Value *C, Value *&D,
+                                   Value *&E, Value *&F) {
+    CmpInst::Predicate Pred;
+    Value *X;
+    if (LHSIsSelect && !RHSIsSelect &&
+        match(RHS, m_UMax(m_Value(X), m_One())) &&
+        match(A, m_c_ICmp(Pred, m_Specific(X), m_Zero())) &&
+        Pred == ICmpInst::ICMP_EQ) {
+      RHSIsSelect = true;
+      match(RHS, m_UMax(m_Value(F), m_Value(E)));
+      D = A;
+    }
+  };
+  TryMatchSelectFromUMax(LHSIsSelect, RHS, RHSIsSelect, A, B, C, D, E, F);
+  TryMatchSelectFromUMax(RHSIsSelect, LHS, LHSIsSelect, D, E, F, A, B, C);
+
   FastMathFlags FMF;
   BuilderTy::FastMathFlagGuard Guard(Builder);
   if (isa(&I)) {
diff --git a/llvm/test/Transforms/InstCombine/binop-select.ll b/llvm/test/Transforms/InstCombine/binop-select.ll
index a59e19897f061d1..b064b3447e8135b 100644
--- a/llvm/test/Transforms/InstCombine/binop-select.ll
+++ b/llvm/test/Transforms/InstCombine/binop-select.ll
@@ -403,3 +403,61 @@ define i32 @ashr_sel_op1_use(i1 %b) {
   %r = ashr i32 -2, %s
   ret i32 %r
 }
+
+
+define i32 @umax_as_select_sub(i32 %a) {
+; CHECK-LABEL: @umax_as_select_sub(
+; CHECK-NEXT:    [[C_NOT:%.*]] = icmp eq i32 [[A:%.*]], 0
+; CHECK-NEXT:    [[TMP1:%.*]] = sub i32 2, [[A]]
+; CHECK-NEXT:    [[B:%.*]] = select i1 [[C_NOT]], i32 -1, i32 [[TMP1]]
+; CHECK-NEXT:    ret i32 [[B]]
+;
+  %c = icmp ugt i32 %a, 0
+  %s = select i1 %c, i32 2, i32 0
+  %m = call i32 @llvm.umax.i32(i32 %a, i32 1)
+  %b = sub i32 %s, %m
+  ret i32 %b
+}
+
+define i32 @umax_as_select_sub_c(i32 %a) {
+; CHECK-LABEL: @umax_as_select_sub_c(
+; CHECK-NEXT:    [[C_NOT:%.*]] = icmp eq i32 [[A:%.*]], 0
+; CHECK-NEXT:    [[TMP1:%.*]] = add i32 [[A]], -2
+; CHECK-NEXT:    [[B:%.*]] = select i1 [[C_NOT]], i32 1, i32 [[TMP1]]
+; CHECK-NEXT:    ret i32 [[B]]
+;
+  %c = icmp ugt i32 %a, 0
+  %s = select i1 %c, i32 2, i32 0
+  %m = call i32 @llvm.umax.i32(i32 %a, i32 1)
+  %b = sub i32 %m, %s
+  ret i32 %b
+}
+
+define i32 @umax_as_select_add(i32 %a, i32 %x, i32 %y) {
+; CHECK-LABEL: @umax_as_select_add(
+; CHECK-NEXT:    [[C_NOT:%.*]] = icmp eq i32 [[A:%.*]], 0
+; CHECK-NEXT:    [[S:%.*]] = select i1 [[C_NOT]], i32 [[Y:%.*]], i32 [[X:%.*]]
+; CHECK-NEXT:    [[M:%.*]] = call i32 @llvm.umax.i32(i32 [[A]], i32 1)
+; CHECK-NEXT:    [[B:%.*]] = add i32 [[S]], [[M]]
+; CHECK-NEXT:    ret i32 [[B]]
+;
+  %c = icmp ugt i32 %a, 0
+  %s = select i1 %c, i32 %x, i32 %y
+  %m = call i32 @llvm.umax.i32(i32 %a, i32 1)
+  %b = add i32 %s, %m
+  ret i32 %b
+}
+
+define i32 @umax_as_select_mul(i32 %a) {
+; CHECK-LABEL: @umax_as_select_mul(
+; CHECK-NEXT:    [[TMP1:%.*]] = shl i32 [[A:%.*]], 1
+; CHECK-NEXT:    ret i32 [[TMP1]]
+;
+  %c = icmp ugt i32 %a, 0
+  %s = select i1 %c, i32 2, i32 0
+  %m = select i1 %c, i32 %a, i32 1
+  %b = mul i32 %s, %m
+  ret i32 %b
+}
+
+declare i32 @llvm.umax.i32(i32, i32)

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this specific to umin(1)? Doesn't the same work for, say, umin(2)? https://alive2.llvm.org/ce/z/gQqtpt

Or really any min/max intrinsic decomposed into select form.

@davemgreen
Copy link
Collaborator Author

Hi - thanks for taking a look and sorry this took a while to get back to. This was causing some regressions in my motivating case due to knock-on register allocation. That should hopefully be resolved now and so it leads to an improvement as you would expect.

I've been looking at trying to generalize it to more cases and deal with the undef. I'll update this patch with the more special case, which the general case can be added to but is more complex. I think I have it working, but have to check through some of the details.

Unfortunately it appears some cases are not valid with undef arguments. I've tried to limit it to valid cases (as opposed to adding freeze). I know that undef can be odd at times but I'm not sure it should be this odd.

…ct fold.

There is an existing instcombine for folding:
`(A ? B : C) binop (A ? E : F) -> A ? (B binop E) : (C binop F)`.
However this will not combine if the select `(x>=1 ? x : 1)` has
been converted to a `umax(x, 1)`. This adds code to treat the umax
as a select, allowing binops to fold into the select.

This patch handles the more special case where the predicate is the limit
extreme (such as `x==0 ? x : 1`), meaning it will use a ==0 predicate as
opposed to the >= predicate usually applied to a umax. According to alive this
is valid so long as the variables are not undef, are known to be non-equal
(https://alive2.llvm.org/ce/z/VAYw3J) or known to be the same value
(https://alive2.llvm.org/ce/z/4TSd7v).

Here are some proofs for different min/maxes:
umax: https://alive2.llvm.org/ce/z/kLw4J9
umin: https://alive2.llvm.org/ce/z/nvPa7p
smin: https://alive2.llvm.org/ce/z/HS8V7V
smax: https://alive2.llvm.org/ce/z/D2ocju
davemgreen added a commit to davemgreen/llvm-project that referenced this pull request Nov 16, 2023
This is a generalization to the code added in llvm#65978, attempting to handle more
with predicates and min/max that match. It was originally trying to use
matchDecomposedSelectPattern but I replaced that with just checking the
predicates and SPF match. The opposite conditions are also handled, which just
inverts the order of the E/F variables.

TODO: Add some proofs.
@davemgreen
Copy link
Collaborator Author

The more general cases will hopefully be something like davemgreen@7ba90fb.

@nikic
Copy link
Contributor

nikic commented Nov 16, 2023

I'm not a fan of the special undef handling here. We should either insert the necessary freeze, or, if that covers your motivating case, we can start with guarding the transform under isGuaranteedNotToBeUndefOrPoison().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants