[VPlan] Merge `fcmp uno` feeding AnyOf. #166823

fhahn · 2025-11-06T18:40:04Z

Fold
any-of (fcmp uno %A, %A), (fcmp uno %B, %B), ... ->
any-of (fcmp uno %A, %B), ...

This pattern is generated to check if any vector lane is NaN, and combining multiple compares is beneficial on architectures that have dedicated instructions.

Alive2 Proof: https://alive2.llvm.org/ce/z/vA_aoM

Combine suggested as part of #161735

llvmbot · 2025-11-06T18:40:38Z

@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)

Changes

Fold
any-of (fcmp uno %A, %A), (fcmp uno %B, %B), ... ->
any-of (fcmp uno %A, %B), ...

This pattern is generated to check if any vector lane is NaN, and combining multiple compares is beneficial on architectures that have dedicated instructions.

Alive2 Proof: https://alive2.llvm.org/ce/z/vA_aoM

Combine suggested as part of #161735

Full diff: https://github.com/llvm/llvm-project/pull/166823.diff

5 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h (+4)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+23)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/fmax-without-fast-math-flags.ll (+3-6)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/fmin-without-fast-math-flags.ll (+3-6)
(modified) llvm/test/Transforms/LoopVectorize/fmax-without-fast-math-flags-interleave.ll (+3-6)

diff --git a/llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h b/llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h
index b57c44872c1b6..8b2931637113f 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h
@@ -417,6 +417,10 @@ m_BranchOnCount(const Op0_t &Op0, const Op1_t &Op1) {
   return m_VPInstruction<VPInstruction::BranchOnCount>(Op0, Op1);
 }
 
+inline VPInstruction_match<VPInstruction::AnyOf> m_AnyOf() {
+  return m_VPInstruction<VPInstruction::AnyOf>();
+}
+
 template <typename Op0_t>
 inline VPInstruction_match<VPInstruction::AnyOf, Op0_t>
 m_AnyOf(const Op0_t &Op0) {
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index 48bd697397f41..937c420dcd6ee 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -1221,6 +1221,29 @@ static void simplifyRecipe(VPSingleDefRecipe *Def, VPTypeAnalysis &TypeInfo) {
     }
   }
 
+  // Fold any-of (fcmp uno %A, %A), (fcmp uno %B, %B), ... ->
+  //      any-of (fcmp uno %A, %B), ...
+  if (match(Def, m_AnyOf()) && Def->getNumOperands() % 2 == 0) {
+    SmallVector<VPValue *, 4> NewOps;
+    unsigned NumOps = Def->getNumOperands();
+    for (unsigned I = 0; I < NumOps; I += 2) {
+      VPValue *A, *B;
+      if (!match(
+              Def->getOperand(I),
+              m_SpecificCmp(CmpInst::FCMP_UNO, m_VPValue(A), m_Deferred(A))) ||
+          !match(Def->getOperand(I + 1),
+                 m_SpecificCmp(CmpInst::FCMP_UNO, m_VPValue(B), m_Deferred(B))))
+        break;
+
+      NewOps.push_back(Builder.createFCmp(CmpInst::FCMP_UNO, A, B));
+    }
+
+    if (NewOps.size() == NumOps / 2) {
+      VPValue *NewAnyOf = Builder.createNaryOp(VPInstruction::AnyOf, NewOps);
+      return Def->replaceAllUsesWith(NewAnyOf);
+    }
+  }
+
   // Remove redundant DerviedIVs, that is 0 + A * 1 -> A and 0 + 0 * x -> 0.
   if ((match(Def, m_DerivedIV(m_ZeroInt(), m_VPValue(A), m_One())) ||
        match(Def, m_DerivedIV(m_ZeroInt(), m_ZeroInt(), m_VPValue()))) &&
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/fmax-without-fast-math-flags.ll b/llvm/test/Transforms/LoopVectorize/AArch64/fmax-without-fast-math-flags.ll
index 3c83c01929aae..f4e5085c604f6 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/fmax-without-fast-math-flags.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/fmax-without-fast-math-flags.ll
@@ -60,12 +60,9 @@ define float @fmaxnum(ptr %src, i64 %n) {
 ; CHECK-NEXT:    [[TMP8]] = call <4 x float> @llvm.maxnum.v4f32(<4 x float> [[VEC_PHI1]], <4 x float> [[WIDE_LOAD2]])
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[IV]], 8
 ; CHECK-NEXT:    [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
-; CHECK-NEXT:    [[TMP3:%.*]] = fcmp uno <4 x float> [[WIDE_LOAD]], [[WIDE_LOAD]]
-; CHECK-NEXT:    [[TMP4:%.*]] = fcmp uno <4 x float> [[WIDE_LOAD2]], [[WIDE_LOAD2]]
-; CHECK-NEXT:    [[TMP18:%.*]] = freeze <4 x i1> [[TMP3]]
-; CHECK-NEXT:    [[TMP15:%.*]] = freeze <4 x i1> [[TMP4]]
-; CHECK-NEXT:    [[TMP5:%.*]] = or <4 x i1> [[TMP18]], [[TMP15]]
-; CHECK-NEXT:    [[TMP6:%.*]] = call i1 @llvm.vector.reduce.or.v4i1(<4 x i1> [[TMP5]])
+; CHECK-NEXT:    [[TMP18:%.*]] = fcmp uno <4 x float> [[WIDE_LOAD]], [[WIDE_LOAD2]]
+; CHECK-NEXT:    [[TMP19:%.*]] = freeze <4 x i1> [[TMP18]]
+; CHECK-NEXT:    [[TMP6:%.*]] = call i1 @llvm.vector.reduce.or.v4i1(<4 x i1> [[TMP19]])
 ; CHECK-NEXT:    [[TMP10:%.*]] = or i1 [[TMP6]], [[TMP9]]
 ; CHECK-NEXT:    br i1 [[TMP10]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/fmin-without-fast-math-flags.ll b/llvm/test/Transforms/LoopVectorize/AArch64/fmin-without-fast-math-flags.ll
index 711a9cd03ac15..89b5d1395a4a0 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/fmin-without-fast-math-flags.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/fmin-without-fast-math-flags.ll
@@ -60,12 +60,9 @@ define float @fminnum(ptr %src, i64 %n) {
 ; CHECK-NEXT:    [[TMP8]] = call <4 x float> @llvm.minnum.v4f32(<4 x float> [[VEC_PHI1]], <4 x float> [[WIDE_LOAD2]])
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[IV]], 8
 ; CHECK-NEXT:    [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
-; CHECK-NEXT:    [[TMP3:%.*]] = fcmp uno <4 x float> [[WIDE_LOAD]], [[WIDE_LOAD]]
-; CHECK-NEXT:    [[TMP4:%.*]] = fcmp uno <4 x float> [[WIDE_LOAD2]], [[WIDE_LOAD2]]
-; CHECK-NEXT:    [[TMP15:%.*]] = freeze <4 x i1> [[TMP3]]
-; CHECK-NEXT:    [[TMP18:%.*]] = freeze <4 x i1> [[TMP4]]
-; CHECK-NEXT:    [[TMP5:%.*]] = or <4 x i1> [[TMP15]], [[TMP18]]
-; CHECK-NEXT:    [[TMP6:%.*]] = call i1 @llvm.vector.reduce.or.v4i1(<4 x i1> [[TMP5]])
+; CHECK-NEXT:    [[TMP15:%.*]] = fcmp uno <4 x float> [[WIDE_LOAD]], [[WIDE_LOAD2]]
+; CHECK-NEXT:    [[TMP19:%.*]] = freeze <4 x i1> [[TMP15]]
+; CHECK-NEXT:    [[TMP6:%.*]] = call i1 @llvm.vector.reduce.or.v4i1(<4 x i1> [[TMP19]])
 ; CHECK-NEXT:    [[TMP10:%.*]] = or i1 [[TMP6]], [[TMP9]]
 ; CHECK-NEXT:    br i1 [[TMP10]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
diff --git a/llvm/test/Transforms/LoopVectorize/fmax-without-fast-math-flags-interleave.ll b/llvm/test/Transforms/LoopVectorize/fmax-without-fast-math-flags-interleave.ll
index af648df9fc5c7..eab3492ebe2fa 100644
--- a/llvm/test/Transforms/LoopVectorize/fmax-without-fast-math-flags-interleave.ll
+++ b/llvm/test/Transforms/LoopVectorize/fmax-without-fast-math-flags-interleave.ll
@@ -60,12 +60,9 @@ define float @fmaxnum(ptr %src, i64 %n) {
 ; CHECK-NEXT:    [[TMP8]] = call <4 x float> @llvm.maxnum.v4f32(<4 x float> [[VEC_PHI1]], <4 x float> [[WIDE_LOAD2]])
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[IV]], 8
 ; CHECK-NEXT:    [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
-; CHECK-NEXT:    [[TMP3:%.*]] = fcmp uno <4 x float> [[WIDE_LOAD]], [[WIDE_LOAD]]
-; CHECK-NEXT:    [[TMP4:%.*]] = fcmp uno <4 x float> [[WIDE_LOAD2]], [[WIDE_LOAD2]]
-; CHECK-NEXT:    [[TMP15:%.*]] = freeze <4 x i1> [[TMP3]]
-; CHECK-NEXT:    [[TMP18:%.*]] = freeze <4 x i1> [[TMP4]]
-; CHECK-NEXT:    [[TMP5:%.*]] = or <4 x i1> [[TMP15]], [[TMP18]]
-; CHECK-NEXT:    [[TMP6:%.*]] = call i1 @llvm.vector.reduce.or.v4i1(<4 x i1> [[TMP5]])
+; CHECK-NEXT:    [[TMP15:%.*]] = fcmp uno <4 x float> [[WIDE_LOAD]], [[WIDE_LOAD2]]
+; CHECK-NEXT:    [[TMP19:%.*]] = freeze <4 x i1> [[TMP15]]
+; CHECK-NEXT:    [[TMP6:%.*]] = call i1 @llvm.vector.reduce.or.v4i1(<4 x i1> [[TMP19]])
 ; CHECK-NEXT:    [[TMP10:%.*]] = or i1 [[TMP6]], [[TMP9]]
 ; CHECK-NEXT:    br i1 [[TMP10]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:

Fold any-of (fcmp uno %A, %A), (fcmp uno %B, %B), ... -> any-of (fcmp uno %A, %B), ... This pattern is generated to check if any vector lane is NaN, and combining multiple compares is beneficial on architectures that have dedicated instructions. Alive2 Proof: https://alive2.llvm.org/ce/z/vA_aoM

Mel-Chen · 2025-11-14T09:08:27Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+    if (NewOps.size() == NumOps / 2) {
+      VPValue *NewAnyOf = Builder.createNaryOp(VPInstruction::AnyOf, NewOps);
+      return Def->replaceAllUsesWith(NewAnyOf);
+    }


Should we clean up the recipes in NewOps if NewOps.size() != NumOps / 2?

The latest update should always use the newly created recipes, thanks

ayalz · 2025-11-15T12:17:11Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+  // Fold any-of (fcmp uno %A, %A), (fcmp uno %B, %B), ... ->
+  //      any-of (fcmp uno %A, %B), ...


Perhaps better (than all-adjacent-pairs-or-nothing is) to search for a single pair of any two operands one doing (fcmp uno %A, %A) and the other (fcmp uno %B, %B) and replace them by one operand doing (fcmp uno %A, %B), to be repeated while found?

Tests below seem to cover the numOperands == 2 case only?

Updated to keep track of unpaired compares, and merge the next matching pair. Should be tested now with tests with IC = 3 and IC =5, thanks

This adds additional test coverage for folding FCMP uno (#166823)

…r fmaxnum. This adds additional test coverage for folding FCMP uno (llvm/llvm-project#166823)

fhahn requested review from Mel-Chen, aniragil, ayalz and rengolin November 6, 2025 18:40

llvmbot added vectorizers llvm:transforms labels Nov 6, 2025

fhahn force-pushed the vplan-merge-fcmp-uno branch from c707451 to c83adad Compare November 8, 2025 21:47

Mel-Chen reviewed Nov 14, 2025

View reviewed changes

ayalz reviewed Nov 15, 2025

View reviewed changes

fhahn added a commit that referenced this pull request Nov 15, 2025

[LV] Add test with to check different interleave counts for fmaxnum.

59d2e93

This adds additional test coverage for folding FCMP uno (#166823)

Merge remote-tracking branch 'origin/main' into vplan-merge-fcmp-uno

65c171d

llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Nov 15, 2025

Automerge: [LV] Add test with to check different interleave counts fo…

908295e

…r fmaxnum. This adds additional test coverage for folding FCMP uno (llvm/llvm-project#166823)

!fixup look for unpaired ops

bde7098

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[VPlan] Merge `fcmp uno` feeding AnyOf. #166823

[VPlan] Merge `fcmp uno` feeding AnyOf. #166823

fhahn commented Nov 6, 2025

Uh oh!

llvmbot commented Nov 6, 2025 •

edited

Loading

Uh oh!

Mel-Chen Nov 14, 2025

Uh oh!

fhahn Nov 15, 2025

Uh oh!

ayalz Nov 15, 2025

Uh oh!

fhahn Nov 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		// Fold any-of (fcmp uno %A, %A), (fcmp uno %B, %B), ... ->
		// any-of (fcmp uno %A, %B), ...

[VPlan] Merge fcmp uno feeding AnyOf. #166823

Are you sure you want to change the base?

[VPlan] Merge fcmp uno feeding AnyOf. #166823

Conversation

fhahn commented Nov 6, 2025

Uh oh!

llvmbot commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Mel-Chen Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

fhahn Nov 15, 2025

Choose a reason for hiding this comment

Uh oh!

ayalz Nov 15, 2025

Choose a reason for hiding this comment

Uh oh!

fhahn Nov 15, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[VPlan] Merge `fcmp uno` feeding AnyOf. #166823

[VPlan] Merge `fcmp uno` feeding AnyOf. #166823

llvmbot commented Nov 6, 2025 •

edited

Loading