Skip to content

Conversation

LewisCrawford
Copy link
Contributor

Add support for the new maximumnum and minimumnum intrinsics in various optimizations in InstSimplify.

Also, change the behavior of optimizing maxnum(sNaN, x) to simplify to qNaN instead of x to better match the LLVM IR spec, and add more tests for sNaN behavior for all 3 max/min intrinsic types.

@LewisCrawford LewisCrawford requested a review from nikic as a code owner May 12, 2025 17:02
@llvmbot llvmbot added llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:ir llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms labels May 12, 2025
@llvmbot
Copy link
Member

llvmbot commented May 12, 2025

@llvm/pr-subscribers-backend-amdgpu
@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-llvm-ir

Author: Lewis Crawford (LewisCrawford)

Changes

Add support for the new maximumnum and minimumnum intrinsics in various optimizations in InstSimplify.

Also, change the behavior of optimizing maxnum(sNaN, x) to simplify to qNaN instead of x to better match the LLVM IR spec, and add more tests for sNaN behavior for all 3 max/min intrinsic types.


Patch is 49.41 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/139581.diff

4 Files Affected:

  • (modified) llvm/include/llvm/IR/PatternMatch.h (+15)
  • (modified) llvm/lib/Analysis/InstructionSimplify.cpp (+35-14)
  • (modified) llvm/lib/Analysis/ValueTracking.cpp (+2)
  • (modified) llvm/test/Transforms/InstSimplify/fminmax-folds.ll (+873-132)
diff --git a/llvm/include/llvm/IR/PatternMatch.h b/llvm/include/llvm/IR/PatternMatch.h
index 6242a686e7bc0..c4d1bd095847d 100644
--- a/llvm/include/llvm/IR/PatternMatch.h
+++ b/llvm/include/llvm/IR/PatternMatch.h
@@ -707,10 +707,25 @@ m_SpecificInt_ICMP(ICmpInst::Predicate Predicate, const APInt &Threshold) {
 struct is_nan {
   bool isValue(const APFloat &C) const { return C.isNaN(); }
 };
+
+struct is_snan {
+  bool isValue(const APFloat &C) const { return C.isSignaling(); }
+};
+
+struct is_qnan {
+  bool isValue(const APFloat &C) const { return C.isNaN() && !C.isSignaling(); }
+};
+
 /// Match an arbitrary NaN constant. This includes quiet and signalling nans.
 /// For vectors, this includes constants with undefined elements.
 inline cstfp_pred_ty<is_nan> m_NaN() { return cstfp_pred_ty<is_nan>(); }
 
+/// Match quiet NaN constants, including vectors with undefined elements.
+inline cstfp_pred_ty<is_qnan> m_qNaN() { return cstfp_pred_ty<is_qnan>(); }
+
+/// Match signalling NaN constants, including vectors with undefined elements.
+inline cstfp_pred_ty<is_snan> m_sNaN() { return cstfp_pred_ty<is_snan>(); }
+
 struct is_nonnan {
   bool isValue(const APFloat &C) const { return !C.isNaN(); }
 };
diff --git a/llvm/lib/Analysis/InstructionSimplify.cpp b/llvm/lib/Analysis/InstructionSimplify.cpp
index 85e3be9cc45c3..49774bcb45585 100644
--- a/llvm/lib/Analysis/InstructionSimplify.cpp
+++ b/llvm/lib/Analysis/InstructionSimplify.cpp
@@ -6415,7 +6415,8 @@ static Value *foldMinMaxSharedOp(Intrinsic::ID IID, Value *Op0, Value *Op1) {
 static Value *foldMinimumMaximumSharedOp(Intrinsic::ID IID, Value *Op0,
                                          Value *Op1) {
   assert((IID == Intrinsic::maxnum || IID == Intrinsic::minnum ||
-          IID == Intrinsic::maximum || IID == Intrinsic::minimum) &&
+          IID == Intrinsic::maximum || IID == Intrinsic::minimum ||
+          IID == Intrinsic::maximumnum || IID == Intrinsic::minimumnum) &&
          "Unsupported intrinsic");
 
   auto *M0 = dyn_cast<IntrinsicInst>(Op0);
@@ -6711,7 +6712,16 @@ Value *llvm::simplifyBinaryIntrinsic(Intrinsic::ID IID, Type *ReturnType,
   case Intrinsic::maxnum:
   case Intrinsic::minnum:
   case Intrinsic::maximum:
-  case Intrinsic::minimum: {
+  case Intrinsic::minimum:
+  case Intrinsic::maximumnum:
+  case Intrinsic::minimumnum: {
+    // In several cases here, we deviate from exact IEEE 754 semantics
+    // to enable optimizations (as allowed by the LLVM IR spec).
+    //
+    // For instance, we often return one of the arguments unmodified instead of
+    // inserting an llvm.canonicalize to transform input sNaNs into qNaNs or to
+    // respect any FTZ semantics, and sometimes assume all NaN inputs are qNaNs.
+
     // If the arguments are the same, this is a no-op.
     if (Op0 == Op1)
       return Op0;
@@ -6725,32 +6735,43 @@ Value *llvm::simplifyBinaryIntrinsic(Intrinsic::ID IID, Type *ReturnType,
       return Op0;
 
     bool PropagateNaN = IID == Intrinsic::minimum || IID == Intrinsic::maximum;
-    bool IsMin = IID == Intrinsic::minimum || IID == Intrinsic::minnum;
-
-    // minnum(X, nan) -> X
-    // maxnum(X, nan) -> X
-    // minimum(X, nan) -> nan
-    // maximum(X, nan) -> nan
-    if (match(Op1, m_NaN()))
-      return PropagateNaN ? propagateNaN(cast<Constant>(Op1)) : Op0;
+    bool PropagateSNaN = IID == Intrinsic::minnum || IID == Intrinsic::maxnum;
+    bool IsMin = IID == Intrinsic::minimum || IID == Intrinsic::minnum ||
+                 IID == Intrinsic::minimumnum;
+
+    // minnum(x, qnan) -> x
+    // maxnum(x, qnan) -> x
+    // minnum(x, snan) -> qnan
+    // maxnum(x, snan) -> qnan
+    // minimum(X, nan) -> qnan
+    // maximum(X, nan) -> qnan
+    if (match(Op1, m_NaN())) {
+      if (PropagateNaN || (PropagateSNaN && match(Op1, m_sNaN())))
+        return propagateNaN(cast<Constant>(Op1));
+      return Op0;
+    }
 
     // In the following folds, inf can be replaced with the largest finite
     // float, if the ninf flag is set.
     const APFloat *C;
     if (match(Op1, m_APFloat(C)) &&
         (C->isInfinity() || (Call && Call->hasNoInfs() && C->isLargest()))) {
-      // minnum(X, -inf) -> -inf
-      // maxnum(X, +inf) -> +inf
+      // minnum(X, -inf) -> -inf (ignoring sNaN -> qNaN propagation)
+      // maxnum(X, +inf) -> +inf (ignoring sNaN -> qNaN propagation)
       // minimum(X, -inf) -> -inf if nnan
       // maximum(X, +inf) -> +inf if nnan
+      // minimumnum(X, -inf) -> -inf
+      // maximumnum(X, +inf) -> +inf
       if (C->isNegative() == IsMin &&
           (!PropagateNaN || (Call && Call->hasNoNaNs())))
         return ConstantFP::get(ReturnType, *C);
 
       // minnum(X, +inf) -> X if nnan
       // maxnum(X, -inf) -> X if nnan
-      // minimum(X, +inf) -> X
-      // maximum(X, -inf) -> X
+      // minimum(X, +inf) -> X (ignoring quieting of sNaNs)
+      // maximum(X, -inf) -> X (ignoring quieting of sNaNs)
+      // maximumnum(X, -inf) -> X if nnan
+      // minimumnum(X, +inf) -> X if nnan
       if (C->isNegative() != IsMin &&
           (PropagateNaN || (Call && Call->hasNoNaNs())))
         return Op0;
diff --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp
index 3d403531cea2f..c01ebbce15804 100644
--- a/llvm/lib/Analysis/ValueTracking.cpp
+++ b/llvm/lib/Analysis/ValueTracking.cpp
@@ -9310,6 +9310,8 @@ Intrinsic::ID llvm::getInverseMinMaxIntrinsic(Intrinsic::ID MinMaxID) {
   case Intrinsic::minimum: return Intrinsic::maximum;
   case Intrinsic::maxnum: return Intrinsic::minnum;
   case Intrinsic::minnum: return Intrinsic::maxnum;
+  case Intrinsic::maximumnum: return Intrinsic::minimumnum;
+  case Intrinsic::minimumnum: return Intrinsic::maximumnum;
   default: llvm_unreachable("Unexpected intrinsic");
   }
 }
diff --git a/llvm/test/Transforms/InstSimplify/fminmax-folds.ll b/llvm/test/Transforms/InstSimplify/fminmax-folds.ll
index fff6cfd8a3b4b..0510a713dba0b 100644
--- a/llvm/test/Transforms/InstSimplify/fminmax-folds.ll
+++ b/llvm/test/Transforms/InstSimplify/fminmax-folds.ll
@@ -54,6 +54,70 @@ define float @test_minimum_const_nan(float %x) {
   ret float %r
 }
 
+define float @test_maximumnum_const_nan(float %x) {
+; CHECK-LABEL: @test_maximumnum_const_nan(
+; CHECK-NEXT:    ret float [[R:%.*]]
+;
+  %r = call float @llvm.maximumnum.f32(float %x, float 0x7fff000000000000)
+  ret float %r
+}
+
+define float @test_minimumnum_const_nan(float %x) {
+; CHECK-LABEL: @test_minimumnum_const_nan(
+; CHECK-NEXT:    ret float [[R:%.*]]
+;
+  %r = call float @llvm.minimumnum.f32(float %x, float 0x7fff000000000000)
+  ret float %r
+}
+
+define float @test_minnum_const_snan(float %x) {
+; CHECK-LABEL: @test_minnum_const_snan(
+; CHECK-NEXT:    ret float 0x7FFC000000000000
+;
+  %r = call float @llvm.minnum.f32(float %x, float 0x7ff4000000000000)
+  ret float %r
+}
+
+define float @test_maxnum_const_snan(float %x) {
+; CHECK-LABEL: @test_maxnum_const_snan(
+; CHECK-NEXT:    ret float 0x7FFC000000000000
+;
+  %r = call float @llvm.maxnum.f32(float %x, float 0x7ff4000000000000)
+  ret float %r
+}
+
+define float @test_maximum_const_snan(float %x) {
+; CHECK-LABEL: @test_maximum_const_snan(
+; CHECK-NEXT:    ret float 0x7FFC000000000000
+;
+  %r = call float @llvm.maximum.f32(float %x, float 0x7ff4000000000000)
+  ret float %r
+}
+
+define float @test_minimum_const_snan(float %x) {
+; CHECK-LABEL: @test_minimum_const_snan(
+; CHECK-NEXT:    ret float 0x7FFC000000000000
+;
+  %r = call float @llvm.minimum.f32(float %x, float 0x7ff4000000000000)
+  ret float %r
+}
+
+define float @test_maximumnum_const_snan(float %x) {
+; CHECK-LABEL: @test_maximumnum_const_snan(
+; CHECK-NEXT:    ret float [[X:%.*]]
+;
+  %r = call float @llvm.maximumnum.f32(float %x, float 0x7ff4000000000000)
+  ret float %r
+}
+
+define float @test_minimumnum_const_snan(float %x) {
+; CHECK-LABEL: @test_minimumnum_const_snan(
+; CHECK-NEXT:    ret float [[X:%.*]]
+;
+  %r = call float @llvm.minimumnum.f32(float %x, float 0x7ff4000000000000)
+  ret float %r
+}
+
 define float @test_minnum_const_inf(float %x) {
 ; CHECK-LABEL: @test_minnum_const_inf(
 ; CHECK-NEXT:    [[R:%.*]] = call float @llvm.minnum.f32(float [[X:%.*]], float 0x7FF0000000000000)
@@ -88,6 +152,23 @@ define float @test_minimum_const_inf(float %x) {
   ret float %r
 }
 
+define float @test_maximumnum_const_inf(float %x) {
+; CHECK-LABEL: @test_maximumnum_const_inf(
+; CHECK-NEXT:    ret float 0x7FF0000000000000
+;
+  %r = call float @llvm.maximumnum.f32(float %x, float 0x7ff0000000000000)
+  ret float %r
+}
+
+define float @test_minimumnum_const_inf(float %x) {
+; CHECK-LABEL: @test_minimumnum_const_inf(
+; CHECK-NEXT:    [[R:%.*]] = call float @llvm.minimumnum.f32(float [[X:%.*]], float 0x7FF0000000000000)
+; CHECK-NEXT:    ret float [[R]]
+;
+  %r = call float @llvm.minimumnum.f32(float %x, float 0x7ff0000000000000)
+  ret float %r
+}
+
 define float @test_minnum_const_neg_inf(float %x) {
 ; CHECK-LABEL: @test_minnum_const_neg_inf(
 ; CHECK-NEXT:    ret float 0xFFF0000000000000
@@ -122,6 +203,23 @@ define float @test_minimum_const_neg_inf(float %x) {
   ret float %r
 }
 
+define float @test_maximumnum_const_neg_inf(float %x) {
+; CHECK-LABEL: @test_maximumnum_const_neg_inf(
+; CHECK-NEXT:    [[X:%.*]] = call float @llvm.maximumnum.f32(float [[X1:%.*]], float 0xFFF0000000000000)
+; CHECK-NEXT:    ret float [[X]]
+;
+  %r = call float @llvm.maximumnum.f32(float %x, float 0xfff0000000000000)
+  ret float %r
+}
+
+define float @test_minimumnum_const_neg_inf(float %x) {
+; CHECK-LABEL: @test_minimumnum_const_neg_inf(
+; CHECK-NEXT:    ret float 0xFFF0000000000000
+;
+  %r = call float @llvm.minimumnum.f32(float %x, float 0xfff0000000000000)
+  ret float %r
+}
+
 define float @test_minnum_const_inf_nnan(float %x) {
 ; CHECK-LABEL: @test_minnum_const_inf_nnan(
 ; CHECK-NEXT:    ret float [[X:%.*]]
@@ -154,6 +252,22 @@ define float @test_minimum_const_inf_nnan(float %x) {
   ret float %r
 }
 
+define float @test_maximumnum_const_inf_nnan(float %x) {
+; CHECK-LABEL: @test_maximumnum_const_inf_nnan(
+; CHECK-NEXT:    ret float 0x7FF0000000000000
+;
+  %r = call nnan float @llvm.maximumnum.f32(float %x, float 0x7ff0000000000000)
+  ret float %r
+}
+
+define float @test_minimumnum_const_inf_nnan(float %x) {
+; CHECK-LABEL: @test_minimumnum_const_inf_nnan(
+; CHECK-NEXT:    ret float [[X:%.*]]
+;
+  %r = call nnan float @llvm.minimumnum.f32(float %x, float 0x7ff0000000000000)
+  ret float %r
+}
+
 define float @test_minnum_const_inf_nnan_comm(float %x) {
 ; CHECK-LABEL: @test_minnum_const_inf_nnan_comm(
 ; CHECK-NEXT:    ret float [[X:%.*]]
@@ -186,6 +300,22 @@ define float @test_minimum_const_inf_nnan_comm(float %x) {
   ret float %r
 }
 
+define float @test_maximumnum_const_inf_nnan_comm(float %x) {
+; CHECK-LABEL: @test_maximumnum_const_inf_nnan_comm(
+; CHECK-NEXT:    ret float 0x7FF0000000000000
+;
+  %r = call nnan float @llvm.maximumnum.f32(float 0x7ff0000000000000, float %x)
+  ret float %r
+}
+
+define float @test_minimumnum_const_inf_nnan_comm(float %x) {
+; CHECK-LABEL: @test_minimumnum_const_inf_nnan_comm(
+; CHECK-NEXT:    ret float [[X:%.*]]
+;
+  %r = call nnan float @llvm.minimumnum.f32(float 0x7ff0000000000000, float %x)
+  ret float %r
+}
+
 define <2 x float> @test_minnum_const_inf_nnan_comm_vec(<2 x float> %x) {
 ; CHECK-LABEL: @test_minnum_const_inf_nnan_comm_vec(
 ; CHECK-NEXT:    ret <2 x float> [[X:%.*]]
@@ -218,6 +348,22 @@ define <2 x float> @test_minimum_const_inf_nnan_comm_vec(<2 x float> %x) {
   ret <2 x float> %r
 }
 
+define <2 x float> @test_maximumnum_const_inf_nnan_comm_vec(<2 x float> %x) {
+; CHECK-LABEL: @test_maximumnum_const_inf_nnan_comm_vec(
+; CHECK-NEXT:    ret <2 x float> splat (float 0x7FF0000000000000)
+;
+  %r = call nnan <2 x float> @llvm.maximumnum.v2f32(<2 x float> <float 0x7ff0000000000000, float 0x7ff0000000000000>, <2 x float> %x)
+  ret <2 x float> %r
+}
+
+define <2 x float> @test_minimumnum_const_inf_nnan_comm_vec(<2 x float> %x) {
+; CHECK-LABEL: @test_minimumnum_const_inf_nnan_comm_vec(
+; CHECK-NEXT:    ret <2 x float> [[X:%.*]]
+;
+  %r = call nnan <2 x float> @llvm.minimumnum.v2f32(<2 x float> <float 0x7ff0000000000000, float 0x7ff0000000000000>, <2 x float> %x)
+  ret <2 x float> %r
+}
+
 define float @test_minnum_const_neg_inf_nnan(float %x) {
 ; CHECK-LABEL: @test_minnum_const_neg_inf_nnan(
 ; CHECK-NEXT:    ret float 0xFFF0000000000000
@@ -250,6 +396,22 @@ define float @test_minimum_const_neg_inf_nnan(float %x) {
   ret float %r
 }
 
+define float @test_maximumnum_const_neg_inf_nnan(float %x) {
+; CHECK-LABEL: @test_maximumnum_const_neg_inf_nnan(
+; CHECK-NEXT:    ret float [[X:%.*]]
+;
+  %r = call nnan float @llvm.maximumnum.f32(float %x, float 0xfff0000000000000)
+  ret float %r
+}
+
+define float @test_minimumnum_const_neg_inf_nnan(float %x) {
+; CHECK-LABEL: @test_minimumnum_const_neg_inf_nnan(
+; CHECK-NEXT:    ret float 0xFFF0000000000000
+;
+  %r = call nnan float @llvm.minimumnum.f32(float %x, float 0xfff0000000000000)
+  ret float %r
+}
+
 define float @test_minnum_const_max(float %x) {
 ; CHECK-LABEL: @test_minnum_const_max(
 ; CHECK-NEXT:    [[R:%.*]] = call float @llvm.minnum.f32(float [[X:%.*]], float 0x47EFFFFFE0000000)
@@ -286,6 +448,24 @@ define float @test_minimum_const_max(float %x) {
   ret float %r
 }
 
+define float @test_maximumnum_const_max(float %x) {
+; CHECK-LABEL: @test_maximumnum_const_max(
+; CHECK-NEXT:    [[R:%.*]] = call float @llvm.maximumnum.f32(float [[X:%.*]], float 0x47EFFFFFE0000000)
+; CHECK-NEXT:    ret float [[R]]
+;
+  %r = call float @llvm.maximumnum.f32(float %x, float 0x47efffffe0000000)
+  ret float %r
+}
+
+define float @test_minimumnum_const_max(float %x) {
+; CHECK-LABEL: @test_minimumnum_const_max(
+; CHECK-NEXT:    [[R:%.*]] = call float @llvm.minimumnum.f32(float [[X:%.*]], float 0x47EFFFFFE0000000)
+; CHECK-NEXT:    ret float [[R]]
+;
+  %r = call float @llvm.minimumnum.f32(float %x, float 0x47efffffe0000000)
+  ret float %r
+}
+
 define float @test_minnum_const_neg_max(float %x) {
 ; CHECK-LABEL: @test_minnum_const_neg_max(
 ; CHECK-NEXT:    [[R:%.*]] = call float @llvm.minnum.f32(float [[X:%.*]], float 0xC7EFFFFFE0000000)
@@ -322,6 +502,24 @@ define float @test_minimum_const_neg_max(float %x) {
   ret float %r
 }
 
+define float @test_maximumnum_const_neg_max(float %x) {
+; CHECK-LABEL: @test_maximumnum_const_neg_max(
+; CHECK-NEXT:    [[R:%.*]] = call float @llvm.maximumnum.f32(float [[X:%.*]], float 0xC7EFFFFFE0000000)
+; CHECK-NEXT:    ret float [[R]]
+;
+  %r = call float @llvm.maximumnum.f32(float %x, float 0xc7efffffe0000000)
+  ret float %r
+}
+
+define float @test_minimumnum_const_neg_max(float %x) {
+; CHECK-LABEL: @test_minimumnum_const_neg_max(
+; CHECK-NEXT:    [[R:%.*]] = call float @llvm.minimumnum.f32(float [[X:%.*]], float 0xC7EFFFFFE0000000)
+; CHECK-NEXT:    ret float [[R]]
+;
+  %r = call float @llvm.minimumnum.f32(float %x, float 0xc7efffffe0000000)
+  ret float %r
+}
+
 define float @test_minnum_const_max_ninf(float %x) {
 ; CHECK-LABEL: @test_minnum_const_max_ninf(
 ; CHECK-NEXT:    [[R:%.*]] = call ninf float @llvm.minnum.f32(float [[X:%.*]], float 0x47EFFFFFE0000000)
@@ -356,6 +554,23 @@ define float @test_minimum_const_max_ninf(float %x) {
   ret float %r
 }
 
+define float @test_maximumnum_const_max_ninf(float %x) {
+; CHECK-LABEL: @test_maximumnum_const_max_ninf(
+; CHECK-NEXT:    ret float 0x47EFFFFFE0000000
+;
+  %r = call ninf float @llvm.maximumnum.f32(float %x, float 0x47efffffe0000000)
+  ret float %r
+}
+
+define float @test_minimumnum_const_max_ninf(float %x) {
+; CHECK-LABEL: @test_minimumnum_const_max_ninf(
+; CHECK-NEXT:    [[R:%.*]] = call ninf float @llvm.minimumnum.f32(float [[X:%.*]], float 0x47EFFFFFE0000000)
+; CHECK-NEXT:    ret float [[R]]
+;
+  %r = call ninf float @llvm.minimumnum.f32(float %x, float 0x47efffffe0000000)
+  ret float %r
+}
+
 define float @test_minnum_const_neg_max_ninf(float %x) {
 ; CHECK-LABEL: @test_minnum_const_neg_max_ninf(
 ; CHECK-NEXT:    ret float 0xC7EFFFFFE0000000
@@ -390,6 +605,23 @@ define float @test_minimum_const_neg_max_ninf(float %x) {
   ret float %r
 }
 
+define float @test_maximumnum_const_neg_max_ninf(float %x) {
+; CHECK-LABEL: @test_maximumnum_const_neg_max_ninf(
+; CHECK-NEXT:    [[X:%.*]] = call ninf float @llvm.maximumnum.f32(float [[X1:%.*]], float 0xC7EFFFFFE0000000)
+; CHECK-NEXT:    ret float [[X]]
+;
+  %r = call ninf float @llvm.maximumnum.f32(float %x, float 0xc7efffffe0000000)
+  ret float %r
+}
+
+define float @test_minimumnum_const_neg_max_ninf(float %x) {
+; CHECK-LABEL: @test_minimumnum_const_neg_max_ninf(
+; CHECK-NEXT:    ret float 0xC7EFFFFFE0000000
+;
+  %r = call ninf float @llvm.minimumnum.f32(float %x, float 0xc7efffffe0000000)
+  ret float %r
+}
+
 define float @test_minnum_const_max_nnan_ninf(float %x) {
 ; CHECK-LABEL: @test_minnum_const_max_nnan_ninf(
 ; CHECK-NEXT:    ret float [[X:%.*]]
@@ -422,6 +654,22 @@ define float @test_minimum_const_max_nnan_ninf(float %x) {
   ret float %r
 }
 
+define float @test_maximumnum_const_max_nnan_ninf(float %x) {
+; CHECK-LABEL: @test_maximumnum_const_max_nnan_ninf(
+; CHECK-NEXT:    ret float 0x47EFFFFFE0000000
+;
+  %r = call nnan ninf float @llvm.maximumnum.f32(float %x, float 0x47efffffe0000000)
+  ret float %r
+}
+
+define float @test_minimumnum_const_max_nnan_ninf(float %x) {
+; CHECK-LABEL: @test_minimumnum_const_max_nnan_ninf(
+; CHECK-NEXT:    ret float [[X:%.*]]
+;
+  %r = call nnan ninf float @llvm.minimumnum.f32(float %x, float 0x47efffffe0000000)
+  ret float %r
+}
+
 define float @test_minnum_const_neg_max_nnan_ninf(float %x) {
 ; CHECK-LABEL: @test_minnum_const_neg_max_nnan_ninf(
 ; CHECK-NEXT:    ret float 0xC7EFFFFFE0000000
@@ -454,8 +702,24 @@ define float @test_minimum_const_neg_max_nnan_ninf(float %x) {
   ret float %r
 }
 
+define float @test_maximumnum_const_neg_max_nnan_ninf(float %x) {
+; CHECK-LABEL: @test_maximumnum_const_neg_max_nnan_ninf(
+; CHECK-NEXT:    ret float [[X:%.*]]
+;
+  %r = call nnan ninf float @llvm.maximumnum.f32(float %x, float 0xc7efffffe0000000)
+  ret float %r
+}
+
+define float @test_minimumnum_const_neg_max_nnan_ninf(float %x) {
+; CHECK-LABEL: @test_minimumnum_const_neg_max_nnan_ninf(
+; CHECK-NEXT:    ret float 0xC7EFFFFFE0000000
+;
+  %r = call nnan ninf float @llvm.minimumnum.f32(float %x, float 0xc7efffffe0000000)
+  ret float %r
+}
+
 ; From the LangRef for minnum/maxnum:
-; "If either operand is a NaN, returns the other non-NaN operand."
+; "If either operand is a qNaN, returns the other non-NaN operand. Returns NaN only if both operands are NaN or if either operand is sNaN."
 
 define double @maxnum_nan_op0(double %x) {
 ; CHECK-LABEL: @maxnum_nan_op0(
@@ -521,115 +785,99 @@ define <2 x double> @minnum_nan_op1_vec(<2 x double> %x) {
   ret <2 x double> %r
 }
 
-define float @maxnum_undef_op1(float %x) {
-; CHECK-LABEL: @maxnum_undef_op1(
-; CHECK-NEXT:    ret float [[X:%.*]]
-;
-  %val = call float @llvm.maxnum.f32(float %x, float undef)
-  ret float %val
-}
-
-define float @maxnum_poison_op1(float %x) {
-; CHECK-LABEL: @maxnum_poison_op1(
-; CHECK-NEXT:    ret float [[X:%.*]]
-;
-  %val = call float @llvm.maxnum.f32(float %x, float poison)
-  ret float %val
-}
-
-define float @maxnum_undef_op0(float %x) {
-; CHECK-LABEL: @maxnum_undef_op0(
-; CHECK-NEXT:    ret float [[X:%.*]]
+define double @maxnum_snan_op0(double %x) {
+; CHECK-LABEL: @maxnum_snan_op0(
+; CHECK-NEXT:    ret double 0x7FFC000000000000
 ;
-  %val = call float @llvm.maxnum.f32(float undef, float %x)
-  ret float %val
+  %r = call double @llvm.maxnum.f64(double 0x7ff4000000000000, double %x)
+  ret double %r
 }
 
-define float @maxnum_poison_op0(float %x) {
-; CHECK-LABEL: @maxnum_poison_op0(
-; CHECK-NEXT:    ret float [[X:%.*]]
+define double @maxnum_snan_op1(double %x) {
+; CHECK-LABEL: @maxnum_snan_op1(
+; CHECK-NEXT:    ret double 0x7FFC00000000DEAD
 ;
-  %val = call float @llvm.maxnum.f32(float poison, float %x)
-  ret float %val
+  %r = call double @llvm.maxnum.f64(double %x, double 0x7ff400000000dead)
+  ret double %r
 }
 
-define float @minnum_undef_op1(float %x) {
-; CHECK-LABEL: @minnum_undef_op1(
-; CHECK-NEXT:    ret float [[X:%.*]]
+define double @minnum_snan_op0(double %x) {
+; CHECK-LABEL: @minnum_snan_op0(
+; CHECK-NEXT:    ret double 0x7FFC000DEAD00000
 ;
-  %val = call float @llvm.minnum.f32(float %x, float undef)
-  ret float %val
+  %r = call double @llvm.minnum.f64(double 0x7ff4000dead00000, d...
[truncated]

Copy link

github-actions bot commented May 12, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@LewisCrawford LewisCrawford force-pushed the instsimplify_maximumnum branch from 319224c to 5c41a4e Compare May 12, 2025 17:05
@LewisCrawford LewisCrawford force-pushed the instsimplify_maximumnum branch 2 times, most recently from 2248c98 to 5751b4a Compare May 13, 2025 09:55
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just documenting a bug, you can't rely on the simple matcher. Either use m_APFloat and check if it's a splat quiet or signaling nan, or you have to do the complicated handling to process this per element

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, this is still legal (if slightly surprising) behaviour because of this wording in the spec:

Floating-point math operations are allowed to treat all NaNs as if they were quiet NaNs. For example, “pow(1.0, SNaN)” may be simplified to 1.0.

What would your preferred behaviour here be for the <sNaN, qNaN> case and <sNaN, poison> case? The way the langref states it, it seems like the preferred approach is to sacrifice sNaN correctness any time it allows an optimization (and only try to preserve it when using StrictFP or constrained FP intrinsics).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed it now so we disable the optimization for maxnum/minnum if there is a mix of sNaNs and qNaNs in the vector. This is probably such an edge-case that it won't impact real-world optimizations, so we might as well try and preserve correctness there. I've kept using the simple matchers to still allow for undef and poison elements.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minnum/maxnum are a special case and require getting the snan handling right, because they have inverted behavior with an snan input.

but in general it's a better QoI to get the correct handling and quiet the returned value. The dropped canonicalization is more useful for instruction optimizations that may lose a quiet. In the constant folding case we can handle it elementwise, it's just more of a hassle

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot handle maxnum(<x, y>, <sNaN, qNaN>) elementwise here if we want to propagate the sNaN, because we would need to return <qNaN, y>, which would require a vector-shuffle instruction, and InstructionSimplify is not allowed to insert new instruction. It can only return either the full vector <x, y> or a new constant vector. In the new elementwise version I've just written, I've kept this behaviour of just disabling the optimization in this case, as we'd need 1 element from the LHS, and 1 from the RHS.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this chain of nan-matchers, it might actually be easier to just do the elementwise handling - propgqtenan already has the elementwise code in it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think element-wise comparisons will end up taking this slightly out of the current scope of the patch, as it will introduce cases like: maxnum(<x, y>, <sNaN, qNaN>) -> <qNaN, y> , which will require insert/extract chains to be created, and might end up with cases where there are not clear perf wins from re-creating partially applied vectors where some elements are NaN vs just using maxnum on the original vector despite some elements being NaN. To avoid these partially applied cases, we'd need to check whether the whole vec is sNaN or qNaN (ignoring poison/undef elements), which is what the current patch does with the matchers already.

It might be worth considering these kind of element-wise folds in future (and for partial Inf, partial poison, partial FMAX vectors too in addition to NaNs maybe), but I think that would be better-suited to a separate patch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now rewritten the whole thing elementwise to avoid using any pattern-matchers (or calls to propegateNaN). It now allows for combinations of different foldable vector elements, as long as we return either a full LHS non-constant value, or a full constant value based on the const RHS arg (with some NaN quieting applied).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arsenm would you be able to take a look at the new elementwise code and see if it looks ok to merge soon?

Add support for the new maximumnum and minimumnum intrinsics in various optimizations in InstSimplify.

Also, change the behavior of optimizing maxnum(sNaN, x) to simplify to qNaN instead of x to better match the LLVM IR spec, and add more tests for sNaN behavior for all 3 max/min intrinsic types.
- Add tests for vectors of <snan, qnan> pairs.
- Add more vector tests for <qnan, poison> and <snan, poison>
- Remove comment wording about FTZ.
- Clarify behavior of fmaxnum(x, <snan, qnan>) -> x in comments (assume snan == qnan)
Add a vec2 variant for all the +inf/-inf/+max/-max
tests for maxnum/maximum/maximumnum.
Instead of transforming maxnum(x, <sNaN, qNaN>) -> x,
avoid folding the maxnum away so the sNaN element can
correctly be propagated as a qNaN.
Update some failing AMDGPU tests using maxnum's updated sNaN behaviour
and maximumnum's new set of optimizations.
Rewrite all the fmin/fmax(x, const) optimizations in
InstSimplify to use elementwise checking instead of
pattern-matching.

This allows for improved splat-vector handling, as well as
handling of mixed constant vectors e.g. <sNan, Inf, poison>.
@LewisCrawford LewisCrawford force-pushed the instsimplify_maximumnum branch from 995a1a3 to 4e7a52e Compare July 3, 2025 14:41
@LewisCrawford LewisCrawford requested a review from arsenm July 3, 2025 14:51
Remove sNaN + qNaN patterns added in earlier version of
the patch, since they are no longer used in the new
elementwise approach.
Re-generate 2 AMDGPU tests that are failing after rebasing this change:
 - llvm/test/CodeGen/AMDGPU/fneg-combines.new.ll
 - llvm/test/Transforms/InstCombine/AMDGPU/fmed3.ll
Put the elementwise checks for fixed-vector elements in an else block
after checking for splats so that fixed-vector splats do not get
checked twice if they fail to be optimized in the first check.
@LewisCrawford
Copy link
Contributor Author

LewisCrawford commented Aug 18, 2025

Can you elaborate, please? AFAICT, https://docs.nvidia.com/cuda/parallel-thread-execution/#floating-point-instructions-max makes no promises about the instruction behavior regarding sNan/qNan handling.

This is the problem. PTX does not treat sNaN/qNaN differently, and will return the other non-NaN input, whereas LLVM's maxnum/minnum intrinsics state that they must return NaN if either operand is sNaN. So the maximumnum/minimumnum intrinsics match the semantics of PTX better since they don't make any distinction between sNaNs and qNaNs.

That said, unlike maxnum the instruction does appear to preserve signed zero order, so in that sense it does match maximumnum better.

That is true for IEEE maxnum, but for LLVM maxnum/minnum the IR spec states: "Follows the semantics of minNum in IEEE-754-2008, except that -0.0 < +0.0 for the purposes of this intrinsic". All 3 LLVM intrinsic variants maxnum/maximum/maximumnum treat -0.0 < +0.0 here like PTX expects (except if there are nsz flags present). The only minor semantic issue here is for sNaNs.

In any case, the current handling of nvvm_fmax/fmin has been this way for a pretty long time now (698c31b), and I'd be somewhat reluctant to change that.

The maximumnum/minimumnum intrinsics were only added in June 2024, so it makes sense that NVVM used maxnum/minnum in 2017 as the closest alternative at the time. At the moment, maximumnum does not seem well supported by many passes yet, for instance it did not have const-folding until I added it a few months ago ( #138700 ). The NVPTX backend does not support it nicely yet either, and generates:

setp.nan.f32    %p1, %r1, %r1;
selp.f32        %r3, %r2, %r1, %p1;
setp.nan.f32    %p2, %r2, %r2;
selp.f32        %r4, %r3, %r2, %p2;
setp.gt.f32     %p3, %r3, %r4;
selp.f32        %r5, %r3, %r4, %p3;
setp.eq.s32     %p4, %r3, 0;
selp.f32        %r6, %r3, %r5, %p4;
setp.eq.s32     %p5, %r4, 0;
selp.f32        %r7, %r4, %r6, %p5;
setp.eq.f32     %p6, %r5, 0f00000000;
selp.f32        %r8, %r7, %r5, %p6;

instead of:

max.f32 %r3, %r1, %r2;

So there's quite a long way to go before we can switch to using maximumnum here instead of maxnum. I've started by trying to bring middle-end passes like InstSimplify/ConstantFolding into alignment with the optimizations, and @arsenm has also been doing a lot of work recently improving support for maximumnum/minimumnum in other passes like ValueTracking too (e.g. #138729, #138737 ), so there is definitely some momentum towards it eventually getting support parity with maxnum.

We may want to introduce a new intrinsic to provide IEEE-compliant behaviour. Even it it will be slower initially, we can live with that for now.

As far as I can tell, PTX max.f32 has exactly the same semantics as LLVM maximumnum, so I don't think there is any benefit of introducing a new NVPTX-specific intrinsic. In the long-term, it would be good to map nvvm.fmax -> llvm.maximumnum -> PTX max.f32. If other code uses llvm.maxnum, it would technically need some extra checks added for sNaNs so it can return qNaN in those cases to match the LLVM IR semantics, since PTX max.f32 will return the other non-NaN argument instead. However, since we don't care too much about sNaNs in NVPTX, it may be fine to just keep that mapped to PTX max.f32 as well, at least in the short term.

Whether or not NVPTX does switch from nvvm.fmax -> llvm.maxnum to nvvm.fmax -> llvm.maximumnum in the long-term, I still think it would be good to improve the support for llvm.maximumnum in passes like this for the ecosystem in general (and add the sNaN correctness fixes for maxnum here too).

Swap min/max lines to be in consistent order with other similar
comments.
Make the code simpler, shorter, and clearer by:
 * Splitting the lambda out into a separate function
 * Instead of returning an optional of a bool + value, use an enum +
   value
 * The new enum specifies whether to avoid optimizing, return LHS,
   return const, or return either.
 * Use the same enum to cover the logic of both the individual scalar
   elements, and the vector as a whole.
 * Use this enum to cover the fact that undef/poison could choose either
   side, rather than tracking it with a separate check.
 * Extract common code from the splat vs fixed vs scalar paths.
 * Merge undef handling with the above approach too, rather than using a
   separate check.
 * Simplify some comments.
@LewisCrawford LewisCrawford force-pushed the instsimplify_maximumnum branch from 13060ae to 12c05ac Compare August 20, 2025 13:12
Copy link
Member

Artem-B commented Aug 20, 2025

PTX does not treat sNaN/qNaN differently, and will return the other non-NaN input, whereas LLVM's maxnum/minnum intrinsics state that they must return NaN if either operand is sNaN.

So, the practical impact is that the code that currently uses nvvm_fmin/fmax will end up always propagating sNaNs, while the actual fmax/fmin PTX instruction would not.

The actual problem is that the users never got the "just fmax/fmin instruction" they intended to have, and by now that behavior has effectively become part of the API. The good news is that nvvm.fmin/fmax do not appear to be widely used. At least I can't find any references to them outside of LLVM/clang and their tests, in our code base.

So the maximumnum/minimumnum intrinsics match the semantics of PTX better since they don't make any distinction between sNaNs and qNaNs.

Agreed. The switch itself is trivial, so we should be able to revert the change easily, if there are problems.

Adding lowering of maximumnum/minimumnum -> fmax.f32/fmin.f32 should be pretty straightforward, too.

@LewisCrawford
Copy link
Contributor Author

So, the practical impact is that the code that currently uses nvvm_fmin/fmax will end up always propagating sNaNs, while the actual fmax/fmin PTX instruction would not.

Yes, although the sNaN propagation issue could only happen in the middle-end, since nvvm_fmax currently becomes PTX max, so it's only during the middle-end time when it is interpreted as llvm.maxnum that there could be any semantic differences (e.g. during constant-folding).

Also, this InstSimplify optimization I'm updating in this patch did not handle maxnum(x, sNaN) -> qNaN before (it used to only do maxnum(x, NaN) -> x in the same way PTX max does), so this different behaviour might become slightly more apparent after the sNaN correctness fixes included in my patch here.

nvvm.fmin/fmax do not appear to be widely used

They are widely used in NVCC.

Adding lowering of maximumnum/minimumnum -> fmax.f32/fmin.f32 should be pretty straightforward, too.

Yes, I'm working on a patch for this at the moment, which I should hopefully be able to upstream soon. It's 5 lines of code plus some tests, so was very straightforward to add.

@LewisCrawford
Copy link
Contributor Author

Here is the NVPTX maximumnum ISel patch I mentioned above: #155804

I think we'll need to land this InstSimplify patch, the above ISel patch, and maybe a couple of other improvements like better SelectionDAG expansion of v2f16 maximumnum calls in ISel before we'd be fully ready to use llvm.maximumnum for nvvm.fmax, but we're getting closer.

@LewisCrawford
Copy link
Contributor Author

@arsenm Does the latest version of the patch look ok to merge?

@LewisCrawford
Copy link
Contributor Author

@Artem-B does this patch seem ok to merge?

@LewisCrawford LewisCrawford self-assigned this Sep 19, 2025
LewisCrawford added a commit to LewisCrawford/llvm-project that referenced this pull request Sep 24, 2025
Refactor all the tests in fminmax-folds.ll so that they are
grouped by optimization, rather than by intrinsic.

Instead of calling 1 intrinsic per function, each function now
tests all 6 variants of the intrinsic. Results are stored to named
pointers to maintain readability in this more compact form. This
makes it much easier to compare the outputs from each intrinsic,
rather than having them scattered in different functions in different
parts of the file. It is also much more compact, so despite adding
>50% more tests, the file is ~500 lines shorter.

The tests added include:
 * Adding maximumnum and minimumnum everywhere (currently not optimized,
   but added as a baseline for future optimizations in llvm#139581).
 * Adding separate tests for SNaN and QNaN (as a baseline for
   correctness improvements in llvm#139581)
 * Adding tests for scalable vectors
 * Increasing the variety of types used in various tests by using
   more f16, f64, and vector types in tests.

The only coverage removed is for tests with undef (only poison is now
tested for).

Overall, this refactor should increase coverage, improve readability
with more comments and clear section headers, and make the tests
much more compact and easier to review in llvm#139581 by providing a
clear baseline for each intrinsic's current behaviour.
Copy link
Member

Artem-B commented Sep 24, 2025

It's OK with me, but the remaining open questions are between you and @arsenm. If he's OK with the current state, you're good to go.

Copy link
Member

@Artem-B Artem-B left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but please check with @arsenm if all concerns had been addressed.

LewisCrawford added a commit that referenced this pull request Sep 25, 2025
Refactor all the tests in `fminmax-folds.ll` so that they are grouped by
optimization, rather than by intrinsic.

Instead of calling 1 intrinsic per function, each function now tests all
6 variants of the intrinsic. Results are stored to named pointers to
maintain readability in this more compact form. This makes it much
easier to compare the outputs from each intrinsic, rather than having
them scattered in different functions in different parts of the file. It
is also much more compact, so despite adding >50% more tests, the file
is ~500 lines shorter.

The tests added include:
* Adding `maximumnum` and `minimumnum` everywhere (currently not
optimized, but added as a baseline for future optimizations in #139581).
* Adding separate tests for SNaN and QNaN (as a baseline for correctness
improvements in #139581 )
 * Adding tests for scalable vectors
* Increasing the variety of types used in various tests by using more
f16, f64, and vector types in tests.

The only coverage removed is for tests with undef (only poison is now
tested for).

Overall, this refactor should increase coverage, improve readability
with more comments and clear section headers, and make the tests much
more compact and easier to review in #139581 by providing a clear
baseline for each intrinsic's current behaviour.
Merge branch 'main' into instsimplify_maximumnum

Replace the old fminmax-folds.ll test with the newly refactored version
from llvm#160504. These tests will not pass yet with the new branch, but
this provides a clean baseline.

Conflicts:
        llvm/test/Transforms/InstSimplify/fminmax-folds.ll
Run update_test_checks.py on fminmax-folds.ll to get the
results that change from this patchset (especially the snan/qnan
changes, and adding maximumnum/minimumnum.
The update_test_checks.py script seems to have used the old
variable names from the pervious version of the tests, rather
than updating them to match the variable names they now represent.

Changed several nested min/max tests to use names like
MAXIMUMNUM_XY instead of MAXIMUMNUM_NESTED to make this consistent
with the other CHECK lines and avoid confusion with the nested
value that the variable used to represent in the old version of the
test.
Add test-cases for the newly added elementwise optimizations
covering vectors like <poison, Inf> and <poison, Inf, SNaN>.
Fix cases where variable X was being renamed MINIMUMNUM
Fix problem with MINNUM_RES getting renamed to MINIMUM_RES
@LewisCrawford
Copy link
Contributor Author

@arsenm I've updated all the fminmax-folds.ll tests now after the refactor in #160504

The diff is now 139 insertions(+), 106 deletions(-), so it should hopefully be a lot easier to review than the previous version with +1800 new lines if you've got time to take another look at this patch some times soon.

YixingZhang007 pushed a commit to YixingZhang007/llvm-project that referenced this pull request Sep 27, 2025
Refactor all the tests in `fminmax-folds.ll` so that they are grouped by
optimization, rather than by intrinsic.

Instead of calling 1 intrinsic per function, each function now tests all
6 variants of the intrinsic. Results are stored to named pointers to
maintain readability in this more compact form. This makes it much
easier to compare the outputs from each intrinsic, rather than having
them scattered in different functions in different parts of the file. It
is also much more compact, so despite adding >50% more tests, the file
is ~500 lines shorter.

The tests added include:
* Adding `maximumnum` and `minimumnum` everywhere (currently not
optimized, but added as a baseline for future optimizations in llvm#139581).
* Adding separate tests for SNaN and QNaN (as a baseline for correctness
improvements in llvm#139581 )
 * Adding tests for scalable vectors
* Increasing the variety of types used in various tests by using more
f16, f64, and vector types in tests.

The only coverage removed is for tests with undef (only poison is now
tested for).

Overall, this refactor should increase coverage, improve readability
with more comments and clear section headers, and make the tests much
more compact and easier to review in llvm#139581 by providing a clear
baseline for each intrinsic's current behaviour.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AMDGPU llvm:analysis Includes value tracking, cost tables and constant folding llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:ir llvm:transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants