Skip to content

Conversation

@LewisCrawford
Copy link
Contributor

The behaviour of constant-folding maxnum(sNaN, x) and minnum(sNaN, x) has become controversial, and there are ongoing discussions about which behaviour we want to specify in the LLVM IR LangRef.

See:

This patch removes optimizations and constant-folding support for maxnum(sNaN, x) but keeps it folded/optimized for qNaN. This should allow for some more flexibility so the implementation can conform to either the old or new version of the semantics specified without any changes.

As far as I am aware, optimizations involving constant sNaN should generally be edge-cases that rarely occur, so here should hopefully be very little real-world performance impact from disabling these optimizations.

The behaviour of constant-folding maxnum(sNaN, x) and
minnum(sNaN, x) has become controvertial, and there
are ongoing discussions about which behaviour we want
to specify in the LLVM IR LangRef.

See:
  - llvm#170082
  - llvm#168838
  - llvm#138451
  - llvm#170067
  - https://discourse.llvm.org/t/rfc-a-consistent-set-of-semantics-for-the-floating-point-minimum-and-maximum-operations/89006

This patch removes optimizations and constant-folding
support for maxnum(sNaN, x) but keeps it folded/optimized
for qNaNs. This should allow for some more flexibility
so the implementation can conform to either the old
or new version of the semantics specified without any
changes.

As far as I am aware, optimizations involving constant sNaN
should generally be edge-cases that rarely occur, so here
should hopefully be very little real-world performance impact
from disabling these optimizations.
@LewisCrawford LewisCrawford requested a review from nikic as a code owner December 1, 2025 18:50
@llvmbot llvmbot added backend:AMDGPU backend:X86 llvm:globalisel llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes labels Dec 1, 2025
@LewisCrawford LewisCrawford removed the request for review from nikic December 1, 2025 18:51
@llvmbot llvmbot added llvm:SelectionDAG SelectionDAGISel as well llvm:analysis Includes value tracking, cost tables and constant folding labels Dec 1, 2025
@LewisCrawford LewisCrawford requested a review from arsenm December 1, 2025 18:51
@llvmbot
Copy link
Member

llvmbot commented Dec 1, 2025

@llvm/pr-subscribers-backend-arm
@llvm/pr-subscribers-backend-amdgpu
@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-backend-x86

Author: Lewis Crawford (LewisCrawford)

Changes

The behaviour of constant-folding maxnum(sNaN, x) and minnum(sNaN, x) has become controversial, and there are ongoing discussions about which behaviour we want to specify in the LLVM IR LangRef.

See:

This patch removes optimizations and constant-folding support for maxnum(sNaN, x) but keeps it folded/optimized for qNaN. This should allow for some more flexibility so the implementation can conform to either the old or new version of the semantics specified without any changes.

As far as I am aware, optimizations involving constant sNaN should generally be edge-cases that rarely occur, so here should hopefully be very little real-world performance impact from disabling these optimizations.


Patch is 24.39 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/170181.diff

11 Files Affected:

  • (modified) llvm/lib/Analysis/ConstantFolding.cpp (+4)
  • (modified) llvm/lib/Analysis/InstructionSimplify.cpp (+11-14)
  • (modified) llvm/lib/CodeGen/GlobalISel/Utils.cpp (+4)
  • (modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+7-8)
  • (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp (+4)
  • (modified) llvm/test/CodeGen/AMDGPU/fcanonicalize-elimination.ll (+4-2)
  • (modified) llvm/test/CodeGen/X86/fmaxnum.ll (+37-8)
  • (modified) llvm/test/CodeGen/X86/fminnum.ll (+37-8)
  • (modified) llvm/test/Transforms/InstCombine/simplify-demanded-fpclass.ll (+4-2)
  • (modified) llvm/test/Transforms/InstSimplify/ConstProp/min-max.ll (+8-4)
  • (modified) llvm/test/Transforms/InstSimplify/fminmax-folds.ll (+22-13)
diff --git a/llvm/lib/Analysis/ConstantFolding.cpp b/llvm/lib/Analysis/ConstantFolding.cpp
index 916154b465af4..63d12ee585e64 100644
--- a/llvm/lib/Analysis/ConstantFolding.cpp
+++ b/llvm/lib/Analysis/ConstantFolding.cpp
@@ -3348,8 +3348,12 @@ static Constant *ConstantFoldIntrinsicCall2(Intrinsic::ID IntrinsicID, Type *Ty,
       case Intrinsic::copysign:
         return ConstantFP::get(Ty->getContext(), APFloat::copySign(Op1V, Op2V));
       case Intrinsic::minnum:
+        if (Op1V.isSignaling() || Op2V.isSignaling())
+          return nullptr;
         return ConstantFP::get(Ty->getContext(), minnum(Op1V, Op2V));
       case Intrinsic::maxnum:
+        if (Op1V.isSignaling() || Op2V.isSignaling())
+          return nullptr;
         return ConstantFP::get(Ty->getContext(), maxnum(Op1V, Op2V));
       case Intrinsic::minimum:
         return ConstantFP::get(Ty->getContext(), minimum(Op1V, Op2V));
diff --git a/llvm/lib/Analysis/InstructionSimplify.cpp b/llvm/lib/Analysis/InstructionSimplify.cpp
index 59a213b47825a..bd85444d7d2b0 100644
--- a/llvm/lib/Analysis/InstructionSimplify.cpp
+++ b/llvm/lib/Analysis/InstructionSimplify.cpp
@@ -6620,7 +6620,8 @@ static MinMaxOptResult OptimizeConstMinMax(const Constant *RHSConst,
   assert(OutNewConstVal != nullptr);
 
   bool PropagateNaN = IID == Intrinsic::minimum || IID == Intrinsic::maximum;
-  bool PropagateSNaN = IID == Intrinsic::minnum || IID == Intrinsic::maxnum;
+  bool ReturnsOtherForAllNaNs =
+      IID == Intrinsic::minimumnum || IID == Intrinsic::maximumnum;
   bool IsMin = IID == Intrinsic::minimum || IID == Intrinsic::minnum ||
                IID == Intrinsic::minimumnum;
 
@@ -6637,29 +6638,27 @@ static MinMaxOptResult OptimizeConstMinMax(const Constant *RHSConst,
 
   // minnum(x, qnan) -> x
   // maxnum(x, qnan) -> x
-  // minnum(x, snan) -> qnan
-  // maxnum(x, snan) -> qnan
   // minimum(X, nan) -> qnan
   // maximum(X, nan) -> qnan
   // minimumnum(X, nan) -> x
   // maximumnum(X, nan) -> x
   if (CAPF.isNaN()) {
-    if (PropagateNaN || (PropagateSNaN && CAPF.isSignaling())) {
+    if (PropagateNaN) {
       *OutNewConstVal = ConstantFP::get(CFP->getType(), CAPF.makeQuiet());
       return MinMaxOptResult::UseNewConstVal;
+    } else if (ReturnsOtherForAllNaNs || !CAPF.isSignaling()) {
+      return MinMaxOptResult::UseOtherVal;
     }
-    return MinMaxOptResult::UseOtherVal;
+    return MinMaxOptResult::CannotOptimize;
   }
 
   if (CAPF.isInfinity() || (Call && Call->hasNoInfs() && CAPF.isLargest())) {
-    // minnum(X, -inf) -> -inf (ignoring sNaN -> qNaN propagation)
-    // maxnum(X, +inf) -> +inf (ignoring sNaN -> qNaN propagation)
     // minimum(X, -inf) -> -inf if nnan
     // maximum(X, +inf) -> +inf if nnan
     // minimumnum(X, -inf) -> -inf
     // maximumnum(X, +inf) -> +inf
     if (CAPF.isNegative() == IsMin &&
-        (!PropagateNaN || (Call && Call->hasNoNaNs()))) {
+        (ReturnsOtherForAllNaNs || (Call && Call->hasNoNaNs()))) {
       *OutNewConstVal = const_cast<Constant *>(RHSConst);
       return MinMaxOptResult::UseNewConstVal;
     }
@@ -7004,12 +7003,10 @@ Value *llvm::simplifyBinaryIntrinsic(Intrinsic::ID IID, Type *ReturnType,
   case Intrinsic::minimum:
   case Intrinsic::maximumnum:
   case Intrinsic::minimumnum: {
-    // In several cases here, we deviate from exact IEEE 754 semantics
-    // to enable optimizations (as allowed by the LLVM IR spec).
-    //
-    // For instance, we may return one of the arguments unmodified instead of
-    // inserting an llvm.canonicalize to transform input sNaNs into qNaNs,
-    // or may assume all NaN inputs are qNaNs.
+    // In some cases here, we deviate from exact IEEE-754 semantics to enable
+    // optimizations (as allowed by the LLVM IR spec) by returning one of the
+    // arguments unmodified instead of inserting an llvm.canonicalize to
+    // transform input sNaNs into qNaNs,
 
     // If the arguments are the same, this is a no-op (ignoring NaN quieting)
     if (Op0 == Op1)
diff --git a/llvm/lib/CodeGen/GlobalISel/Utils.cpp b/llvm/lib/CodeGen/GlobalISel/Utils.cpp
index e8954a3d9899b..bc01cb65c4a69 100644
--- a/llvm/lib/CodeGen/GlobalISel/Utils.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/Utils.cpp
@@ -768,8 +768,12 @@ llvm::ConstantFoldFPBinOp(unsigned Opcode, const Register Op1,
     C1.copySign(C2);
     return C1;
   case TargetOpcode::G_FMINNUM:
+    if (C1.isSignaling() || C2.isSignaling())
+      return std::nullopt;
     return minnum(C1, C2);
   case TargetOpcode::G_FMAXNUM:
+    if (C1.isSignaling() || C2.isSignaling())
+      return std::nullopt;
     return maxnum(C1, C2);
   case TargetOpcode::G_FMINIMUM:
     return minimum(C1, C2);
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 0f3a207cc6414..70950084ee6b7 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -19505,7 +19505,8 @@ SDValue DAGCombiner::visitFMinMax(SDNode *N) {
   const SDNodeFlags Flags = N->getFlags();
   unsigned Opc = N->getOpcode();
   bool PropAllNaNsToQNaNs = Opc == ISD::FMINIMUM || Opc == ISD::FMAXIMUM;
-  bool PropOnlySNaNsToQNaNs = Opc == ISD::FMINNUM || Opc == ISD::FMAXNUM;
+  bool ReturnsOtherForAllNaNs =
+      Opc == ISD::FMINIMUMNUM || Opc == ISD::FMAXIMUMNUM;
   bool IsMin =
       Opc == ISD::FMINNUM || Opc == ISD::FMINIMUM || Opc == ISD::FMINIMUMNUM;
   SelectionDAG::FlagInserter FlagsInserter(DAG, N);
@@ -19524,32 +19525,30 @@ SDValue DAGCombiner::visitFMinMax(SDNode *N) {
 
     // minnum(X, qnan) -> X
     // maxnum(X, qnan) -> X
-    // minnum(X, snan) -> qnan
-    // maxnum(X, snan) -> qnan
     // minimum(X, nan) -> qnan
     // maximum(X, nan) -> qnan
     // minimumnum(X, nan) -> X
     // maximumnum(X, nan) -> X
     if (AF.isNaN()) {
-      if (PropAllNaNsToQNaNs || (AF.isSignaling() && PropOnlySNaNsToQNaNs)) {
+      if (PropAllNaNsToQNaNs) {
         if (AF.isSignaling())
           return DAG.getConstantFP(AF.makeQuiet(), SDLoc(N), VT);
         return N->getOperand(1);
+      } else if (ReturnsOtherForAllNaNs || !AF.isSignaling()) {
+        return N->getOperand(0);
       }
-      return N->getOperand(0);
+      return SDValue();
     }
 
     // In the following folds, inf can be replaced with the largest finite
     // float, if the ninf flag is set.
     if (AF.isInfinity() || (Flags.hasNoInfs() && AF.isLargest())) {
-      // minnum(X, -inf) -> -inf (ignoring sNaN -> qNaN propagation)
-      // maxnum(X, +inf) -> +inf (ignoring sNaN -> qNaN propagation)
       // minimum(X, -inf) -> -inf if nnan
       // maximum(X, +inf) -> +inf if nnan
       // minimumnum(X, -inf) -> -inf
       // maximumnum(X, +inf) -> +inf
       if (IsMin == AF.isNegative() &&
-          (!PropAllNaNsToQNaNs || Flags.hasNoNaNs()))
+          (ReturnsOtherForAllNaNs || Flags.hasNoNaNs()))
         return N->getOperand(1);
 
       // minnum(X, +inf) -> X if nnan
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index c9519ce1610b2..7c2bd74b84fe4 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -7382,8 +7382,12 @@ SDValue SelectionDAG::foldConstantFPMath(unsigned Opcode, const SDLoc &DL,
       C1.copySign(C2);
       return getConstantFP(C1, DL, VT);
     case ISD::FMINNUM:
+      if (C1.isSignaling() || C2.isSignaling())
+        return SDValue();
       return getConstantFP(minnum(C1, C2), DL, VT);
     case ISD::FMAXNUM:
+      if (C1.isSignaling() || C2.isSignaling())
+        return SDValue();
       return getConstantFP(maxnum(C1, C2), DL, VT);
     case ISD::FMINIMUM:
       return getConstantFP(minimum(C1, C2), DL, VT);
diff --git a/llvm/test/CodeGen/AMDGPU/fcanonicalize-elimination.ll b/llvm/test/CodeGen/AMDGPU/fcanonicalize-elimination.ll
index 05d3e9c381910..e7685d53b2d10 100644
--- a/llvm/test/CodeGen/AMDGPU/fcanonicalize-elimination.ll
+++ b/llvm/test/CodeGen/AMDGPU/fcanonicalize-elimination.ll
@@ -497,10 +497,12 @@ define amdgpu_kernel void @test_fold_canonicalize_minnum_value_f32(ptr addrspace
   ret void
 }
 
-; FIXME: Should there be more checks here? minnum with sNaN operand is simplified to qNaN.
+; FIXME: Should there be more checks here? minnum with sNaN operand might get simplified away.
 
 ; GCN-LABEL: test_fold_canonicalize_sNaN_value_f32:
-; GCN: v_mov_b32_e32 v{{.+}}, 0x7fc00000
+; GCN: {{flat|global}}_load_dword [[LOAD:v[0-9]+]]
+; VI: v_mul_f32_e32 v{{[0-9]+}}, 1.0, [[LOAD]]
+; GFX9: v_max_f32_e32 v{{[0-9]+}}, [[LOAD]], [[LOAD]]
 define amdgpu_kernel void @test_fold_canonicalize_sNaN_value_f32(ptr addrspace(1) %arg) {
   %id = tail call i32 @llvm.amdgcn.workitem.id.x()
   %gep = getelementptr inbounds float, ptr addrspace(1) %arg, i32 %id
diff --git a/llvm/test/CodeGen/X86/fmaxnum.ll b/llvm/test/CodeGen/X86/fmaxnum.ll
index 150bef01bdbe0..6a03628d9f078 100644
--- a/llvm/test/CodeGen/X86/fmaxnum.ll
+++ b/llvm/test/CodeGen/X86/fmaxnum.ll
@@ -676,15 +676,44 @@ define float @test_maxnum_neg_inf_nnan(float %x, float %y) nounwind {
 
 ; Test SNaN quieting
 define float @test_maxnum_snan(float %x) {
-; SSE-LABEL: test_maxnum_snan:
-; SSE:       # %bb.0:
-; SSE-NEXT:    movss {{.*#+}} xmm0 = [NaN,0.0E+0,0.0E+0,0.0E+0]
-; SSE-NEXT:    retq
+; SSE2-LABEL: test_maxnum_snan:
+; SSE2:       # %bb.0:
+; SSE2-NEXT:    movss {{.*#+}} xmm2 = [NaN,0.0E+0,0.0E+0,0.0E+0]
+; SSE2-NEXT:    movaps %xmm0, %xmm1
+; SSE2-NEXT:    cmpunordss %xmm0, %xmm1
+; SSE2-NEXT:    movaps %xmm1, %xmm3
+; SSE2-NEXT:    andps %xmm2, %xmm3
+; SSE2-NEXT:    maxss %xmm0, %xmm2
+; SSE2-NEXT:    andnps %xmm2, %xmm1
+; SSE2-NEXT:    orps %xmm3, %xmm1
+; SSE2-NEXT:    movaps %xmm1, %xmm0
+; SSE2-NEXT:    retq
 ;
-; AVX-LABEL: test_maxnum_snan:
-; AVX:       # %bb.0:
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [NaN,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT:    retq
+; SSE4-LABEL: test_maxnum_snan:
+; SSE4:       # %bb.0:
+; SSE4-NEXT:    movss {{.*#+}} xmm1 = [NaN,0.0E+0,0.0E+0,0.0E+0]
+; SSE4-NEXT:    maxss %xmm0, %xmm1
+; SSE4-NEXT:    cmpunordss %xmm0, %xmm0
+; SSE4-NEXT:    blendvps %xmm0, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
+; SSE4-NEXT:    movaps %xmm1, %xmm0
+; SSE4-NEXT:    retq
+;
+; AVX1-LABEL: test_maxnum_snan:
+; AVX1:       # %bb.0:
+; AVX1-NEXT:    vmovss {{.*#+}} xmm1 = [NaN,0.0E+0,0.0E+0,0.0E+0]
+; AVX1-NEXT:    vmaxss %xmm0, %xmm1, %xmm1
+; AVX1-NEXT:    vcmpunordss %xmm0, %xmm0, %xmm0
+; AVX1-NEXT:    vblendvps %xmm0, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm0
+; AVX1-NEXT:    retq
+;
+; AVX512-LABEL: test_maxnum_snan:
+; AVX512:       # %bb.0:
+; AVX512-NEXT:    vmovss {{.*#+}} xmm2 = [NaN,0.0E+0,0.0E+0,0.0E+0]
+; AVX512-NEXT:    vmaxss %xmm0, %xmm2, %xmm1
+; AVX512-NEXT:    vcmpunordss %xmm0, %xmm0, %k1
+; AVX512-NEXT:    vmovss %xmm2, %xmm1, %xmm1 {%k1}
+; AVX512-NEXT:    vmovaps %xmm1, %xmm0
+; AVX512-NEXT:    retq
   %r = call float @llvm.maxnum.f32(float 0x7ff4000000000000, float %x)
   ret float %r
 }
diff --git a/llvm/test/CodeGen/X86/fminnum.ll b/llvm/test/CodeGen/X86/fminnum.ll
index 4aa1a618be758..5c882c99d4f14 100644
--- a/llvm/test/CodeGen/X86/fminnum.ll
+++ b/llvm/test/CodeGen/X86/fminnum.ll
@@ -676,15 +676,44 @@ define float @test_minnum_inf_nnan(float %x, float %y) nounwind {
 
 ; Test SNaN quieting
 define float @test_minnum_snan(float %x) {
-; SSE-LABEL: test_minnum_snan:
-; SSE:       # %bb.0:
-; SSE-NEXT:    movss {{.*#+}} xmm0 = [NaN,0.0E+0,0.0E+0,0.0E+0]
-; SSE-NEXT:    retq
+; SSE2-LABEL: test_minnum_snan:
+; SSE2:       # %bb.0:
+; SSE2-NEXT:    movss {{.*#+}} xmm2 = [NaN,0.0E+0,0.0E+0,0.0E+0]
+; SSE2-NEXT:    movaps %xmm0, %xmm1
+; SSE2-NEXT:    cmpunordss %xmm0, %xmm1
+; SSE2-NEXT:    movaps %xmm1, %xmm3
+; SSE2-NEXT:    andps %xmm2, %xmm3
+; SSE2-NEXT:    minss %xmm0, %xmm2
+; SSE2-NEXT:    andnps %xmm2, %xmm1
+; SSE2-NEXT:    orps %xmm3, %xmm1
+; SSE2-NEXT:    movaps %xmm1, %xmm0
+; SSE2-NEXT:    retq
 ;
-; AVX-LABEL: test_minnum_snan:
-; AVX:       # %bb.0:
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [NaN,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT:    retq
+; SSE4-LABEL: test_minnum_snan:
+; SSE4:       # %bb.0:
+; SSE4-NEXT:    movss {{.*#+}} xmm1 = [NaN,0.0E+0,0.0E+0,0.0E+0]
+; SSE4-NEXT:    minss %xmm0, %xmm1
+; SSE4-NEXT:    cmpunordss %xmm0, %xmm0
+; SSE4-NEXT:    blendvps %xmm0, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
+; SSE4-NEXT:    movaps %xmm1, %xmm0
+; SSE4-NEXT:    retq
+;
+; AVX1-LABEL: test_minnum_snan:
+; AVX1:       # %bb.0:
+; AVX1-NEXT:    vmovss {{.*#+}} xmm1 = [NaN,0.0E+0,0.0E+0,0.0E+0]
+; AVX1-NEXT:    vminss %xmm0, %xmm1, %xmm1
+; AVX1-NEXT:    vcmpunordss %xmm0, %xmm0, %xmm0
+; AVX1-NEXT:    vblendvps %xmm0, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm0
+; AVX1-NEXT:    retq
+;
+; AVX512-LABEL: test_minnum_snan:
+; AVX512:       # %bb.0:
+; AVX512-NEXT:    vmovss {{.*#+}} xmm2 = [NaN,0.0E+0,0.0E+0,0.0E+0]
+; AVX512-NEXT:    vminss %xmm0, %xmm2, %xmm1
+; AVX512-NEXT:    vcmpunordss %xmm0, %xmm0, %k1
+; AVX512-NEXT:    vmovss %xmm2, %xmm1, %xmm1 {%k1}
+; AVX512-NEXT:    vmovaps %xmm1, %xmm0
+; AVX512-NEXT:    retq
   %r = call float @llvm.minnum.f32(float 0x7ff4000000000000, float %x)
   ret float %r
 }
diff --git a/llvm/test/Transforms/InstCombine/simplify-demanded-fpclass.ll b/llvm/test/Transforms/InstCombine/simplify-demanded-fpclass.ll
index df60078dbf452..a7ff967d3123b 100644
--- a/llvm/test/Transforms/InstCombine/simplify-demanded-fpclass.ll
+++ b/llvm/test/Transforms/InstCombine/simplify-demanded-fpclass.ll
@@ -10,6 +10,8 @@ declare float @llvm.trunc.f32(float)
 declare float @llvm.arithmetic.fence.f32(float)
 declare float @llvm.minnum.f32(float, float)
 declare float @llvm.maxnum.f32(float, float)
+declare float @llvm.minimumnum.f32(float, float)
+declare float @llvm.maximumnum.f32(float, float)
 
 
 define float @ninf_user_select_inf(i1 %cond, float %x, float %y) {
@@ -1314,7 +1316,7 @@ define nofpclass(pinf) float @ret_nofpclass_pinf__minnum_ninf(i1 %cond, float %x
 ; CHECK-SAME: (i1 [[COND:%.*]], float [[X:%.*]]) {
 ; CHECK-NEXT:    ret float 0xFFF0000000000000
 ;
-  %min = call float @llvm.minnum.f32(float %x, float 0xFFF0000000000000)
+  %min = call float @llvm.minimumnum.f32(float %x, float 0xFFF0000000000000)
   ret float %min
 }
 
@@ -1335,6 +1337,6 @@ define nofpclass(ninf) float @ret_nofpclass_ninf__maxnum_pinf(i1 %cond, float %x
 ; CHECK-SAME: (i1 [[COND:%.*]], float [[X:%.*]]) {
 ; CHECK-NEXT:    ret float 0x7FF0000000000000
 ;
-  %max = call float @llvm.maxnum.f32(float %x, float 0x7FF0000000000000)
+  %max = call float @llvm.maximumnum.f32(float %x, float 0x7FF0000000000000)
   ret float %max
 }
diff --git a/llvm/test/Transforms/InstSimplify/ConstProp/min-max.ll b/llvm/test/Transforms/InstSimplify/ConstProp/min-max.ll
index a633d29179896..84bec15d6ed32 100644
--- a/llvm/test/Transforms/InstSimplify/ConstProp/min-max.ll
+++ b/llvm/test/Transforms/InstSimplify/ConstProp/min-max.ll
@@ -97,7 +97,8 @@ define float @minnum_float_qnan_p0() {
 
 define float @minnum_float_p0_snan() {
 ; CHECK-LABEL: @minnum_float_p0_snan(
-; CHECK-NEXT:    ret float 0x7FFC000000000000
+; CHECK-NEXT:    [[MIN:%.*]] = call float @llvm.minnum.f32(float 0.000000e+00, float 0x7FF4000000000000)
+; CHECK-NEXT:    ret float [[MIN]]
 ;
   %min = call float @llvm.minnum.f32(float 0.0, float 0x7FF4000000000000)
   ret float %min
@@ -105,7 +106,8 @@ define float @minnum_float_p0_snan() {
 
 define float @minnum_float_snan_p0() {
 ; CHECK-LABEL: @minnum_float_snan_p0(
-; CHECK-NEXT:    ret float 0x7FFC000000000000
+; CHECK-NEXT:    [[MIN:%.*]] = call float @llvm.minnum.f32(float 0x7FF4000000000000, float 0.000000e+00)
+; CHECK-NEXT:    ret float [[MIN]]
 ;
   %min = call float @llvm.minnum.f32(float 0x7FF4000000000000, float 0.0)
   ret float %min
@@ -205,7 +207,8 @@ define float @maxnum_float_qnan_p0() {
 
 define float @maxnum_float_p0_snan() {
 ; CHECK-LABEL: @maxnum_float_p0_snan(
-; CHECK-NEXT:    ret float 0x7FFC000000000000
+; CHECK-NEXT:    [[MAX:%.*]] = call float @llvm.maxnum.f32(float 0.000000e+00, float 0x7FF4000000000000)
+; CHECK-NEXT:    ret float [[MAX]]
 ;
   %max = call float @llvm.maxnum.f32(float 0.0, float 0x7FF4000000000000)
   ret float %max
@@ -213,7 +216,8 @@ define float @maxnum_float_p0_snan() {
 
 define float @maxnum_float_snan_p0() {
 ; CHECK-LABEL: @maxnum_float_snan_p0(
-; CHECK-NEXT:    ret float 0x7FFC000000000000
+; CHECK-NEXT:    [[MAX:%.*]] = call float @llvm.maxnum.f32(float 0x7FF4000000000000, float 0.000000e+00)
+; CHECK-NEXT:    ret float [[MAX]]
 ;
   %max = call float @llvm.maxnum.f32(float 0x7FF4000000000000, float 0.0)
   ret float %max
diff --git a/llvm/test/Transforms/InstSimplify/fminmax-folds.ll b/llvm/test/Transforms/InstSimplify/fminmax-folds.ll
index 091e85920c0df..7544f7190df89 100644
--- a/llvm/test/Transforms/InstSimplify/fminmax-folds.ll
+++ b/llvm/test/Transforms/InstSimplify/fminmax-folds.ll
@@ -43,11 +43,13 @@ define void @minmax_qnan_f32(float %x, ptr %minnum_res, ptr %maxnum_res, ptr %mi
 ; Note that maxnum/minnum return qnan here for snan inputs, unlike maximumnum/minimumnum
 define void @minmax_snan_f32(float %x, ptr %minnum_res, ptr %maxnum_res, ptr %minimum_res, ptr %maximum_res, ptr %minimumnum_res, ptr %maximumnum_res) {
 ; CHECK-LABEL: @minmax_snan_f32(
-; CHECK-NEXT:    store float 0x7FFC000000000000, ptr [[MINNUM_RES:%.*]], align 4
-; CHECK-NEXT:    store float 0x7FFC000000000000, ptr [[MAXNUM_RES:%.*]], align 4
+; CHECK-NEXT:    [[MINNUM:%.*]] = call float @llvm.minnum.f32(float [[X:%.*]], float 0x7FF4000000000000)
+; CHECK-NEXT:    store float [[MINNUM]], ptr [[MINNUM_RES:%.*]], align 4
+; CHECK-NEXT:    [[MAXNUM:%.*]] = call float @llvm.maxnum.f32(float [[X]], float 0x7FF4000000000000)
+; CHECK-NEXT:    store float [[MAXNUM]], ptr [[MAXNUM_RES:%.*]], align 4
 ; CHECK-NEXT:    store float 0x7FFC000000000000, ptr [[MINIMUM_RES:%.*]], align 4
 ; CHECK-NEXT:    store float 0x7FFC000000000000, ptr [[MAXIMUM_RES:%.*]], align 4
-; CHECK-NEXT:    store float [[X:%.*]], ptr [[MINIMUMNUM_RES:%.*]], align 4
+; CHECK-NEXT:    store float [[X]], ptr [[MINIMUMNUM_RES:%.*]], align 4
 ; CHECK-NEXT:    store float [[X]], ptr [[MAXIMUMNUM_RES:%.*]], align 4
 ; CHECK-NEXT:    ret void
 ;
@@ -98,11 +100,13 @@ define void @minmax_qnan_nxv2f64_op0(<vscale x 2 x double> %x, ptr %minnum_res,
 ; Note that maxnum/minnum return qnan here for snan inputs, unlike maximumnum/minimumnum
 define void @minmax_snan_nxv2f64_op1(<vscale x 2 x double> %x, ptr %minnum_res, ptr %maxnum_res, ptr %minimum_res, ptr %maximum_res, ptr %minimumnum_res, ptr %maximumnum_res) {
 ; CHECK-LABEL: @minmax_snan_nxv2f64_op1(
-; CHECK-NEXT:    store <vscale x 2 x double> splat (double 0x7FFC00DEAD00DEAD), ptr [[MINNUM_RES:%.*]], align 16
-; CHECK-NEXT:    store <vscale x 2 x double> splat (double 0x7FFC00DEAD00DEAD), ptr [[MAXNUM_RES:%.*]], align 16
+; CHECK-NEXT:    [[MINNUM:%.*]] = call <vscale x 2 x double> @llvm.minnum.nxv2f64(<vscale x 2 x double> splat (double 0x7FF400DEAD00DEAD), <vscale x 2 x double> [[X:%.*]])
+; CHECK-NEXT:    store <vscale x 2 x double> [[MINNUM]], ptr [[MINNUM_RES:%.*]], align 16
+; CHECK-NEXT:    [[MAXNUM:%.*]] = call <vscale x 2 x double> @llvm.maxnum.nxv2f64(<vscale x 2 x double> splat (double 0x7FF400DEAD00DEAD), <vscale x 2 x double> [[X]])
+; CHECK-NEXT:    store <vscale x 2 x double> [[MAXNUM]], ptr [[MAXNUM_RES:%.*]], align 16
 ; CHECK-NEXT:    store <vscale x 2 x double> splat (double 0x7FFC00DEAD00DEAD), ptr [[MINIMUM_RES:%.*]], align 16
 ; CHECK-NEXT:    store <vscale x 2 x double> splat (double 0x7FFC00DEAD00DEAD), ptr [[MAXIMUM_RES:%.*]], align 16
-; CHECK-NEXT:    store <vscale x 2 x double> [[X:%.*]], ptr [[MINIMUMNUM_RES:%.*]], align 16
+; CHECK-NEXT:    store <vscale x 2 x double> [[X]], ptr [[MINIMUMNUM_RES:%.*]], align 16
 ; CHECK-NEXT:    store <vscale x 2 x double> [[X]], ptr [[MAXIMUMNUM_RES:%.*]], align 16
 ; CHECK-NEXT:    ret void
 ;
@@ -255,7 +259,8 @@ define void @minmax_pos_inf_f32(float %x, ptr %minnum_res, ptr %maxnum_res, ptr
 ; CHECK-LABEL: @minmax_pos_inf_f32(
 ; CHECK-NEXT:    [[MINNUM:%.*]] = call float @llvm.minnum.f32(float [[X:%.*]], float 0x7FF0000000000000)
 ; CHECK-NEXT:    store float [[MINNUM]], ptr [[MINNUM_RES:%.*]], align 4
-; CHECK-NEXT:    store float 0x7FF0000000000000, ptr [[MAXNUM_RES:%.*]], align 4
+; CHECK-NEXT:    [[MAXNUM:%.*]] = call float @llvm.maxnum.f32(float [[X]], float 0x7FF0000000000000)
+; CHECK-NEXT:    store float [[MAXNUM]], ptr [[MAXNUM_RES:%.*]], align 4
 ; CHECK-NEXT:    store float ...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Dec 1, 2025

@llvm/pr-subscribers-llvm-selectiondag

Author: Lewis Crawford (LewisCrawford)

Changes

The behaviour of constant-folding maxnum(sNaN, x) and minnum(sNaN, x) has become controversial, and there are ongoing discussions about which behaviour we want to specify in the LLVM IR LangRef.

See:

This patch removes optimizations and constant-folding support for maxnum(sNaN, x) but keeps it folded/optimized for qNaN. This should allow for some more flexibility so the implementation can conform to either the old or new version of the semantics specified without any changes.

As far as I am aware, optimizations involving constant sNaN should generally be edge-cases that rarely occur, so here should hopefully be very little real-world performance impact from disabling these optimizations.


Patch is 24.39 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/170181.diff

11 Files Affected:

  • (modified) llvm/lib/Analysis/ConstantFolding.cpp (+4)
  • (modified) llvm/lib/Analysis/InstructionSimplify.cpp (+11-14)
  • (modified) llvm/lib/CodeGen/GlobalISel/Utils.cpp (+4)
  • (modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+7-8)
  • (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp (+4)
  • (modified) llvm/test/CodeGen/AMDGPU/fcanonicalize-elimination.ll (+4-2)
  • (modified) llvm/test/CodeGen/X86/fmaxnum.ll (+37-8)
  • (modified) llvm/test/CodeGen/X86/fminnum.ll (+37-8)
  • (modified) llvm/test/Transforms/InstCombine/simplify-demanded-fpclass.ll (+4-2)
  • (modified) llvm/test/Transforms/InstSimplify/ConstProp/min-max.ll (+8-4)
  • (modified) llvm/test/Transforms/InstSimplify/fminmax-folds.ll (+22-13)
diff --git a/llvm/lib/Analysis/ConstantFolding.cpp b/llvm/lib/Analysis/ConstantFolding.cpp
index 916154b465af4..63d12ee585e64 100644
--- a/llvm/lib/Analysis/ConstantFolding.cpp
+++ b/llvm/lib/Analysis/ConstantFolding.cpp
@@ -3348,8 +3348,12 @@ static Constant *ConstantFoldIntrinsicCall2(Intrinsic::ID IntrinsicID, Type *Ty,
       case Intrinsic::copysign:
         return ConstantFP::get(Ty->getContext(), APFloat::copySign(Op1V, Op2V));
       case Intrinsic::minnum:
+        if (Op1V.isSignaling() || Op2V.isSignaling())
+          return nullptr;
         return ConstantFP::get(Ty->getContext(), minnum(Op1V, Op2V));
       case Intrinsic::maxnum:
+        if (Op1V.isSignaling() || Op2V.isSignaling())
+          return nullptr;
         return ConstantFP::get(Ty->getContext(), maxnum(Op1V, Op2V));
       case Intrinsic::minimum:
         return ConstantFP::get(Ty->getContext(), minimum(Op1V, Op2V));
diff --git a/llvm/lib/Analysis/InstructionSimplify.cpp b/llvm/lib/Analysis/InstructionSimplify.cpp
index 59a213b47825a..bd85444d7d2b0 100644
--- a/llvm/lib/Analysis/InstructionSimplify.cpp
+++ b/llvm/lib/Analysis/InstructionSimplify.cpp
@@ -6620,7 +6620,8 @@ static MinMaxOptResult OptimizeConstMinMax(const Constant *RHSConst,
   assert(OutNewConstVal != nullptr);
 
   bool PropagateNaN = IID == Intrinsic::minimum || IID == Intrinsic::maximum;
-  bool PropagateSNaN = IID == Intrinsic::minnum || IID == Intrinsic::maxnum;
+  bool ReturnsOtherForAllNaNs =
+      IID == Intrinsic::minimumnum || IID == Intrinsic::maximumnum;
   bool IsMin = IID == Intrinsic::minimum || IID == Intrinsic::minnum ||
                IID == Intrinsic::minimumnum;
 
@@ -6637,29 +6638,27 @@ static MinMaxOptResult OptimizeConstMinMax(const Constant *RHSConst,
 
   // minnum(x, qnan) -> x
   // maxnum(x, qnan) -> x
-  // minnum(x, snan) -> qnan
-  // maxnum(x, snan) -> qnan
   // minimum(X, nan) -> qnan
   // maximum(X, nan) -> qnan
   // minimumnum(X, nan) -> x
   // maximumnum(X, nan) -> x
   if (CAPF.isNaN()) {
-    if (PropagateNaN || (PropagateSNaN && CAPF.isSignaling())) {
+    if (PropagateNaN) {
       *OutNewConstVal = ConstantFP::get(CFP->getType(), CAPF.makeQuiet());
       return MinMaxOptResult::UseNewConstVal;
+    } else if (ReturnsOtherForAllNaNs || !CAPF.isSignaling()) {
+      return MinMaxOptResult::UseOtherVal;
     }
-    return MinMaxOptResult::UseOtherVal;
+    return MinMaxOptResult::CannotOptimize;
   }
 
   if (CAPF.isInfinity() || (Call && Call->hasNoInfs() && CAPF.isLargest())) {
-    // minnum(X, -inf) -> -inf (ignoring sNaN -> qNaN propagation)
-    // maxnum(X, +inf) -> +inf (ignoring sNaN -> qNaN propagation)
     // minimum(X, -inf) -> -inf if nnan
     // maximum(X, +inf) -> +inf if nnan
     // minimumnum(X, -inf) -> -inf
     // maximumnum(X, +inf) -> +inf
     if (CAPF.isNegative() == IsMin &&
-        (!PropagateNaN || (Call && Call->hasNoNaNs()))) {
+        (ReturnsOtherForAllNaNs || (Call && Call->hasNoNaNs()))) {
       *OutNewConstVal = const_cast<Constant *>(RHSConst);
       return MinMaxOptResult::UseNewConstVal;
     }
@@ -7004,12 +7003,10 @@ Value *llvm::simplifyBinaryIntrinsic(Intrinsic::ID IID, Type *ReturnType,
   case Intrinsic::minimum:
   case Intrinsic::maximumnum:
   case Intrinsic::minimumnum: {
-    // In several cases here, we deviate from exact IEEE 754 semantics
-    // to enable optimizations (as allowed by the LLVM IR spec).
-    //
-    // For instance, we may return one of the arguments unmodified instead of
-    // inserting an llvm.canonicalize to transform input sNaNs into qNaNs,
-    // or may assume all NaN inputs are qNaNs.
+    // In some cases here, we deviate from exact IEEE-754 semantics to enable
+    // optimizations (as allowed by the LLVM IR spec) by returning one of the
+    // arguments unmodified instead of inserting an llvm.canonicalize to
+    // transform input sNaNs into qNaNs,
 
     // If the arguments are the same, this is a no-op (ignoring NaN quieting)
     if (Op0 == Op1)
diff --git a/llvm/lib/CodeGen/GlobalISel/Utils.cpp b/llvm/lib/CodeGen/GlobalISel/Utils.cpp
index e8954a3d9899b..bc01cb65c4a69 100644
--- a/llvm/lib/CodeGen/GlobalISel/Utils.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/Utils.cpp
@@ -768,8 +768,12 @@ llvm::ConstantFoldFPBinOp(unsigned Opcode, const Register Op1,
     C1.copySign(C2);
     return C1;
   case TargetOpcode::G_FMINNUM:
+    if (C1.isSignaling() || C2.isSignaling())
+      return std::nullopt;
     return minnum(C1, C2);
   case TargetOpcode::G_FMAXNUM:
+    if (C1.isSignaling() || C2.isSignaling())
+      return std::nullopt;
     return maxnum(C1, C2);
   case TargetOpcode::G_FMINIMUM:
     return minimum(C1, C2);
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 0f3a207cc6414..70950084ee6b7 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -19505,7 +19505,8 @@ SDValue DAGCombiner::visitFMinMax(SDNode *N) {
   const SDNodeFlags Flags = N->getFlags();
   unsigned Opc = N->getOpcode();
   bool PropAllNaNsToQNaNs = Opc == ISD::FMINIMUM || Opc == ISD::FMAXIMUM;
-  bool PropOnlySNaNsToQNaNs = Opc == ISD::FMINNUM || Opc == ISD::FMAXNUM;
+  bool ReturnsOtherForAllNaNs =
+      Opc == ISD::FMINIMUMNUM || Opc == ISD::FMAXIMUMNUM;
   bool IsMin =
       Opc == ISD::FMINNUM || Opc == ISD::FMINIMUM || Opc == ISD::FMINIMUMNUM;
   SelectionDAG::FlagInserter FlagsInserter(DAG, N);
@@ -19524,32 +19525,30 @@ SDValue DAGCombiner::visitFMinMax(SDNode *N) {
 
     // minnum(X, qnan) -> X
     // maxnum(X, qnan) -> X
-    // minnum(X, snan) -> qnan
-    // maxnum(X, snan) -> qnan
     // minimum(X, nan) -> qnan
     // maximum(X, nan) -> qnan
     // minimumnum(X, nan) -> X
     // maximumnum(X, nan) -> X
     if (AF.isNaN()) {
-      if (PropAllNaNsToQNaNs || (AF.isSignaling() && PropOnlySNaNsToQNaNs)) {
+      if (PropAllNaNsToQNaNs) {
         if (AF.isSignaling())
           return DAG.getConstantFP(AF.makeQuiet(), SDLoc(N), VT);
         return N->getOperand(1);
+      } else if (ReturnsOtherForAllNaNs || !AF.isSignaling()) {
+        return N->getOperand(0);
       }
-      return N->getOperand(0);
+      return SDValue();
     }
 
     // In the following folds, inf can be replaced with the largest finite
     // float, if the ninf flag is set.
     if (AF.isInfinity() || (Flags.hasNoInfs() && AF.isLargest())) {
-      // minnum(X, -inf) -> -inf (ignoring sNaN -> qNaN propagation)
-      // maxnum(X, +inf) -> +inf (ignoring sNaN -> qNaN propagation)
       // minimum(X, -inf) -> -inf if nnan
       // maximum(X, +inf) -> +inf if nnan
       // minimumnum(X, -inf) -> -inf
       // maximumnum(X, +inf) -> +inf
       if (IsMin == AF.isNegative() &&
-          (!PropAllNaNsToQNaNs || Flags.hasNoNaNs()))
+          (ReturnsOtherForAllNaNs || Flags.hasNoNaNs()))
         return N->getOperand(1);
 
       // minnum(X, +inf) -> X if nnan
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index c9519ce1610b2..7c2bd74b84fe4 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -7382,8 +7382,12 @@ SDValue SelectionDAG::foldConstantFPMath(unsigned Opcode, const SDLoc &DL,
       C1.copySign(C2);
       return getConstantFP(C1, DL, VT);
     case ISD::FMINNUM:
+      if (C1.isSignaling() || C2.isSignaling())
+        return SDValue();
       return getConstantFP(minnum(C1, C2), DL, VT);
     case ISD::FMAXNUM:
+      if (C1.isSignaling() || C2.isSignaling())
+        return SDValue();
       return getConstantFP(maxnum(C1, C2), DL, VT);
     case ISD::FMINIMUM:
       return getConstantFP(minimum(C1, C2), DL, VT);
diff --git a/llvm/test/CodeGen/AMDGPU/fcanonicalize-elimination.ll b/llvm/test/CodeGen/AMDGPU/fcanonicalize-elimination.ll
index 05d3e9c381910..e7685d53b2d10 100644
--- a/llvm/test/CodeGen/AMDGPU/fcanonicalize-elimination.ll
+++ b/llvm/test/CodeGen/AMDGPU/fcanonicalize-elimination.ll
@@ -497,10 +497,12 @@ define amdgpu_kernel void @test_fold_canonicalize_minnum_value_f32(ptr addrspace
   ret void
 }
 
-; FIXME: Should there be more checks here? minnum with sNaN operand is simplified to qNaN.
+; FIXME: Should there be more checks here? minnum with sNaN operand might get simplified away.
 
 ; GCN-LABEL: test_fold_canonicalize_sNaN_value_f32:
-; GCN: v_mov_b32_e32 v{{.+}}, 0x7fc00000
+; GCN: {{flat|global}}_load_dword [[LOAD:v[0-9]+]]
+; VI: v_mul_f32_e32 v{{[0-9]+}}, 1.0, [[LOAD]]
+; GFX9: v_max_f32_e32 v{{[0-9]+}}, [[LOAD]], [[LOAD]]
 define amdgpu_kernel void @test_fold_canonicalize_sNaN_value_f32(ptr addrspace(1) %arg) {
   %id = tail call i32 @llvm.amdgcn.workitem.id.x()
   %gep = getelementptr inbounds float, ptr addrspace(1) %arg, i32 %id
diff --git a/llvm/test/CodeGen/X86/fmaxnum.ll b/llvm/test/CodeGen/X86/fmaxnum.ll
index 150bef01bdbe0..6a03628d9f078 100644
--- a/llvm/test/CodeGen/X86/fmaxnum.ll
+++ b/llvm/test/CodeGen/X86/fmaxnum.ll
@@ -676,15 +676,44 @@ define float @test_maxnum_neg_inf_nnan(float %x, float %y) nounwind {
 
 ; Test SNaN quieting
 define float @test_maxnum_snan(float %x) {
-; SSE-LABEL: test_maxnum_snan:
-; SSE:       # %bb.0:
-; SSE-NEXT:    movss {{.*#+}} xmm0 = [NaN,0.0E+0,0.0E+0,0.0E+0]
-; SSE-NEXT:    retq
+; SSE2-LABEL: test_maxnum_snan:
+; SSE2:       # %bb.0:
+; SSE2-NEXT:    movss {{.*#+}} xmm2 = [NaN,0.0E+0,0.0E+0,0.0E+0]
+; SSE2-NEXT:    movaps %xmm0, %xmm1
+; SSE2-NEXT:    cmpunordss %xmm0, %xmm1
+; SSE2-NEXT:    movaps %xmm1, %xmm3
+; SSE2-NEXT:    andps %xmm2, %xmm3
+; SSE2-NEXT:    maxss %xmm0, %xmm2
+; SSE2-NEXT:    andnps %xmm2, %xmm1
+; SSE2-NEXT:    orps %xmm3, %xmm1
+; SSE2-NEXT:    movaps %xmm1, %xmm0
+; SSE2-NEXT:    retq
 ;
-; AVX-LABEL: test_maxnum_snan:
-; AVX:       # %bb.0:
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [NaN,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT:    retq
+; SSE4-LABEL: test_maxnum_snan:
+; SSE4:       # %bb.0:
+; SSE4-NEXT:    movss {{.*#+}} xmm1 = [NaN,0.0E+0,0.0E+0,0.0E+0]
+; SSE4-NEXT:    maxss %xmm0, %xmm1
+; SSE4-NEXT:    cmpunordss %xmm0, %xmm0
+; SSE4-NEXT:    blendvps %xmm0, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
+; SSE4-NEXT:    movaps %xmm1, %xmm0
+; SSE4-NEXT:    retq
+;
+; AVX1-LABEL: test_maxnum_snan:
+; AVX1:       # %bb.0:
+; AVX1-NEXT:    vmovss {{.*#+}} xmm1 = [NaN,0.0E+0,0.0E+0,0.0E+0]
+; AVX1-NEXT:    vmaxss %xmm0, %xmm1, %xmm1
+; AVX1-NEXT:    vcmpunordss %xmm0, %xmm0, %xmm0
+; AVX1-NEXT:    vblendvps %xmm0, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm0
+; AVX1-NEXT:    retq
+;
+; AVX512-LABEL: test_maxnum_snan:
+; AVX512:       # %bb.0:
+; AVX512-NEXT:    vmovss {{.*#+}} xmm2 = [NaN,0.0E+0,0.0E+0,0.0E+0]
+; AVX512-NEXT:    vmaxss %xmm0, %xmm2, %xmm1
+; AVX512-NEXT:    vcmpunordss %xmm0, %xmm0, %k1
+; AVX512-NEXT:    vmovss %xmm2, %xmm1, %xmm1 {%k1}
+; AVX512-NEXT:    vmovaps %xmm1, %xmm0
+; AVX512-NEXT:    retq
   %r = call float @llvm.maxnum.f32(float 0x7ff4000000000000, float %x)
   ret float %r
 }
diff --git a/llvm/test/CodeGen/X86/fminnum.ll b/llvm/test/CodeGen/X86/fminnum.ll
index 4aa1a618be758..5c882c99d4f14 100644
--- a/llvm/test/CodeGen/X86/fminnum.ll
+++ b/llvm/test/CodeGen/X86/fminnum.ll
@@ -676,15 +676,44 @@ define float @test_minnum_inf_nnan(float %x, float %y) nounwind {
 
 ; Test SNaN quieting
 define float @test_minnum_snan(float %x) {
-; SSE-LABEL: test_minnum_snan:
-; SSE:       # %bb.0:
-; SSE-NEXT:    movss {{.*#+}} xmm0 = [NaN,0.0E+0,0.0E+0,0.0E+0]
-; SSE-NEXT:    retq
+; SSE2-LABEL: test_minnum_snan:
+; SSE2:       # %bb.0:
+; SSE2-NEXT:    movss {{.*#+}} xmm2 = [NaN,0.0E+0,0.0E+0,0.0E+0]
+; SSE2-NEXT:    movaps %xmm0, %xmm1
+; SSE2-NEXT:    cmpunordss %xmm0, %xmm1
+; SSE2-NEXT:    movaps %xmm1, %xmm3
+; SSE2-NEXT:    andps %xmm2, %xmm3
+; SSE2-NEXT:    minss %xmm0, %xmm2
+; SSE2-NEXT:    andnps %xmm2, %xmm1
+; SSE2-NEXT:    orps %xmm3, %xmm1
+; SSE2-NEXT:    movaps %xmm1, %xmm0
+; SSE2-NEXT:    retq
 ;
-; AVX-LABEL: test_minnum_snan:
-; AVX:       # %bb.0:
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [NaN,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT:    retq
+; SSE4-LABEL: test_minnum_snan:
+; SSE4:       # %bb.0:
+; SSE4-NEXT:    movss {{.*#+}} xmm1 = [NaN,0.0E+0,0.0E+0,0.0E+0]
+; SSE4-NEXT:    minss %xmm0, %xmm1
+; SSE4-NEXT:    cmpunordss %xmm0, %xmm0
+; SSE4-NEXT:    blendvps %xmm0, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
+; SSE4-NEXT:    movaps %xmm1, %xmm0
+; SSE4-NEXT:    retq
+;
+; AVX1-LABEL: test_minnum_snan:
+; AVX1:       # %bb.0:
+; AVX1-NEXT:    vmovss {{.*#+}} xmm1 = [NaN,0.0E+0,0.0E+0,0.0E+0]
+; AVX1-NEXT:    vminss %xmm0, %xmm1, %xmm1
+; AVX1-NEXT:    vcmpunordss %xmm0, %xmm0, %xmm0
+; AVX1-NEXT:    vblendvps %xmm0, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm0
+; AVX1-NEXT:    retq
+;
+; AVX512-LABEL: test_minnum_snan:
+; AVX512:       # %bb.0:
+; AVX512-NEXT:    vmovss {{.*#+}} xmm2 = [NaN,0.0E+0,0.0E+0,0.0E+0]
+; AVX512-NEXT:    vminss %xmm0, %xmm2, %xmm1
+; AVX512-NEXT:    vcmpunordss %xmm0, %xmm0, %k1
+; AVX512-NEXT:    vmovss %xmm2, %xmm1, %xmm1 {%k1}
+; AVX512-NEXT:    vmovaps %xmm1, %xmm0
+; AVX512-NEXT:    retq
   %r = call float @llvm.minnum.f32(float 0x7ff4000000000000, float %x)
   ret float %r
 }
diff --git a/llvm/test/Transforms/InstCombine/simplify-demanded-fpclass.ll b/llvm/test/Transforms/InstCombine/simplify-demanded-fpclass.ll
index df60078dbf452..a7ff967d3123b 100644
--- a/llvm/test/Transforms/InstCombine/simplify-demanded-fpclass.ll
+++ b/llvm/test/Transforms/InstCombine/simplify-demanded-fpclass.ll
@@ -10,6 +10,8 @@ declare float @llvm.trunc.f32(float)
 declare float @llvm.arithmetic.fence.f32(float)
 declare float @llvm.minnum.f32(float, float)
 declare float @llvm.maxnum.f32(float, float)
+declare float @llvm.minimumnum.f32(float, float)
+declare float @llvm.maximumnum.f32(float, float)
 
 
 define float @ninf_user_select_inf(i1 %cond, float %x, float %y) {
@@ -1314,7 +1316,7 @@ define nofpclass(pinf) float @ret_nofpclass_pinf__minnum_ninf(i1 %cond, float %x
 ; CHECK-SAME: (i1 [[COND:%.*]], float [[X:%.*]]) {
 ; CHECK-NEXT:    ret float 0xFFF0000000000000
 ;
-  %min = call float @llvm.minnum.f32(float %x, float 0xFFF0000000000000)
+  %min = call float @llvm.minimumnum.f32(float %x, float 0xFFF0000000000000)
   ret float %min
 }
 
@@ -1335,6 +1337,6 @@ define nofpclass(ninf) float @ret_nofpclass_ninf__maxnum_pinf(i1 %cond, float %x
 ; CHECK-SAME: (i1 [[COND:%.*]], float [[X:%.*]]) {
 ; CHECK-NEXT:    ret float 0x7FF0000000000000
 ;
-  %max = call float @llvm.maxnum.f32(float %x, float 0x7FF0000000000000)
+  %max = call float @llvm.maximumnum.f32(float %x, float 0x7FF0000000000000)
   ret float %max
 }
diff --git a/llvm/test/Transforms/InstSimplify/ConstProp/min-max.ll b/llvm/test/Transforms/InstSimplify/ConstProp/min-max.ll
index a633d29179896..84bec15d6ed32 100644
--- a/llvm/test/Transforms/InstSimplify/ConstProp/min-max.ll
+++ b/llvm/test/Transforms/InstSimplify/ConstProp/min-max.ll
@@ -97,7 +97,8 @@ define float @minnum_float_qnan_p0() {
 
 define float @minnum_float_p0_snan() {
 ; CHECK-LABEL: @minnum_float_p0_snan(
-; CHECK-NEXT:    ret float 0x7FFC000000000000
+; CHECK-NEXT:    [[MIN:%.*]] = call float @llvm.minnum.f32(float 0.000000e+00, float 0x7FF4000000000000)
+; CHECK-NEXT:    ret float [[MIN]]
 ;
   %min = call float @llvm.minnum.f32(float 0.0, float 0x7FF4000000000000)
   ret float %min
@@ -105,7 +106,8 @@ define float @minnum_float_p0_snan() {
 
 define float @minnum_float_snan_p0() {
 ; CHECK-LABEL: @minnum_float_snan_p0(
-; CHECK-NEXT:    ret float 0x7FFC000000000000
+; CHECK-NEXT:    [[MIN:%.*]] = call float @llvm.minnum.f32(float 0x7FF4000000000000, float 0.000000e+00)
+; CHECK-NEXT:    ret float [[MIN]]
 ;
   %min = call float @llvm.minnum.f32(float 0x7FF4000000000000, float 0.0)
   ret float %min
@@ -205,7 +207,8 @@ define float @maxnum_float_qnan_p0() {
 
 define float @maxnum_float_p0_snan() {
 ; CHECK-LABEL: @maxnum_float_p0_snan(
-; CHECK-NEXT:    ret float 0x7FFC000000000000
+; CHECK-NEXT:    [[MAX:%.*]] = call float @llvm.maxnum.f32(float 0.000000e+00, float 0x7FF4000000000000)
+; CHECK-NEXT:    ret float [[MAX]]
 ;
   %max = call float @llvm.maxnum.f32(float 0.0, float 0x7FF4000000000000)
   ret float %max
@@ -213,7 +216,8 @@ define float @maxnum_float_p0_snan() {
 
 define float @maxnum_float_snan_p0() {
 ; CHECK-LABEL: @maxnum_float_snan_p0(
-; CHECK-NEXT:    ret float 0x7FFC000000000000
+; CHECK-NEXT:    [[MAX:%.*]] = call float @llvm.maxnum.f32(float 0x7FF4000000000000, float 0.000000e+00)
+; CHECK-NEXT:    ret float [[MAX]]
 ;
   %max = call float @llvm.maxnum.f32(float 0x7FF4000000000000, float 0.0)
   ret float %max
diff --git a/llvm/test/Transforms/InstSimplify/fminmax-folds.ll b/llvm/test/Transforms/InstSimplify/fminmax-folds.ll
index 091e85920c0df..7544f7190df89 100644
--- a/llvm/test/Transforms/InstSimplify/fminmax-folds.ll
+++ b/llvm/test/Transforms/InstSimplify/fminmax-folds.ll
@@ -43,11 +43,13 @@ define void @minmax_qnan_f32(float %x, ptr %minnum_res, ptr %maxnum_res, ptr %mi
 ; Note that maxnum/minnum return qnan here for snan inputs, unlike maximumnum/minimumnum
 define void @minmax_snan_f32(float %x, ptr %minnum_res, ptr %maxnum_res, ptr %minimum_res, ptr %maximum_res, ptr %minimumnum_res, ptr %maximumnum_res) {
 ; CHECK-LABEL: @minmax_snan_f32(
-; CHECK-NEXT:    store float 0x7FFC000000000000, ptr [[MINNUM_RES:%.*]], align 4
-; CHECK-NEXT:    store float 0x7FFC000000000000, ptr [[MAXNUM_RES:%.*]], align 4
+; CHECK-NEXT:    [[MINNUM:%.*]] = call float @llvm.minnum.f32(float [[X:%.*]], float 0x7FF4000000000000)
+; CHECK-NEXT:    store float [[MINNUM]], ptr [[MINNUM_RES:%.*]], align 4
+; CHECK-NEXT:    [[MAXNUM:%.*]] = call float @llvm.maxnum.f32(float [[X]], float 0x7FF4000000000000)
+; CHECK-NEXT:    store float [[MAXNUM]], ptr [[MAXNUM_RES:%.*]], align 4
 ; CHECK-NEXT:    store float 0x7FFC000000000000, ptr [[MINIMUM_RES:%.*]], align 4
 ; CHECK-NEXT:    store float 0x7FFC000000000000, ptr [[MAXIMUM_RES:%.*]], align 4
-; CHECK-NEXT:    store float [[X:%.*]], ptr [[MINIMUMNUM_RES:%.*]], align 4
+; CHECK-NEXT:    store float [[X]], ptr [[MINIMUMNUM_RES:%.*]], align 4
 ; CHECK-NEXT:    store float [[X]], ptr [[MAXIMUMNUM_RES:%.*]], align 4
 ; CHECK-NEXT:    ret void
 ;
@@ -98,11 +100,13 @@ define void @minmax_qnan_nxv2f64_op0(<vscale x 2 x double> %x, ptr %minnum_res,
 ; Note that maxnum/minnum return qnan here for snan inputs, unlike maximumnum/minimumnum
 define void @minmax_snan_nxv2f64_op1(<vscale x 2 x double> %x, ptr %minnum_res, ptr %maxnum_res, ptr %minimum_res, ptr %maximum_res, ptr %minimumnum_res, ptr %maximumnum_res) {
 ; CHECK-LABEL: @minmax_snan_nxv2f64_op1(
-; CHECK-NEXT:    store <vscale x 2 x double> splat (double 0x7FFC00DEAD00DEAD), ptr [[MINNUM_RES:%.*]], align 16
-; CHECK-NEXT:    store <vscale x 2 x double> splat (double 0x7FFC00DEAD00DEAD), ptr [[MAXNUM_RES:%.*]], align 16
+; CHECK-NEXT:    [[MINNUM:%.*]] = call <vscale x 2 x double> @llvm.minnum.nxv2f64(<vscale x 2 x double> splat (double 0x7FF400DEAD00DEAD), <vscale x 2 x double> [[X:%.*]])
+; CHECK-NEXT:    store <vscale x 2 x double> [[MINNUM]], ptr [[MINNUM_RES:%.*]], align 16
+; CHECK-NEXT:    [[MAXNUM:%.*]] = call <vscale x 2 x double> @llvm.maxnum.nxv2f64(<vscale x 2 x double> splat (double 0x7FF400DEAD00DEAD), <vscale x 2 x double> [[X]])
+; CHECK-NEXT:    store <vscale x 2 x double> [[MAXNUM]], ptr [[MAXNUM_RES:%.*]], align 16
 ; CHECK-NEXT:    store <vscale x 2 x double> splat (double 0x7FFC00DEAD00DEAD), ptr [[MINIMUM_RES:%.*]], align 16
 ; CHECK-NEXT:    store <vscale x 2 x double> splat (double 0x7FFC00DEAD00DEAD), ptr [[MAXIMUM_RES:%.*]], align 16
-; CHECK-NEXT:    store <vscale x 2 x double> [[X:%.*]], ptr [[MINIMUMNUM_RES:%.*]], align 16
+; CHECK-NEXT:    store <vscale x 2 x double> [[X]], ptr [[MINIMUMNUM_RES:%.*]], align 16
 ; CHECK-NEXT:    store <vscale x 2 x double> [[X]], ptr [[MAXIMUMNUM_RES:%.*]], align 16
 ; CHECK-NEXT:    ret void
 ;
@@ -255,7 +259,8 @@ define void @minmax_pos_inf_f32(float %x, ptr %minnum_res, ptr %maxnum_res, ptr
 ; CHECK-LABEL: @minmax_pos_inf_f32(
 ; CHECK-NEXT:    [[MINNUM:%.*]] = call float @llvm.minnum.f32(float [[X:%.*]], float 0x7FF0000000000000)
 ; CHECK-NEXT:    store float [[MINNUM]], ptr [[MINNUM_RES:%.*]], align 4
-; CHECK-NEXT:    store float 0x7FF0000000000000, ptr [[MAXNUM_RES:%.*]], align 4
+; CHECK-NEXT:    [[MAXNUM:%.*]] = call float @llvm.maxnum.f32(float [[X]], float 0x7FF0000000000000)
+; CHECK-NEXT:    store float [[MAXNUM]], ptr [[MAXNUM_RES:%.*]], align 4
 ; CHECK-NEXT:    store float ...
[truncated]

@github-actions
Copy link

github-actions bot commented Dec 1, 2025

🐧 Linux x64 Test Results

  • 186804 tests passed
  • 4897 tests skipped

Copy link
Member

Artem-B commented Dec 1, 2025

As far as I am aware, optimizations involving constant sNaN should generally be edge-cases that rarely occur, so here should hopefully be very little real-world performance impact from disabling these optimizations.

I'm curious, do we want to keep these optimizations around if the target makes no distinction between signalling and non-signalling NaNs? AFAICT, sNaN makes no difference on NVIDIA GPUs, and never causes any traps or updates any flags.

Copy link
Member

@Artem-B Artem-B left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general.

Copy link
Contributor

@phoebewang phoebewang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with the change assuming sNaN is rare in reality. I still want it back once we made consensus in LangRef.

@phoebewang
Copy link
Contributor

As far as I am aware, optimizations involving constant sNaN should generally be edge-cases that rarely occur, so here should hopefully be very little real-world performance impact from disabling these optimizations.

I'm curious, do we want to keep these optimizations around if the target makes no distinction between signalling and non-signalling NaNs? AFAICT, sNaN makes no difference on NVIDIA GPUs, and never causes any traps or updates any flags.

X86 doesn't distinguish it either. We need either change the LangRef or change targets' implementation to match with it. But either way, the optimizations are worth well. We should bring it back once the controversy settled.

Comment on lines +713 to +715
; AVX512-NEXT: vcmpunordss %xmm0, %xmm0, %k1
; AVX512-NEXT: vmovss %xmm2, %xmm1, %xmm1 {%k1}
; AVX512-NEXT: vmovaps %xmm1, %xmm0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably something for a different PR,
but this really looks like a use case for VFIXUPIMMP[SD] with [QS]NaN to src1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A MOVSS is enough if NaN is known. But we don't need bother doing this in each backend. Just waiting for the code in DAGCombiner.cpp‎ brought back :)

@LewisCrawford
Copy link
Contributor Author

do we want to keep these optimizations around if the target makes no distinction between signalling and non-signalling NaNs? AFAICT, sNaN makes no difference on NVIDIA GPUs, and never causes any traps or updates any flags.

Even on backends like NVPTX that don't really care about sNaNs, the specification discussions around what the correct behaviour of maxnum(x, sNaN) will still impact the behaviour on backends like NVPTX if the middle-end chooses to either optimize it to x or qNaN, even if the backend would not have triggered a trap / flag update.

For the NVPTX backend, we are hoping to transition nvvm_fmax from mapping to maxnum to using maximumnum instead, as this better reflects the way the PTX spec intends NaNs to be handled (by treating sNaN and qNaN identically).

We should bring it back once the controversy settled.

Most of these optimizations will probably be ok to bring back if the result is that the LangRef clearly defines the exact expected result here for all targets. If it instead goes for something like "return value is non-deterministic if one or both inputs are sNaN" as mentioned here #170082 (comment) , then maybe it's better to keep disabled so the constant and non-constant SNaNs produce consistent behaviour. Maybe we might also want to add a target-specific check for whether the backend cares about SNaN trap / flag setting too (but this is probably already covered by strictfp + constrained intrinsics), so hopefully we should be able to add these optimizations back at some point.

@LewisCrawford LewisCrawford merged commit ea3fdc5 into llvm:main Dec 2, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend:AMDGPU backend:ARM backend:X86 llvm:analysis Includes value tracking, cost tables and constant folding llvm:globalisel llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:SelectionDAG SelectionDAGISel as well llvm:transforms

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants