-
Notifications
You must be signed in to change notification settings - Fork 15.3k
Avoid maxnum(sNaN, x) optimizations / folds #170181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The behaviour of constant-folding maxnum(sNaN, x) and minnum(sNaN, x) has become controvertial, and there are ongoing discussions about which behaviour we want to specify in the LLVM IR LangRef. See: - llvm#170082 - llvm#168838 - llvm#138451 - llvm#170067 - https://discourse.llvm.org/t/rfc-a-consistent-set-of-semantics-for-the-floating-point-minimum-and-maximum-operations/89006 This patch removes optimizations and constant-folding support for maxnum(sNaN, x) but keeps it folded/optimized for qNaNs. This should allow for some more flexibility so the implementation can conform to either the old or new version of the semantics specified without any changes. As far as I am aware, optimizations involving constant sNaN should generally be edge-cases that rarely occur, so here should hopefully be very little real-world performance impact from disabling these optimizations.
|
@llvm/pr-subscribers-backend-arm @llvm/pr-subscribers-backend-x86 Author: Lewis Crawford (LewisCrawford) ChangesThe behaviour of constant-folding See:
This patch removes optimizations and constant-folding support for As far as I am aware, optimizations involving constant Patch is 24.39 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/170181.diff 11 Files Affected:
diff --git a/llvm/lib/Analysis/ConstantFolding.cpp b/llvm/lib/Analysis/ConstantFolding.cpp
index 916154b465af4..63d12ee585e64 100644
--- a/llvm/lib/Analysis/ConstantFolding.cpp
+++ b/llvm/lib/Analysis/ConstantFolding.cpp
@@ -3348,8 +3348,12 @@ static Constant *ConstantFoldIntrinsicCall2(Intrinsic::ID IntrinsicID, Type *Ty,
case Intrinsic::copysign:
return ConstantFP::get(Ty->getContext(), APFloat::copySign(Op1V, Op2V));
case Intrinsic::minnum:
+ if (Op1V.isSignaling() || Op2V.isSignaling())
+ return nullptr;
return ConstantFP::get(Ty->getContext(), minnum(Op1V, Op2V));
case Intrinsic::maxnum:
+ if (Op1V.isSignaling() || Op2V.isSignaling())
+ return nullptr;
return ConstantFP::get(Ty->getContext(), maxnum(Op1V, Op2V));
case Intrinsic::minimum:
return ConstantFP::get(Ty->getContext(), minimum(Op1V, Op2V));
diff --git a/llvm/lib/Analysis/InstructionSimplify.cpp b/llvm/lib/Analysis/InstructionSimplify.cpp
index 59a213b47825a..bd85444d7d2b0 100644
--- a/llvm/lib/Analysis/InstructionSimplify.cpp
+++ b/llvm/lib/Analysis/InstructionSimplify.cpp
@@ -6620,7 +6620,8 @@ static MinMaxOptResult OptimizeConstMinMax(const Constant *RHSConst,
assert(OutNewConstVal != nullptr);
bool PropagateNaN = IID == Intrinsic::minimum || IID == Intrinsic::maximum;
- bool PropagateSNaN = IID == Intrinsic::minnum || IID == Intrinsic::maxnum;
+ bool ReturnsOtherForAllNaNs =
+ IID == Intrinsic::minimumnum || IID == Intrinsic::maximumnum;
bool IsMin = IID == Intrinsic::minimum || IID == Intrinsic::minnum ||
IID == Intrinsic::minimumnum;
@@ -6637,29 +6638,27 @@ static MinMaxOptResult OptimizeConstMinMax(const Constant *RHSConst,
// minnum(x, qnan) -> x
// maxnum(x, qnan) -> x
- // minnum(x, snan) -> qnan
- // maxnum(x, snan) -> qnan
// minimum(X, nan) -> qnan
// maximum(X, nan) -> qnan
// minimumnum(X, nan) -> x
// maximumnum(X, nan) -> x
if (CAPF.isNaN()) {
- if (PropagateNaN || (PropagateSNaN && CAPF.isSignaling())) {
+ if (PropagateNaN) {
*OutNewConstVal = ConstantFP::get(CFP->getType(), CAPF.makeQuiet());
return MinMaxOptResult::UseNewConstVal;
+ } else if (ReturnsOtherForAllNaNs || !CAPF.isSignaling()) {
+ return MinMaxOptResult::UseOtherVal;
}
- return MinMaxOptResult::UseOtherVal;
+ return MinMaxOptResult::CannotOptimize;
}
if (CAPF.isInfinity() || (Call && Call->hasNoInfs() && CAPF.isLargest())) {
- // minnum(X, -inf) -> -inf (ignoring sNaN -> qNaN propagation)
- // maxnum(X, +inf) -> +inf (ignoring sNaN -> qNaN propagation)
// minimum(X, -inf) -> -inf if nnan
// maximum(X, +inf) -> +inf if nnan
// minimumnum(X, -inf) -> -inf
// maximumnum(X, +inf) -> +inf
if (CAPF.isNegative() == IsMin &&
- (!PropagateNaN || (Call && Call->hasNoNaNs()))) {
+ (ReturnsOtherForAllNaNs || (Call && Call->hasNoNaNs()))) {
*OutNewConstVal = const_cast<Constant *>(RHSConst);
return MinMaxOptResult::UseNewConstVal;
}
@@ -7004,12 +7003,10 @@ Value *llvm::simplifyBinaryIntrinsic(Intrinsic::ID IID, Type *ReturnType,
case Intrinsic::minimum:
case Intrinsic::maximumnum:
case Intrinsic::minimumnum: {
- // In several cases here, we deviate from exact IEEE 754 semantics
- // to enable optimizations (as allowed by the LLVM IR spec).
- //
- // For instance, we may return one of the arguments unmodified instead of
- // inserting an llvm.canonicalize to transform input sNaNs into qNaNs,
- // or may assume all NaN inputs are qNaNs.
+ // In some cases here, we deviate from exact IEEE-754 semantics to enable
+ // optimizations (as allowed by the LLVM IR spec) by returning one of the
+ // arguments unmodified instead of inserting an llvm.canonicalize to
+ // transform input sNaNs into qNaNs,
// If the arguments are the same, this is a no-op (ignoring NaN quieting)
if (Op0 == Op1)
diff --git a/llvm/lib/CodeGen/GlobalISel/Utils.cpp b/llvm/lib/CodeGen/GlobalISel/Utils.cpp
index e8954a3d9899b..bc01cb65c4a69 100644
--- a/llvm/lib/CodeGen/GlobalISel/Utils.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/Utils.cpp
@@ -768,8 +768,12 @@ llvm::ConstantFoldFPBinOp(unsigned Opcode, const Register Op1,
C1.copySign(C2);
return C1;
case TargetOpcode::G_FMINNUM:
+ if (C1.isSignaling() || C2.isSignaling())
+ return std::nullopt;
return minnum(C1, C2);
case TargetOpcode::G_FMAXNUM:
+ if (C1.isSignaling() || C2.isSignaling())
+ return std::nullopt;
return maxnum(C1, C2);
case TargetOpcode::G_FMINIMUM:
return minimum(C1, C2);
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 0f3a207cc6414..70950084ee6b7 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -19505,7 +19505,8 @@ SDValue DAGCombiner::visitFMinMax(SDNode *N) {
const SDNodeFlags Flags = N->getFlags();
unsigned Opc = N->getOpcode();
bool PropAllNaNsToQNaNs = Opc == ISD::FMINIMUM || Opc == ISD::FMAXIMUM;
- bool PropOnlySNaNsToQNaNs = Opc == ISD::FMINNUM || Opc == ISD::FMAXNUM;
+ bool ReturnsOtherForAllNaNs =
+ Opc == ISD::FMINIMUMNUM || Opc == ISD::FMAXIMUMNUM;
bool IsMin =
Opc == ISD::FMINNUM || Opc == ISD::FMINIMUM || Opc == ISD::FMINIMUMNUM;
SelectionDAG::FlagInserter FlagsInserter(DAG, N);
@@ -19524,32 +19525,30 @@ SDValue DAGCombiner::visitFMinMax(SDNode *N) {
// minnum(X, qnan) -> X
// maxnum(X, qnan) -> X
- // minnum(X, snan) -> qnan
- // maxnum(X, snan) -> qnan
// minimum(X, nan) -> qnan
// maximum(X, nan) -> qnan
// minimumnum(X, nan) -> X
// maximumnum(X, nan) -> X
if (AF.isNaN()) {
- if (PropAllNaNsToQNaNs || (AF.isSignaling() && PropOnlySNaNsToQNaNs)) {
+ if (PropAllNaNsToQNaNs) {
if (AF.isSignaling())
return DAG.getConstantFP(AF.makeQuiet(), SDLoc(N), VT);
return N->getOperand(1);
+ } else if (ReturnsOtherForAllNaNs || !AF.isSignaling()) {
+ return N->getOperand(0);
}
- return N->getOperand(0);
+ return SDValue();
}
// In the following folds, inf can be replaced with the largest finite
// float, if the ninf flag is set.
if (AF.isInfinity() || (Flags.hasNoInfs() && AF.isLargest())) {
- // minnum(X, -inf) -> -inf (ignoring sNaN -> qNaN propagation)
- // maxnum(X, +inf) -> +inf (ignoring sNaN -> qNaN propagation)
// minimum(X, -inf) -> -inf if nnan
// maximum(X, +inf) -> +inf if nnan
// minimumnum(X, -inf) -> -inf
// maximumnum(X, +inf) -> +inf
if (IsMin == AF.isNegative() &&
- (!PropAllNaNsToQNaNs || Flags.hasNoNaNs()))
+ (ReturnsOtherForAllNaNs || Flags.hasNoNaNs()))
return N->getOperand(1);
// minnum(X, +inf) -> X if nnan
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index c9519ce1610b2..7c2bd74b84fe4 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -7382,8 +7382,12 @@ SDValue SelectionDAG::foldConstantFPMath(unsigned Opcode, const SDLoc &DL,
C1.copySign(C2);
return getConstantFP(C1, DL, VT);
case ISD::FMINNUM:
+ if (C1.isSignaling() || C2.isSignaling())
+ return SDValue();
return getConstantFP(minnum(C1, C2), DL, VT);
case ISD::FMAXNUM:
+ if (C1.isSignaling() || C2.isSignaling())
+ return SDValue();
return getConstantFP(maxnum(C1, C2), DL, VT);
case ISD::FMINIMUM:
return getConstantFP(minimum(C1, C2), DL, VT);
diff --git a/llvm/test/CodeGen/AMDGPU/fcanonicalize-elimination.ll b/llvm/test/CodeGen/AMDGPU/fcanonicalize-elimination.ll
index 05d3e9c381910..e7685d53b2d10 100644
--- a/llvm/test/CodeGen/AMDGPU/fcanonicalize-elimination.ll
+++ b/llvm/test/CodeGen/AMDGPU/fcanonicalize-elimination.ll
@@ -497,10 +497,12 @@ define amdgpu_kernel void @test_fold_canonicalize_minnum_value_f32(ptr addrspace
ret void
}
-; FIXME: Should there be more checks here? minnum with sNaN operand is simplified to qNaN.
+; FIXME: Should there be more checks here? minnum with sNaN operand might get simplified away.
; GCN-LABEL: test_fold_canonicalize_sNaN_value_f32:
-; GCN: v_mov_b32_e32 v{{.+}}, 0x7fc00000
+; GCN: {{flat|global}}_load_dword [[LOAD:v[0-9]+]]
+; VI: v_mul_f32_e32 v{{[0-9]+}}, 1.0, [[LOAD]]
+; GFX9: v_max_f32_e32 v{{[0-9]+}}, [[LOAD]], [[LOAD]]
define amdgpu_kernel void @test_fold_canonicalize_sNaN_value_f32(ptr addrspace(1) %arg) {
%id = tail call i32 @llvm.amdgcn.workitem.id.x()
%gep = getelementptr inbounds float, ptr addrspace(1) %arg, i32 %id
diff --git a/llvm/test/CodeGen/X86/fmaxnum.ll b/llvm/test/CodeGen/X86/fmaxnum.ll
index 150bef01bdbe0..6a03628d9f078 100644
--- a/llvm/test/CodeGen/X86/fmaxnum.ll
+++ b/llvm/test/CodeGen/X86/fmaxnum.ll
@@ -676,15 +676,44 @@ define float @test_maxnum_neg_inf_nnan(float %x, float %y) nounwind {
; Test SNaN quieting
define float @test_maxnum_snan(float %x) {
-; SSE-LABEL: test_maxnum_snan:
-; SSE: # %bb.0:
-; SSE-NEXT: movss {{.*#+}} xmm0 = [NaN,0.0E+0,0.0E+0,0.0E+0]
-; SSE-NEXT: retq
+; SSE2-LABEL: test_maxnum_snan:
+; SSE2: # %bb.0:
+; SSE2-NEXT: movss {{.*#+}} xmm2 = [NaN,0.0E+0,0.0E+0,0.0E+0]
+; SSE2-NEXT: movaps %xmm0, %xmm1
+; SSE2-NEXT: cmpunordss %xmm0, %xmm1
+; SSE2-NEXT: movaps %xmm1, %xmm3
+; SSE2-NEXT: andps %xmm2, %xmm3
+; SSE2-NEXT: maxss %xmm0, %xmm2
+; SSE2-NEXT: andnps %xmm2, %xmm1
+; SSE2-NEXT: orps %xmm3, %xmm1
+; SSE2-NEXT: movaps %xmm1, %xmm0
+; SSE2-NEXT: retq
;
-; AVX-LABEL: test_maxnum_snan:
-; AVX: # %bb.0:
-; AVX-NEXT: vmovss {{.*#+}} xmm0 = [NaN,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT: retq
+; SSE4-LABEL: test_maxnum_snan:
+; SSE4: # %bb.0:
+; SSE4-NEXT: movss {{.*#+}} xmm1 = [NaN,0.0E+0,0.0E+0,0.0E+0]
+; SSE4-NEXT: maxss %xmm0, %xmm1
+; SSE4-NEXT: cmpunordss %xmm0, %xmm0
+; SSE4-NEXT: blendvps %xmm0, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
+; SSE4-NEXT: movaps %xmm1, %xmm0
+; SSE4-NEXT: retq
+;
+; AVX1-LABEL: test_maxnum_snan:
+; AVX1: # %bb.0:
+; AVX1-NEXT: vmovss {{.*#+}} xmm1 = [NaN,0.0E+0,0.0E+0,0.0E+0]
+; AVX1-NEXT: vmaxss %xmm0, %xmm1, %xmm1
+; AVX1-NEXT: vcmpunordss %xmm0, %xmm0, %xmm0
+; AVX1-NEXT: vblendvps %xmm0, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm0
+; AVX1-NEXT: retq
+;
+; AVX512-LABEL: test_maxnum_snan:
+; AVX512: # %bb.0:
+; AVX512-NEXT: vmovss {{.*#+}} xmm2 = [NaN,0.0E+0,0.0E+0,0.0E+0]
+; AVX512-NEXT: vmaxss %xmm0, %xmm2, %xmm1
+; AVX512-NEXT: vcmpunordss %xmm0, %xmm0, %k1
+; AVX512-NEXT: vmovss %xmm2, %xmm1, %xmm1 {%k1}
+; AVX512-NEXT: vmovaps %xmm1, %xmm0
+; AVX512-NEXT: retq
%r = call float @llvm.maxnum.f32(float 0x7ff4000000000000, float %x)
ret float %r
}
diff --git a/llvm/test/CodeGen/X86/fminnum.ll b/llvm/test/CodeGen/X86/fminnum.ll
index 4aa1a618be758..5c882c99d4f14 100644
--- a/llvm/test/CodeGen/X86/fminnum.ll
+++ b/llvm/test/CodeGen/X86/fminnum.ll
@@ -676,15 +676,44 @@ define float @test_minnum_inf_nnan(float %x, float %y) nounwind {
; Test SNaN quieting
define float @test_minnum_snan(float %x) {
-; SSE-LABEL: test_minnum_snan:
-; SSE: # %bb.0:
-; SSE-NEXT: movss {{.*#+}} xmm0 = [NaN,0.0E+0,0.0E+0,0.0E+0]
-; SSE-NEXT: retq
+; SSE2-LABEL: test_minnum_snan:
+; SSE2: # %bb.0:
+; SSE2-NEXT: movss {{.*#+}} xmm2 = [NaN,0.0E+0,0.0E+0,0.0E+0]
+; SSE2-NEXT: movaps %xmm0, %xmm1
+; SSE2-NEXT: cmpunordss %xmm0, %xmm1
+; SSE2-NEXT: movaps %xmm1, %xmm3
+; SSE2-NEXT: andps %xmm2, %xmm3
+; SSE2-NEXT: minss %xmm0, %xmm2
+; SSE2-NEXT: andnps %xmm2, %xmm1
+; SSE2-NEXT: orps %xmm3, %xmm1
+; SSE2-NEXT: movaps %xmm1, %xmm0
+; SSE2-NEXT: retq
;
-; AVX-LABEL: test_minnum_snan:
-; AVX: # %bb.0:
-; AVX-NEXT: vmovss {{.*#+}} xmm0 = [NaN,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT: retq
+; SSE4-LABEL: test_minnum_snan:
+; SSE4: # %bb.0:
+; SSE4-NEXT: movss {{.*#+}} xmm1 = [NaN,0.0E+0,0.0E+0,0.0E+0]
+; SSE4-NEXT: minss %xmm0, %xmm1
+; SSE4-NEXT: cmpunordss %xmm0, %xmm0
+; SSE4-NEXT: blendvps %xmm0, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
+; SSE4-NEXT: movaps %xmm1, %xmm0
+; SSE4-NEXT: retq
+;
+; AVX1-LABEL: test_minnum_snan:
+; AVX1: # %bb.0:
+; AVX1-NEXT: vmovss {{.*#+}} xmm1 = [NaN,0.0E+0,0.0E+0,0.0E+0]
+; AVX1-NEXT: vminss %xmm0, %xmm1, %xmm1
+; AVX1-NEXT: vcmpunordss %xmm0, %xmm0, %xmm0
+; AVX1-NEXT: vblendvps %xmm0, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm0
+; AVX1-NEXT: retq
+;
+; AVX512-LABEL: test_minnum_snan:
+; AVX512: # %bb.0:
+; AVX512-NEXT: vmovss {{.*#+}} xmm2 = [NaN,0.0E+0,0.0E+0,0.0E+0]
+; AVX512-NEXT: vminss %xmm0, %xmm2, %xmm1
+; AVX512-NEXT: vcmpunordss %xmm0, %xmm0, %k1
+; AVX512-NEXT: vmovss %xmm2, %xmm1, %xmm1 {%k1}
+; AVX512-NEXT: vmovaps %xmm1, %xmm0
+; AVX512-NEXT: retq
%r = call float @llvm.minnum.f32(float 0x7ff4000000000000, float %x)
ret float %r
}
diff --git a/llvm/test/Transforms/InstCombine/simplify-demanded-fpclass.ll b/llvm/test/Transforms/InstCombine/simplify-demanded-fpclass.ll
index df60078dbf452..a7ff967d3123b 100644
--- a/llvm/test/Transforms/InstCombine/simplify-demanded-fpclass.ll
+++ b/llvm/test/Transforms/InstCombine/simplify-demanded-fpclass.ll
@@ -10,6 +10,8 @@ declare float @llvm.trunc.f32(float)
declare float @llvm.arithmetic.fence.f32(float)
declare float @llvm.minnum.f32(float, float)
declare float @llvm.maxnum.f32(float, float)
+declare float @llvm.minimumnum.f32(float, float)
+declare float @llvm.maximumnum.f32(float, float)
define float @ninf_user_select_inf(i1 %cond, float %x, float %y) {
@@ -1314,7 +1316,7 @@ define nofpclass(pinf) float @ret_nofpclass_pinf__minnum_ninf(i1 %cond, float %x
; CHECK-SAME: (i1 [[COND:%.*]], float [[X:%.*]]) {
; CHECK-NEXT: ret float 0xFFF0000000000000
;
- %min = call float @llvm.minnum.f32(float %x, float 0xFFF0000000000000)
+ %min = call float @llvm.minimumnum.f32(float %x, float 0xFFF0000000000000)
ret float %min
}
@@ -1335,6 +1337,6 @@ define nofpclass(ninf) float @ret_nofpclass_ninf__maxnum_pinf(i1 %cond, float %x
; CHECK-SAME: (i1 [[COND:%.*]], float [[X:%.*]]) {
; CHECK-NEXT: ret float 0x7FF0000000000000
;
- %max = call float @llvm.maxnum.f32(float %x, float 0x7FF0000000000000)
+ %max = call float @llvm.maximumnum.f32(float %x, float 0x7FF0000000000000)
ret float %max
}
diff --git a/llvm/test/Transforms/InstSimplify/ConstProp/min-max.ll b/llvm/test/Transforms/InstSimplify/ConstProp/min-max.ll
index a633d29179896..84bec15d6ed32 100644
--- a/llvm/test/Transforms/InstSimplify/ConstProp/min-max.ll
+++ b/llvm/test/Transforms/InstSimplify/ConstProp/min-max.ll
@@ -97,7 +97,8 @@ define float @minnum_float_qnan_p0() {
define float @minnum_float_p0_snan() {
; CHECK-LABEL: @minnum_float_p0_snan(
-; CHECK-NEXT: ret float 0x7FFC000000000000
+; CHECK-NEXT: [[MIN:%.*]] = call float @llvm.minnum.f32(float 0.000000e+00, float 0x7FF4000000000000)
+; CHECK-NEXT: ret float [[MIN]]
;
%min = call float @llvm.minnum.f32(float 0.0, float 0x7FF4000000000000)
ret float %min
@@ -105,7 +106,8 @@ define float @minnum_float_p0_snan() {
define float @minnum_float_snan_p0() {
; CHECK-LABEL: @minnum_float_snan_p0(
-; CHECK-NEXT: ret float 0x7FFC000000000000
+; CHECK-NEXT: [[MIN:%.*]] = call float @llvm.minnum.f32(float 0x7FF4000000000000, float 0.000000e+00)
+; CHECK-NEXT: ret float [[MIN]]
;
%min = call float @llvm.minnum.f32(float 0x7FF4000000000000, float 0.0)
ret float %min
@@ -205,7 +207,8 @@ define float @maxnum_float_qnan_p0() {
define float @maxnum_float_p0_snan() {
; CHECK-LABEL: @maxnum_float_p0_snan(
-; CHECK-NEXT: ret float 0x7FFC000000000000
+; CHECK-NEXT: [[MAX:%.*]] = call float @llvm.maxnum.f32(float 0.000000e+00, float 0x7FF4000000000000)
+; CHECK-NEXT: ret float [[MAX]]
;
%max = call float @llvm.maxnum.f32(float 0.0, float 0x7FF4000000000000)
ret float %max
@@ -213,7 +216,8 @@ define float @maxnum_float_p0_snan() {
define float @maxnum_float_snan_p0() {
; CHECK-LABEL: @maxnum_float_snan_p0(
-; CHECK-NEXT: ret float 0x7FFC000000000000
+; CHECK-NEXT: [[MAX:%.*]] = call float @llvm.maxnum.f32(float 0x7FF4000000000000, float 0.000000e+00)
+; CHECK-NEXT: ret float [[MAX]]
;
%max = call float @llvm.maxnum.f32(float 0x7FF4000000000000, float 0.0)
ret float %max
diff --git a/llvm/test/Transforms/InstSimplify/fminmax-folds.ll b/llvm/test/Transforms/InstSimplify/fminmax-folds.ll
index 091e85920c0df..7544f7190df89 100644
--- a/llvm/test/Transforms/InstSimplify/fminmax-folds.ll
+++ b/llvm/test/Transforms/InstSimplify/fminmax-folds.ll
@@ -43,11 +43,13 @@ define void @minmax_qnan_f32(float %x, ptr %minnum_res, ptr %maxnum_res, ptr %mi
; Note that maxnum/minnum return qnan here for snan inputs, unlike maximumnum/minimumnum
define void @minmax_snan_f32(float %x, ptr %minnum_res, ptr %maxnum_res, ptr %minimum_res, ptr %maximum_res, ptr %minimumnum_res, ptr %maximumnum_res) {
; CHECK-LABEL: @minmax_snan_f32(
-; CHECK-NEXT: store float 0x7FFC000000000000, ptr [[MINNUM_RES:%.*]], align 4
-; CHECK-NEXT: store float 0x7FFC000000000000, ptr [[MAXNUM_RES:%.*]], align 4
+; CHECK-NEXT: [[MINNUM:%.*]] = call float @llvm.minnum.f32(float [[X:%.*]], float 0x7FF4000000000000)
+; CHECK-NEXT: store float [[MINNUM]], ptr [[MINNUM_RES:%.*]], align 4
+; CHECK-NEXT: [[MAXNUM:%.*]] = call float @llvm.maxnum.f32(float [[X]], float 0x7FF4000000000000)
+; CHECK-NEXT: store float [[MAXNUM]], ptr [[MAXNUM_RES:%.*]], align 4
; CHECK-NEXT: store float 0x7FFC000000000000, ptr [[MINIMUM_RES:%.*]], align 4
; CHECK-NEXT: store float 0x7FFC000000000000, ptr [[MAXIMUM_RES:%.*]], align 4
-; CHECK-NEXT: store float [[X:%.*]], ptr [[MINIMUMNUM_RES:%.*]], align 4
+; CHECK-NEXT: store float [[X]], ptr [[MINIMUMNUM_RES:%.*]], align 4
; CHECK-NEXT: store float [[X]], ptr [[MAXIMUMNUM_RES:%.*]], align 4
; CHECK-NEXT: ret void
;
@@ -98,11 +100,13 @@ define void @minmax_qnan_nxv2f64_op0(<vscale x 2 x double> %x, ptr %minnum_res,
; Note that maxnum/minnum return qnan here for snan inputs, unlike maximumnum/minimumnum
define void @minmax_snan_nxv2f64_op1(<vscale x 2 x double> %x, ptr %minnum_res, ptr %maxnum_res, ptr %minimum_res, ptr %maximum_res, ptr %minimumnum_res, ptr %maximumnum_res) {
; CHECK-LABEL: @minmax_snan_nxv2f64_op1(
-; CHECK-NEXT: store <vscale x 2 x double> splat (double 0x7FFC00DEAD00DEAD), ptr [[MINNUM_RES:%.*]], align 16
-; CHECK-NEXT: store <vscale x 2 x double> splat (double 0x7FFC00DEAD00DEAD), ptr [[MAXNUM_RES:%.*]], align 16
+; CHECK-NEXT: [[MINNUM:%.*]] = call <vscale x 2 x double> @llvm.minnum.nxv2f64(<vscale x 2 x double> splat (double 0x7FF400DEAD00DEAD), <vscale x 2 x double> [[X:%.*]])
+; CHECK-NEXT: store <vscale x 2 x double> [[MINNUM]], ptr [[MINNUM_RES:%.*]], align 16
+; CHECK-NEXT: [[MAXNUM:%.*]] = call <vscale x 2 x double> @llvm.maxnum.nxv2f64(<vscale x 2 x double> splat (double 0x7FF400DEAD00DEAD), <vscale x 2 x double> [[X]])
+; CHECK-NEXT: store <vscale x 2 x double> [[MAXNUM]], ptr [[MAXNUM_RES:%.*]], align 16
; CHECK-NEXT: store <vscale x 2 x double> splat (double 0x7FFC00DEAD00DEAD), ptr [[MINIMUM_RES:%.*]], align 16
; CHECK-NEXT: store <vscale x 2 x double> splat (double 0x7FFC00DEAD00DEAD), ptr [[MAXIMUM_RES:%.*]], align 16
-; CHECK-NEXT: store <vscale x 2 x double> [[X:%.*]], ptr [[MINIMUMNUM_RES:%.*]], align 16
+; CHECK-NEXT: store <vscale x 2 x double> [[X]], ptr [[MINIMUMNUM_RES:%.*]], align 16
; CHECK-NEXT: store <vscale x 2 x double> [[X]], ptr [[MAXIMUMNUM_RES:%.*]], align 16
; CHECK-NEXT: ret void
;
@@ -255,7 +259,8 @@ define void @minmax_pos_inf_f32(float %x, ptr %minnum_res, ptr %maxnum_res, ptr
; CHECK-LABEL: @minmax_pos_inf_f32(
; CHECK-NEXT: [[MINNUM:%.*]] = call float @llvm.minnum.f32(float [[X:%.*]], float 0x7FF0000000000000)
; CHECK-NEXT: store float [[MINNUM]], ptr [[MINNUM_RES:%.*]], align 4
-; CHECK-NEXT: store float 0x7FF0000000000000, ptr [[MAXNUM_RES:%.*]], align 4
+; CHECK-NEXT: [[MAXNUM:%.*]] = call float @llvm.maxnum.f32(float [[X]], float 0x7FF0000000000000)
+; CHECK-NEXT: store float [[MAXNUM]], ptr [[MAXNUM_RES:%.*]], align 4
; CHECK-NEXT: store float ...
[truncated]
|
|
@llvm/pr-subscribers-llvm-selectiondag Author: Lewis Crawford (LewisCrawford) ChangesThe behaviour of constant-folding See:
This patch removes optimizations and constant-folding support for As far as I am aware, optimizations involving constant Patch is 24.39 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/170181.diff 11 Files Affected:
diff --git a/llvm/lib/Analysis/ConstantFolding.cpp b/llvm/lib/Analysis/ConstantFolding.cpp
index 916154b465af4..63d12ee585e64 100644
--- a/llvm/lib/Analysis/ConstantFolding.cpp
+++ b/llvm/lib/Analysis/ConstantFolding.cpp
@@ -3348,8 +3348,12 @@ static Constant *ConstantFoldIntrinsicCall2(Intrinsic::ID IntrinsicID, Type *Ty,
case Intrinsic::copysign:
return ConstantFP::get(Ty->getContext(), APFloat::copySign(Op1V, Op2V));
case Intrinsic::minnum:
+ if (Op1V.isSignaling() || Op2V.isSignaling())
+ return nullptr;
return ConstantFP::get(Ty->getContext(), minnum(Op1V, Op2V));
case Intrinsic::maxnum:
+ if (Op1V.isSignaling() || Op2V.isSignaling())
+ return nullptr;
return ConstantFP::get(Ty->getContext(), maxnum(Op1V, Op2V));
case Intrinsic::minimum:
return ConstantFP::get(Ty->getContext(), minimum(Op1V, Op2V));
diff --git a/llvm/lib/Analysis/InstructionSimplify.cpp b/llvm/lib/Analysis/InstructionSimplify.cpp
index 59a213b47825a..bd85444d7d2b0 100644
--- a/llvm/lib/Analysis/InstructionSimplify.cpp
+++ b/llvm/lib/Analysis/InstructionSimplify.cpp
@@ -6620,7 +6620,8 @@ static MinMaxOptResult OptimizeConstMinMax(const Constant *RHSConst,
assert(OutNewConstVal != nullptr);
bool PropagateNaN = IID == Intrinsic::minimum || IID == Intrinsic::maximum;
- bool PropagateSNaN = IID == Intrinsic::minnum || IID == Intrinsic::maxnum;
+ bool ReturnsOtherForAllNaNs =
+ IID == Intrinsic::minimumnum || IID == Intrinsic::maximumnum;
bool IsMin = IID == Intrinsic::minimum || IID == Intrinsic::minnum ||
IID == Intrinsic::minimumnum;
@@ -6637,29 +6638,27 @@ static MinMaxOptResult OptimizeConstMinMax(const Constant *RHSConst,
// minnum(x, qnan) -> x
// maxnum(x, qnan) -> x
- // minnum(x, snan) -> qnan
- // maxnum(x, snan) -> qnan
// minimum(X, nan) -> qnan
// maximum(X, nan) -> qnan
// minimumnum(X, nan) -> x
// maximumnum(X, nan) -> x
if (CAPF.isNaN()) {
- if (PropagateNaN || (PropagateSNaN && CAPF.isSignaling())) {
+ if (PropagateNaN) {
*OutNewConstVal = ConstantFP::get(CFP->getType(), CAPF.makeQuiet());
return MinMaxOptResult::UseNewConstVal;
+ } else if (ReturnsOtherForAllNaNs || !CAPF.isSignaling()) {
+ return MinMaxOptResult::UseOtherVal;
}
- return MinMaxOptResult::UseOtherVal;
+ return MinMaxOptResult::CannotOptimize;
}
if (CAPF.isInfinity() || (Call && Call->hasNoInfs() && CAPF.isLargest())) {
- // minnum(X, -inf) -> -inf (ignoring sNaN -> qNaN propagation)
- // maxnum(X, +inf) -> +inf (ignoring sNaN -> qNaN propagation)
// minimum(X, -inf) -> -inf if nnan
// maximum(X, +inf) -> +inf if nnan
// minimumnum(X, -inf) -> -inf
// maximumnum(X, +inf) -> +inf
if (CAPF.isNegative() == IsMin &&
- (!PropagateNaN || (Call && Call->hasNoNaNs()))) {
+ (ReturnsOtherForAllNaNs || (Call && Call->hasNoNaNs()))) {
*OutNewConstVal = const_cast<Constant *>(RHSConst);
return MinMaxOptResult::UseNewConstVal;
}
@@ -7004,12 +7003,10 @@ Value *llvm::simplifyBinaryIntrinsic(Intrinsic::ID IID, Type *ReturnType,
case Intrinsic::minimum:
case Intrinsic::maximumnum:
case Intrinsic::minimumnum: {
- // In several cases here, we deviate from exact IEEE 754 semantics
- // to enable optimizations (as allowed by the LLVM IR spec).
- //
- // For instance, we may return one of the arguments unmodified instead of
- // inserting an llvm.canonicalize to transform input sNaNs into qNaNs,
- // or may assume all NaN inputs are qNaNs.
+ // In some cases here, we deviate from exact IEEE-754 semantics to enable
+ // optimizations (as allowed by the LLVM IR spec) by returning one of the
+ // arguments unmodified instead of inserting an llvm.canonicalize to
+ // transform input sNaNs into qNaNs,
// If the arguments are the same, this is a no-op (ignoring NaN quieting)
if (Op0 == Op1)
diff --git a/llvm/lib/CodeGen/GlobalISel/Utils.cpp b/llvm/lib/CodeGen/GlobalISel/Utils.cpp
index e8954a3d9899b..bc01cb65c4a69 100644
--- a/llvm/lib/CodeGen/GlobalISel/Utils.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/Utils.cpp
@@ -768,8 +768,12 @@ llvm::ConstantFoldFPBinOp(unsigned Opcode, const Register Op1,
C1.copySign(C2);
return C1;
case TargetOpcode::G_FMINNUM:
+ if (C1.isSignaling() || C2.isSignaling())
+ return std::nullopt;
return minnum(C1, C2);
case TargetOpcode::G_FMAXNUM:
+ if (C1.isSignaling() || C2.isSignaling())
+ return std::nullopt;
return maxnum(C1, C2);
case TargetOpcode::G_FMINIMUM:
return minimum(C1, C2);
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 0f3a207cc6414..70950084ee6b7 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -19505,7 +19505,8 @@ SDValue DAGCombiner::visitFMinMax(SDNode *N) {
const SDNodeFlags Flags = N->getFlags();
unsigned Opc = N->getOpcode();
bool PropAllNaNsToQNaNs = Opc == ISD::FMINIMUM || Opc == ISD::FMAXIMUM;
- bool PropOnlySNaNsToQNaNs = Opc == ISD::FMINNUM || Opc == ISD::FMAXNUM;
+ bool ReturnsOtherForAllNaNs =
+ Opc == ISD::FMINIMUMNUM || Opc == ISD::FMAXIMUMNUM;
bool IsMin =
Opc == ISD::FMINNUM || Opc == ISD::FMINIMUM || Opc == ISD::FMINIMUMNUM;
SelectionDAG::FlagInserter FlagsInserter(DAG, N);
@@ -19524,32 +19525,30 @@ SDValue DAGCombiner::visitFMinMax(SDNode *N) {
// minnum(X, qnan) -> X
// maxnum(X, qnan) -> X
- // minnum(X, snan) -> qnan
- // maxnum(X, snan) -> qnan
// minimum(X, nan) -> qnan
// maximum(X, nan) -> qnan
// minimumnum(X, nan) -> X
// maximumnum(X, nan) -> X
if (AF.isNaN()) {
- if (PropAllNaNsToQNaNs || (AF.isSignaling() && PropOnlySNaNsToQNaNs)) {
+ if (PropAllNaNsToQNaNs) {
if (AF.isSignaling())
return DAG.getConstantFP(AF.makeQuiet(), SDLoc(N), VT);
return N->getOperand(1);
+ } else if (ReturnsOtherForAllNaNs || !AF.isSignaling()) {
+ return N->getOperand(0);
}
- return N->getOperand(0);
+ return SDValue();
}
// In the following folds, inf can be replaced with the largest finite
// float, if the ninf flag is set.
if (AF.isInfinity() || (Flags.hasNoInfs() && AF.isLargest())) {
- // minnum(X, -inf) -> -inf (ignoring sNaN -> qNaN propagation)
- // maxnum(X, +inf) -> +inf (ignoring sNaN -> qNaN propagation)
// minimum(X, -inf) -> -inf if nnan
// maximum(X, +inf) -> +inf if nnan
// minimumnum(X, -inf) -> -inf
// maximumnum(X, +inf) -> +inf
if (IsMin == AF.isNegative() &&
- (!PropAllNaNsToQNaNs || Flags.hasNoNaNs()))
+ (ReturnsOtherForAllNaNs || Flags.hasNoNaNs()))
return N->getOperand(1);
// minnum(X, +inf) -> X if nnan
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index c9519ce1610b2..7c2bd74b84fe4 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -7382,8 +7382,12 @@ SDValue SelectionDAG::foldConstantFPMath(unsigned Opcode, const SDLoc &DL,
C1.copySign(C2);
return getConstantFP(C1, DL, VT);
case ISD::FMINNUM:
+ if (C1.isSignaling() || C2.isSignaling())
+ return SDValue();
return getConstantFP(minnum(C1, C2), DL, VT);
case ISD::FMAXNUM:
+ if (C1.isSignaling() || C2.isSignaling())
+ return SDValue();
return getConstantFP(maxnum(C1, C2), DL, VT);
case ISD::FMINIMUM:
return getConstantFP(minimum(C1, C2), DL, VT);
diff --git a/llvm/test/CodeGen/AMDGPU/fcanonicalize-elimination.ll b/llvm/test/CodeGen/AMDGPU/fcanonicalize-elimination.ll
index 05d3e9c381910..e7685d53b2d10 100644
--- a/llvm/test/CodeGen/AMDGPU/fcanonicalize-elimination.ll
+++ b/llvm/test/CodeGen/AMDGPU/fcanonicalize-elimination.ll
@@ -497,10 +497,12 @@ define amdgpu_kernel void @test_fold_canonicalize_minnum_value_f32(ptr addrspace
ret void
}
-; FIXME: Should there be more checks here? minnum with sNaN operand is simplified to qNaN.
+; FIXME: Should there be more checks here? minnum with sNaN operand might get simplified away.
; GCN-LABEL: test_fold_canonicalize_sNaN_value_f32:
-; GCN: v_mov_b32_e32 v{{.+}}, 0x7fc00000
+; GCN: {{flat|global}}_load_dword [[LOAD:v[0-9]+]]
+; VI: v_mul_f32_e32 v{{[0-9]+}}, 1.0, [[LOAD]]
+; GFX9: v_max_f32_e32 v{{[0-9]+}}, [[LOAD]], [[LOAD]]
define amdgpu_kernel void @test_fold_canonicalize_sNaN_value_f32(ptr addrspace(1) %arg) {
%id = tail call i32 @llvm.amdgcn.workitem.id.x()
%gep = getelementptr inbounds float, ptr addrspace(1) %arg, i32 %id
diff --git a/llvm/test/CodeGen/X86/fmaxnum.ll b/llvm/test/CodeGen/X86/fmaxnum.ll
index 150bef01bdbe0..6a03628d9f078 100644
--- a/llvm/test/CodeGen/X86/fmaxnum.ll
+++ b/llvm/test/CodeGen/X86/fmaxnum.ll
@@ -676,15 +676,44 @@ define float @test_maxnum_neg_inf_nnan(float %x, float %y) nounwind {
; Test SNaN quieting
define float @test_maxnum_snan(float %x) {
-; SSE-LABEL: test_maxnum_snan:
-; SSE: # %bb.0:
-; SSE-NEXT: movss {{.*#+}} xmm0 = [NaN,0.0E+0,0.0E+0,0.0E+0]
-; SSE-NEXT: retq
+; SSE2-LABEL: test_maxnum_snan:
+; SSE2: # %bb.0:
+; SSE2-NEXT: movss {{.*#+}} xmm2 = [NaN,0.0E+0,0.0E+0,0.0E+0]
+; SSE2-NEXT: movaps %xmm0, %xmm1
+; SSE2-NEXT: cmpunordss %xmm0, %xmm1
+; SSE2-NEXT: movaps %xmm1, %xmm3
+; SSE2-NEXT: andps %xmm2, %xmm3
+; SSE2-NEXT: maxss %xmm0, %xmm2
+; SSE2-NEXT: andnps %xmm2, %xmm1
+; SSE2-NEXT: orps %xmm3, %xmm1
+; SSE2-NEXT: movaps %xmm1, %xmm0
+; SSE2-NEXT: retq
;
-; AVX-LABEL: test_maxnum_snan:
-; AVX: # %bb.0:
-; AVX-NEXT: vmovss {{.*#+}} xmm0 = [NaN,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT: retq
+; SSE4-LABEL: test_maxnum_snan:
+; SSE4: # %bb.0:
+; SSE4-NEXT: movss {{.*#+}} xmm1 = [NaN,0.0E+0,0.0E+0,0.0E+0]
+; SSE4-NEXT: maxss %xmm0, %xmm1
+; SSE4-NEXT: cmpunordss %xmm0, %xmm0
+; SSE4-NEXT: blendvps %xmm0, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
+; SSE4-NEXT: movaps %xmm1, %xmm0
+; SSE4-NEXT: retq
+;
+; AVX1-LABEL: test_maxnum_snan:
+; AVX1: # %bb.0:
+; AVX1-NEXT: vmovss {{.*#+}} xmm1 = [NaN,0.0E+0,0.0E+0,0.0E+0]
+; AVX1-NEXT: vmaxss %xmm0, %xmm1, %xmm1
+; AVX1-NEXT: vcmpunordss %xmm0, %xmm0, %xmm0
+; AVX1-NEXT: vblendvps %xmm0, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm0
+; AVX1-NEXT: retq
+;
+; AVX512-LABEL: test_maxnum_snan:
+; AVX512: # %bb.0:
+; AVX512-NEXT: vmovss {{.*#+}} xmm2 = [NaN,0.0E+0,0.0E+0,0.0E+0]
+; AVX512-NEXT: vmaxss %xmm0, %xmm2, %xmm1
+; AVX512-NEXT: vcmpunordss %xmm0, %xmm0, %k1
+; AVX512-NEXT: vmovss %xmm2, %xmm1, %xmm1 {%k1}
+; AVX512-NEXT: vmovaps %xmm1, %xmm0
+; AVX512-NEXT: retq
%r = call float @llvm.maxnum.f32(float 0x7ff4000000000000, float %x)
ret float %r
}
diff --git a/llvm/test/CodeGen/X86/fminnum.ll b/llvm/test/CodeGen/X86/fminnum.ll
index 4aa1a618be758..5c882c99d4f14 100644
--- a/llvm/test/CodeGen/X86/fminnum.ll
+++ b/llvm/test/CodeGen/X86/fminnum.ll
@@ -676,15 +676,44 @@ define float @test_minnum_inf_nnan(float %x, float %y) nounwind {
; Test SNaN quieting
define float @test_minnum_snan(float %x) {
-; SSE-LABEL: test_minnum_snan:
-; SSE: # %bb.0:
-; SSE-NEXT: movss {{.*#+}} xmm0 = [NaN,0.0E+0,0.0E+0,0.0E+0]
-; SSE-NEXT: retq
+; SSE2-LABEL: test_minnum_snan:
+; SSE2: # %bb.0:
+; SSE2-NEXT: movss {{.*#+}} xmm2 = [NaN,0.0E+0,0.0E+0,0.0E+0]
+; SSE2-NEXT: movaps %xmm0, %xmm1
+; SSE2-NEXT: cmpunordss %xmm0, %xmm1
+; SSE2-NEXT: movaps %xmm1, %xmm3
+; SSE2-NEXT: andps %xmm2, %xmm3
+; SSE2-NEXT: minss %xmm0, %xmm2
+; SSE2-NEXT: andnps %xmm2, %xmm1
+; SSE2-NEXT: orps %xmm3, %xmm1
+; SSE2-NEXT: movaps %xmm1, %xmm0
+; SSE2-NEXT: retq
;
-; AVX-LABEL: test_minnum_snan:
-; AVX: # %bb.0:
-; AVX-NEXT: vmovss {{.*#+}} xmm0 = [NaN,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT: retq
+; SSE4-LABEL: test_minnum_snan:
+; SSE4: # %bb.0:
+; SSE4-NEXT: movss {{.*#+}} xmm1 = [NaN,0.0E+0,0.0E+0,0.0E+0]
+; SSE4-NEXT: minss %xmm0, %xmm1
+; SSE4-NEXT: cmpunordss %xmm0, %xmm0
+; SSE4-NEXT: blendvps %xmm0, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
+; SSE4-NEXT: movaps %xmm1, %xmm0
+; SSE4-NEXT: retq
+;
+; AVX1-LABEL: test_minnum_snan:
+; AVX1: # %bb.0:
+; AVX1-NEXT: vmovss {{.*#+}} xmm1 = [NaN,0.0E+0,0.0E+0,0.0E+0]
+; AVX1-NEXT: vminss %xmm0, %xmm1, %xmm1
+; AVX1-NEXT: vcmpunordss %xmm0, %xmm0, %xmm0
+; AVX1-NEXT: vblendvps %xmm0, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm0
+; AVX1-NEXT: retq
+;
+; AVX512-LABEL: test_minnum_snan:
+; AVX512: # %bb.0:
+; AVX512-NEXT: vmovss {{.*#+}} xmm2 = [NaN,0.0E+0,0.0E+0,0.0E+0]
+; AVX512-NEXT: vminss %xmm0, %xmm2, %xmm1
+; AVX512-NEXT: vcmpunordss %xmm0, %xmm0, %k1
+; AVX512-NEXT: vmovss %xmm2, %xmm1, %xmm1 {%k1}
+; AVX512-NEXT: vmovaps %xmm1, %xmm0
+; AVX512-NEXT: retq
%r = call float @llvm.minnum.f32(float 0x7ff4000000000000, float %x)
ret float %r
}
diff --git a/llvm/test/Transforms/InstCombine/simplify-demanded-fpclass.ll b/llvm/test/Transforms/InstCombine/simplify-demanded-fpclass.ll
index df60078dbf452..a7ff967d3123b 100644
--- a/llvm/test/Transforms/InstCombine/simplify-demanded-fpclass.ll
+++ b/llvm/test/Transforms/InstCombine/simplify-demanded-fpclass.ll
@@ -10,6 +10,8 @@ declare float @llvm.trunc.f32(float)
declare float @llvm.arithmetic.fence.f32(float)
declare float @llvm.minnum.f32(float, float)
declare float @llvm.maxnum.f32(float, float)
+declare float @llvm.minimumnum.f32(float, float)
+declare float @llvm.maximumnum.f32(float, float)
define float @ninf_user_select_inf(i1 %cond, float %x, float %y) {
@@ -1314,7 +1316,7 @@ define nofpclass(pinf) float @ret_nofpclass_pinf__minnum_ninf(i1 %cond, float %x
; CHECK-SAME: (i1 [[COND:%.*]], float [[X:%.*]]) {
; CHECK-NEXT: ret float 0xFFF0000000000000
;
- %min = call float @llvm.minnum.f32(float %x, float 0xFFF0000000000000)
+ %min = call float @llvm.minimumnum.f32(float %x, float 0xFFF0000000000000)
ret float %min
}
@@ -1335,6 +1337,6 @@ define nofpclass(ninf) float @ret_nofpclass_ninf__maxnum_pinf(i1 %cond, float %x
; CHECK-SAME: (i1 [[COND:%.*]], float [[X:%.*]]) {
; CHECK-NEXT: ret float 0x7FF0000000000000
;
- %max = call float @llvm.maxnum.f32(float %x, float 0x7FF0000000000000)
+ %max = call float @llvm.maximumnum.f32(float %x, float 0x7FF0000000000000)
ret float %max
}
diff --git a/llvm/test/Transforms/InstSimplify/ConstProp/min-max.ll b/llvm/test/Transforms/InstSimplify/ConstProp/min-max.ll
index a633d29179896..84bec15d6ed32 100644
--- a/llvm/test/Transforms/InstSimplify/ConstProp/min-max.ll
+++ b/llvm/test/Transforms/InstSimplify/ConstProp/min-max.ll
@@ -97,7 +97,8 @@ define float @minnum_float_qnan_p0() {
define float @minnum_float_p0_snan() {
; CHECK-LABEL: @minnum_float_p0_snan(
-; CHECK-NEXT: ret float 0x7FFC000000000000
+; CHECK-NEXT: [[MIN:%.*]] = call float @llvm.minnum.f32(float 0.000000e+00, float 0x7FF4000000000000)
+; CHECK-NEXT: ret float [[MIN]]
;
%min = call float @llvm.minnum.f32(float 0.0, float 0x7FF4000000000000)
ret float %min
@@ -105,7 +106,8 @@ define float @minnum_float_p0_snan() {
define float @minnum_float_snan_p0() {
; CHECK-LABEL: @minnum_float_snan_p0(
-; CHECK-NEXT: ret float 0x7FFC000000000000
+; CHECK-NEXT: [[MIN:%.*]] = call float @llvm.minnum.f32(float 0x7FF4000000000000, float 0.000000e+00)
+; CHECK-NEXT: ret float [[MIN]]
;
%min = call float @llvm.minnum.f32(float 0x7FF4000000000000, float 0.0)
ret float %min
@@ -205,7 +207,8 @@ define float @maxnum_float_qnan_p0() {
define float @maxnum_float_p0_snan() {
; CHECK-LABEL: @maxnum_float_p0_snan(
-; CHECK-NEXT: ret float 0x7FFC000000000000
+; CHECK-NEXT: [[MAX:%.*]] = call float @llvm.maxnum.f32(float 0.000000e+00, float 0x7FF4000000000000)
+; CHECK-NEXT: ret float [[MAX]]
;
%max = call float @llvm.maxnum.f32(float 0.0, float 0x7FF4000000000000)
ret float %max
@@ -213,7 +216,8 @@ define float @maxnum_float_p0_snan() {
define float @maxnum_float_snan_p0() {
; CHECK-LABEL: @maxnum_float_snan_p0(
-; CHECK-NEXT: ret float 0x7FFC000000000000
+; CHECK-NEXT: [[MAX:%.*]] = call float @llvm.maxnum.f32(float 0x7FF4000000000000, float 0.000000e+00)
+; CHECK-NEXT: ret float [[MAX]]
;
%max = call float @llvm.maxnum.f32(float 0x7FF4000000000000, float 0.0)
ret float %max
diff --git a/llvm/test/Transforms/InstSimplify/fminmax-folds.ll b/llvm/test/Transforms/InstSimplify/fminmax-folds.ll
index 091e85920c0df..7544f7190df89 100644
--- a/llvm/test/Transforms/InstSimplify/fminmax-folds.ll
+++ b/llvm/test/Transforms/InstSimplify/fminmax-folds.ll
@@ -43,11 +43,13 @@ define void @minmax_qnan_f32(float %x, ptr %minnum_res, ptr %maxnum_res, ptr %mi
; Note that maxnum/minnum return qnan here for snan inputs, unlike maximumnum/minimumnum
define void @minmax_snan_f32(float %x, ptr %minnum_res, ptr %maxnum_res, ptr %minimum_res, ptr %maximum_res, ptr %minimumnum_res, ptr %maximumnum_res) {
; CHECK-LABEL: @minmax_snan_f32(
-; CHECK-NEXT: store float 0x7FFC000000000000, ptr [[MINNUM_RES:%.*]], align 4
-; CHECK-NEXT: store float 0x7FFC000000000000, ptr [[MAXNUM_RES:%.*]], align 4
+; CHECK-NEXT: [[MINNUM:%.*]] = call float @llvm.minnum.f32(float [[X:%.*]], float 0x7FF4000000000000)
+; CHECK-NEXT: store float [[MINNUM]], ptr [[MINNUM_RES:%.*]], align 4
+; CHECK-NEXT: [[MAXNUM:%.*]] = call float @llvm.maxnum.f32(float [[X]], float 0x7FF4000000000000)
+; CHECK-NEXT: store float [[MAXNUM]], ptr [[MAXNUM_RES:%.*]], align 4
; CHECK-NEXT: store float 0x7FFC000000000000, ptr [[MINIMUM_RES:%.*]], align 4
; CHECK-NEXT: store float 0x7FFC000000000000, ptr [[MAXIMUM_RES:%.*]], align 4
-; CHECK-NEXT: store float [[X:%.*]], ptr [[MINIMUMNUM_RES:%.*]], align 4
+; CHECK-NEXT: store float [[X]], ptr [[MINIMUMNUM_RES:%.*]], align 4
; CHECK-NEXT: store float [[X]], ptr [[MAXIMUMNUM_RES:%.*]], align 4
; CHECK-NEXT: ret void
;
@@ -98,11 +100,13 @@ define void @minmax_qnan_nxv2f64_op0(<vscale x 2 x double> %x, ptr %minnum_res,
; Note that maxnum/minnum return qnan here for snan inputs, unlike maximumnum/minimumnum
define void @minmax_snan_nxv2f64_op1(<vscale x 2 x double> %x, ptr %minnum_res, ptr %maxnum_res, ptr %minimum_res, ptr %maximum_res, ptr %minimumnum_res, ptr %maximumnum_res) {
; CHECK-LABEL: @minmax_snan_nxv2f64_op1(
-; CHECK-NEXT: store <vscale x 2 x double> splat (double 0x7FFC00DEAD00DEAD), ptr [[MINNUM_RES:%.*]], align 16
-; CHECK-NEXT: store <vscale x 2 x double> splat (double 0x7FFC00DEAD00DEAD), ptr [[MAXNUM_RES:%.*]], align 16
+; CHECK-NEXT: [[MINNUM:%.*]] = call <vscale x 2 x double> @llvm.minnum.nxv2f64(<vscale x 2 x double> splat (double 0x7FF400DEAD00DEAD), <vscale x 2 x double> [[X:%.*]])
+; CHECK-NEXT: store <vscale x 2 x double> [[MINNUM]], ptr [[MINNUM_RES:%.*]], align 16
+; CHECK-NEXT: [[MAXNUM:%.*]] = call <vscale x 2 x double> @llvm.maxnum.nxv2f64(<vscale x 2 x double> splat (double 0x7FF400DEAD00DEAD), <vscale x 2 x double> [[X]])
+; CHECK-NEXT: store <vscale x 2 x double> [[MAXNUM]], ptr [[MAXNUM_RES:%.*]], align 16
; CHECK-NEXT: store <vscale x 2 x double> splat (double 0x7FFC00DEAD00DEAD), ptr [[MINIMUM_RES:%.*]], align 16
; CHECK-NEXT: store <vscale x 2 x double> splat (double 0x7FFC00DEAD00DEAD), ptr [[MAXIMUM_RES:%.*]], align 16
-; CHECK-NEXT: store <vscale x 2 x double> [[X:%.*]], ptr [[MINIMUMNUM_RES:%.*]], align 16
+; CHECK-NEXT: store <vscale x 2 x double> [[X]], ptr [[MINIMUMNUM_RES:%.*]], align 16
; CHECK-NEXT: store <vscale x 2 x double> [[X]], ptr [[MAXIMUMNUM_RES:%.*]], align 16
; CHECK-NEXT: ret void
;
@@ -255,7 +259,8 @@ define void @minmax_pos_inf_f32(float %x, ptr %minnum_res, ptr %maxnum_res, ptr
; CHECK-LABEL: @minmax_pos_inf_f32(
; CHECK-NEXT: [[MINNUM:%.*]] = call float @llvm.minnum.f32(float [[X:%.*]], float 0x7FF0000000000000)
; CHECK-NEXT: store float [[MINNUM]], ptr [[MINNUM_RES:%.*]], align 4
-; CHECK-NEXT: store float 0x7FF0000000000000, ptr [[MAXNUM_RES:%.*]], align 4
+; CHECK-NEXT: [[MAXNUM:%.*]] = call float @llvm.maxnum.f32(float [[X]], float 0x7FF0000000000000)
+; CHECK-NEXT: store float [[MAXNUM]], ptr [[MAXNUM_RES:%.*]], align 4
; CHECK-NEXT: store float ...
[truncated]
|
🐧 Linux x64 Test Results
|
I'm curious, do we want to keep these optimizations around if the target makes no distinction between signalling and non-signalling NaNs? AFAICT, sNaN makes no difference on NVIDIA GPUs, and never causes any traps or updates any flags. |
Artem-B
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM in general.
phoebewang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm good with the change assuming sNaN is rare in reality. I still want it back once we made consensus in LangRef.
X86 doesn't distinguish it either. We need either change the LangRef or change targets' implementation to match with it. But either way, the optimizations are worth well. We should bring it back once the controversy settled. |
| ; AVX512-NEXT: vcmpunordss %xmm0, %xmm0, %k1 | ||
| ; AVX512-NEXT: vmovss %xmm2, %xmm1, %xmm1 {%k1} | ||
| ; AVX512-NEXT: vmovaps %xmm1, %xmm0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably something for a different PR,
but this really looks like a use case for VFIXUPIMMP[SD] with [QS]NaN to src1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A MOVSS is enough if NaN is known. But we don't need bother doing this in each backend. Just waiting for the code in DAGCombiner.cpp brought back :)
Even on backends like NVPTX that don't really care about sNaNs, the specification discussions around what the correct behaviour of For the NVPTX backend, we are hoping to transition
Most of these optimizations will probably be ok to bring back if the result is that the LangRef clearly defines the exact expected result here for all targets. If it instead goes for something like "return value is non-deterministic if one or both inputs are sNaN" as mentioned here #170082 (comment) , then maybe it's better to keep disabled so the constant and non-constant SNaNs produce consistent behaviour. Maybe we might also want to add a target-specific check for whether the backend cares about SNaN trap / flag setting too (but this is probably already covered by |
The behaviour of constant-folding
maxnum(sNaN, x)andminnum(sNaN, x)has become controversial, and there are ongoing discussions about which behaviour we want to specify in the LLVM IR LangRef.See:
This patch removes optimizations and constant-folding support for
maxnum(sNaN, x)but keeps it folded/optimized forqNaN. This should allow for some more flexibility so the implementation can conform to either the old or new version of the semantics specified without any changes.As far as I am aware, optimizations involving constant
sNaNshould generally be edge-cases that rarely occur, so here should hopefully be very little real-world performance impact from disabling these optimizations.