-
Notifications
You must be signed in to change notification settings - Fork 15.1k
[InstSimplify] Optimize maximumnum and minimumnum #139581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-backend-amdgpu @llvm/pr-subscribers-llvm-ir Author: Lewis Crawford (LewisCrawford) ChangesAdd support for the new maximumnum and minimumnum intrinsics in various optimizations in InstSimplify. Also, change the behavior of optimizing maxnum(sNaN, x) to simplify to qNaN instead of x to better match the LLVM IR spec, and add more tests for sNaN behavior for all 3 max/min intrinsic types. Patch is 49.41 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/139581.diff 4 Files Affected:
diff --git a/llvm/include/llvm/IR/PatternMatch.h b/llvm/include/llvm/IR/PatternMatch.h
index 6242a686e7bc0..c4d1bd095847d 100644
--- a/llvm/include/llvm/IR/PatternMatch.h
+++ b/llvm/include/llvm/IR/PatternMatch.h
@@ -707,10 +707,25 @@ m_SpecificInt_ICMP(ICmpInst::Predicate Predicate, const APInt &Threshold) {
struct is_nan {
bool isValue(const APFloat &C) const { return C.isNaN(); }
};
+
+struct is_snan {
+ bool isValue(const APFloat &C) const { return C.isSignaling(); }
+};
+
+struct is_qnan {
+ bool isValue(const APFloat &C) const { return C.isNaN() && !C.isSignaling(); }
+};
+
/// Match an arbitrary NaN constant. This includes quiet and signalling nans.
/// For vectors, this includes constants with undefined elements.
inline cstfp_pred_ty<is_nan> m_NaN() { return cstfp_pred_ty<is_nan>(); }
+/// Match quiet NaN constants, including vectors with undefined elements.
+inline cstfp_pred_ty<is_qnan> m_qNaN() { return cstfp_pred_ty<is_qnan>(); }
+
+/// Match signalling NaN constants, including vectors with undefined elements.
+inline cstfp_pred_ty<is_snan> m_sNaN() { return cstfp_pred_ty<is_snan>(); }
+
struct is_nonnan {
bool isValue(const APFloat &C) const { return !C.isNaN(); }
};
diff --git a/llvm/lib/Analysis/InstructionSimplify.cpp b/llvm/lib/Analysis/InstructionSimplify.cpp
index 85e3be9cc45c3..49774bcb45585 100644
--- a/llvm/lib/Analysis/InstructionSimplify.cpp
+++ b/llvm/lib/Analysis/InstructionSimplify.cpp
@@ -6415,7 +6415,8 @@ static Value *foldMinMaxSharedOp(Intrinsic::ID IID, Value *Op0, Value *Op1) {
static Value *foldMinimumMaximumSharedOp(Intrinsic::ID IID, Value *Op0,
Value *Op1) {
assert((IID == Intrinsic::maxnum || IID == Intrinsic::minnum ||
- IID == Intrinsic::maximum || IID == Intrinsic::minimum) &&
+ IID == Intrinsic::maximum || IID == Intrinsic::minimum ||
+ IID == Intrinsic::maximumnum || IID == Intrinsic::minimumnum) &&
"Unsupported intrinsic");
auto *M0 = dyn_cast<IntrinsicInst>(Op0);
@@ -6711,7 +6712,16 @@ Value *llvm::simplifyBinaryIntrinsic(Intrinsic::ID IID, Type *ReturnType,
case Intrinsic::maxnum:
case Intrinsic::minnum:
case Intrinsic::maximum:
- case Intrinsic::minimum: {
+ case Intrinsic::minimum:
+ case Intrinsic::maximumnum:
+ case Intrinsic::minimumnum: {
+ // In several cases here, we deviate from exact IEEE 754 semantics
+ // to enable optimizations (as allowed by the LLVM IR spec).
+ //
+ // For instance, we often return one of the arguments unmodified instead of
+ // inserting an llvm.canonicalize to transform input sNaNs into qNaNs or to
+ // respect any FTZ semantics, and sometimes assume all NaN inputs are qNaNs.
+
// If the arguments are the same, this is a no-op.
if (Op0 == Op1)
return Op0;
@@ -6725,32 +6735,43 @@ Value *llvm::simplifyBinaryIntrinsic(Intrinsic::ID IID, Type *ReturnType,
return Op0;
bool PropagateNaN = IID == Intrinsic::minimum || IID == Intrinsic::maximum;
- bool IsMin = IID == Intrinsic::minimum || IID == Intrinsic::minnum;
-
- // minnum(X, nan) -> X
- // maxnum(X, nan) -> X
- // minimum(X, nan) -> nan
- // maximum(X, nan) -> nan
- if (match(Op1, m_NaN()))
- return PropagateNaN ? propagateNaN(cast<Constant>(Op1)) : Op0;
+ bool PropagateSNaN = IID == Intrinsic::minnum || IID == Intrinsic::maxnum;
+ bool IsMin = IID == Intrinsic::minimum || IID == Intrinsic::minnum ||
+ IID == Intrinsic::minimumnum;
+
+ // minnum(x, qnan) -> x
+ // maxnum(x, qnan) -> x
+ // minnum(x, snan) -> qnan
+ // maxnum(x, snan) -> qnan
+ // minimum(X, nan) -> qnan
+ // maximum(X, nan) -> qnan
+ if (match(Op1, m_NaN())) {
+ if (PropagateNaN || (PropagateSNaN && match(Op1, m_sNaN())))
+ return propagateNaN(cast<Constant>(Op1));
+ return Op0;
+ }
// In the following folds, inf can be replaced with the largest finite
// float, if the ninf flag is set.
const APFloat *C;
if (match(Op1, m_APFloat(C)) &&
(C->isInfinity() || (Call && Call->hasNoInfs() && C->isLargest()))) {
- // minnum(X, -inf) -> -inf
- // maxnum(X, +inf) -> +inf
+ // minnum(X, -inf) -> -inf (ignoring sNaN -> qNaN propagation)
+ // maxnum(X, +inf) -> +inf (ignoring sNaN -> qNaN propagation)
// minimum(X, -inf) -> -inf if nnan
// maximum(X, +inf) -> +inf if nnan
+ // minimumnum(X, -inf) -> -inf
+ // maximumnum(X, +inf) -> +inf
if (C->isNegative() == IsMin &&
(!PropagateNaN || (Call && Call->hasNoNaNs())))
return ConstantFP::get(ReturnType, *C);
// minnum(X, +inf) -> X if nnan
// maxnum(X, -inf) -> X if nnan
- // minimum(X, +inf) -> X
- // maximum(X, -inf) -> X
+ // minimum(X, +inf) -> X (ignoring quieting of sNaNs)
+ // maximum(X, -inf) -> X (ignoring quieting of sNaNs)
+ // maximumnum(X, -inf) -> X if nnan
+ // minimumnum(X, +inf) -> X if nnan
if (C->isNegative() != IsMin &&
(PropagateNaN || (Call && Call->hasNoNaNs())))
return Op0;
diff --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp
index 3d403531cea2f..c01ebbce15804 100644
--- a/llvm/lib/Analysis/ValueTracking.cpp
+++ b/llvm/lib/Analysis/ValueTracking.cpp
@@ -9310,6 +9310,8 @@ Intrinsic::ID llvm::getInverseMinMaxIntrinsic(Intrinsic::ID MinMaxID) {
case Intrinsic::minimum: return Intrinsic::maximum;
case Intrinsic::maxnum: return Intrinsic::minnum;
case Intrinsic::minnum: return Intrinsic::maxnum;
+ case Intrinsic::maximumnum: return Intrinsic::minimumnum;
+ case Intrinsic::minimumnum: return Intrinsic::maximumnum;
default: llvm_unreachable("Unexpected intrinsic");
}
}
diff --git a/llvm/test/Transforms/InstSimplify/fminmax-folds.ll b/llvm/test/Transforms/InstSimplify/fminmax-folds.ll
index fff6cfd8a3b4b..0510a713dba0b 100644
--- a/llvm/test/Transforms/InstSimplify/fminmax-folds.ll
+++ b/llvm/test/Transforms/InstSimplify/fminmax-folds.ll
@@ -54,6 +54,70 @@ define float @test_minimum_const_nan(float %x) {
ret float %r
}
+define float @test_maximumnum_const_nan(float %x) {
+; CHECK-LABEL: @test_maximumnum_const_nan(
+; CHECK-NEXT: ret float [[R:%.*]]
+;
+ %r = call float @llvm.maximumnum.f32(float %x, float 0x7fff000000000000)
+ ret float %r
+}
+
+define float @test_minimumnum_const_nan(float %x) {
+; CHECK-LABEL: @test_minimumnum_const_nan(
+; CHECK-NEXT: ret float [[R:%.*]]
+;
+ %r = call float @llvm.minimumnum.f32(float %x, float 0x7fff000000000000)
+ ret float %r
+}
+
+define float @test_minnum_const_snan(float %x) {
+; CHECK-LABEL: @test_minnum_const_snan(
+; CHECK-NEXT: ret float 0x7FFC000000000000
+;
+ %r = call float @llvm.minnum.f32(float %x, float 0x7ff4000000000000)
+ ret float %r
+}
+
+define float @test_maxnum_const_snan(float %x) {
+; CHECK-LABEL: @test_maxnum_const_snan(
+; CHECK-NEXT: ret float 0x7FFC000000000000
+;
+ %r = call float @llvm.maxnum.f32(float %x, float 0x7ff4000000000000)
+ ret float %r
+}
+
+define float @test_maximum_const_snan(float %x) {
+; CHECK-LABEL: @test_maximum_const_snan(
+; CHECK-NEXT: ret float 0x7FFC000000000000
+;
+ %r = call float @llvm.maximum.f32(float %x, float 0x7ff4000000000000)
+ ret float %r
+}
+
+define float @test_minimum_const_snan(float %x) {
+; CHECK-LABEL: @test_minimum_const_snan(
+; CHECK-NEXT: ret float 0x7FFC000000000000
+;
+ %r = call float @llvm.minimum.f32(float %x, float 0x7ff4000000000000)
+ ret float %r
+}
+
+define float @test_maximumnum_const_snan(float %x) {
+; CHECK-LABEL: @test_maximumnum_const_snan(
+; CHECK-NEXT: ret float [[X:%.*]]
+;
+ %r = call float @llvm.maximumnum.f32(float %x, float 0x7ff4000000000000)
+ ret float %r
+}
+
+define float @test_minimumnum_const_snan(float %x) {
+; CHECK-LABEL: @test_minimumnum_const_snan(
+; CHECK-NEXT: ret float [[X:%.*]]
+;
+ %r = call float @llvm.minimumnum.f32(float %x, float 0x7ff4000000000000)
+ ret float %r
+}
+
define float @test_minnum_const_inf(float %x) {
; CHECK-LABEL: @test_minnum_const_inf(
; CHECK-NEXT: [[R:%.*]] = call float @llvm.minnum.f32(float [[X:%.*]], float 0x7FF0000000000000)
@@ -88,6 +152,23 @@ define float @test_minimum_const_inf(float %x) {
ret float %r
}
+define float @test_maximumnum_const_inf(float %x) {
+; CHECK-LABEL: @test_maximumnum_const_inf(
+; CHECK-NEXT: ret float 0x7FF0000000000000
+;
+ %r = call float @llvm.maximumnum.f32(float %x, float 0x7ff0000000000000)
+ ret float %r
+}
+
+define float @test_minimumnum_const_inf(float %x) {
+; CHECK-LABEL: @test_minimumnum_const_inf(
+; CHECK-NEXT: [[R:%.*]] = call float @llvm.minimumnum.f32(float [[X:%.*]], float 0x7FF0000000000000)
+; CHECK-NEXT: ret float [[R]]
+;
+ %r = call float @llvm.minimumnum.f32(float %x, float 0x7ff0000000000000)
+ ret float %r
+}
+
define float @test_minnum_const_neg_inf(float %x) {
; CHECK-LABEL: @test_minnum_const_neg_inf(
; CHECK-NEXT: ret float 0xFFF0000000000000
@@ -122,6 +203,23 @@ define float @test_minimum_const_neg_inf(float %x) {
ret float %r
}
+define float @test_maximumnum_const_neg_inf(float %x) {
+; CHECK-LABEL: @test_maximumnum_const_neg_inf(
+; CHECK-NEXT: [[X:%.*]] = call float @llvm.maximumnum.f32(float [[X1:%.*]], float 0xFFF0000000000000)
+; CHECK-NEXT: ret float [[X]]
+;
+ %r = call float @llvm.maximumnum.f32(float %x, float 0xfff0000000000000)
+ ret float %r
+}
+
+define float @test_minimumnum_const_neg_inf(float %x) {
+; CHECK-LABEL: @test_minimumnum_const_neg_inf(
+; CHECK-NEXT: ret float 0xFFF0000000000000
+;
+ %r = call float @llvm.minimumnum.f32(float %x, float 0xfff0000000000000)
+ ret float %r
+}
+
define float @test_minnum_const_inf_nnan(float %x) {
; CHECK-LABEL: @test_minnum_const_inf_nnan(
; CHECK-NEXT: ret float [[X:%.*]]
@@ -154,6 +252,22 @@ define float @test_minimum_const_inf_nnan(float %x) {
ret float %r
}
+define float @test_maximumnum_const_inf_nnan(float %x) {
+; CHECK-LABEL: @test_maximumnum_const_inf_nnan(
+; CHECK-NEXT: ret float 0x7FF0000000000000
+;
+ %r = call nnan float @llvm.maximumnum.f32(float %x, float 0x7ff0000000000000)
+ ret float %r
+}
+
+define float @test_minimumnum_const_inf_nnan(float %x) {
+; CHECK-LABEL: @test_minimumnum_const_inf_nnan(
+; CHECK-NEXT: ret float [[X:%.*]]
+;
+ %r = call nnan float @llvm.minimumnum.f32(float %x, float 0x7ff0000000000000)
+ ret float %r
+}
+
define float @test_minnum_const_inf_nnan_comm(float %x) {
; CHECK-LABEL: @test_minnum_const_inf_nnan_comm(
; CHECK-NEXT: ret float [[X:%.*]]
@@ -186,6 +300,22 @@ define float @test_minimum_const_inf_nnan_comm(float %x) {
ret float %r
}
+define float @test_maximumnum_const_inf_nnan_comm(float %x) {
+; CHECK-LABEL: @test_maximumnum_const_inf_nnan_comm(
+; CHECK-NEXT: ret float 0x7FF0000000000000
+;
+ %r = call nnan float @llvm.maximumnum.f32(float 0x7ff0000000000000, float %x)
+ ret float %r
+}
+
+define float @test_minimumnum_const_inf_nnan_comm(float %x) {
+; CHECK-LABEL: @test_minimumnum_const_inf_nnan_comm(
+; CHECK-NEXT: ret float [[X:%.*]]
+;
+ %r = call nnan float @llvm.minimumnum.f32(float 0x7ff0000000000000, float %x)
+ ret float %r
+}
+
define <2 x float> @test_minnum_const_inf_nnan_comm_vec(<2 x float> %x) {
; CHECK-LABEL: @test_minnum_const_inf_nnan_comm_vec(
; CHECK-NEXT: ret <2 x float> [[X:%.*]]
@@ -218,6 +348,22 @@ define <2 x float> @test_minimum_const_inf_nnan_comm_vec(<2 x float> %x) {
ret <2 x float> %r
}
+define <2 x float> @test_maximumnum_const_inf_nnan_comm_vec(<2 x float> %x) {
+; CHECK-LABEL: @test_maximumnum_const_inf_nnan_comm_vec(
+; CHECK-NEXT: ret <2 x float> splat (float 0x7FF0000000000000)
+;
+ %r = call nnan <2 x float> @llvm.maximumnum.v2f32(<2 x float> <float 0x7ff0000000000000, float 0x7ff0000000000000>, <2 x float> %x)
+ ret <2 x float> %r
+}
+
+define <2 x float> @test_minimumnum_const_inf_nnan_comm_vec(<2 x float> %x) {
+; CHECK-LABEL: @test_minimumnum_const_inf_nnan_comm_vec(
+; CHECK-NEXT: ret <2 x float> [[X:%.*]]
+;
+ %r = call nnan <2 x float> @llvm.minimumnum.v2f32(<2 x float> <float 0x7ff0000000000000, float 0x7ff0000000000000>, <2 x float> %x)
+ ret <2 x float> %r
+}
+
define float @test_minnum_const_neg_inf_nnan(float %x) {
; CHECK-LABEL: @test_minnum_const_neg_inf_nnan(
; CHECK-NEXT: ret float 0xFFF0000000000000
@@ -250,6 +396,22 @@ define float @test_minimum_const_neg_inf_nnan(float %x) {
ret float %r
}
+define float @test_maximumnum_const_neg_inf_nnan(float %x) {
+; CHECK-LABEL: @test_maximumnum_const_neg_inf_nnan(
+; CHECK-NEXT: ret float [[X:%.*]]
+;
+ %r = call nnan float @llvm.maximumnum.f32(float %x, float 0xfff0000000000000)
+ ret float %r
+}
+
+define float @test_minimumnum_const_neg_inf_nnan(float %x) {
+; CHECK-LABEL: @test_minimumnum_const_neg_inf_nnan(
+; CHECK-NEXT: ret float 0xFFF0000000000000
+;
+ %r = call nnan float @llvm.minimumnum.f32(float %x, float 0xfff0000000000000)
+ ret float %r
+}
+
define float @test_minnum_const_max(float %x) {
; CHECK-LABEL: @test_minnum_const_max(
; CHECK-NEXT: [[R:%.*]] = call float @llvm.minnum.f32(float [[X:%.*]], float 0x47EFFFFFE0000000)
@@ -286,6 +448,24 @@ define float @test_minimum_const_max(float %x) {
ret float %r
}
+define float @test_maximumnum_const_max(float %x) {
+; CHECK-LABEL: @test_maximumnum_const_max(
+; CHECK-NEXT: [[R:%.*]] = call float @llvm.maximumnum.f32(float [[X:%.*]], float 0x47EFFFFFE0000000)
+; CHECK-NEXT: ret float [[R]]
+;
+ %r = call float @llvm.maximumnum.f32(float %x, float 0x47efffffe0000000)
+ ret float %r
+}
+
+define float @test_minimumnum_const_max(float %x) {
+; CHECK-LABEL: @test_minimumnum_const_max(
+; CHECK-NEXT: [[R:%.*]] = call float @llvm.minimumnum.f32(float [[X:%.*]], float 0x47EFFFFFE0000000)
+; CHECK-NEXT: ret float [[R]]
+;
+ %r = call float @llvm.minimumnum.f32(float %x, float 0x47efffffe0000000)
+ ret float %r
+}
+
define float @test_minnum_const_neg_max(float %x) {
; CHECK-LABEL: @test_minnum_const_neg_max(
; CHECK-NEXT: [[R:%.*]] = call float @llvm.minnum.f32(float [[X:%.*]], float 0xC7EFFFFFE0000000)
@@ -322,6 +502,24 @@ define float @test_minimum_const_neg_max(float %x) {
ret float %r
}
+define float @test_maximumnum_const_neg_max(float %x) {
+; CHECK-LABEL: @test_maximumnum_const_neg_max(
+; CHECK-NEXT: [[R:%.*]] = call float @llvm.maximumnum.f32(float [[X:%.*]], float 0xC7EFFFFFE0000000)
+; CHECK-NEXT: ret float [[R]]
+;
+ %r = call float @llvm.maximumnum.f32(float %x, float 0xc7efffffe0000000)
+ ret float %r
+}
+
+define float @test_minimumnum_const_neg_max(float %x) {
+; CHECK-LABEL: @test_minimumnum_const_neg_max(
+; CHECK-NEXT: [[R:%.*]] = call float @llvm.minimumnum.f32(float [[X:%.*]], float 0xC7EFFFFFE0000000)
+; CHECK-NEXT: ret float [[R]]
+;
+ %r = call float @llvm.minimumnum.f32(float %x, float 0xc7efffffe0000000)
+ ret float %r
+}
+
define float @test_minnum_const_max_ninf(float %x) {
; CHECK-LABEL: @test_minnum_const_max_ninf(
; CHECK-NEXT: [[R:%.*]] = call ninf float @llvm.minnum.f32(float [[X:%.*]], float 0x47EFFFFFE0000000)
@@ -356,6 +554,23 @@ define float @test_minimum_const_max_ninf(float %x) {
ret float %r
}
+define float @test_maximumnum_const_max_ninf(float %x) {
+; CHECK-LABEL: @test_maximumnum_const_max_ninf(
+; CHECK-NEXT: ret float 0x47EFFFFFE0000000
+;
+ %r = call ninf float @llvm.maximumnum.f32(float %x, float 0x47efffffe0000000)
+ ret float %r
+}
+
+define float @test_minimumnum_const_max_ninf(float %x) {
+; CHECK-LABEL: @test_minimumnum_const_max_ninf(
+; CHECK-NEXT: [[R:%.*]] = call ninf float @llvm.minimumnum.f32(float [[X:%.*]], float 0x47EFFFFFE0000000)
+; CHECK-NEXT: ret float [[R]]
+;
+ %r = call ninf float @llvm.minimumnum.f32(float %x, float 0x47efffffe0000000)
+ ret float %r
+}
+
define float @test_minnum_const_neg_max_ninf(float %x) {
; CHECK-LABEL: @test_minnum_const_neg_max_ninf(
; CHECK-NEXT: ret float 0xC7EFFFFFE0000000
@@ -390,6 +605,23 @@ define float @test_minimum_const_neg_max_ninf(float %x) {
ret float %r
}
+define float @test_maximumnum_const_neg_max_ninf(float %x) {
+; CHECK-LABEL: @test_maximumnum_const_neg_max_ninf(
+; CHECK-NEXT: [[X:%.*]] = call ninf float @llvm.maximumnum.f32(float [[X1:%.*]], float 0xC7EFFFFFE0000000)
+; CHECK-NEXT: ret float [[X]]
+;
+ %r = call ninf float @llvm.maximumnum.f32(float %x, float 0xc7efffffe0000000)
+ ret float %r
+}
+
+define float @test_minimumnum_const_neg_max_ninf(float %x) {
+; CHECK-LABEL: @test_minimumnum_const_neg_max_ninf(
+; CHECK-NEXT: ret float 0xC7EFFFFFE0000000
+;
+ %r = call ninf float @llvm.minimumnum.f32(float %x, float 0xc7efffffe0000000)
+ ret float %r
+}
+
define float @test_minnum_const_max_nnan_ninf(float %x) {
; CHECK-LABEL: @test_minnum_const_max_nnan_ninf(
; CHECK-NEXT: ret float [[X:%.*]]
@@ -422,6 +654,22 @@ define float @test_minimum_const_max_nnan_ninf(float %x) {
ret float %r
}
+define float @test_maximumnum_const_max_nnan_ninf(float %x) {
+; CHECK-LABEL: @test_maximumnum_const_max_nnan_ninf(
+; CHECK-NEXT: ret float 0x47EFFFFFE0000000
+;
+ %r = call nnan ninf float @llvm.maximumnum.f32(float %x, float 0x47efffffe0000000)
+ ret float %r
+}
+
+define float @test_minimumnum_const_max_nnan_ninf(float %x) {
+; CHECK-LABEL: @test_minimumnum_const_max_nnan_ninf(
+; CHECK-NEXT: ret float [[X:%.*]]
+;
+ %r = call nnan ninf float @llvm.minimumnum.f32(float %x, float 0x47efffffe0000000)
+ ret float %r
+}
+
define float @test_minnum_const_neg_max_nnan_ninf(float %x) {
; CHECK-LABEL: @test_minnum_const_neg_max_nnan_ninf(
; CHECK-NEXT: ret float 0xC7EFFFFFE0000000
@@ -454,8 +702,24 @@ define float @test_minimum_const_neg_max_nnan_ninf(float %x) {
ret float %r
}
+define float @test_maximumnum_const_neg_max_nnan_ninf(float %x) {
+; CHECK-LABEL: @test_maximumnum_const_neg_max_nnan_ninf(
+; CHECK-NEXT: ret float [[X:%.*]]
+;
+ %r = call nnan ninf float @llvm.maximumnum.f32(float %x, float 0xc7efffffe0000000)
+ ret float %r
+}
+
+define float @test_minimumnum_const_neg_max_nnan_ninf(float %x) {
+; CHECK-LABEL: @test_minimumnum_const_neg_max_nnan_ninf(
+; CHECK-NEXT: ret float 0xC7EFFFFFE0000000
+;
+ %r = call nnan ninf float @llvm.minimumnum.f32(float %x, float 0xc7efffffe0000000)
+ ret float %r
+}
+
; From the LangRef for minnum/maxnum:
-; "If either operand is a NaN, returns the other non-NaN operand."
+; "If either operand is a qNaN, returns the other non-NaN operand. Returns NaN only if both operands are NaN or if either operand is sNaN."
define double @maxnum_nan_op0(double %x) {
; CHECK-LABEL: @maxnum_nan_op0(
@@ -521,115 +785,99 @@ define <2 x double> @minnum_nan_op1_vec(<2 x double> %x) {
ret <2 x double> %r
}
-define float @maxnum_undef_op1(float %x) {
-; CHECK-LABEL: @maxnum_undef_op1(
-; CHECK-NEXT: ret float [[X:%.*]]
-;
- %val = call float @llvm.maxnum.f32(float %x, float undef)
- ret float %val
-}
-
-define float @maxnum_poison_op1(float %x) {
-; CHECK-LABEL: @maxnum_poison_op1(
-; CHECK-NEXT: ret float [[X:%.*]]
-;
- %val = call float @llvm.maxnum.f32(float %x, float poison)
- ret float %val
-}
-
-define float @maxnum_undef_op0(float %x) {
-; CHECK-LABEL: @maxnum_undef_op0(
-; CHECK-NEXT: ret float [[X:%.*]]
+define double @maxnum_snan_op0(double %x) {
+; CHECK-LABEL: @maxnum_snan_op0(
+; CHECK-NEXT: ret double 0x7FFC000000000000
;
- %val = call float @llvm.maxnum.f32(float undef, float %x)
- ret float %val
+ %r = call double @llvm.maxnum.f64(double 0x7ff4000000000000, double %x)
+ ret double %r
}
-define float @maxnum_poison_op0(float %x) {
-; CHECK-LABEL: @maxnum_poison_op0(
-; CHECK-NEXT: ret float [[X:%.*]]
+define double @maxnum_snan_op1(double %x) {
+; CHECK-LABEL: @maxnum_snan_op1(
+; CHECK-NEXT: ret double 0x7FFC00000000DEAD
;
- %val = call float @llvm.maxnum.f32(float poison, float %x)
- ret float %val
+ %r = call double @llvm.maxnum.f64(double %x, double 0x7ff400000000dead)
+ ret double %r
}
-define float @minnum_undef_op1(float %x) {
-; CHECK-LABEL: @minnum_undef_op1(
-; CHECK-NEXT: ret float [[X:%.*]]
+define double @minnum_snan_op0(double %x) {
+; CHECK-LABEL: @minnum_snan_op0(
+; CHECK-NEXT: ret double 0x7FFC000DEAD00000
;
- %val = call float @llvm.minnum.f32(float %x, float undef)
- ret float %val
+ %r = call double @llvm.minnum.f64(double 0x7ff4000dead00000, d...
[truncated]
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
319224c
to
5c41a4e
Compare
2248c98
to
5751b4a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just documenting a bug, you can't rely on the simple matcher. Either use m_APFloat and check if it's a splat quiet or signaling nan, or you have to do the complicated handling to process this per element
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell, this is still legal (if slightly surprising) behaviour because of this wording in the spec:
Floating-point math operations are allowed to treat all NaNs as if they were quiet NaNs. For example, “pow(1.0, SNaN)” may be simplified to 1.0.
What would your preferred behaviour here be for the <sNaN, qNaN> case and <sNaN, poison> case? The way the langref states it, it seems like the preferred approach is to sacrifice sNaN correctness any time it allows an optimization (and only try to preserve it when using StrictFP or constrained FP intrinsics).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed it now so we disable the optimization for maxnum/minnum if there is a mix of sNaNs and qNaNs in the vector. This is probably such an edge-case that it won't impact real-world optimizations, so we might as well try and preserve correctness there. I've kept using the simple matchers to still allow for undef and poison elements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minnum/maxnum are a special case and require getting the snan handling right, because they have inverted behavior with an snan input.
but in general it's a better QoI to get the correct handling and quiet the returned value. The dropped canonicalization is more useful for instruction optimizations that may lose a quiet. In the constant folding case we can handle it elementwise, it's just more of a hassle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot handle maxnum(<x, y>, <sNaN, qNaN>) elementwise here if we want to propagate the sNaN, because we would need to return <qNaN, y>, which would require a vector-shuffle instruction, and InstructionSimplify is not allowed to insert new instruction. It can only return either the full vector <x, y> or a new constant vector. In the new elementwise version I've just written, I've kept this behaviour of just disabling the optimization in this case, as we'd need 1 element from the LHS, and 1 from the RHS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this chain of nan-matchers, it might actually be easier to just do the elementwise handling - propgqtenan already has the elementwise code in it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think element-wise comparisons will end up taking this slightly out of the current scope of the patch, as it will introduce cases like: maxnum(<x, y>, <sNaN, qNaN>) -> <qNaN, y> , which will require insert/extract chains to be created, and might end up with cases where there are not clear perf wins from re-creating partially applied vectors where some elements are NaN vs just using maxnum on the original vector despite some elements being NaN. To avoid these partially applied cases, we'd need to check whether the whole vec is sNaN or qNaN (ignoring poison/undef elements), which is what the current patch does with the matchers already.
It might be worth considering these kind of element-wise folds in future (and for partial Inf, partial poison, partial FMAX vectors too in addition to NaNs maybe), but I think that would be better-suited to a separate patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've now rewritten the whole thing elementwise to avoid using any pattern-matchers (or calls to propegateNaN). It now allows for combinations of different foldable vector elements, as long as we return either a full LHS non-constant value, or a full constant value based on the const RHS arg (with some NaN quieting applied).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arsenm would you be able to take a look at the new elementwise code and see if it looks ok to merge soon?
Add support for the new maximumnum and minimumnum intrinsics in various optimizations in InstSimplify. Also, change the behavior of optimizing maxnum(sNaN, x) to simplify to qNaN instead of x to better match the LLVM IR spec, and add more tests for sNaN behavior for all 3 max/min intrinsic types.
- Add tests for vectors of <snan, qnan> pairs. - Add more vector tests for <qnan, poison> and <snan, poison> - Remove comment wording about FTZ. - Clarify behavior of fmaxnum(x, <snan, qnan>) -> x in comments (assume snan == qnan)
Add a vec2 variant for all the +inf/-inf/+max/-max tests for maxnum/maximum/maximumnum.
Instead of transforming maxnum(x, <sNaN, qNaN>) -> x, avoid folding the maxnum away so the sNaN element can correctly be propagated as a qNaN.
Update some failing AMDGPU tests using maxnum's updated sNaN behaviour and maximumnum's new set of optimizations.
Rewrite all the fmin/fmax(x, const) optimizations in InstSimplify to use elementwise checking instead of pattern-matching. This allows for improved splat-vector handling, as well as handling of mixed constant vectors e.g. <sNan, Inf, poison>.
995a1a3
to
4e7a52e
Compare
Remove sNaN + qNaN patterns added in earlier version of the patch, since they are no longer used in the new elementwise approach.
Re-generate 2 AMDGPU tests that are failing after rebasing this change: - llvm/test/CodeGen/AMDGPU/fneg-combines.new.ll - llvm/test/Transforms/InstCombine/AMDGPU/fmed3.ll
Put the elementwise checks for fixed-vector elements in an else block after checking for splats so that fixed-vector splats do not get checked twice if they fail to be optimized in the first check.
This is the problem. PTX does not treat sNaN/qNaN differently, and will return the other non-NaN input, whereas LLVM's maxnum/minnum intrinsics state that they must return NaN if either operand is sNaN. So the
That is true for IEEE
The
instead of:
So there's quite a long way to go before we can switch to using
As far as I can tell, PTX Whether or not NVPTX does switch from |
Swap min/max lines to be in consistent order with other similar comments.
Make the code simpler, shorter, and clearer by: * Splitting the lambda out into a separate function * Instead of returning an optional of a bool + value, use an enum + value * The new enum specifies whether to avoid optimizing, return LHS, return const, or return either. * Use the same enum to cover the logic of both the individual scalar elements, and the vector as a whole. * Use this enum to cover the fact that undef/poison could choose either side, rather than tracking it with a separate check. * Extract common code from the splat vs fixed vs scalar paths. * Merge undef handling with the above approach too, rather than using a separate check. * Simplify some comments.
13060ae
to
12c05ac
Compare
So, the practical impact is that the code that currently uses The actual problem is that the users never got the "just fmax/fmin instruction" they intended to have, and by now that behavior has effectively become part of the API. The good news is that nvvm.fmin/fmax do not appear to be widely used. At least I can't find any references to them outside of LLVM/clang and their tests, in our code base.
Agreed. The switch itself is trivial, so we should be able to revert the change easily, if there are problems. Adding lowering of maximumnum/minimumnum -> fmax.f32/fmin.f32 should be pretty straightforward, too. |
Yes, although the sNaN propagation issue could only happen in the middle-end, since nvvm_fmax currently becomes PTX max, so it's only during the middle-end time when it is interpreted as llvm.maxnum that there could be any semantic differences (e.g. during constant-folding). Also, this InstSimplify optimization I'm updating in this patch did not handle maxnum(x, sNaN) -> qNaN before (it used to only do maxnum(x, NaN) -> x in the same way PTX max does), so this different behaviour might become slightly more apparent after the sNaN correctness fixes included in my patch here.
They are widely used in NVCC.
Yes, I'm working on a patch for this at the moment, which I should hopefully be able to upstream soon. It's 5 lines of code plus some tests, so was very straightforward to add. |
Here is the NVPTX maximumnum ISel patch I mentioned above: #155804 I think we'll need to land this InstSimplify patch, the above ISel patch, and maybe a couple of other improvements like better SelectionDAG expansion of v2f16 |
@arsenm Does the latest version of the patch look ok to merge? |
@Artem-B does this patch seem ok to merge? |
Refactor all the tests in fminmax-folds.ll so that they are grouped by optimization, rather than by intrinsic. Instead of calling 1 intrinsic per function, each function now tests all 6 variants of the intrinsic. Results are stored to named pointers to maintain readability in this more compact form. This makes it much easier to compare the outputs from each intrinsic, rather than having them scattered in different functions in different parts of the file. It is also much more compact, so despite adding >50% more tests, the file is ~500 lines shorter. The tests added include: * Adding maximumnum and minimumnum everywhere (currently not optimized, but added as a baseline for future optimizations in llvm#139581). * Adding separate tests for SNaN and QNaN (as a baseline for correctness improvements in llvm#139581) * Adding tests for scalable vectors * Increasing the variety of types used in various tests by using more f16, f64, and vector types in tests. The only coverage removed is for tests with undef (only poison is now tested for). Overall, this refactor should increase coverage, improve readability with more comments and clear section headers, and make the tests much more compact and easier to review in llvm#139581 by providing a clear baseline for each intrinsic's current behaviour.
It's OK with me, but the remaining open questions are between you and @arsenm. If he's OK with the current state, you're good to go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but please check with @arsenm if all concerns had been addressed.
Refactor all the tests in `fminmax-folds.ll` so that they are grouped by optimization, rather than by intrinsic. Instead of calling 1 intrinsic per function, each function now tests all 6 variants of the intrinsic. Results are stored to named pointers to maintain readability in this more compact form. This makes it much easier to compare the outputs from each intrinsic, rather than having them scattered in different functions in different parts of the file. It is also much more compact, so despite adding >50% more tests, the file is ~500 lines shorter. The tests added include: * Adding `maximumnum` and `minimumnum` everywhere (currently not optimized, but added as a baseline for future optimizations in #139581). * Adding separate tests for SNaN and QNaN (as a baseline for correctness improvements in #139581 ) * Adding tests for scalable vectors * Increasing the variety of types used in various tests by using more f16, f64, and vector types in tests. The only coverage removed is for tests with undef (only poison is now tested for). Overall, this refactor should increase coverage, improve readability with more comments and clear section headers, and make the tests much more compact and easier to review in #139581 by providing a clear baseline for each intrinsic's current behaviour.
Merge branch 'main' into instsimplify_maximumnum Replace the old fminmax-folds.ll test with the newly refactored version from llvm#160504. These tests will not pass yet with the new branch, but this provides a clean baseline. Conflicts: llvm/test/Transforms/InstSimplify/fminmax-folds.ll
Run update_test_checks.py on fminmax-folds.ll to get the results that change from this patchset (especially the snan/qnan changes, and adding maximumnum/minimumnum.
The update_test_checks.py script seems to have used the old variable names from the pervious version of the tests, rather than updating them to match the variable names they now represent. Changed several nested min/max tests to use names like MAXIMUMNUM_XY instead of MAXIMUMNUM_NESTED to make this consistent with the other CHECK lines and avoid confusion with the nested value that the variable used to represent in the old version of the test.
Add test-cases for the newly added elementwise optimizations covering vectors like <poison, Inf> and <poison, Inf, SNaN>.
Fix cases where variable X was being renamed MINIMUMNUM
Fix problem with MINNUM_RES getting renamed to MINIMUM_RES
@arsenm I've updated all the fminmax-folds.ll tests now after the refactor in #160504 The diff is now |
Refactor all the tests in `fminmax-folds.ll` so that they are grouped by optimization, rather than by intrinsic. Instead of calling 1 intrinsic per function, each function now tests all 6 variants of the intrinsic. Results are stored to named pointers to maintain readability in this more compact form. This makes it much easier to compare the outputs from each intrinsic, rather than having them scattered in different functions in different parts of the file. It is also much more compact, so despite adding >50% more tests, the file is ~500 lines shorter. The tests added include: * Adding `maximumnum` and `minimumnum` everywhere (currently not optimized, but added as a baseline for future optimizations in llvm#139581). * Adding separate tests for SNaN and QNaN (as a baseline for correctness improvements in llvm#139581 ) * Adding tests for scalable vectors * Increasing the variety of types used in various tests by using more f16, f64, and vector types in tests. The only coverage removed is for tests with undef (only poison is now tested for). Overall, this refactor should increase coverage, improve readability with more comments and clear section headers, and make the tests much more compact and easier to review in llvm#139581 by providing a clear baseline for each intrinsic's current behaviour.
Add support for the new maximumnum and minimumnum intrinsics in various optimizations in InstSimplify.
Also, change the behavior of optimizing maxnum(sNaN, x) to simplify to qNaN instead of x to better match the LLVM IR spec, and add more tests for sNaN behavior for all 3 max/min intrinsic types.