-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[flang][OpenMP] Use maxnum/minnum for lowering of max/min reduction operators #89258
Conversation
…perators This patch changes the lowering of max and min to be lowered to arith::MaxNumFop and arith::MinNumFOp instead of using arith::MaximumFOp and arith::MinimumFOp. The arith::MaximumFOp and arith::MinimumFOp map to the corresponding intrinsics llvm.maximum.* and llvm.minimum.* intrinsics which conform to the semantics specified in the draft of IEEE 754-2019, which is not supported by all hardware. Instead using arith::MaximumFOp and arith::MinimumFOp will allow code generation for more targets and match the code generated by clang OpenMP.
@llvm/pr-subscribers-flang-fir-hlfir @llvm/pr-subscribers-flang-openmp Author: Jan Leyonberg (jsjodin) ChangesThis patch changes the lowering of max and min to be lowered to arith::MaxNumFop and arith::MinNumFOp instead of using arith::MaximumFOp and arith::MinimumFOp. The arith::MaximumFOp and arith::MinimumFOp map to the corresponding intrinsics llvm.maximum.* and llvm.minimum.* intrinsics which conform to the semantics specified in the draft of IEEE 754-2019, which is not supported by all hardware. Instead using arith::MaximumFOp and arith::MinimumFOp will allow code generation for more targets and match the code generated by clang OpenMP. Full diff: https://github.com/llvm/llvm-project/pull/89258.diff 9 Files Affected:
diff --git a/flang/lib/Lower/OpenMP/ReductionProcessor.cpp b/flang/lib/Lower/OpenMP/ReductionProcessor.cpp
index f42386fe2736dd..36ea9aba5e9856 100644
--- a/flang/lib/Lower/OpenMP/ReductionProcessor.cpp
+++ b/flang/lib/Lower/OpenMP/ReductionProcessor.cpp
@@ -220,12 +220,12 @@ mlir::Value ReductionProcessor::createScalarCombiner(
switch (redId) {
case ReductionIdentifier::MAX:
reductionOp =
- getReductionOperation<mlir::arith::MaximumFOp, mlir::arith::MaxSIOp>(
+ getReductionOperation<mlir::arith::MaxNumFOp, mlir::arith::MaxSIOp>(
builder, type, loc, op1, op2);
break;
case ReductionIdentifier::MIN:
reductionOp =
- getReductionOperation<mlir::arith::MinimumFOp, mlir::arith::MinSIOp>(
+ getReductionOperation<mlir::arith::MinNumFOp, mlir::arith::MinSIOp>(
builder, type, loc, op1, op2);
break;
case ReductionIdentifier::IOR:
diff --git a/flang/test/Lower/OpenMP/FIR/wsloop-reduction-max-byref.f90 b/flang/test/Lower/OpenMP/FIR/wsloop-reduction-max-byref.f90
index f0979ab95f568a..80b720e3aac1d1 100644
--- a/flang/test/Lower/OpenMP/FIR/wsloop-reduction-max-byref.f90
+++ b/flang/test/Lower/OpenMP/FIR/wsloop-reduction-max-byref.f90
@@ -11,7 +11,7 @@
!CHECK: ^bb0(%[[ARG0:.*]]: !fir.ref<f32>, %[[ARG1:.*]]: !fir.ref<f32>):
!CHECK: %[[LD0:.*]] = fir.load %[[ARG0]] : !fir.ref<f32>
!CHECK: %[[LD1:.*]] = fir.load %[[ARG1]] : !fir.ref<f32>
-!CHECK: %[[RES:.*]] = arith.maximumf %[[LD0]], %[[LD1]] {{.*}}: f32
+!CHECK: %[[RES:.*]] = arith.maxnumf %[[LD0]], %[[LD1]] {{.*}}: f32
!CHECK: fir.store %[[RES]] to %[[ARG0]] : !fir.ref<f32>
!CHECK: omp.yield(%[[ARG0]] : !fir.ref<f32>)
diff --git a/flang/test/Lower/OpenMP/FIR/wsloop-reduction-max.f90 b/flang/test/Lower/OpenMP/FIR/wsloop-reduction-max.f90
index 996296c2adc2b5..c3b821ea591246 100644
--- a/flang/test/Lower/OpenMP/FIR/wsloop-reduction-max.f90
+++ b/flang/test/Lower/OpenMP/FIR/wsloop-reduction-max.f90
@@ -6,7 +6,7 @@
!CHECK: omp.yield(%[[MINIMUM_VAL_F]] : f32)
!CHECK: combiner
!CHECK: ^bb0(%[[ARG0_F:.*]]: f32, %[[ARG1_F:.*]]: f32):
-!CHECK: %[[COMB_VAL_F:.*]] = arith.maximumf %[[ARG0_F]], %[[ARG1_F]] {{.*}}: f32
+!CHECK: %[[COMB_VAL_F:.*]] = arith.maxnumf %[[ARG0_F]], %[[ARG1_F]] {{.*}}: f32
!CHECK: omp.yield(%[[COMB_VAL_F]] : f32)
!CHECK: omp.declare_reduction @[[MAX_DECLARE_I:.*]] : i32 init {
diff --git a/flang/test/Lower/OpenMP/FIR/wsloop-reduction-min-byref.f90 b/flang/test/Lower/OpenMP/FIR/wsloop-reduction-min-byref.f90
index 24aa8e46e5bbbd..b284f8e5d96721 100644
--- a/flang/test/Lower/OpenMP/FIR/wsloop-reduction-min-byref.f90
+++ b/flang/test/Lower/OpenMP/FIR/wsloop-reduction-min-byref.f90
@@ -11,7 +11,7 @@
!CHECK: ^bb0(%[[ARG0:.*]]: !fir.ref<f32>, %[[ARG1:.*]]: !fir.ref<f32>):
!CHECK: %[[LD0:.*]] = fir.load %[[ARG0]] : !fir.ref<f32>
!CHECK: %[[LD1:.*]] = fir.load %[[ARG1]] : !fir.ref<f32>
-!CHECK: %[[RES:.*]] = arith.minimumf %[[LD0]], %[[LD1]] {{.*}}: f32
+!CHECK: %[[RES:.*]] = arith.minnumf %[[LD0]], %[[LD1]] {{.*}}: f32
!CHECK: fir.store %[[RES]] to %[[ARG0]] : !fir.ref<f32>
!CHECK: omp.yield(%[[ARG0]] : !fir.ref<f32>)
diff --git a/flang/test/Lower/OpenMP/FIR/wsloop-reduction-min.f90 b/flang/test/Lower/OpenMP/FIR/wsloop-reduction-min.f90
index 268f51c9dc9330..ab33e180ed883d 100644
--- a/flang/test/Lower/OpenMP/FIR/wsloop-reduction-min.f90
+++ b/flang/test/Lower/OpenMP/FIR/wsloop-reduction-min.f90
@@ -6,7 +6,7 @@
!CHECK: omp.yield(%[[MAXIMUM_VAL_F]] : f32)
!CHECK: combiner
!CHECK: ^bb0(%[[ARG0_F:.*]]: f32, %[[ARG1_F:.*]]: f32):
-!CHECK: %[[COMB_VAL_F:.*]] = arith.minimumf %[[ARG0_F]], %[[ARG1_F]] {{.*}}: f32
+!CHECK: %[[COMB_VAL_F:.*]] = arith.minnumf %[[ARG0_F]], %[[ARG1_F]] {{.*}}: f32
!CHECK: omp.yield(%[[COMB_VAL_F]] : f32)
!CHECK: omp.declare_reduction @[[MIN_DECLARE_I:.*]] : i32 init {
diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-max-byref.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-max-byref.f90
index ee562bbe15863e..2f6921edcb42a5 100644
--- a/flang/test/Lower/OpenMP/wsloop-reduction-max-byref.f90
+++ b/flang/test/Lower/OpenMP/wsloop-reduction-max-byref.f90
@@ -13,7 +13,7 @@
!CHECK: ^bb0(%[[ARG0:.*]]: !fir.ref<f32>, %[[ARG1:.*]]: !fir.ref<f32>):
!CHECK: %[[LD0:.*]] = fir.load %[[ARG0]] : !fir.ref<f32>
!CHECK: %[[LD1:.*]] = fir.load %[[ARG1]] : !fir.ref<f32>
-!CHECK: %[[RES:.*]] = arith.maximumf %[[LD0]], %[[LD1]] {{.*}}: f32
+!CHECK: %[[RES:.*]] = arith.maxnumf %[[LD0]], %[[LD1]] {{.*}}: f32
!CHECK: fir.store %[[RES]] to %[[ARG0]] : !fir.ref<f32>
!CHECK: omp.yield(%[[ARG0]] : !fir.ref<f32>)
diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-max.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-max.f90
index 6f11f0ec96a7d3..c9cf5cbf4f8c02 100644
--- a/flang/test/Lower/OpenMP/wsloop-reduction-max.f90
+++ b/flang/test/Lower/OpenMP/wsloop-reduction-max.f90
@@ -10,7 +10,7 @@
! CHECK-LABEL: } combiner {
! CHECK: ^bb0(%[[VAL_0:.*]]: f32, %[[VAL_1:.*]]: f32):
-! CHECK: %[[VAL_2:.*]] = arith.maximumf %[[VAL_0]], %[[VAL_1]] fastmath<contract> : f32
+! CHECK: %[[VAL_2:.*]] = arith.maxnumf %[[VAL_0]], %[[VAL_1]] fastmath<contract> : f32
! CHECK: omp.yield(%[[VAL_2]] : f32)
! CHECK: }
diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-min-byref.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-min-byref.f90
index c0372117a03b9d..84a376b46b8fbe 100644
--- a/flang/test/Lower/OpenMP/wsloop-reduction-min-byref.f90
+++ b/flang/test/Lower/OpenMP/wsloop-reduction-min-byref.f90
@@ -13,7 +13,7 @@
!CHECK: ^bb0(%[[ARG0:.*]]: !fir.ref<f32>, %[[ARG1:.*]]: !fir.ref<f32>):
!CHECK: %[[LD0:.*]] = fir.load %[[ARG0]] : !fir.ref<f32>
!CHECK: %[[LD1:.*]] = fir.load %[[ARG1]] : !fir.ref<f32>
-!CHECK: %[[RES:.*]] = arith.minimumf %[[LD0]], %[[LD1]] {{.*}}: f32
+!CHECK: %[[RES:.*]] = arith.minnumf %[[LD0]], %[[LD1]] {{.*}}: f32
!CHECK: fir.store %[[RES]] to %[[ARG0]] : !fir.ref<f32>
!CHECK: omp.yield(%[[ARG0]] : !fir.ref<f32>)
diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-min.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-min.f90
index 2c694f82e279a4..3ba279acd14c41 100644
--- a/flang/test/Lower/OpenMP/wsloop-reduction-min.f90
+++ b/flang/test/Lower/OpenMP/wsloop-reduction-min.f90
@@ -10,7 +10,7 @@
! CHECK-LABEL: } combiner {
! CHECK: ^bb0(%[[VAL_0:.*]]: f32, %[[VAL_1:.*]]: f32):
-! CHECK: %[[VAL_2:.*]] = arith.minimumf %[[VAL_0]], %[[VAL_1]] fastmath<contract> : f32
+! CHECK: %[[VAL_2:.*]] = arith.minnumf %[[VAL_0]], %[[VAL_1]] fastmath<contract> : f32
! CHECK: omp.yield(%[[VAL_2]] : f32)
! CHECK: }
|
If this fixes #87955 then can you add a fixes tag? |
This works for me on AArch64. I think it lowers the 128 version early to compare and select. Thanks for the fix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks
Thank you for the review, I will add a fixes tag. |
I have a concern with making a change this early in the compilation process only to avoid a selection error in the backend. First of all, we seem to replace an IEEE-compliant operation with one that isn't. Such a replacement should only be done if the compiler is allowed to do that (via relevant FP flags). Second, this is a legal operation in MLIR, and presumably in LLVM IR. If the backend can't generate valid code for it, then it's a missing functionality in the backend. Lastly, the replacing of an IEEE-compliant operation with a similar one should be done by the backend, since it's the backend that knows (1) if the operation needs to be replaced, and (2) what to replace it with. |
Okay, so does this PR fix anything? |
Yes it fixes #87955 |
I think this change is probably okay since OpenMP 5.0 with min/max reductions was released 2018 and the IEEE-compliance relates to a 2019 draft. So we're not changing the semantics of OpenMP in this case. |
Feel free to commit it as it is, but we should get the backend(s) to fix this. |
Opened #89278 to track it. |
…perators (llvm#89258) This patch changes the lowering of max and min to be lowered to arith::MaxNumFop and arith::MinNumFOp instead of using arith::MaximumFOp and arith::MinimumFOp. The arith::MaximumFOp and arith::MinimumFOp map to the corresponding intrinsics llvm.maximum.* and llvm.minimum.* intrinsics which conform to the semantics specified in the draft of IEEE 754-2019, which is not supported by all hardware. Instead using arith::MaximumFOp and arith::MinimumFOp will allow code generation for more targets and match the code generated by clang OpenMP. fixes llvm#87955
This patch changes the lowering of max and min to be lowered to arith::MaxNumFop and arith::MinNumFOp instead of using arith::MaximumFOp and arith::MinimumFOp. The arith::MaximumFOp and arith::MinimumFOp map to the corresponding intrinsics llvm.maximum.* and llvm.minimum.* intrinsics which conform to the semantics specified in the draft of IEEE 754-2019, which is not supported by all hardware. Instead using arith::MaximumFOp and arith::MinimumFOp will allow code generation for more targets and match the code generated by clang OpenMP.
fixes #87955