Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[flang][OpenMP] Use maxnum/minnum for lowering of max/min reduction operators #89258

Merged
merged 1 commit into from
Apr 19, 2024

Conversation

jsjodin
Copy link
Contributor

@jsjodin jsjodin commented Apr 18, 2024

This patch changes the lowering of max and min to be lowered to arith::MaxNumFop and arith::MinNumFOp instead of using arith::MaximumFOp and arith::MinimumFOp. The arith::MaximumFOp and arith::MinimumFOp map to the corresponding intrinsics llvm.maximum.* and llvm.minimum.* intrinsics which conform to the semantics specified in the draft of IEEE 754-2019, which is not supported by all hardware. Instead using arith::MaximumFOp and arith::MinimumFOp will allow code generation for more targets and match the code generated by clang OpenMP.

fixes #87955

…perators

This patch changes the lowering of max and min to be lowered to
arith::MaxNumFop and arith::MinNumFOp instead of using arith::MaximumFOp and
arith::MinimumFOp. The arith::MaximumFOp and arith::MinimumFOp map to the
corresponding intrinsics llvm.maximum.* and llvm.minimum.* intrinsics which
conform to the semantics specified in the draft of IEEE 754-2019, which is not
supported by all hardware. Instead using arith::MaximumFOp and
arith::MinimumFOp will allow code generation for more targets and match the
code generated by clang OpenMP.
@llvmbot
Copy link
Collaborator

llvmbot commented Apr 18, 2024

@llvm/pr-subscribers-flang-fir-hlfir

@llvm/pr-subscribers-flang-openmp

Author: Jan Leyonberg (jsjodin)

Changes

This patch changes the lowering of max and min to be lowered to arith::MaxNumFop and arith::MinNumFOp instead of using arith::MaximumFOp and arith::MinimumFOp. The arith::MaximumFOp and arith::MinimumFOp map to the corresponding intrinsics llvm.maximum.* and llvm.minimum.* intrinsics which conform to the semantics specified in the draft of IEEE 754-2019, which is not supported by all hardware. Instead using arith::MaximumFOp and arith::MinimumFOp will allow code generation for more targets and match the code generated by clang OpenMP.


Full diff: https://github.com/llvm/llvm-project/pull/89258.diff

9 Files Affected:

  • (modified) flang/lib/Lower/OpenMP/ReductionProcessor.cpp (+2-2)
  • (modified) flang/test/Lower/OpenMP/FIR/wsloop-reduction-max-byref.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/FIR/wsloop-reduction-max.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/FIR/wsloop-reduction-min-byref.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/FIR/wsloop-reduction-min.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-max-byref.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-max.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-min-byref.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-min.f90 (+1-1)
diff --git a/flang/lib/Lower/OpenMP/ReductionProcessor.cpp b/flang/lib/Lower/OpenMP/ReductionProcessor.cpp
index f42386fe2736dd..36ea9aba5e9856 100644
--- a/flang/lib/Lower/OpenMP/ReductionProcessor.cpp
+++ b/flang/lib/Lower/OpenMP/ReductionProcessor.cpp
@@ -220,12 +220,12 @@ mlir::Value ReductionProcessor::createScalarCombiner(
   switch (redId) {
   case ReductionIdentifier::MAX:
     reductionOp =
-        getReductionOperation<mlir::arith::MaximumFOp, mlir::arith::MaxSIOp>(
+        getReductionOperation<mlir::arith::MaxNumFOp, mlir::arith::MaxSIOp>(
             builder, type, loc, op1, op2);
     break;
   case ReductionIdentifier::MIN:
     reductionOp =
-        getReductionOperation<mlir::arith::MinimumFOp, mlir::arith::MinSIOp>(
+        getReductionOperation<mlir::arith::MinNumFOp, mlir::arith::MinSIOp>(
             builder, type, loc, op1, op2);
     break;
   case ReductionIdentifier::IOR:
diff --git a/flang/test/Lower/OpenMP/FIR/wsloop-reduction-max-byref.f90 b/flang/test/Lower/OpenMP/FIR/wsloop-reduction-max-byref.f90
index f0979ab95f568a..80b720e3aac1d1 100644
--- a/flang/test/Lower/OpenMP/FIR/wsloop-reduction-max-byref.f90
+++ b/flang/test/Lower/OpenMP/FIR/wsloop-reduction-max-byref.f90
@@ -11,7 +11,7 @@
 !CHECK: ^bb0(%[[ARG0:.*]]: !fir.ref<f32>, %[[ARG1:.*]]: !fir.ref<f32>):
 !CHECK:   %[[LD0:.*]] = fir.load %[[ARG0]] : !fir.ref<f32>
 !CHECK:   %[[LD1:.*]] = fir.load %[[ARG1]] : !fir.ref<f32>
-!CHECK:   %[[RES:.*]] = arith.maximumf %[[LD0]], %[[LD1]] {{.*}}: f32
+!CHECK:   %[[RES:.*]] = arith.maxnumf %[[LD0]], %[[LD1]] {{.*}}: f32
 !CHECK:   fir.store %[[RES]] to %[[ARG0]] : !fir.ref<f32>
 !CHECK:   omp.yield(%[[ARG0]] : !fir.ref<f32>)
 
diff --git a/flang/test/Lower/OpenMP/FIR/wsloop-reduction-max.f90 b/flang/test/Lower/OpenMP/FIR/wsloop-reduction-max.f90
index 996296c2adc2b5..c3b821ea591246 100644
--- a/flang/test/Lower/OpenMP/FIR/wsloop-reduction-max.f90
+++ b/flang/test/Lower/OpenMP/FIR/wsloop-reduction-max.f90
@@ -6,7 +6,7 @@
 !CHECK:   omp.yield(%[[MINIMUM_VAL_F]] : f32)
 !CHECK: combiner
 !CHECK: ^bb0(%[[ARG0_F:.*]]: f32, %[[ARG1_F:.*]]: f32):
-!CHECK:   %[[COMB_VAL_F:.*]] = arith.maximumf %[[ARG0_F]], %[[ARG1_F]] {{.*}}: f32
+!CHECK:   %[[COMB_VAL_F:.*]] = arith.maxnumf %[[ARG0_F]], %[[ARG1_F]] {{.*}}: f32
 !CHECK:   omp.yield(%[[COMB_VAL_F]] : f32)
 
 !CHECK: omp.declare_reduction @[[MAX_DECLARE_I:.*]] : i32 init {
diff --git a/flang/test/Lower/OpenMP/FIR/wsloop-reduction-min-byref.f90 b/flang/test/Lower/OpenMP/FIR/wsloop-reduction-min-byref.f90
index 24aa8e46e5bbbd..b284f8e5d96721 100644
--- a/flang/test/Lower/OpenMP/FIR/wsloop-reduction-min-byref.f90
+++ b/flang/test/Lower/OpenMP/FIR/wsloop-reduction-min-byref.f90
@@ -11,7 +11,7 @@
 !CHECK: ^bb0(%[[ARG0:.*]]: !fir.ref<f32>, %[[ARG1:.*]]: !fir.ref<f32>):
 !CHECK:   %[[LD0:.*]] = fir.load %[[ARG0]] : !fir.ref<f32>
 !CHECK:   %[[LD1:.*]] = fir.load %[[ARG1]] : !fir.ref<f32>
-!CHECK:   %[[RES:.*]] = arith.minimumf %[[LD0]], %[[LD1]] {{.*}}: f32
+!CHECK:   %[[RES:.*]] = arith.minnumf %[[LD0]], %[[LD1]] {{.*}}: f32
 !CHECK:   fir.store %[[RES]] to %[[ARG0]] : !fir.ref<f32>
 !CHECK:   omp.yield(%[[ARG0]] : !fir.ref<f32>)
 
diff --git a/flang/test/Lower/OpenMP/FIR/wsloop-reduction-min.f90 b/flang/test/Lower/OpenMP/FIR/wsloop-reduction-min.f90
index 268f51c9dc9330..ab33e180ed883d 100644
--- a/flang/test/Lower/OpenMP/FIR/wsloop-reduction-min.f90
+++ b/flang/test/Lower/OpenMP/FIR/wsloop-reduction-min.f90
@@ -6,7 +6,7 @@
 !CHECK:   omp.yield(%[[MAXIMUM_VAL_F]] : f32)
 !CHECK: combiner
 !CHECK: ^bb0(%[[ARG0_F:.*]]: f32, %[[ARG1_F:.*]]: f32):
-!CHECK:   %[[COMB_VAL_F:.*]] = arith.minimumf %[[ARG0_F]], %[[ARG1_F]] {{.*}}: f32
+!CHECK:   %[[COMB_VAL_F:.*]] = arith.minnumf %[[ARG0_F]], %[[ARG1_F]] {{.*}}: f32
 !CHECK:   omp.yield(%[[COMB_VAL_F]] : f32)
 
 !CHECK: omp.declare_reduction @[[MIN_DECLARE_I:.*]] : i32 init {
diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-max-byref.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-max-byref.f90
index ee562bbe15863e..2f6921edcb42a5 100644
--- a/flang/test/Lower/OpenMP/wsloop-reduction-max-byref.f90
+++ b/flang/test/Lower/OpenMP/wsloop-reduction-max-byref.f90
@@ -13,7 +13,7 @@
 !CHECK: ^bb0(%[[ARG0:.*]]: !fir.ref<f32>, %[[ARG1:.*]]: !fir.ref<f32>):
 !CHECK:   %[[LD0:.*]] = fir.load %[[ARG0]] : !fir.ref<f32>
 !CHECK:   %[[LD1:.*]] = fir.load %[[ARG1]] : !fir.ref<f32>
-!CHECK:   %[[RES:.*]] = arith.maximumf %[[LD0]], %[[LD1]] {{.*}}: f32
+!CHECK:   %[[RES:.*]] = arith.maxnumf %[[LD0]], %[[LD1]] {{.*}}: f32
 !CHECK:   fir.store %[[RES]] to %[[ARG0]] : !fir.ref<f32>
 !CHECK:   omp.yield(%[[ARG0]] : !fir.ref<f32>)
 
diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-max.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-max.f90
index 6f11f0ec96a7d3..c9cf5cbf4f8c02 100644
--- a/flang/test/Lower/OpenMP/wsloop-reduction-max.f90
+++ b/flang/test/Lower/OpenMP/wsloop-reduction-max.f90
@@ -10,7 +10,7 @@
 
 ! CHECK-LABEL:   } combiner {
 ! CHECK:         ^bb0(%[[VAL_0:.*]]: f32, %[[VAL_1:.*]]: f32):
-! CHECK:           %[[VAL_2:.*]] = arith.maximumf %[[VAL_0]], %[[VAL_1]] fastmath<contract> : f32
+! CHECK:           %[[VAL_2:.*]] = arith.maxnumf %[[VAL_0]], %[[VAL_1]] fastmath<contract> : f32
 ! CHECK:           omp.yield(%[[VAL_2]] : f32)
 ! CHECK:         }
 
diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-min-byref.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-min-byref.f90
index c0372117a03b9d..84a376b46b8fbe 100644
--- a/flang/test/Lower/OpenMP/wsloop-reduction-min-byref.f90
+++ b/flang/test/Lower/OpenMP/wsloop-reduction-min-byref.f90
@@ -13,7 +13,7 @@
 !CHECK: ^bb0(%[[ARG0:.*]]: !fir.ref<f32>, %[[ARG1:.*]]: !fir.ref<f32>):
 !CHECK:   %[[LD0:.*]] = fir.load %[[ARG0]] : !fir.ref<f32>
 !CHECK:   %[[LD1:.*]] = fir.load %[[ARG1]] : !fir.ref<f32>
-!CHECK:   %[[RES:.*]] = arith.minimumf %[[LD0]], %[[LD1]] {{.*}}: f32
+!CHECK:   %[[RES:.*]] = arith.minnumf %[[LD0]], %[[LD1]] {{.*}}: f32
 !CHECK:   fir.store %[[RES]] to %[[ARG0]] : !fir.ref<f32>
 !CHECK:   omp.yield(%[[ARG0]] : !fir.ref<f32>)
 
diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-min.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-min.f90
index 2c694f82e279a4..3ba279acd14c41 100644
--- a/flang/test/Lower/OpenMP/wsloop-reduction-min.f90
+++ b/flang/test/Lower/OpenMP/wsloop-reduction-min.f90
@@ -10,7 +10,7 @@
 
 ! CHECK-LABEL:   } combiner {
 ! CHECK:         ^bb0(%[[VAL_0:.*]]: f32, %[[VAL_1:.*]]: f32):
-! CHECK:           %[[VAL_2:.*]] = arith.minimumf %[[VAL_0]], %[[VAL_1]] fastmath<contract> : f32
+! CHECK:           %[[VAL_2:.*]] = arith.minnumf %[[VAL_0]], %[[VAL_1]] fastmath<contract> : f32
 ! CHECK:           omp.yield(%[[VAL_2]] : f32)
 ! CHECK:         }
 

@kiranchandramohan
Copy link
Contributor

If this fixes #87955 then can you add a fixes tag?

@kiranchandramohan
Copy link
Contributor

If this fixes #87955 then can you add a fixes tag?

This works for me on AArch64. I think it lowers the 128 version early to compare and select. Thanks for the fix.

Copy link
Contributor

@kiranchandramohan kiranchandramohan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG.

Copy link
Contributor

@tblah tblah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks

@kiranchandramohan
Copy link
Contributor

If this fixes #87955 then can you add a fixes tag?

This works for me on AArch64. I think it lowers the 128 version early to compare and select. Thanks for the fix.

Sorry, I got that wrong. For #87955 it is a min reduction and the backend can convert minnum to a fminl call hence it works.

@jsjodin
Copy link
Contributor Author

jsjodin commented Apr 18, 2024

If this fixes #87955 then can you add a fixes tag?

This works for me on AArch64. I think it lowers the 128 version early to compare and select. Thanks for the fix.

Thank you for the review, I will add a fixes tag.

@kparzysz
Copy link
Contributor

I have a concern with making a change this early in the compilation process only to avoid a selection error in the backend. First of all, we seem to replace an IEEE-compliant operation with one that isn't. Such a replacement should only be done if the compiler is allowed to do that (via relevant FP flags). Second, this is a legal operation in MLIR, and presumably in LLVM IR. If the backend can't generate valid code for it, then it's a missing functionality in the backend. Lastly, the replacing of an IEEE-compliant operation with a similar one should be done by the backend, since it's the backend that knows (1) if the operation needs to be replaced, and (2) what to replace it with.

@jsjodin
Copy link
Contributor Author

jsjodin commented Apr 18, 2024

If this fixes #87955 then can you add a fixes tag?

This works for me on AArch64. I think it lowers the 128 version early to compare and select. Thanks for the fix.

Sorry, I got that wrong. For #87955 it is a min reduction and the backend can convert minnum to a fminl call hence it works.

Okay, so does this PR fix anything?

@kiranchandramohan
Copy link
Contributor

If this fixes #87955 then can you add a fixes tag?

This works for me on AArch64. I think it lowers the 128 version early to compare and select. Thanks for the fix.

Sorry, I got that wrong. For #87955 it is a min reduction and the backend can convert minnum to a fminl call hence it works.

Okay, so does this PR fix anything?

Yes it fixes #87955

@jsjodin
Copy link
Contributor Author

jsjodin commented Apr 18, 2024

I have a concern with making a change this early in the compilation process only to avoid a selection error in the backend. First of all, we seem to replace an IEEE-compliant operation with one that isn't. Such a replacement should only be done if the compiler is allowed to do that (via relevant FP flags). Second, this is a legal operation in MLIR, and presumably in LLVM IR. If the backend can't generate valid code for it, then it's a missing functionality in the backend. Lastly, the replacing of an IEEE-compliant operation with a similar one should be done by the backend, since it's the backend that knows (1) if the operation needs to be replaced, and (2) what to replace it with.

I think this change is probably okay since OpenMP 5.0 with min/max reductions was released 2018 and the IEEE-compliance relates to a 2019 draft. So we're not changing the semantics of OpenMP in this case.

@kparzysz
Copy link
Contributor

Feel free to commit it as it is, but we should get the backend(s) to fix this.

@kparzysz
Copy link
Contributor

Opened #89278 to track it.

@jsjodin jsjodin merged commit 0c455ee into llvm:main Apr 19, 2024
8 checks passed
aniplcc pushed a commit to aniplcc/llvm-project that referenced this pull request Apr 21, 2024
…perators (llvm#89258)

This patch changes the lowering of max and min to be lowered to
arith::MaxNumFop and arith::MinNumFOp instead of using arith::MaximumFOp
and arith::MinimumFOp. The arith::MaximumFOp and arith::MinimumFOp map
to the corresponding intrinsics llvm.maximum.* and llvm.minimum.*
intrinsics which conform to the semantics specified in the draft of IEEE
754-2019, which is not supported by all hardware. Instead using
arith::MaximumFOp and arith::MinimumFOp will allow code generation for
more targets and match the code generated by clang OpenMP.

fixes llvm#87955
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flang:fir-hlfir flang:openmp flang Flang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Flang][OpenMP] Compilation error of using a reduction clause (min) with a variable (kind=16)
5 participants