Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Flang][OpenMP] Implement "promotion" of use_device_ptr non-cptr arguments to use_device_addr #82834

Merged
merged 3 commits into from
Mar 13, 2024

Conversation

agozillon
Copy link
Contributor

This effectively implements some now deprecated OpenMP functionality that some applications (most notably at the moment GenASiS) unfortunately depend on (deprecated in specification version 5.2):

"If a list item in a use_device_ptr clause is not of type C_PTR, the behavior is as if the list item appeared in a use_device_addr clause. Support for such list items in a use_device_ptr clause is deprecated."

This PR downgrades the hard-error to a deprecated warning and "promotes" the above cases by simply moving the offending operands from the use_device_ptr value list to the back of the use_device_addr list (and moves the related symbols, locs and types that form the BlockArgs correspondingly) and then the generation of the target data construct proceeds as normal.

@llvmbot
Copy link
Collaborator

llvmbot commented Feb 23, 2024

@llvm/pr-subscribers-flang-fir-hlfir

@llvm/pr-subscribers-flang-openmp

Author: None (agozillon)

Changes

This effectively implements some now deprecated OpenMP functionality that some applications (most notably at the moment GenASiS) unfortunately depend on (deprecated in specification version 5.2):

"If a list item in a use_device_ptr clause is not of type C_PTR, the behavior is as if the list item appeared in a use_device_addr clause. Support for such list items in a use_device_ptr clause is deprecated."

This PR downgrades the hard-error to a deprecated warning and "promotes" the above cases by simply moving the offending operands from the use_device_ptr value list to the back of the use_device_addr list (and moves the related symbols, locs and types that form the BlockArgs correspondingly) and then the generation of the target data construct proceeds as normal.


Full diff: https://github.com/llvm/llvm-project/pull/82834.diff

4 Files Affected:

  • (modified) flang/lib/Lower/OpenMP/OpenMP.cpp (+66)
  • (modified) flang/lib/Semantics/check-omp-structure.cpp (+1-1)
  • (added) flang/test/Lower/OpenMP/use-device-ptr-to-use-device-addr.f90 (+72)
  • (modified) flang/test/Semantics/OpenMP/use_device_ptr1.f90 (+1-1)
diff --git a/flang/lib/Lower/OpenMP/OpenMP.cpp b/flang/lib/Lower/OpenMP/OpenMP.cpp
index 7953bf83cba0fe..922831fa734c8c 100644
--- a/flang/lib/Lower/OpenMP/OpenMP.cpp
+++ b/flang/lib/Lower/OpenMP/OpenMP.cpp
@@ -723,6 +723,58 @@ genTaskGroupOp(Fortran::lower::AbstractConverter &converter,
       /*task_reductions=*/nullptr, allocateOperands, allocatorOperands);
 }
 
+// This helper function implements the functionality of "promoting"
+// non-CPTR arguments of use_device_ptr to use_device_addr
+// arguments (automagic conversion of use_device_ptr ->
+// use_device_addr in these cases). The way we do so currently is
+// through the shuffling of operands from the devicePtrOperands to
+// deviceAddrOperands where neccesary and re-organizing the types,
+// locations and symbols to maintain the correct ordering of ptr/addr
+// input -> BlockArg.
+//
+// This effectively implements some deprecated OpenMP functionality
+// that some legacy applications unfortunately depend on
+// (deprecated in specification version 5.2):
+//
+// "If a list item in a use_device_ptr clause is not of type C_PTR,
+//  the behavior is as if the list item appeared in a use_device_addr
+//  clause. Support for such list items in a use_device_ptr clause
+//  is deprecated."
+static void promoteNonCPtrUseDevicePtrArgsToUseDeviceAddr(
+    llvm::SmallVector<mlir::Value> &devicePtrOperands,
+    llvm::SmallVector<mlir::Value> &deviceAddrOperands,
+    llvm::SmallVector<mlir::Type> &useDeviceTypes,
+    llvm::SmallVector<mlir::Location> &useDeviceLocs,
+    llvm::SmallVector<const Fortran::semantics::Symbol *> &useDeviceSymbols) {
+  auto moveElementToBack = [](size_t idx, auto &vector) {
+    auto *iter = std::next(vector.begin(), idx);
+    vector.push_back(*iter);
+    vector.erase(iter);
+  };
+
+  // Iterate over our use_device_ptr list and shift all non-cptr arguments into
+  // use_device_addr.
+  for (auto *it = devicePtrOperands.begin(); it != devicePtrOperands.end();) {
+    if (!fir::isa_builtin_cptr_type(fir::unwrapRefType(it->getType()))) {
+      deviceAddrOperands.push_back(*it);
+      // We have to shuffle the symbols around as well, to maintain
+      // the correct Input -> BlockArg for use_device_ptr/use_device_addr.
+      // NOTE: However, as map's do not seem to be included currently
+      // this isn't as pertinent, but we must try to maintain for
+      // future alterations. I believe the reason they are not currently
+      // is that the BlockArg assign/lowering needs to be extended
+      // to a greater set of types.
+      auto idx = std::distance(devicePtrOperands.begin(), it);
+      moveElementToBack(idx, useDeviceTypes);
+      moveElementToBack(idx, useDeviceLocs);
+      moveElementToBack(idx, useDeviceSymbols);
+      it = devicePtrOperands.erase(it);
+      continue;
+    }
+    ++it;
+  }
+}
+
 static mlir::omp::DataOp
 genDataOp(Fortran::lower::AbstractConverter &converter,
           Fortran::semantics::SemanticsContext &semaCtx,
@@ -745,6 +797,20 @@ genDataOp(Fortran::lower::AbstractConverter &converter,
                          useDeviceSymbols);
   cp.processUseDeviceAddr(deviceAddrOperands, useDeviceTypes, useDeviceLocs,
                           useDeviceSymbols);
+  // This function implements the deprecated functionality of use_device_ptr
+  // that allows users to provide non-CPTR arguments to it with the caveat
+  // that the compiler will treat them as use_device_addr. A lot of legacy
+  // code may still depend on this functionality, so we should support it
+  // in some manner. We do so currently by simply shifting non-cptr operands
+  // from the use_device_ptr list into the front of the use_device_addr list
+  // whilst maintaining the ordering of useDeviceLocs, useDeviceSymbols and
+  // useDeviceTypes to use_device_ptr/use_device_addr input for BlockArg
+  // ordering.
+  // TODO: Perhaps create a user provideable compiler option that will
+  // re-introduce a hard-error rather than a warning in these cases.
+  promoteNonCPtrUseDevicePtrArgsToUseDeviceAddr(
+      devicePtrOperands, deviceAddrOperands, useDeviceTypes, useDeviceLocs,
+      useDeviceSymbols);
   cp.processMap(currentLocation, llvm::omp::Directive::OMPD_target_data,
                 stmtCtx, mapOperands);
 
diff --git a/flang/lib/Semantics/check-omp-structure.cpp b/flang/lib/Semantics/check-omp-structure.cpp
index 54101ab8a42bbf..bf4debee1df34c 100644
--- a/flang/lib/Semantics/check-omp-structure.cpp
+++ b/flang/lib/Semantics/check-omp-structure.cpp
@@ -2948,7 +2948,7 @@ void OmpStructureChecker::Enter(const parser::OmpClause::UseDevicePtr &x) {
         if (name->symbol) {
           if (!(IsBuiltinCPtr(*(name->symbol)))) {
             context_.Say(itr->second->source,
-                "'%s' in USE_DEVICE_PTR clause must be of type C_PTR"_err_en_US,
+                "Use of non-C_PTR type '%s' in USE_DEVICE_PTR is deprecated, use USE_DEVICE_ADDR instead"_warn_en_US,
                 name->ToString());
           } else {
             useDevicePtrNameList.push_back(*name);
diff --git a/flang/test/Lower/OpenMP/use-device-ptr-to-use-device-addr.f90 b/flang/test/Lower/OpenMP/use-device-ptr-to-use-device-addr.f90
new file mode 100644
index 00000000000000..33b5971656010a
--- /dev/null
+++ b/flang/test/Lower/OpenMP/use-device-ptr-to-use-device-addr.f90
@@ -0,0 +1,72 @@
+!RUN: %flang_fc1 -emit-hlfir -fopenmp %s -o - | FileCheck %s
+!RUN: bbc -emit-hlfir -fopenmp %s -o - | FileCheck %s
+
+! This tests primary goal is to check the promotion of 
+! non-CPTR arguments from use_device_ptr to 
+! use_device_addr works, without breaking any 
+! functionality 
+
+!CHECK: func.func @{{.*}}only_use_device_ptr()
+!CHECK: omp.target_data use_device_ptr(%{{.*}} : !fir.ref<!fir.type<_QM__fortran_builtinsT__builtin_c_ptr{__address:i64}>>) use_device_addr(%{{.*}}, %{{.*}} : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xf32>>>>, !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>) {
+!CHECK: ^bb0(%{{.*}}: !fir.ref<!fir.type<_QM__fortran_builtinsT__builtin_c_ptr{__address:i64}>>, %{{.*}}: !fir.ref<!fir.box<!fir.ptr<!fir.array<?xf32>>>>, %{{.*}}: !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>):
+subroutine only_use_device_ptr 
+    use iso_c_binding
+    integer, pointer, dimension(:) :: array
+    real, pointer :: pa(:)
+    type(c_ptr) :: cptr
+
+   !$omp target data use_device_ptr(pa, cptr, array)
+   !$omp end target data 
+end subroutine
+
+!CHECK: func.func @{{.*}}mix_use_device_ptr_and_addr()
+!CHECK: omp.target_data use_device_ptr({{.*}} : !fir.ref<!fir.type<_QM__fortran_builtinsT__builtin_c_ptr{__address:i64}>>) use_device_addr(%{{.*}}, %{{.*}} : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>, !fir.ref<!fir.box<!fir.ptr<!fir.array<?xf32>>>>) {
+!CHECK: ^bb0(%{{.*}}: !fir.ref<!fir.type<_QM__fortran_builtinsT__builtin_c_ptr{__address:i64}>>, %{{.*}}: !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>, %{{.*}}: !fir.ref<!fir.box<!fir.ptr<!fir.array<?xf32>>>>):
+subroutine mix_use_device_ptr_and_addr 
+    use iso_c_binding
+    integer, pointer, dimension(:) :: array
+    real, pointer :: pa(:)
+    type(c_ptr) :: cptr
+
+   !$omp target data use_device_ptr(pa, cptr) use_device_addr(array)
+   !$omp end target data 
+end subroutine
+
+!CHECK: func.func @{{.*}}only_use_device_addr()
+!CHECK: omp.target_data use_device_addr(%{{.*}}, %{{.*}}, %{{.*}} : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xf32>>>>, !fir.ref<!fir.type<_QM__fortran_builtinsT__builtin_c_ptr{__address:i64}>>, !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>) {
+!CHECK: ^bb0(%{{.*}}: !fir.ref<!fir.box<!fir.ptr<!fir.array<?xf32>>>>, %{{.*}}: !fir.ref<!fir.type<_QM__fortran_builtinsT__builtin_c_ptr{__address:i64}>>, %{{.*}}: !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>):
+subroutine only_use_device_addr 
+    use iso_c_binding
+    integer, pointer, dimension(:) :: array
+    real, pointer :: pa(:)
+    type(c_ptr) :: cptr
+
+   !$omp target data use_device_addr(pa, cptr, array)
+   !$omp end target data 
+end subroutine
+
+!CHECK: func.func @{{.*}}mix_use_device_ptr_and_addr_and_map()
+!CHECK: omp.target_data map_entries(%{{.*}}, %{{.*}} : !fir.ref<i32>, !fir.ref<i32>) use_device_ptr(%{{.*}} : !fir.ref<!fir.type<_QM__fortran_builtinsT__builtin_c_ptr{__address:i64}>>) use_device_addr(%{{.*}}, %{{.*}} : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>, !fir.ref<!fir.box<!fir.ptr<!fir.array<?xf32>>>>) {
+!CHECK: ^bb0(%{{.*}}: !fir.ref<!fir.type<_QM__fortran_builtinsT__builtin_c_ptr{__address:i64}>>, %{{.*}}: !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>, %{{.*}}: !fir.ref<!fir.box<!fir.ptr<!fir.array<?xf32>>>>):
+subroutine mix_use_device_ptr_and_addr_and_map
+    use iso_c_binding
+    integer :: i, j
+    integer, pointer, dimension(:) :: array
+    real, pointer :: pa(:)
+    type(c_ptr) :: cptr
+
+   !$omp target data use_device_ptr(pa, cptr) use_device_addr(array) map(tofrom: i, j)
+   !$omp end target data 
+end subroutine
+
+!CHECK: func.func @{{.*}}only_use_map()
+!CHECK: omp.target_data map_entries(%{{.*}}, %{{.*}}, %{{.*}}, %{{.*}}, %{{.*}} : !fir.llvm_ptr<!fir.ref<!fir.array<?xf32>>>, !fir.ref<!fir.box<!fir.ptr<!fir.array<?xf32>>>>, !fir.ref<!fir.type<_QM__fortran_builtinsT__builtin_c_ptr{__address:i64}>>, !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>>, !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>) {
+subroutine only_use_map
+    use iso_c_binding
+    integer, pointer, dimension(:) :: array
+    real, pointer :: pa(:)
+    type(c_ptr) :: cptr
+
+   !$omp target data map(pa, cptr, array)
+   !$omp end target data 
+end subroutine
diff --git a/flang/test/Semantics/OpenMP/use_device_ptr1.f90 b/flang/test/Semantics/OpenMP/use_device_ptr1.f90
index af89698a5c5a99..176fb5f35a8490 100644
--- a/flang/test/Semantics/OpenMP/use_device_ptr1.f90
+++ b/flang/test/Semantics/OpenMP/use_device_ptr1.f90
@@ -27,7 +27,7 @@ subroutine omp_target_data
       a = arrayB
    !$omp end target data
 
-   !ERROR: 'a' in USE_DEVICE_PTR clause must be of type C_PTR
+   !WARNING: Use of non-C_PTR type 'a' in USE_DEVICE_PTR is deprecated, use USE_DEVICE_ADDR instead
    !$omp target data map(tofrom: a) use_device_ptr(a)
       a = 2
    !$omp end target data

Comment on lines +800 to +883
// This function implements the deprecated functionality of use_device_ptr
// that allows users to provide non-CPTR arguments to it with the caveat
// that the compiler will treat them as use_device_addr. A lot of legacy
// code may still depend on this functionality, so we should support it
// in some manner. We do so currently by simply shifting non-cptr operands
// from the use_device_ptr list into the front of the use_device_addr list
// whilst maintaining the ordering of useDeviceLocs, useDeviceSymbols and
// useDeviceTypes to use_device_ptr/use_device_addr input for BlockArg
// ordering.
// TODO: Perhaps create a user provideable compiler option that will
// re-introduce a hard-error rather than a warning in these cases.
promoteNonCPtrUseDevicePtrArgsToUseDeviceAddr(
devicePtrOperands, deviceAddrOperands, useDeviceTypes, useDeviceLocs,
useDeviceSymbols);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better to do this directly in cp.processUseDevicePtr by passing deviceAddrOperands as an additional argument?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could certainly try to do that if you would prefer it!

I opted for this originally as it keeps the behavior as a seperate process and currently processUseDevicePtr/processUseDeviceAddr effectively use the same function call currently under the hood, so I was a little apprehensive to overload it for just useDevcePtr related behavior.

Doing it this way also allowed keeping the ordering easier at the time as access to the complete list (of use device and addr) is possible (but I suppose that depends on your perspective of order in this case, i.e. should the non-cptr useDevicePtr arguments be inserted at the end or front of the useDeviceAddr list).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer an overload of processUseDevicePtr by adding deviceAddrOperands* as the final optional argument. You can switch the order of calls to processUseDevicePtr and processUseDeviceAddr to get the right ordering I guess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer an overload of processUseDevicePtr by adding deviceAddrOperands* as the final optional argument. You can switch the order of calls to processUseDevicePtr and processUseDeviceAddr to get the right ordering I guess.

Sounds good, I'll give it a try and see what I can manage :-) I have another idea that may simplify things further as well, so I'll see if that works as well while I am revising the PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to make the adjustments suggested, in this case I opted to just keep the one overload of processUseDevicePtr but with the extra optional argument (although, happy to add a secondary overload if you'd prefer). I did however, keep the ordering of processUseDevicePtr/processUseDeviceAddr calls, the most important thing is keeping the BlockArg <-> UseDevicePtr/UseDeviceAddr ordering I believe.

The alternative method I wanted to attempt, didn't work out quite as well as I desired, I believe it'd work but would require seperate vectors for the locs/syms/types for device ptr and addr and a few more optional parameters for each of these for processUseDevicePtr. We'd get rid of the promoteNonCPtrUseDevicePtrArgsToUseDeviceAddr from this PR and instead do the "promotion" inside of addUseDeviceClause when the optional arguments are present (effectively, if optional arguements are present, check if a type is a cptr, if it isn't move it into the optional arguments which would be the use_device_addr vectors). I stopped digging into it any further to see what the input on the current iteration was and see if you'd prefer this other method I am suggesting!

@kiranchandramohan
Copy link
Contributor

@agozillon I was hoping to rid of promoteNonCPtrUseDevicePtrArgsToUseDeviceAddr. If that is not possible then the previous approach has the minimal change and is better.

@agozillon
Copy link
Contributor Author

@agozillon I was hoping to rid of promoteNonCPtrUseDevicePtrArgsToUseDeviceAddr. If that is not possible then the previous approach has the minimal change and is better.

Ah I am very sorry then I misinterpreted your original meaning then. We could probably inline the code inside of the function where it's called just now and remove the function definition but completely removing what it does is unlikely.

In this case as we end up sharing a single set of syms/locs/types to make the BlockArgs, there's occasions we need to re-order them slightly whenever we move things around to keep the ordering of BlockArgs <-> devicePtr/deviceAddr (perhaps this doesn't matter as much as it does for TargetOp though). I don't think this changes regardless of the order we call processUseDevicePtr/processUseDeviceAddr or where the promotion is done as something we move could be anywhere in the devicePtr vector and it's corresponding element in the symbols etc. vector would need moved, e.g.

!$omp target data use_device_ptr(pa, cptr, array)

In the above case, we only depend on processUseDevicePtr, but we have two elements to move (pa and array) to create a new use_device_addr list, that's simple enough we just push them onto the vector of use_device_addr operands and erase them from use_device_ptr's, however, now the data making the BlockArgs is out of order, so we need to move them to a corresponding offset in their lists (which is a bit more complex as they share a single vector across both use_device_addr and use_device_ptr unlike the Values),

I could be missing something though, and there is every chance I am overcomplicating things so please do point it out if you think that's the case!

I could try finish the other idea I had if you would like? To see if it'd get rid of it/is more reasonable, but it won't be as small a code change as the original I think and it'll make the function interfaces a little more lengthy with optional parameters. It would be to do the "promotion" inside of the shared addUseDeviceClause function whenever the appropriate optional arguments have been specified and to have seperate lists for each of the components making up BlockArgs that will get merged together after we're done to create the BlockArgs and appropriate symbol bindings.

@agozillon
Copy link
Contributor Author

agozillon commented Mar 7, 2024

Just a little ping to see if I can get a quick answer to which direction you'd prefer @kiranchandramohan (revert to previous version or try the other idea I had)? Either works for me. No rush to answer as I know you're incredibly busy, just wanted to make sure the last (verbose) comment I made was interpreted as a question, so I know how to move forward!

@kiranchandramohan
Copy link
Contributor

Sorry about the delay on my side. I think you can go with the previous approach unless you strongly feel about the other idea.

@agozillon
Copy link
Contributor Author

No problem Kiran! Thank you for the quick reply and sorry for the bother. I'll just revert it to the previous state tomorrow then when I am updating the current PR set I have up based on feedback.

@agozillon
Copy link
Contributor Author

last commit just reverts back to the prior better state!

@agozillon
Copy link
Contributor Author

small ping to see if this is in a reasonable enough state to land or if there's some more progress to be made!

Copy link
Contributor

@kiranchandramohan kiranchandramohan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG.

@agozillon
Copy link
Contributor Author

Thank you very much for your help and the review @kiranchandramohan !

…ments to use_device_addr

This effectively implements some now deprecated OpenMP functionality
that some legacy applications unfortunately depend on
(deprecated in specification version 5.2):

"If a list item in a use_device_ptr clause is not of type C_PTR,
 the behavior is as if the list item appeared in a use_device_addr
 clause. Support for such list items in a use_device_ptr clause
 is deprecated."

This PR demotes the hard-error to a deprecated warning and
"promotes" the above cases by simply moving the offending
operands from the use_device_ptr value list to the
back of the use_device_addr list (and moves the related
symbols, locs and types that form the BlockArgs
correspondingly) and then the generation of the
target data construct procedes as normal.
…r than forward

This reverts commit 4f89314388a11289f7a4b56a9e2686bdbb62edf6.
@agozillon
Copy link
Contributor Author

I'll land this after the CI tests pass (at least the linux ones) after the recent rebase

@agozillon agozillon merged commit 096ee4e into llvm:main Mar 13, 2024
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flang:fir-hlfir flang:openmp flang:semantics flang Flang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants