Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Flang][Lower][OpenMP] Add initial lowering of pointers/allocatables/target in map clauses to map_info and entries #68600

Closed
wants to merge 1 commit into from

Conversation

agozillon
Copy link
Contributor

@agozillon agozillon commented Oct 9, 2023

This patch seeks to add an initial lowering for pointers, allocatables and target variables explicitly captured by map in Flang OpenMP.

Currently the intention is to treat these types as a special case of OpenMP structure mapping as far as the runtime is concerned, where the box (descriptor information) is the holding container and the underlying pointer is contained within the container. The descriptor pointed to by the generated bounds provides all the data required to offload the pointer contained within the descriptor.

This comes from the concept that the eventual lowered LLVM IR Type for these types is effectively a structure containing descriptor information and a pointer to the data. And the target device kernel essentially treats these as such.

A future optimization might be to strip the descriptor information to turn these into naked pointers, but I feel the descriptor information is rather important and perhaps the OpenMP specification has something to say about maintaining Fortran descriptors!

This patch generates two map_info for each Fortran pointer, target or allocatable, implicitly mimicking the following relationship in C++:

template<typename T>
struct descriptor {
 ... other descriptor data
 T* data;
};

#pragma omp target map (tofrom: descriptor, descriptor->data)

Or in Fortran:

 type :: descriptor
    ... other descriptor data
   integer :: data(:) ! not completely accurate but showcases the idea
 end type scalar

!$omp target map(from:descriptor, descriptor%data)

Where the first map generated is the descriptor and the second argument is the pointer to the data. This results in the following operations generated per mapped allocatable/target/pointer:

  1. omp.bounds pointing to the descriptor information relevant to the
    bounds of the described data
  2. one map which contains no bounds information and it's varptr (main
    map argument) points to the descriptor (box), this is the first map
    in the above examples indicating the descriptor container.
  3. a second map which contains the bounds operation from 1, a varptr
    pointing to the data/address of the to be mapped
    pointer/target/allocatable variable contained within the descriptor
    (we generate a box_addr) and then a subsequent varptptr (which
    points to an owning object) which points to the descriptor, that
    was mapped in the first map, indicating a link between these two
    maps for later lowering to take advantage of.

NOTE: Currently there's a block on pointers/allocas/target contained in derived types
as it appears the bounds information isn't fully generated for these just yet.

@llvmbot
Copy link
Collaborator

llvmbot commented Oct 9, 2023

@llvm/pr-subscribers-flang-fir-hlfir
@llvm/pr-subscribers-mlir-openmp

@llvm/pr-subscribers-flang-openmp

Changes

This patch seeks to add an initial lowering for pointers, allocatables and target variables explicitly captured by map in Flang OpenMP.

Currently the intention is to treat these types as a special case of OpenMP structure mapping as far as the runtime is concerned, where the box (descriptor information) is the holding container and the underlying pointer is contained within the container. The descriptor pointed to by the generated bounds provides all the data required to offload the pointer contained within the descriptor.

This comes from the concept that the eventual lowered LLVM IR Type for these types is effectively a structure containing descriptor information and a pointer to the data. And the target device kernel essentially treats these as such.

A future optimization might be to strip the descriptor information to turn these into naked pointers, but I feel the descriptor information is rather important and perhaps the OpenMP specification has something to say about maintaining Fortran descriptors!

This patch generates two map_info for each Fortran pointer, target or allocatable, implicitly mimicking the following relationship in C++:

template<typename T>
struct descriptor {
... other descriptor data
T* data;
};

#pragma omp target map (tofrom: descriptor, descriptor->data)

Or in Fortran:
type :: descriptor
... other descriptor data
integer :: data(:) ! not completely accurate but showcases the idea
end type scalar

!$omp target map(from:scalar, scalar%data)

Where the first map generated is the descriptor and the second argument is the pointer to the data. This results in the following operations generated per mapped allocatable/target/pointer:

  1. omp.bounds pointing to the descriptor information relevant to the
    bounds of the described data
  2. one map which contains no bounds information and it's varptr (main
    map argument) points to the descriptor (box), this is the first map
    in the above examples indicating the descriptor container.
  3. a second map which contains the bounds operation from 1, a varptr
    pointing to the data/address of the to be mapped
    pointer/target/allocatable variable contained within the descriptor
    (we generate a box_addr) and then a subsequent varptptr (which
    points to an owning object) which points to the descriptor, that
    was mapped in the first map, indicating a link between these two
    maps for later lowering to take advantage of.

NOTE: Currently there's a block on pointers/allocas/target contained in derived types
as it appears the bounds information isn't fully generated for these just yet.


Full diff: https://github.com/llvm/llvm-project/pull/68600.diff

3 Files Affected:

  • (modified) flang/lib/Lower/OpenMP.cpp (+80-7)
  • (modified) flang/test/Lower/OpenMP/FIR/target.f90 (+3-2)
  • (added) flang/test/Lower/OpenMP/allocatable-pointer-target-map.f90 (+74)
diff --git a/flang/lib/Lower/OpenMP.cpp b/flang/lib/Lower/OpenMP.cpp
index 5f5e968eaaa6414..8002423e0a1961c 100644
--- a/flang/lib/Lower/OpenMP.cpp
+++ b/flang/lib/Lower/OpenMP.cpp
@@ -48,6 +48,9 @@ getOmpObjectSymbol(const Fortran::parser::OmpObject &ompObject) {
                     Fortran::parser::Unwrap<Fortran::parser::ArrayElement>(
                         designator)) {
               sym = GetFirstName(arrayEle->base).symbol;
+            } else if (auto *structComp = Fortran::parser::Unwrap<
+                           Fortran::parser::StructureComponent>(designator)) {
+              sym = structComp->component.symbol;
             } else if (const Fortran::parser::Name *name =
                            Fortran::semantics::getDesignatorNameIfDataRef(
                                designator)) {
@@ -1663,11 +1666,10 @@ bool ClauseProcessor::processLink(
 
 static mlir::omp::MapInfoOp
 createMapInfoOp(fir::FirOpBuilder &builder, mlir::Location loc,
-                mlir::Value baseAddr, std::stringstream &name,
-                mlir::SmallVector<mlir::Value> bounds, uint64_t mapType,
-                mlir::omp::VariableCaptureKind mapCaptureType, bool implicit,
-                mlir::Type retTy) {
-  mlir::Value varPtrPtr;
+                mlir::Value baseAddr, mlir::Value varPtrPtr,
+                const std::string &name, mlir::SmallVector<mlir::Value> bounds,
+                uint64_t mapType, mlir::omp::VariableCaptureKind mapCaptureType,
+                bool implicit, mlir::Type retTy) {
   if (auto boxTy = baseAddr.getType().dyn_cast<fir::BaseBoxType>()) {
     baseAddr = builder.create<fir::BoxAddrOp>(loc, baseAddr);
     retTy = baseAddr.getType();
@@ -1675,7 +1677,7 @@ createMapInfoOp(fir::FirOpBuilder &builder, mlir::Location loc,
 
   mlir::omp::MapInfoOp op =
       builder.create<mlir::omp::MapInfoOp>(loc, retTy, baseAddr);
-  op.setNameAttr(builder.getStringAttr(name.str()));
+  op.setNameAttr(builder.getStringAttr(name));
   op.setImplicit(implicit);
   op.setMapType(mapType);
   op.setMapCaptureType(mapCaptureType);
@@ -1752,11 +1754,82 @@ bool ClauseProcessor::processMap(
                                        semanticsContext, stmtCtx, ompObject,
                                        clauseLocation, asFortran, bounds);
 
+          auto checkIfStructComponent = [](const Fortran::parser::OmpObject
+                                               &ompObject) {
+            bool isComponent = false;
+            std::visit(
+                Fortran::common::visitors{
+                    [&](const Fortran::parser::Designator &designator) {
+                      if (auto *structComp = Fortran::parser::Unwrap<
+                              Fortran::parser::StructureComponent>(
+                              designator)) {
+                        if (std::holds_alternative<Fortran::parser::Name>(
+                                structComp->base.u))
+                          isComponent = true;
+                      }
+                    },
+                    [&](const Fortran::parser::Name &name) {}},
+                ompObject.u);
+
+            return isComponent;
+          };
+
+          // TODO: Currently, it appears there's missing symbol information
+          // and bounds information for allocatables and pointers inside
+          // of derived types. The latter needs some additional support
+          // added to the bounds generation whereas the former appears
+          // that it could be a problem when referring to pointer members
+          // via an OpenMP map clause, for the moment we do not handle
+          // these cases and must emit an error.
+          if (checkIfStructComponent(ompObject) &&
+              Fortran::semantics::IsAllocatableOrPointer(
+                  *getOmpObjectSymbol(ompObject)))
+            TODO(currentLocation,
+                 "pointer members of derived types are currently unmapped");
+
+          if (Fortran::semantics::IsAllocatableOrPointer(
+                  *getOmpObjectSymbol(ompObject))) {
+            // We mimic what will eventually be a structure containing a
+            // pointer mapping for allocatables/pointers/target e.g.:
+            //
+            // !$omp target map(from:in, in%map_ptr)
+            //
+            // ===>
+            //
+            // map_entry varptr(in) ....
+            // map_entry varptr(map_ptr) varptrptr(in) ...
+            //
+            // This is to attempt to keep the lowering of these consistent
+            // with structures containing pointers that are mapped like the
+            // example above, where we break it into the descriptor being the
+            // main "structure" being mapped and the contained pointer the
+            // specific member being referenced. This is of course implicit,
+            // the user just maps the pointer, target or allocatable.
+            mlir::Value descriptor =
+                converter.getSymbolAddress(*getOmpObjectSymbol(ompObject));
+            mapOperands.push_back(createMapInfoOp(
+                firOpBuilder, clauseLocation, descriptor, nullptr,
+                asFortran.str(), mlir::SmallVector<mlir::Value>{},
+                static_cast<std::underlying_type_t<
+                    llvm::omp::OpenMPOffloadMappingFlags>>(mapTypeBits),
+                mlir::omp::VariableCaptureKind::ByRef, false,
+                descriptor.getType()));
+            mapOperands.push_back(createMapInfoOp(
+                firOpBuilder, clauseLocation, baseAddr, descriptor,
+                asFortran.str(), bounds,
+                static_cast<std::underlying_type_t<
+                    llvm::omp::OpenMPOffloadMappingFlags>>(mapTypeBits),
+                mlir::omp::VariableCaptureKind::ByRef, false,
+                baseAddr.getType()));
+            continue;
+          }
+
           // Explicit map captures are captured ByRef by default,
           // optimisation passes may alter this to ByCopy or other capture
           // types to optimise
           mapOperands.push_back(createMapInfoOp(
-              firOpBuilder, clauseLocation, baseAddr, asFortran, bounds,
+              firOpBuilder, clauseLocation, baseAddr, nullptr, asFortran.str(),
+              bounds,
               static_cast<
                   std::underlying_type_t<llvm::omp::OpenMPOffloadMappingFlags>>(
                   mapTypeBits),
diff --git a/flang/test/Lower/OpenMP/FIR/target.f90 b/flang/test/Lower/OpenMP/FIR/target.f90
index 9b1fb5c15ac1d2d..6267bb4221a095c 100644
--- a/flang/test/Lower/OpenMP/FIR/target.f90
+++ b/flang/test/Lower/OpenMP/FIR/target.f90
@@ -248,8 +248,9 @@ end subroutine omp_target_device_ptr
  subroutine omp_target_device_addr
    integer, pointer :: a
    !CHECK: %[[VAL_0:.*]] = fir.alloca !fir.box<!fir.ptr<i32>> {bindc_name = "a", uniq_name = "_QFomp_target_device_addrEa"}
-   !CHECK: %[[MAP:.*]] = omp.map_info var_ptr({{.*}})   map_clauses(tofrom) capture(ByRef) -> {{.*}} {name = "a"}
-   !CHECK: omp.target_data map_entries(%[[MAP]] : {{.*}}) use_device_addr(%[[VAL_0]] : !fir.ref<!fir.box<!fir.ptr<i32>>>) {
+   !CHECK: %[[MAP_DESC:.*]] = omp.map_info var_ptr({{.*}})   map_clauses(tofrom) capture(ByRef) -> {{.*}} {name = "a"}
+   !CHECK: %[[MAP_PTR:.*]] = omp.map_info var_ptr({{.*}})   var_ptr_ptr({{.*}}) map_clauses(tofrom) capture(ByRef) -> {{.*}} {name = "a"}
+   !CHECK: omp.target_data map_entries(%[[MAP_DESC]], %[[MAP_PTR]] : {{.*}}) use_device_addr(%[[VAL_0]] : !fir.ref<!fir.box<!fir.ptr<i32>>>) {
    !$omp target data map(tofrom: a) use_device_addr(a)
    !CHECK: ^bb0(%[[VAL_1:.*]]: !fir.ref<!fir.box<!fir.ptr<i32>>>):
    !CHECK: {{.*}} = fir.load %[[VAL_1]] : !fir.ref<!fir.box<!fir.ptr<i32>>>
diff --git a/flang/test/Lower/OpenMP/allocatable-pointer-target-map.f90 b/flang/test/Lower/OpenMP/allocatable-pointer-target-map.f90
new file mode 100644
index 000000000000000..7cbe0e5db4d5f66
--- /dev/null
+++ b/flang/test/Lower/OpenMP/allocatable-pointer-target-map.f90
@@ -0,0 +1,74 @@
+
+!RUN: %flang_fc1 -emit-hlfir -fopenmp %s -o - | FileCheck %s
+
+subroutine map_pointer()
+    integer,  pointer :: map_ptr(:)     
+    allocate(map_ptr(10))
+    !CHECK: %[[ALLOCA:.*]] = fir.alloca !fir.box<!fir.ptr<!fir.array<?xi32>>> {bindc_name = "map_ptr", uniq_name = "_QFmap_pointerEmap_ptr"}
+    !CHECK: %[[DESC:.*]]:2 = hlfir.declare %[[ALLOCA]] {fortran_attrs = #fir.var_attrs<pointer>, uniq_name = "_QFmap_pointerEmap_ptr"} : (!fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>) -> (!fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>, !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>)
+    !CHECK: %[[LOAD_FROM_DESC:.*]] = fir.load %[[DESC]]#1 : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>
+    !CHECK: %[[MAP_BOUNDS:.*]] = omp.bounds   lower_bound({{.*}}) upper_bound({{.*}}) stride({{.*}}) start_idx({{.*}}) {stride_in_bytes = true}
+    !CHECK: %[[MAP_DESC:.*]] = omp.map_info var_ptr(%[[DESC]]#1 : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>)   map_clauses(tofrom) capture(ByRef) -> !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>> {name = "map_ptr"}
+    !CHECK: %[[PTR_ADDR:.*]] = fir.box_addr %[[LOAD_FROM_DESC]] : (!fir.box<!fir.ptr<!fir.array<?xi32>>>) -> !fir.ptr<!fir.array<?xi32>>
+    !CHECK: %[[MAP_PTR:.*]] = omp.map_info var_ptr(%[[PTR_ADDR]] : !fir.ptr<!fir.array<?xi32>>)   var_ptr_ptr(%[[DESC]]#1 : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>) map_clauses(tofrom) capture(ByRef) bounds(%[[MAP_BOUNDS]]) -> !fir.ptr<!fir.array<?xi32>> {name = "map_ptr"}
+    !CHECK: omp.target   map_entries(%[[MAP_DESC]], %[[MAP_PTR]] : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>, !fir.ptr<!fir.array<?xi32>>) {
+    !$omp target map(tofrom: map_ptr) 
+    !$omp end target
+end subroutine map_pointer
+
+subroutine map_alloca()
+    integer,  allocatable :: map_al(:) 
+    allocate(map_al(10)) 
+    !CHECK: %[[ALLOCA:.*]] = fir.alloca !fir.box<!fir.heap<!fir.array<?xi32>>> {bindc_name = "map_al", uniq_name = "_QFmap_allocaEmap_al"}
+    !CHECK: %[[DESC:.*]]:2 = hlfir.declare %[[ALLOCA]] {fortran_attrs = #fir.var_attrs<allocatable>, uniq_name = "_QFmap_allocaEmap_al"} : (!fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>) -> (!fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>, !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>)
+    !CHECK: %[[LOAD_FROM_DESC:.*]] = fir.load %[[DESC]]#1 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>
+    !CHECK: %[[MAP_BOUNDS:.*]] = omp.bounds   lower_bound({{.*}}) upper_bound({{.*}}) stride({{.*}}) start_idx({{.*}}) {stride_in_bytes = true}
+    !CHECK: %[[MAP_DESC:.*]] = omp.map_info var_ptr(%[[DESC]]#1 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>)   map_clauses(tofrom) capture(ByRef) -> !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>> {name = "map_al"}
+    !CHECK: %[[PTR_ADDR:.*]] = fir.box_addr %[[LOAD_FROM_DESC]] : (!fir.box<!fir.heap<!fir.array<?xi32>>>) -> !fir.heap<!fir.array<?xi32>>
+    !CHECK: %[[MAP_PTR:.*]] = omp.map_info var_ptr(%[[PTR_ADDR]] : !fir.heap<!fir.array<?xi32>>)   var_ptr_ptr(%[[DESC]]#1 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>) map_clauses(tofrom) capture(ByRef) bounds(%[[MAP_BOUNDS]]) -> !fir.heap<!fir.array<?xi32>> {name = "map_al"}
+    !CHECK: omp.target   map_entries(%[[MAP_DESC]], %[[MAP_PTR]] : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>, !fir.heap<!fir.array<?xi32>>) {
+    !$omp target map(tofrom: map_al) 
+    !$omp end target
+end subroutine map_alloca
+
+subroutine map_pointer_target()
+    integer,  pointer :: a(:)
+    integer, target :: b(10)
+    a => b
+    !CHECK: %[[ALLOCA:.*]] = fir.alloca !fir.box<!fir.ptr<!fir.array<?xi32>>> {bindc_name = "a", uniq_name = "_QFmap_pointer_targetEa"}
+    !CHECK: %[[DESC:.*]]:2 = hlfir.declare %[[ALLOCA]] {fortran_attrs = #fir.var_attrs<pointer>, uniq_name = "_QFmap_pointer_targetEa"} : (!fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>) -> (!fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>, !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>)
+    !CHECK: %[[LOAD_FROM_DESC:.*]] = fir.load %[[DESC]]#1 : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>
+    !CHECK: %[[MAP_BOUNDS:.*]] = omp.bounds   lower_bound({{.*}}) upper_bound({{.*}}) stride({{.*}}) start_idx({{.*}}) {stride_in_bytes = true}
+    !CHECK: %[[MAP_DESC:.*]] = omp.map_info var_ptr(%[[DESC]]#1 : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>)   map_clauses(tofrom) capture(ByRef) -> !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>> {name = "a"}
+    !CHECK: %[[PTR_ADDR:.*]] = fir.box_addr %[[LOAD_FROM_DESC]] : (!fir.box<!fir.ptr<!fir.array<?xi32>>>) -> !fir.ptr<!fir.array<?xi32>>
+    !CHECK: %[[MAP_PTR:.*]] = omp.map_info var_ptr(%[[PTR_ADDR]] : !fir.ptr<!fir.array<?xi32>>)   var_ptr_ptr(%[[DESC]]#1 : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>) map_clauses(tofrom) capture(ByRef) bounds(%[[MAP_BOUNDS]]) -> !fir.ptr<!fir.array<?xi32>> {name = "a"}
+    !CHECK: omp.target   map_entries(%[[MAP_DESC]], %[[MAP_PTR]] : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>, !fir.ptr<!fir.array<?xi32>>) {
+    !$omp target map(tofrom: a) 
+    !$omp end target
+end subroutine map_pointer_target
+
+subroutine map_pointer_target_section()
+    integer,target  :: A(30)
+    integer,pointer :: p(:)
+    !CHECK: %[[ALLOCA:.*]] = fir.alloca !fir.array<30xi32> {bindc_name = "a", fir.target, uniq_name = "_QFmap_pointer_target_sectionEa"}
+    !CHECK: %[[SHAPE:.*]] = fir.shape %c30 : (index) -> !fir.shape<1>
+    !CHECK: %[[DESC_1:.*]]:2 = hlfir.declare %[[ALLOCA]](%[[SHAPE]]) {fortran_attrs = #fir.var_attrs<target>, uniq_name = "_QFmap_pointer_target_sectionEa"} : (!fir.ref<!fir.array<30xi32>>, !fir.shape<1>) -> (!fir.ref<!fir.array<30xi32>>, !fir.ref<!fir.array<30xi32>>)
+    !CHECK: %[[ALLOCA_2:.*]] = fir.alloca !fir.box<!fir.ptr<!fir.array<?xi32>>> {bindc_name = "p", uniq_name = "_QFmap_pointer_target_sectionEp"}
+    !CHECK: %[[DESC_2:.*]]:2 = hlfir.declare %[[ALLOCA_2]] {fortran_attrs = #fir.var_attrs<pointer>, uniq_name = "_QFmap_pointer_target_sectionEp"} : (!fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>) -> (!fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>, !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>)
+    !CHECK: %[[MAP_1_BOUNDS:.*]] = omp.bounds   lower_bound({{.*}}) upper_bound({{.*}}) stride({{.*}}) start_idx({{.*}})
+    !CHECK: %[[MAP_1:.*]] = omp.map_info var_ptr(%[[DESC_1]]#1 : !fir.ref<!fir.array<30xi32>>)   map_clauses(tofrom) capture(ByRef) bounds(%[[MAP_1_BOUNDS]]) -> !fir.ref<!fir.array<30xi32>> {name = "a(1:4)"}
+    !CHECK: omp.target_data   map_entries(%[[MAP_1]] : !fir.ref<!fir.array<30xi32>>) {
+    !$omp target data map( A(1:4) )
+        p=>A
+        !CHECK: %[[LOAD:.*]] = fir.load %[[DESC_2]]#1 : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>
+        !CHECK: %[[MAP_3_BOUNDS:.*]] = omp.bounds   lower_bound({{.*}}) upper_bound({{.*}}) stride({{.*}}) start_idx({{.*}}) {stride_in_bytes = true}
+        !CHECK: %[[MAP_2:.*]] = omp.map_info var_ptr(%[[DESC_2]]#1 : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>)   map_clauses(tofrom) capture(ByRef) -> !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>> {name = "p(8:27)"}
+        !CHECK: %[[MAP_ADDR_OF:.*]] = fir.box_addr %[[LOAD]] : (!fir.box<!fir.ptr<!fir.array<?xi32>>>) -> !fir.ptr<!fir.array<?xi32>>
+        !CHECK: %[[MAP_3:.*]] = omp.map_info var_ptr(%[[MAP_ADDR_OF]] : !fir.ptr<!fir.array<?xi32>>)   var_ptr_ptr(%[[DESC_2]]#1 : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>) map_clauses(tofrom) capture(ByRef) bounds(%[[MAP_3_BOUNDS]]) -> !fir.ptr<!fir.array<?xi32>> {name = "p(8:27)"}
+        !CHECK: omp.target   map_entries(%[[MAP_2]], %[[MAP_3]] : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>, !fir.ptr<!fir.array<?xi32>>) {
+        !$omp target map( p(8:27) )
+        A(3) = 0
+        p(9) = 0
+        !$omp end target
+    !$omp end target data
+end subroutine map_pointer_target_section

@github-actions
Copy link

github-actions bot commented Oct 9, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.


if (Fortran::semantics::IsAllocatableOrPointer(
*getOmpObjectSymbol(ompObject))) {
// We mimic what will eventually be a structure containing a
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An inline comment that might add additional clarity alongside the commit description.

…target in map clauses to map_info and entries

This patch seeks to add an initial lowering for pointers, allocatables
and target variables explicitly captured by map in Flang OpenMP.

Currently the intention is to treat these types as a special case of
OpenMP structure mapping as far as the runtime is concerned,
where the box (descriptor information) is the holding container
and the underlying pointer is contained within the container.
The descriptor pointed to by the generated bounds provides
all the data required to offload the pointer contained within
the descriptor.

This comes from the concept that the eventual lowered
LLVM IR Type for these types is effectively a structure
containing descriptor information and a pointer to
the data. And the target device kernel essentially
treats these as such.

A future optimisation might be to strip the descriptor
information to turn these into naked pointers, but I
feel the descriptor information is rather important
and perhaps the OpenMP specification has something
to say about maintaining Fortran descriptors!

This patch generates two map_info for each Fortran
pointer, target or allocatable, implicitly mimicing the
following relationship in C++:

template<typename T>
struct descriptor {
 ... other descriptor data
 T* data;
};

Or in Fortran:
 type :: descriptor
    ... other descriptor data
   integer :: data(:) ! not completely accurate but showcases the idea
 end type scalar

!$omp target map(from:scalar, scalar%data)

Where the first map generated is the descriptor and the second argument
is the pointer to the data. This results in the following operations generated
per mapped allocatable/target/pointer:

1) omp.bounds pointing to the descriptor information relevant to the
   bounds of the described data
2) one map which contains no bounds information and it's varptr (main
   map argument) points to the descriptor (box), this is the first map
   in the above examples indicating the descriptor container.
3) a second map which contains the bounds operation from 1, a varptr
   pointing to the data/address of the to be mapped
   pointer/target/allocatable variable contained within the descriptor
   (we generate a box_addr) and then a subsequent varptptr (which
   points to an owning object) which points to the descriptor, that
   was mapped in the first map, indicating a link between these two
   maps for later lowering to take adavantage of.
Copy link
Contributor

@razvanlupusoru razvanlupusoru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the comments I have on the various piece of code, I also am not sure about adding this code directly to lowering. Descriptors are an implementation detail. From a user's perspective, the clauses they use specifically refer to the data.

Thus the mapping is also an implementation detail - and one that the compiler can exploit. The descriptor can be mapped with a "to" clause or even firstprivatized. So I think the better place to put this code is a pass after lowering. Straightforward to do:

  1. Walk through each construct that consumes the map info operations.
  2. Collect the set of box references and box's data references in data clauses.
  3. Initial candidate set is all of the boxes whose base_addr is used in map info
  4. Subtract the set of boxes already in data clauses
  5. Generate the appropriate map info entry for the box and insert it right before the mapping operation for its data.

// We mimic what will eventually be a structure containing a
// pointer mapping for allocatables/pointers/target e.g.:
//
// !$omp target map(from:in, in%map_ptr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the "from" on the descriptor doesn't make a lot of sense to me. Two issues I see:

  • Accessing descriptor on device would mean reading uninitialized data.
  • In the case of allocatables, it suggests that the reallocation on device is somehow supported which I imagine is not the case (more specifically the descriptor is created and updated on device - and copied to host).

So the point is that the map entry for descriptor likely does not match the map entry for the data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "from" is just an arbitrary example, likely a bad one, however, currently it takes whatever mapping the original mapping has, perhaps it should be something different though! As I agree, from indicates we're allowing allocation
on device, although, I believe that might be something for the semantic stage to call out, as I imagine it's perhaps legal to assign to a "from" target on device, from a host side allocated pointer passed in as a "to", although my spec knowledge is quite lacking on this (and many many other areas, so do please point out when I'm wrong).

However, for the two bullet points:

  • You can access a descriptor on the device if it was passed to it via fromto/to, it is currently initialized in the current implementation I have, at least from my understanding. The examples, except the last one (which needs a chunk of work to function at runtime for unrelated reasons, it's an aspirational example for now) allows me to loop over the allocated data and assign values to it then return the updated data. The descriptor and data is currently lowered to the following LLVM-IR structure by default { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, where the ptr is the data and the rest of the descriptor information is contained in the structure and the kernel generated by the target region currently expects this and treats it as such (using a default LLVM-IR lowering), rather than as raw pointers. I currently map the structure in it's entirety as you would in the OpenMP examples I've provided with the explicit member accesses. However, perhaps this is the wrong way to go about it (and the method I mention in the note where we try to treat it as a naked pointer and discard the descriptor is the right way to go in the first place, although I'm unsure how to get to that stage currently), although it does function reasonably just now, albeit without any extreme tests being done, just simple first step tests.
  • I'm fairly certain you're correct that it won't work, I believe the descriptor would update, but the data would not be mapped back (if the allocation was even possible, which currently it's not). However, pointer/target re-assignment on device should likely be supported from my understanding, and I imagine mapping the descriptor across alongside the pointer + data and updating it may be the easiest method to do so, although I admit it's not something I've tested or dug into deeply yet!

This is just me giving some information from my current experience trying to get this mapping working for OpenMP so far, so perhaps my understanding is flawed/incorrect, so do take it all with a grain of salt, I'll happily defer to you all with more experience!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me clarify my last point since I realized it may not be clear. If we have an integer, allocatable, dimension(:) :: array and a map clause map(from:array), the descriptor itself does not get a from mapping.

In this case, most likely you want a to mapping since the descriptor fields provide information about the data. Bonus points if this is done with some sort of liveness analysis instead of just blind mapping.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, some of my other example points might have been made without reading this! Too used to phabricator grouping up comments before submitting them all at once, need to manually do that now.

I think I understand what you mean, and in the case of allocatable I believe that's correct, perhaps in all cases it is, would it be correct in the example I provided in the comment I provided below (pointer remap on device)? or would this be a case where a tofrom mapping for the descriptor would make sense (the example is a slightly simplified test case we have in an old bug report, so perhaps it's invalid OpenMP)? That's the only case where I think having some kind of from mapping for the descriptor would make sense, and perhaps I'm completely wrong.

// ===>
//
// map_entry varptr(in) ....
// map_entry varptr(map_ptr) varptrptr(in) ...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like seeing the use of varPtrPtr since this is one aspect of OpenACC lowering which is not yet complete. :) However, I think there is a mismatch here - at least from how I thought about it when I added this operand in acc dialect. Consider how clang maps a pointer as array from a struct:
https://github.com/llvm/llvm-project/blob/main/clang/lib/CodeGen/CGOpenMPRuntime.cpp#L6989

Basically the varPtrPtr is literally meant to mean &varPtr. This means that this is the address of the "base_addr" field in the descriptor: aka (&(desc.base_addr)).
This is intentional - the whole descriptor does not need mapped if all we access is the data. Clang does it the same - the whole parent struct does not need mapped if only mapping a member.

Part of reason this wasn't solidified in lowering of OpenACC is because it is not yet clear how to really represent this abstraction. Descriptors don't have to be materialized into a struct type. And in that sense, in FIR, taking the address of a field in box type is not well defined.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeanPerier @clementval I added you as reviewers to comment on this point. Is introducing a fir.box_addr_addr the right abstraction for solving this issue? How to ensure that such an operation is not abused or introduces any problems with alias analysis?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, then I misunderstood it then! Thank you for clarifying. Perhaps it'd be better creating a varParentPtr for OpenMP, if we continued down this route then, to seperate the two.

Yes, I thought we could probably ditch the descriptor at some point, but I thought of doing it as more of an optimisation case rather than the default, at least for the moment to get it working initially as it seemed simpler to get working initially whilst maintaining correctness at the trade off of it being slower for the time being. Simpler as the target region when lowered already takes into account the fact that the descriptor and pointer are lowered as a structure, so we simply treat it as such when mapping it. However, perhaps this isn't the best route to go down in the short-term or long-term? I've not yet looked at the complexity of implementing the other solution or the possible downsides, if any.

One case I can perhaps see that descriptors may be of use on the device is the following example:

      integer, target :: x(10)
      integer, pointer :: x_d(:)

      do i=1, 10
         x(i)=1
      enddo

      !$omp target map(from:x_d) map(to: x)
       x_d => x;
      !$omp end target

Where the descriptor for the pointer may need updated so that the host can appropriately utilise it, but this is just a thought and probably not a very well thought out one at that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I admit I don't see why create a varParentPtr instead of varPtrPtr. You can deduce the actual parent from the varPtrPtr since an access of a member (when it gets materialized). Also, if you don't have the access to the field directly how do you know how to do the attach action? Interestingly in this case, &descriptor == &(descriptor.base_addr), so that is probably why it works just fine for LLVM - but again I feel like varParentPtr is not the abstraction you really want.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're likely correct, what would you suggest is the correct abstraction in this case?

I'm unsure what you mean by attach in this case unfortunately, so I apologies. But I likely don't do it correctly, so I would very much love to hear the correct direction you think it should go so I can try to adjust it to that!

I can open a patch on the lowering that I have for it in the next few days hopefully if that'd help clarify anything/aid discussion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about using "attach" terminology - after looking through OpenMP spec, it does not seem to be the common use of the semantics I meant. https://www.openmp.org/spec-html/5.2/openmpsu6.html#x13-120001.2.6 look for "attached pointer".
Anyway, attach in my use of it, means, that if you have something like:
struct S {
double * p;
} s;
map(tofrom:s,s.p)
Then in device memory, s is slightly different. Its field p points to the device memory. The concept of changing device s's p field is called attach in OpenACC terms.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think fir.box_addr_addr would likely be OK if it helps here. In terms of side-effects/alias analysis it would be like a fir.coordinate or a GEP.

I would however not introduce similar operations for other descriptor fields like extents/bounds. It is safe to require fir.box codegen to include an address field somewhere, but it is not safe to assume that it will hold an extent field for instance (some descriptor format store the upper bound and recompute the extent when needed or vice-versa).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about using "attach" terminology - after looking through OpenMP spec, it does not seem to be the common use of the semantics I meant. https://www.openmp.org/spec-html/5.2/openmpsu6.html#x13-120001.2.6 look for "attached pointer". Anyway, attach in my use of it, means, that if you have something like: struct S { double * p; } s; map(tofrom:s,s.p) Then in device memory, s is slightly different. Its field p points to the device memory. The concept of changing device s's p field is called attach in OpenACC terms.

No need to apologies, even if you used the OpenMP terminology I wouldn't have understood in this case, so thank you for looking it up for me and teaching me in this case.

To perhaps give a bit more information to hopefully aid discussion, I believe it works as I do some cheating to recreate the GEP for the final segment in this example: https://github.com/llvm/llvm-project/blob/main/clang/lib/CodeGen/CGOpenMPRuntime.cpp#L6989 that likely only works in this specific case and I'd love to be able to fix it as it's on my TODO list! On the off chance that you're interested in looking at the lowering to LLVM-IR that I have in place currently, here's the little blob of code that I've currently managed to put together to generate the relevant runtime components to do the initial offload of the allocatables/pointers as a special case of a structure (sorry it's downstream and a tad messy, you can ignore effectively everything outside of this function): https://github.com/ROCm-Developer-Tools/llvm-project/pull/225/files#diff-2cbb5651f4570d81d55ac4198deda0f6f7341b2503479752ef2295da3774c586R1983

The varPtrPtr being the parent in this perhaps terrible use-case is primarily to indicate that something is part of another object and to then sort it into groupings of parent + children so they can be mapped as such. Using the varPtrPtr for this is likely not the best way to do things and is as you've stated a misuse of the field (hence the suggestion of a parent field, but that in itself is perhaps misguided thinking).

This is initial/WIP work as someone new to a lot of OpenMP/OpenACC and Fortran concepts, so I am more than happy to adjust/change things based on suggestions and look into other directions if there's better avenues! e.g. perhaps as I think you've said mapping the descriptor is not the way to go long term (although, for initial functionality it could be nice to treat them as such, with descriptor removal being a later step and more optimal approach).

Thank you for your input on this patch as well as @jeanPerier it's very appreciated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think fir.box_addr_addr would likely be OK if it helps here. In terms of side-effects/alias analysis it would be like a fir.coordinate or a GEP.

I would however not introduce similar operations for other descriptor fields like extents/bounds. It is safe to require fir.box codegen to include an address field somewhere, but it is not safe to assume that it will hold an extent field for instance (some descriptor format store the upper bound and recompute the extent when needed or vice-versa).

Adding a whole new operation might be a little overkill if it's only useful for this specific situation, I imagine, so unless it's absolutely necessary I'd not want to impose something like that on the FIR Dialect, but if it has other use cases/usages then perhaps it'd be good to add!

// with structures containing pointers that are mapped like the
// example above, where we break it into the descriptor being the
// main "structure" being mapped and the contained pointer the
// specific member being referenced. This is of course implicit,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You note that this is implicit - this should be reflected in the mapping operation too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what sense do you mean? Currently we're aiming to add implicit captures to the map list as well, not just explicit maps, as a sort of canonicalization process.

Do you mean we should not specially handle this case at the PFT -> MLIR lowering level and handle it later on in the MLIR -> LLVM-IR lowering process (or somewhere else)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This point is simply to distinguish between user mapping and compiler implied mapping. Currently in the generated IR, there's no way to distinguish.

Regarding the question about where to handle this - I suggest somewhere after lowering and before LLVM-IR generation (aka at the MLIR level). Lowering IMO should reflect the original user's code (and the user intended to map the data). Also it should not force a decision on how to get descriptor on device - like I noted, we may want to firstprivatize it in some cases.

Copy link
Contributor Author

@agozillon agozillon Oct 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for the clarification, I believe with the current work on IsolatedFromAbove the rest of the implicit maps will be raised into the map entries during the initial generation of the IR, but I think that's currently still under investigation and discussion.

However, for this particular case, perhaps as you suggested in the main comment on the patch, an MLIR pass is perhaps more apt if the lowered IR must maintain as much similarity to the originally written code as possible. However, the current series of OpenMP passes occurs before the first verification for the lowered IR occurs so I imagine that the emitted IR would still not reflect the users original code if we placed it there (perhaps there's a way to dump the IR earlier though).

I'll await further discussion before I move down this direction and update the patch for now!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not quite followed the IsolatedFromAbove change, but I understand what the issue would be in this case - namely it is not possible to do a live-in analysis since there can be no outside definitions. In this case, I suggest you update the map_info to ensure you can capture that the descriptor mapping is implicit and still generate it during lowering. Regarding the mapping action to apply, I suggest you add an empty placeholder that can be filled in by a follow-up pass which can decide whether it applies "to", "tofrom", or "firstprivate". Conservatively, you can make it "tofrom" during lowering - even though I don't necessarily prefer this option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the input! I think this is roughly what's done currently without the follow up pass, however, I still need too see exactly how the IsolatedFromAbove changes exactly mesh with my current patches (other patch using your awesome bounds work is the array sectioning patch I have up, so do feel free to review that as well if you wish, I didn't add you as I wasn't sure you'd want to/have the time), so I imagine some revisions of them will be required after the patch lands.

As for the mapping action I'm happy making it "to" initially if this is what you'd prefer and it's possible (especially as I believe the previous example on pointer re-association is quite possibly illegal in OpenMP 5.0-5.2 after a re-reading of the data sharing portion of the specification). I think the firstprivate case may be a little further down the line as I'm not entirely sure how well it's supported in current Fortran+OpenMP

@agozillon
Copy link
Contributor Author

Other than the comments I have on the various piece of code, I also am not sure about adding this code directly to lowering. Descriptors are an implementation detail. From a user's perspective, the clauses they use specifically refer to the data.

Thus the mapping is also an implementation detail - and one that the compiler can exploit. The descriptor can be mapped with a "to" clause or even firstprivatized. So I think the better place to put this code is a pass after lowering.

Perhaps it is better to do this after the lowering in a pass, I'd be interested in other reviewers opinions on this as well! I don't mind where it's done, in this case the PFT lowering made the most sense to me initially, just maintaining as much of the map generation in the same location as possible and it being rather simple to do (and perhaps easier with the future IsolatedFromAbove changes).

Although, I do think there is some merit to descriptors being assigned the same mapping as it's associated pointer/allocatable, primarily with the example specified in another comment as the main motivator.

Straightforward to do:

  1. Walk through each construct that consumes the map info operations.
  2. Collect the set of box references and box's data references in data clauses.
  3. Initial candidate set is all of the boxes whose base_addr is used in map info
  4. Subtract the set of boxes already in data clauses
  5. Generate the appropriate map info entry for the box and insert it right before the mapping operation for its data.

Thank you very much for the pseudo code algorithm, I can take a look into this direction if this is the route we all wish to go down? When using data clauses in the above set of bullet points, do you mean map_entries in current OpenMP dialect parlance (list of map_info kept by various operations that consume them, e.g. TargetOp) or something else, sorry for the stupid question!

Thank you for the review comments and I welcome more! My apologies if the reply comments are a little out of synch, I've been replying to them as I go, and this is the last one I've replied to (although, I did skim all before replying some details may have been forgotten in the grand scheme and some duplicate statements might be there!).

@agozillon
Copy link
Contributor Author

If we wished to investigate other avenues such as mapping the underlying data without the descriptor as I think was one possible avenue suggested, I could also look into that, however, I may need some guidance in that case as to what the best approach would be.

@agozillon
Copy link
Contributor Author

Closing this PR as I'll be opening a new one that will supersede this very soon! So please feel free to feed into the conversation and review when it's open, it'd be very appreciated.

@agozillon agozillon closed this Nov 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flang:fir-hlfir flang:ir flang:openmp flang Flang issues not falling into any other category mlir:openmp
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants