Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Flang][OpenMP] Initial mapping of Fortran pointers and allocatables for target devices #71766

Merged
merged 1 commit into from
Feb 5, 2024

Conversation

agozillon
Copy link
Contributor

@agozillon agozillon commented Nov 9, 2023

This patch seeks to add an initial lowering for pointers, allocatable variables and some other cases of box types such as assumed size and shape. This works for both explicitly and implicit captured allocatables, but the implicit capture needs further investigation to make sure the semantics are correct in all cases (e.g. when used in conjunction with enter/exit data directives).

Currently the intention is to treat these types as a special case of OpenMP structure mapping as far as the runtime is concerned, where the box (descriptor information) is the holding container and the underlying data pointer is contained within the container. The descriptor pointed to by the generated bounds provides all the data required to offload the pointer contained within the descriptor. There's also the additional case where we must map other pointers within the descriptor information that may be utilised on the device, currently descriptor addendum information is the only case where this is required but as the descriptor is still under-going changes future additions may be required.

For the moment, the PFT -> FIR/HLFIR+OpenMP dialect lowering in Flang's Lower/OpenMP.cpp has had the processMap lowering code for Target operations extended to support generation of specialised map information for descriptor types (Box's) that (hopefully, as derived type member mapping hasn't been implemented yet, but should not be dissimilar) mimic how derived type member mapping when a parent structure is also being mapped will hopefully be performed, with a main mapping of the structure and extra map_info per member being attached to a new member argument field added to the map_info operation. The main difference that will likely be addressed in the future derived type member mapping PR is that the derived types will likely have symbols to be bound to block arguments, whereas the descriptor members do not, so currently the descriptor members are not added to the map operands, but for derived types that might not be the case. The descriptor members can be added to the map operands, but it would require an additional extension to TargetOp to have a mapping between map operands and block arguments to aid lowering and printing of the operation. So for the moment, I've delayed doing that until we confirm if it's necessary in a future PR for derived type member mapping.

There's an addition to FIR's CodeGen to support a specialised omp.map_info operation lowering for FIR to the LLVM dialect, this is to allow for appropriate conversion of Box types to there LLVM structure equivalent that is required for later OpenMP lowering.

Additions to OpenMPToLLVMIRTranslation.cpp's mapping infrastructure is also added that adds some initial lowering for structure types with member maps, attempts to make it generalizable to all structure + member mappings have been made not just descriptor mapping. But this hasn't been fully explored yet, so there may be some adjustments required (hopefully minor).

There is an outstanding known issue with scalar allocatable assignment where it will crash in the instruction selector for AMD GPU when we compile utilising HLFIR (but not FIR).

@llvmbot
Copy link
Collaborator

llvmbot commented Nov 9, 2023

@llvm/pr-subscribers-openacc
@llvm/pr-subscribers-flang-driver
@llvm/pr-subscribers-mlir
@llvm/pr-subscribers-flang-fir-hlfir
@llvm/pr-subscribers-flang-codegen
@llvm/pr-subscribers-mlir-llvm
@llvm/pr-subscribers-flang-openmp
@llvm/pr-subscribers-mlir-openmp

@llvm/pr-subscribers-openmp

Author: None (agozillon)

Changes

This patch seeks to add an initial lowering for pointers and allocatable variables explicitly captured by implicit and explicit map in Flang OpenMP.

Currently the intention is to treat these types as a special case of OpenMP structure mapping as far as the runtime is concerned, where the box (descriptor information) is the holding container and the underlying pointer is contained within the container. The descriptor pointed to by the generated bounds provides all the data required to offload the pointer contained within the descriptor.

This patch series is composed of several changes:

  • Special handling of PFT -> MLIR lowering for allocatables/pointers, essentially mapping the original symbol which is the descriptor and allocated data, and not the address of the data as given by the base address of the shared bounds generation infrastructure. We also toggle on a new boolean for omp.map_info when it is an allocatable/pointer to indicate the need for special lowering.
  • An addition to omp.map_info of a boolean variable is_fortran_allocatable, that indicates if the value being mapped is a fortran allocatable type (also pointer).
  • An addition of a specialised omp.map_info lowering to HLFIR/FIR's CodeGen when lowering from FIR+OpenMP -> LLVM+OpenMP, that allows specialised lowering of the descriptor type so that it doesn't devolve into an opaque pointer, as we need the type. Unfortunately this specialised lowering of descriptor types is only available to FIR. The default OpenMP lowering for omp.map_info remains, it's just superseded for Flang with the FIR specific lowering.
  • The specialised lowering for these fortran allocatables/pointers from MLIR -> LLVM-IR, which effectively treats the descriptor and data pointer as a parent structure + member of mapping like Clang would for a structure with a member pointer, when both are explicitly mapped.
  • A bunch of related runtime tests, minor changes to existing tests to incorporate new changes and new flang/mlir tests for the various stages of the compiler that were modified.

NOTE: This patch is broken into several commits currently (if it works as expected), each commits description has some further details on the changes incorporated within the commit, giving some of the reasoning behind the current decisions, so please do read the commits (possibly a more bite size way to parse the patch as well). The PR can be broken up into multiple patches, but I thought I'd try this method of having a stacked PR first, as it allows all of the changes to be co-located for initial discussion and giving a good overview. It's worth noting that a majority of the changes are just additions of further tests.


Patch is 82.71 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/71766.diff

20 Files Affected:

  • (modified) flang/lib/Lower/OpenMP.cpp (+21-4)
  • (modified) flang/lib/Optimizer/CodeGen/CodeGen.cpp (+45)
  • (modified) flang/test/Driver/OpenMP/map-types-and-sizes.f90 (+22-1)
  • (modified) flang/test/Lower/OpenMP/FIR/target.f90 (+1-1)
  • (added) flang/test/Lower/OpenMP/allocatable-array-bounds.f90 (+101)
  • (added) flang/test/Lower/OpenMP/allocatable-map.f90 (+16)
  • (modified) flang/test/Lower/OpenMP/target.f90 (+1-1)
  • (modified) mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td (+5-1)
  • (modified) mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+225-11)
  • (modified) mlir/test/Dialect/OpenMP/ops.mlir (+15)
  • (added) mlir/test/Target/LLVMIR/omptarget-fortran-allocatable-types-host.mlir (+374)
  • (added) openmp/libomptarget/test/offloading/fortran/allocatable-array-section-1d-bounds.f90 (+46)
  • (added) openmp/libomptarget/test/offloading/fortran/allocatable-array-section-3d-bounds.f90 (+44)
  • (added) openmp/libomptarget/test/offloading/fortran/allocatable-map-scopes.f95 (+66)
  • (added) openmp/libomptarget/test/offloading/fortran/pointer-scopes-enter-exit-map.f90 (+83)
  • (added) openmp/libomptarget/test/offloading/fortran/pointer-target-array-section-3d-bounds.f90 (+43)
  • (added) openmp/libomptarget/test/offloading/fortran/pointer-target-map-scopes.f95 (+64)
  • (added) openmp/libomptarget/test/offloading/fortran/target_enter_exit_allocatables.f90 (+44)
  • (added) openmp/libomptarget/test/offloading/fortran/target_enter_exit_array.f90 (+41)
  • (added) openmp/libomptarget/test/offloading/fortran/target_single_value_allocate.f90 (+26)
diff --git a/flang/lib/Lower/OpenMP.cpp b/flang/lib/Lower/OpenMP.cpp
index c2568c629e521c1..96ad89b1e1b65c1 100644
--- a/flang/lib/Lower/OpenMP.cpp
+++ b/flang/lib/Lower/OpenMP.cpp
@@ -54,6 +54,9 @@ getOmpObjectSymbol(const Fortran::parser::OmpObject &ompObject) {
                     Fortran::parser::Unwrap<Fortran::parser::ArrayElement>(
                         designator)) {
               sym = GetFirstName(arrayEle->base).symbol;
+            } else if (auto *structComp = Fortran::parser::Unwrap<
+                           Fortran::parser::StructureComponent>(designator)) {
+              sym = structComp->component.symbol;
             } else if (const Fortran::parser::Name *name =
                            Fortran::semantics::getDesignatorNameIfDataRef(
                                designator)) {
@@ -1708,7 +1711,7 @@ createMapInfoOp(fir::FirOpBuilder &builder, mlir::Location loc,
                 mlir::Value baseAddr, std::stringstream &name,
                 mlir::SmallVector<mlir::Value> bounds, uint64_t mapType,
                 mlir::omp::VariableCaptureKind mapCaptureType, mlir::Type retTy,
-                bool isVal = false) {
+                bool isAllocatable = false, bool isVal = false) {
   mlir::Value val, varPtr, varPtrPtr;
   mlir::TypeAttr varType;
 
@@ -1730,6 +1733,10 @@ createMapInfoOp(fir::FirOpBuilder &builder, mlir::Location loc,
       builder.getIntegerAttr(builder.getIntegerType(64, false), mapType),
       builder.getAttr<mlir::omp::VariableCaptureKindAttr>(mapCaptureType),
       builder.getStringAttr(name.str()));
+
+  if (isAllocatable)
+    op.setIsFortranAllocatableAttr(builder.getBoolAttr(isAllocatable));
+
   return op;
 }
 
@@ -1798,6 +1805,13 @@ bool ClauseProcessor::processMap(
               converter, firOpBuilder, semanticsContext, stmtCtx, ompObject,
               clauseLocation, asFortran, bounds, treatIndexAsSection);
 
+          // Address of the descriptor, not the de-referenced data, currently
+          // required for later lowering to LLVM-IR
+          if (Fortran::semantics::IsAllocatableOrPointer(
+                  *getOmpObjectSymbol(ompObject)))
+            baseAddr =
+                converter.getSymbolAddress(*getOmpObjectSymbol(ompObject));
+
           // Explicit map captures are captured ByRef by default,
           // optimisation passes may alter this to ByCopy or other capture
           // types to optimise
@@ -1806,7 +1820,9 @@ bool ClauseProcessor::processMap(
               static_cast<
                   std::underlying_type_t<llvm::omp::OpenMPOffloadMappingFlags>>(
                   mapTypeBits),
-              mlir::omp::VariableCaptureKind::ByRef, baseAddr.getType());
+              mlir::omp::VariableCaptureKind::ByRef, baseAddr.getType(),
+              Fortran::semantics::IsAllocatableOrPointer(
+                  *getOmpObjectSymbol(ompObject)));
 
           mapOperands.push_back(mapOp);
           if (mapSymTypes)
@@ -2661,7 +2677,8 @@ genTargetOp(Fortran::lower::AbstractConverter &converter,
             static_cast<
                 std::underlying_type_t<llvm::omp::OpenMPOffloadMappingFlags>>(
                 mapFlag),
-            captureKind, baseOp.getType());
+            captureKind, baseOp.getType(),
+            Fortran::semantics::IsAllocatableOrPointer(sym));
 
         mapOperands.push_back(mapOp);
         mapSymTypes.push_back(baseOp.getType());
@@ -2686,7 +2703,7 @@ genTargetOp(Fortran::lower::AbstractConverter &converter,
           static_cast<
               std::underlying_type_t<llvm::omp::OpenMPOffloadMappingFlags>>(
               llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_IMPLICIT),
-          mlir::omp::VariableCaptureKind::ByCopy, val.getType(), true);
+          mlir::omp::VariableCaptureKind::ByCopy, val.getType(), false, true);
       mapOperands.push_back(mapOp);
     }
   };
diff --git a/flang/lib/Optimizer/CodeGen/CodeGen.cpp b/flang/lib/Optimizer/CodeGen/CodeGen.cpp
index 9eabacdc818f6f4..d0ac9ea262be81e 100644
--- a/flang/lib/Optimizer/CodeGen/CodeGen.cpp
+++ b/flang/lib/Optimizer/CodeGen/CodeGen.cpp
@@ -417,6 +417,46 @@ class FIROpAndTypeConversion : public FIROpConversion<FromOp> {
 };
 } // namespace
 
+namespace {
+// FIR Op specific conversion for MapInfoOp that overwrites the default OpenMP
+// Dialect lowering, this allows FIR specific lowering of types, required for
+// descriptors of allocatables currently.
+struct MapInfoOpConversion : public FIROpConversion<mlir::omp::MapInfoOp> {
+  using FIROpConversion::FIROpConversion;
+
+  mlir::LogicalResult
+  matchAndRewrite(mlir::omp::MapInfoOp curOp, OpAdaptor adaptor,
+                  mlir::ConversionPatternRewriter &rewriter) const override {
+    const mlir::TypeConverter *converter = getTypeConverter();
+
+    llvm::SmallVector<mlir::Type> resTypes;
+    if (failed(converter->convertTypes(curOp->getResultTypes(), resTypes)))
+      return mlir::failure();
+
+    // Copy attributes of the curOp except for the typeAttr which should
+    // be converted
+    llvm::SmallVector<mlir::NamedAttribute> newAttrs;
+    for (mlir::NamedAttribute attr : curOp->getAttrs()) {
+      if (auto typeAttr = mlir::dyn_cast<mlir::TypeAttr>(attr.getValue())) {
+        mlir::Type newAttr;
+        if (curOp.getIsFortranAllocatable()) {
+          newAttr = this->getBoxTypePair(typeAttr.getValue()).llvm;
+        } else {
+          newAttr = converter->convertType(typeAttr.getValue());
+        }
+        newAttrs.emplace_back(attr.getName(), mlir::TypeAttr::get(newAttr));
+      } else {
+        newAttrs.push_back(attr);
+      }
+    }
+
+    rewriter.replaceOpWithNewOp<mlir::omp::MapInfoOp>(
+        curOp, resTypes, adaptor.getOperands(), newAttrs);
+    return mlir::success();
+  }
+};
+} // namespace
+
 namespace {
 /// Lower `fir.address_of` operation to `llvm.address_of` operation.
 struct AddrOfOpConversion : public FIROpConversion<fir::AddrOfOp> {
@@ -3828,6 +3868,11 @@ class FIRToLLVMLowering
                                                                   options);
     mlir::populateFuncToLLVMConversionPatterns(typeConverter, pattern);
     mlir::populateOpenMPToLLVMConversionPatterns(typeConverter, pattern);
+
+    // Insert OpenMP FIR specific conversion patterns that override OpenMP
+    // dialect default conversion patterns.
+    pattern.insert<MapInfoOpConversion>(typeConverter, options);
+
     mlir::arith::populateArithToLLVMConversionPatterns(typeConverter, pattern);
     mlir::cf::populateControlFlowToLLVMConversionPatterns(typeConverter,
                                                           pattern);
diff --git a/flang/test/Driver/OpenMP/map-types-and-sizes.f90 b/flang/test/Driver/OpenMP/map-types-and-sizes.f90
index e4f429e479af158..aa7e0042860296b 100644
--- a/flang/test/Driver/OpenMP/map-types-and-sizes.f90
+++ b/flang/test/Driver/OpenMP/map-types-and-sizes.f90
@@ -22,7 +22,6 @@ subroutine mapType_array
   !$omp end target
 end subroutine mapType_array
 
-!CHECK: @.offload_sizes{{.*}} = private unnamed_addr constant [1 x i64] [i64 8]
 !CHECK: @.offload_maptypes{{.*}} = private unnamed_addr constant [1 x i64] [i64 547]
 subroutine mapType_ptr
   integer, pointer :: a
@@ -31,6 +30,16 @@ subroutine mapType_ptr
   !$omp end target
 end subroutine mapType_ptr
 
+!CHECK: @.offload_maptypes{{.*}} = private unnamed_addr constant [1 x i64] [i64 547]
+subroutine mapType_allocatable
+  integer, allocatable :: a
+  allocate(a)
+  !$omp target
+     a = 10
+  !$omp end target
+  deallocate(a)
+end subroutine mapType_allocatable
+
 !CHECK: @.offload_sizes{{.*}} = private unnamed_addr constant [2 x i64] [i64 8, i64 4]
 !CHECK: @.offload_maptypes{{.*}} = private unnamed_addr constant [2 x i64] [i64 544, i64 800]
 subroutine mapType_c_ptr
@@ -50,3 +59,15 @@ subroutine mapType_char
      a = 'b'
   !$omp end target
 end subroutine mapType_char
+
+!CHECK-LABEL: define void @maptype_ptr_() {
+!CHECK: %[[DESC_ELE_SIZE:.*]] = getelementptr { ptr, i64, i32, i8, i8, i8, i8 }, ptr %{{.*}}, i64 0, i32 1
+!CHECK: %[[DESC_ELE_SIZE_LOAD:.*]] = load i64, ptr %[[DESC_ELE_SIZE]], align 8
+!CHECK: %[[OFFLOAD_SIZE_ARR:.*]] = getelementptr inbounds [1 x i64], ptr %.offload_sizes, i32 0, i32 0  
+!CHECK: store i64 %[[DESC_ELE_SIZE_LOAD]], ptr %[[OFFLOAD_SIZE_ARR]], align 8
+
+!CHECK-LABEL: define void @maptype_allocatable_() {
+!CHECK: %[[DESC_ELE_SIZE:.*]] = getelementptr { ptr, i64, i32, i8, i8, i8, i8 }, ptr %{{.*}}, i64 0, i32 1
+!CHECK: %[[DESC_ELE_SIZE_LOAD:.*]] = load i64, ptr %[[DESC_ELE_SIZE]], align 8
+!CHECK: %[[OFFLOAD_SIZE_ARR_1:.*]] = getelementptr inbounds [1 x i64], ptr %.offload_sizes, i32 0, i32 0  
+!CHECK: store i64 %[[DESC_ELE_SIZE_LOAD]], ptr %[[OFFLOAD_SIZE_ARR_1]], align 8
\ No newline at end of file
diff --git a/flang/test/Lower/OpenMP/FIR/target.f90 b/flang/test/Lower/OpenMP/FIR/target.f90
index d5a8fb242de9219..cd5326f3224e220 100644
--- a/flang/test/Lower/OpenMP/FIR/target.f90
+++ b/flang/test/Lower/OpenMP/FIR/target.f90
@@ -316,7 +316,7 @@ end subroutine omp_target_device_ptr
  subroutine omp_target_device_addr
    integer, pointer :: a
    !CHECK: %[[VAL_0:.*]] = fir.alloca !fir.box<!fir.ptr<i32>> {bindc_name = "a", uniq_name = "_QFomp_target_device_addrEa"}
-   !CHECK: %[[MAP:.*]] = omp.map_info var_ptr({{.*}})   map_clauses(tofrom) capture(ByRef) -> {{.*}} {name = "a"}
+   !CHECK: %[[MAP:.*]] = omp.map_info var_ptr({{.*}})   map_clauses(tofrom) capture(ByRef) -> {{.*}} {is_fortran_allocatable = true, name = "a"}
    !CHECK: omp.target_data map_entries(%[[MAP]] : {{.*}}) use_device_addr(%[[VAL_0]] : !fir.ref<!fir.box<!fir.ptr<i32>>>) {
    !$omp target data map(tofrom: a) use_device_addr(a)
    !CHECK: ^bb0(%[[VAL_1:.*]]: !fir.ref<!fir.box<!fir.ptr<i32>>>):
diff --git a/flang/test/Lower/OpenMP/allocatable-array-bounds.f90 b/flang/test/Lower/OpenMP/allocatable-array-bounds.f90
new file mode 100644
index 000000000000000..aa2649632a87270
--- /dev/null
+++ b/flang/test/Lower/OpenMP/allocatable-array-bounds.f90
@@ -0,0 +1,101 @@
+!RUN: %flang_fc1 -emit-hlfir -fopenmp %s -o - | FileCheck %s --check-prefixes HOST
+
+!HOST-LABEL:  func.func @_QPread_write_section() {
+
+!HOST: %[[ALLOCA_1:.*]] = fir.alloca !fir.box<!fir.heap<!fir.array<?xi32>>> {bindc_name = "sp_read", uniq_name = "_QFread_write_sectionEsp_read"}
+!HOST: %[[DECLARE_1:.*]]:2 = hlfir.declare %[[ALLOCA_1]] {fortran_attrs = #fir.var_attrs<allocatable>, uniq_name = "_QFread_write_sectionEsp_read"} : (!fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>) -> (!fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>, !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>)
+
+!HOST: %[[ALLOCA_2:.*]] = fir.alloca !fir.box<!fir.heap<!fir.array<?xi32>>> {bindc_name = "sp_write", uniq_name = "_QFread_write_sectionEsp_write"}
+!HOST: %[[DECLARE_2:.*]]:2 = hlfir.declare %[[ALLOCA_2]] {fortran_attrs = #fir.var_attrs<allocatable>, uniq_name = "_QFread_write_sectionEsp_write"} : (!fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>) -> (!fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>, !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>)
+
+!HOST: %[[LOAD_1:.*]] = fir.load %[[DECLARE_1]]#1 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>
+!HOST: %[[LOAD_2:.*]] = fir.load %[[DECLARE_1]]#1 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>
+!HOST: %[[CONSTANT_1:.*]] = arith.constant 0 : index
+!HOST: %[[BOX_1:.*]]:3 = fir.box_dims %[[LOAD_2]], %[[CONSTANT_1]] : (!fir.box<!fir.heap<!fir.array<?xi32>>>, index) -> (index, index, index)
+!HOST: %[[CONSTANT_2:.*]] = arith.constant 0 : index
+!HOST: %[[BOX_2:.*]]:3 = fir.box_dims %[[LOAD_1]], %[[CONSTANT_2]] : (!fir.box<!fir.heap<!fir.array<?xi32>>>, index) -> (index, index, index)
+!HOST: %[[CONSTANT_3:.*]] = arith.constant 2 : index
+!HOST: %[[LB_1:.*]] = arith.subi %[[CONSTANT_3]], %[[BOX_1]]#0 : index
+!HOST: %[[CONSTANT_4:.*]] = arith.constant 5 : index
+!HOST: %[[UB_1:.*]] = arith.subi %[[CONSTANT_4]], %[[BOX_1]]#0 : index
+!HOST: %[[BOUNDS_1:.*]] = omp.bounds lower_bound(%[[LB_1]] : index) upper_bound(%[[UB_1]] : index) stride(%[[BOX_2]]#2 : index) start_idx(%[[BOX_1]]#0 : index) {stride_in_bytes = true}
+!HOST: %[[MAP_INFO_1:.*]] = omp.map_info var_ptr(%[[DECLARE_1]]#1 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>, !fir.box<!fir.heap<!fir.array<?xi32>>>) map_clauses(tofrom) capture(ByRef) bounds(%[[BOUNDS_1]]) -> !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>> {is_fortran_allocatable = true, name = "sp_read(2:5)"}
+    
+!HOST: %[[LOAD_3:.*]] = fir.load %[[DECLARE_2]]#1 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>
+!HOST: %[[LOAD_4:.*]] = fir.load %[[DECLARE_2]]#1 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>
+!HOST: %[[CONSTANT_5:.*]] = arith.constant 0 : index
+!HOST: %[[BOX_3:.*]]:3 = fir.box_dims %[[LOAD_4]], %[[CONSTANT_5]] : (!fir.box<!fir.heap<!fir.array<?xi32>>>, index) -> (index, index, index)
+!HOST: %[[CONSTANT_6:.*]] = arith.constant 0 : index
+!HOST: %[[BOX_4:.*]]:3 = fir.box_dims %[[LOAD_3]], %[[CONSTANT_6]] : (!fir.box<!fir.heap<!fir.array<?xi32>>>, index) -> (index, index, index)
+!HOST: %[[CONSTANT_7:.*]] = arith.constant 2 : index
+!HOST: %[[LB_2:.*]] = arith.subi %[[CONSTANT_7]], %[[BOX_3]]#0 : index
+!HOST: %[[CONSTANT_8:.*]] = arith.constant 5 : index
+!HOST: %[[UB_2:.*]] = arith.subi %[[CONSTANT_8]], %[[BOX_3]]#0 : index
+!HOST: %[[BOUNDS_2:.*]] = omp.bounds lower_bound(%[[LB_2]] : index) upper_bound(%[[UB_2]] : index) stride(%[[BOX_4]]#2 : index) start_idx(%[[BOX_3]]#0 : index) {stride_in_bytes = true}
+!HOST: %[[MAP_INFO_2:.*]] = omp.map_info var_ptr(%11#1 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>, !fir.box<!fir.heap<!fir.array<?xi32>>>) map_clauses(tofrom) capture(ByRef) bounds(%[[BOUNDS_2]]) -> !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>> {is_fortran_allocatable = true, name = "sp_write(2:5)"}
+subroutine read_write_section()
+    integer, allocatable :: sp_read(:)
+    integer, allocatable :: sp_write(:)
+    allocate(sp_read(10)) 
+    allocate(sp_write(10))
+    sp_write = (/0,0,0,0,0,0,0,0,0,0/)
+    sp_read = (/1,2,3,4,5,6,7,8,9,10/)
+
+!$omp target map(tofrom:sp_read(2:5)) map(tofrom:sp_write(2:5))
+    do i = 2, 5
+        sp_write(i) = sp_read(i)
+    end do
+!$omp end target
+end subroutine read_write_section
+
+module assumed_allocatable_array_routines
+    contains
+
+!HOST-LABEL: func.func @_QMassumed_allocatable_array_routinesPassumed_shape_array(
+
+!HOST: %[[DECLARE:.*]]:2 = hlfir.declare %[[ARG:.*]] {fortran_attrs = #fir.var_attrs<allocatable, intent_inout>, uniq_name = "_QMassumed_allocatable_array_routinesFassumed_shape_arrayEarr_read_write"} : (!fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>) -> (!fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>, !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>)
+!HOST: %[[LOAD_1:.*]] = fir.load %[[DECLARE]]#1 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>
+!HOST: %[[LOAD_2:.*]] = fir.load %[[DECLARE]]#1 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>
+!HOST: %[[CONSTANT_1:.*]] = arith.constant 0 : index
+!HOST: %[[BOX_1:.*]]:3 = fir.box_dims %[[LOAD_2]], %[[CONSTANT_1]] : (!fir.box<!fir.heap<!fir.array<?xi32>>>, index) -> (index, index, index)
+!HOST: %[[CONSTANT_2:.*]] = arith.constant 0 : index
+!HOST: %[[BOX_2:.*]]:3 = fir.box_dims %[[LOAD_1]], %[[CONSTANT_2]] : (!fir.box<!fir.heap<!fir.array<?xi32>>>, index) -> (index, index, index)
+!HOST: %[[CONSTANT_3:.*]] = arith.constant 2 : index
+!HOST: %[[LB:.*]] = arith.subi %[[CONSTANT_3]], %[[BOX_1]]#0 : index
+!HOST: %[[CONSTANT_4:.*]] = arith.constant 5 : index
+!HOST: %[[UB:.*]] = arith.subi %[[CONSTANT_4]], %[[BOX_1]]#0 : index
+!HOST: %[[BOUNDS:.*]] = omp.bounds lower_bound(%[[LB]] : index) upper_bound(%[[UB]] : index) stride(%[[BOX_2]]#2 : index) start_idx(%[[BOX_1]]#0 : index) {stride_in_bytes = true}
+!HOST: %[[MAP_INFO:.*]] = omp.map_info var_ptr(%[[DECLARE]]#1 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>, !fir.box<!fir.heap<!fir.array<?xi32>>>) map_clauses(tofrom) capture(ByRef) bounds(%[[BOUNDS]]) -> !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>> {is_fortran_allocatable = true, name = "arr_read_write(2:5)"}
+subroutine assumed_shape_array(arr_read_write)
+    integer, allocatable, intent(inout) :: arr_read_write(:)
+
+!$omp target map(tofrom:arr_read_write(2:5))
+    do i = 2, 5
+        arr_read_write(i) = i
+    end do
+!$omp end target
+end subroutine assumed_shape_array
+end module assumed_allocatable_array_routines
+
+!HOST-LABEL: func.func @_QPcall_assumed_shape_and_size_array() {
+!HOST: %[[ALLOCA:.*]] = fir.alloca !fir.box<!fir.heap<!fir.array<?xi32>>> {bindc_name = "arr_read_write", uniq_name = "_QFcall_assumed_shape_and_size_arrayEarr_read_write"}
+!HOST: %[[DECLARE:.*]]:2 = hlfir.declare %[[ALLOCA]] {fortran_attrs = #fir.var_attrs<allocatable>, uniq_name = "_QFcall_assumed_shape_and_size_arrayEarr_read_write"} : (!fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>) -> (!fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>, !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>)
+!HOST: %[[ALLOCA_MEM:.*]] = fir.allocmem !fir.array<?xi32>, %{{.*}} {fir.must_be_heap = true, uniq_name = "_QFcall_assumed_shape_and_size_arrayEarr_read_write.alloc"}
+!HOST: %[[SHAPE:.*]] = fir.shape %{{.*}} : (index) -> !fir.shape<1>
+!HOST: %[[EMBOX:.*]] = fir.embox %[[ALLOCA_MEM]](%[[SHAPE]]) : (!fir.heap<!fir.array<?xi32>>, !fir.shape<1>) -> !fir.box<!fir.heap<!fir.array<?xi32>>>
+!HOST: fir.store %[[EMBOX]] to %[[DECLARE]]#1 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>
+!HOST: %[[LOAD:.*]] = fir.load %[[DECLARE]]#0 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>
+!HOST: %[[CONSTANT_1:.*]] = arith.constant 10 : index
+!HOST: %[[CONSTANT_2:.*]] = arith.constant 20 : index
+!HOST: %[[CONSTANT_3:.*]] = arith.constant 1 : index
+!HOST: %[[CONSTANT_4:.*]] = arith.constant 11 : index
+!HOST: %[[SHAPE:.*]] = fir.shape %[[CONSTANT_4]] : (index) -> !fir.shape<1>
+!HOST: %[[DESIGNATE:.*]] = hlfir.designate %[[LOAD]] (%[[CONSTANT_1]]:%[[CONSTANT_2]]:%[[CONSTANT_3]])  shape %[[SHAPE]] : (!fir.box<!fir.heap<!fir.array<?xi32>>>, index, index, index, !fir.shape<1>) -> !fir.ref<!fir.array<11xi32>>
+!HOST: fir.call @_QPassumed_size_array(%[[DESIGNATE]]) fastmath<contract> : (!fir.ref<!fir.array<11xi32>>) -> ()
+subroutine call_assumed_shape_and_size_array
+    use assumed_allocatable_array_routines
+    integer, allocatable :: arr_read_write(:)
+    allocate(arr_read_write(20))
+    call assumed_size_array(arr_read_write(10:20))
+    deallocate(arr_read_write)
+end subroutine call_assumed_shape_and_size_array
diff --git a/flang/test/Lower/OpenMP/allocatable-map.f90 b/flang/test/Lower/OpenMP/allocatable-map.f90
new file mode 100644
index 000000000000000..7d10ff181d76e05
--- /dev/null
+++ b/flang/test/Lower/OpenMP/allocatable-map.f90
@@ -0,0 +1,16 @@
+!RUN: %flang_fc1 -emit-hlfir -fopenmp %s -o - | FileCheck %s --check-prefixes="HLFIRDIALECT"
+!RUN: %flang_fc1 -emit-fir -fopenmp %s -o - | fir-opt --fir-to-llvm-ir | FileCheck %s --check-prefixes="LLVMOMPDIALECT"
+
+!LLVMOMPDIALECT: %[[ALLOCA:.*]] = llvm.alloca {{.*}} x !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)> {bindc_name = "point"} : (i64) -> !llvm.ptr
+!LLVMOMPDIALECT: %[[MAP:.*]] = omp.map_info var_ptr(%[[ALLOCA]] : !llvm.ptr, !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>) map_clauses(implicit, tofrom) capture(ByRef) -> !llvm.ptr {is_fortran_allocatable = true, name = "point"}
+!LLVMOMPDIALECT: omp.target map_entries({{.*}}, %[[MAP]] -> {{.*}} : {{.*}}, !llvm.ptr) {
+
+!HLFIRDIALECT: %[[POINTER:.*]]:2 = hlfir.declare %0 {fortran_attrs = #fir.var_attrs<pointer>, uniq_name = "_QFpointer_routineEpoint"} : (!fir.ref<!fir.box<!fir.ptr<i32>>>) -> (!fir.ref<!fir.box<!fir.ptr<i32>>>, !fir.ref<!fir.box<!fir.ptr<i32>>>)
+!HLFIRDIALECT: %[[POINTER_MAP:.*]] = omp.map_info var_ptr(%[[POINTER]]#1 : !fir.ref<!fir.box<!fir.ptr<i32>>>, !fir.box<!fir.ptr<i32>>) map_clauses(implicit, tofrom) capture(ByRef) -> !fir.ref<!fir.box<!fir.ptr<i32>>> {is_fortran_allocatable = true, name = "point"}
+!HLFIRDIALECT: omp.target map_entries({{.*}}, %[[POINTER_MAP]] -> {{.*}} : {{.*}}, !fir.ref<!fir.box<!fir.ptr<i32>>>) {
+subroutine pointer_routine()
+    integer, pointer :: point 
+!$omp target map(tofrom:pointer)
+    point = 1
+!$omp end target
+end subroutine pointer_routine
diff --git a/flang/test/Lower/OpenMP/target.f90 b/flang/test/Lower/OpenMP/target.f90
index 86f456b847df90a..a409769ae4f8bf4 100644
--- a/flang/test/Lower/OpenMP/target.f90
+++ b/flang/test/Lower/OpenMP/target.f90
@@ -340,7 +340,7 @@ subroutine omp_target_device_addr
  ...
[truncated]

@agozillon
Copy link
Contributor Author

The actual change list to the compiler is a lot smaller than the indicated number in the PR, around 1/4 (300 lines or so) are the (more important) compiler changes, the rest are tests, which I may have went a little overboard with for the runtime tests in hindsight! But more test coverage is better I suppose...

@kiranchandramohan
Copy link
Contributor

I think it will be good to write down this as an RFC. This is specifically for two reasons:

  1. We are using Fortran-specific information in the dialect and its translation.
  2. We have decided to use the Flang descriptor for some purposes and the MapInfo/Bounds operation for other purposes. It will be good to clarify that and also decide what should be the best source for a particular kind of information.

@@ -1151,7 +1151,8 @@ def MapInfoOp : OpenMP_Op<"map_info", [AttrSizedOperandSegments]> {
Variadic<DataBoundsType>:$bounds, /* rank-0 to rank-{n-1} */
OptionalAttr<UI64Attr>:$map_type,
OptionalAttr<VariableCaptureKindAttr>:$map_capture_type,
OptionalAttr<StrAttr>:$name);
OptionalAttr<StrAttr>:$name,
DefaultValuedAttr<BoolAttr, "false">:$is_fortran_allocatable);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be a Unit Attribute?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can do, a UnitAttr is a much cleaner way to do it, thank you for pointing it out!

Comment on lines 1202 to 1204
- 'is_fortran_allocatable': Indicates if the var_ptr variable is a fortran allocatable
type, e.g. a pointer or allocatable containing a descriptor mapping that wraps the
data and contains further information on the mapped variable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add more info to the commit message on what specifically about fortran_allocatables you need here?

Is it possible to avoid specific information about the frontend in the dialect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do my best to explain some of the choices made in this PR after I update the PR with the requested changes, I'll add a "ping" message when it's updated.

As for the RFC requested in another message, I'm more than happy to do so, but it might be worth discussing the choices in the PR first before we commit to an RFC, in-case any reviewer can perhaps see alternatives to the choices made after I explain them a little more. But perhaps it's just better to go ahead and open an RFC, if you think that's a better approach then I'm happy to do an RFC alongside the discussion in the patch? As I guess that's still a very good way to get some feedback.

Comment on lines 57 to 59
} else if (auto *structComp = Fortran::parser::Unwrap<
Fortran::parser::StructureComponent>(designator)) {
sym = structComp->component.symbol;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it OK to do the Structure Component portion in a separate patch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can do! It was originally to emit an error for trying to offload allocatables in derived types as it's unsupported currently, but I didn't add it to the PR as it seems unnecessary, so I can drop this change and add it if it's ever required.

// FIR Op specific conversion for MapInfoOp that overwrites the default OpenMP
// Dialect lowering, this allows FIR specific lowering of types, required for
// descriptors of allocatables currently.
struct MapInfoOpConversion : public FIROpConversion<mlir::omp::MapInfoOp> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should possibly be in a separate file or pass.

llvm::Type *type = moduleTranslation.convertType(inputType);
if (auto pointerType =
llvm::dyn_cast<mlir::omp::PointerLikeType>(inputType)) {
if (auto eleType = pointerType.getElementType()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not expected to work at the LLVM level due to opaque pointer changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sorry, this was a bit of an oversight when I created the PR from the downstream branch, I'll remove it.

@agozillon
Copy link
Contributor Author

I think it will be good to write down this as an RFC. This is specifically for two reasons:

  1. We are using Fortran-specific information in the dialect and its translation.
  2. We have decided to use the Flang descriptor for some purposes and the MapInfo/Bounds operation for other purposes. It will be good to clarify that and also decide what should be the best source for a particular kind of information.

I'll try to elaborate a bit more on this soon, after I update the PR. And I'd be happy to create an RFC, but perhaps it's worth waiting on a little bit more reviewer input after I update the PR with more details (on the chance reviewers might be able to indicate a better choice in certain areas that I hadn't came across)? If you don't think that's worthwhile though or it's more worthwhile to have two avenues of feedback (PR + RFC) I'm more than happy to open up the RFC just now rather than waiting a little bit!

However, for 2) it's quite possible to use the BoundsOp for a lot of the calculations as they directly access the descriptor fields, the only issue I've found is that there is no element type size field for the BoundsOp (as far as I'm aware at least). You can use getStride() (when getStrideInBytes is true, which is the case currently for allocatables), however, I opted to not do so as I imagine the value returned will change when a user specifies a stride, although, @razvanlupusoru can likely clarify if it would!

So, rather than having a mismatch of accessing the descriptor field directly in one case and using the bounds in all other cases I opted to just directly access the descriptor in all cases, however, I'd be happy to change the calculation to use BoundsOp where feasible if that seems more reasonable? I'm unsure if adding an ElementTypeSize field to bounds is a reasonable sounding change but that could also be an option to fully utilise just the BoundsOp for the calculation

@razvanlupusoru
Copy link
Contributor

I will take a look at this PR early next week and also respond to questions as needed! :) Sorry for the delay.

@agozillon
Copy link
Contributor Author

I am back in the office only on 8th.

Thanks for making progress here.

Can we move the addition of the extra MapInfos to a pass in Flang?

It might be good to document the present flow and details in flang/docs. This is probably easy to do since you have given a lot of explanations and these can be copypasted.

Hi Kiran, thank you very much for the quick reply, especially as you're on vacation! Please do not feel rushed to answer any other prompts on this PR, they can wait till you're back, I hope you have an awesome well deserved vacation! :-)

I'm happy to look into trying to add it as an extra pass, as well as adding some documentation. In this case, I was thinking about continuing to generate the extra MapInfos in the OpenMP lowering, but generate the block arguments that link to them later in the pass? But I can attempt to generate both the extra MapInfo and BlockArg in the opt pass and see how it goes as well!

@agozillon
Copy link
Contributor Author

agozillon commented Jan 5, 2024

Updated the PR to utilise a pass "OMPDescriptorMapInfoGenPass" which will expand the mapping of descriptor types into multiple maps (one for the parent descriptor and contained data pointer currently, and in the future perhaps the addendum when required), it will insert these newly generated maps into the map owning operation and in the case of TargetOp it inserts them into the BlockArg list. I also added an MLIR interface to allow easy access to these map operands for map owning operations. This simplifies the code inside of Lower/OpenMP.cpp and the OpenMPToLLVMIRTranslation.cpp a little bit.

I also added a document which describes the current lowering process a little bit and the idea behind it.

And I also rebased, which I hope doesn't make it too difficult to review the changeset, but it'd been a month and a bit since a rebase so it felt prudent to do so!

@agozillon
Copy link
Contributor Author

As an aside the target_single_value_allocate.f90 test will remain failing for this PR. I am working on a small-ish PR or two that will hopefully allow this to pass in the near future. The changes required are a little more general than just this PR and will likely fix issues other than just this test, so I'd rather not complicate or delay this PR further.

@agozillon
Copy link
Contributor Author

Small ping for some reviewer attention on this PR if anyone gets some spare time, thank you ahead of time!

Copy link
Contributor

@kiranchandramohan kiranchandramohan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can @skatrak and @TIFitis review the MapInfoOpConversion and the OMPDescriptorMapInfoGenPass pass?

flang/docs/OpenMP-descriptor-management.md Show resolved Hide resolved

} // namespace fir

#endif // FORTRAN_OPTIMIZER_CODEGEN_CODEGENOPENMP_H
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:new line

Copy link
Contributor

@skatrak skatrak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Andrew for this work! I checked the MapInfoOpConversion and the OMPDescriptorMapInfoGenPass pass and it LGTM, I just have a couple of nits and small comments.

mappings are treated as if they were structure mappings with explicit member maps on the same directive as
their parent was mapped.

This method is generic in the sense that the OpenMP diaelct doesn't need to understand that it is mapping a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This method is generic in the sense that the OpenMP diaelct doesn't need to understand that it is mapping a
This method is generic in the sense that the OpenMP dialect doesn't need to understand that it is mapping a

This method is generic in the sense that the OpenMP diaelct doesn't need to understand that it is mapping a
Fortran type containing a descriptor, it just thinks it's a record type from either Fortran or C++. However,
it is a little rigid in how the descriptor mappings are handled as there is no specialisation or possibility
to specialise the mappings for possible edge cases without poluting the dialect or lowering with further
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
to specialise the mappings for possible edge cases without poluting the dialect or lowering with further
to specialise the mappings for possible edge cases without polluting the dialect or lowering with further

op.getMapType().value()),
builder.getAttr<mlir::omp::VariableCaptureKindAttr>(
mlir::omp::VariableCaptureKind::ByRef),
builder.getStringAttr("")));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Add comment with the name of this argument

if (auto mapClauseOwner =
llvm::dyn_cast<mlir::omp::MapClauseOwningOpInterface>(target)) {
llvm::SmallVector<mlir::Value> newMapOps;
for (size_t i = 0; i < mapClauseOwner.getMapOperands().size(); ++i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Store reference to mapClauseOwner.getMapOperands() outside of the loop and re-use it here and inside of the loop, so code is a bit less verbose

void genDescriptorMemberMaps(mlir::omp::MapInfoOp op,
fir::FirOpBuilder &builder,
mlir::Operation *target) {
llvm::SmallVector<mlir::Value> descriptorBaseAddrMembers;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be a vector? It looks like only one element is ever added to it, if I understand it correctly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not necessary at the moment, but will be if we continue down this implementation route in the long term and add the addendum (and any other possible future pointer elements of the descriptor), so it was perhaps me being a little over prepared, so I'll try to remove the vector use!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. I think it's fine to leave it as well, if you just add a comment to describe this there, so it's clear why it could hold multiple elements. Maybe even a TODO comment telling to add these other elements might do the trick.

!CHECK: %[[SIZE_DIFF:.*]] = sub i64 %[[ALLOCA_GEP_INT]], %[[ALLOCA_INT]]
!CHECK: %[[DIV:.*]] = sdiv exact i64 %[[SIZE_DIFF]], ptrtoint (ptr getelementptr (i8, ptr null, i32 1) to i64)
!CHECK: %[[OFFLOAD_SIZE_ARR:.*]] = getelementptr inbounds [3 x i64], ptr %.offload_sizes, i32 0, i32 0
!CHECK: store i64 %[[DIV]], ptr %[[OFFLOAD_SIZE_ARR]], align 8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: new line

allocate(arr_read_write(20))
call assumed_size_array(arr_read_write(10:20))
deallocate(arr_read_write)
end subroutine call_assumed_shape_and_size_array
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: new line

auto parentClause =
mlir::dyn_cast<mlir::omp::MapInfoOp>(mapData.MapClause[mapDataIndex]);

////////// First Parent Map Segment //////////
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to split these 'segments' into separate functions, so it's a bit easier to follow? Feel free to ignore if you disagree, the function just looks to me to be already split into different logical units.

@agozillon
Copy link
Contributor Author

Thank you both very much for the review! I'll wait until @TIFitis has had a chance to have a look at the PR before I address the review points and update the PR. And it'll give anyone else that still wishes to provide some further feedback a chance to. So hopefully an update by the end of the week or Monday.

Copy link
Member

@TIFitis TIFitis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for working on this patch. Overall, I'm happy with the state of the patch. I've already discussed some of the suggestions I have with @agozillon offline.

Copy link
Member

@TIFitis TIFitis Jan 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please change the name of this and other tests such that they start with the name of the directive/construct being tested. In this case either target or map should be fine.

// as the printing and later processing currently requires a 1:1
// mapping of BlockArgs to MapInfoOp's at the same placement in
// each array (BlockArgs and MapOperands).
if (auto targetOp = llvm::dyn_cast<mlir::omp::TargetOp>(target)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: remove braces

mlir::isa_and_present<fir::BoxAddrOp>(
op.getVarPtr().getDefiningOp())) {
builder.setInsertionPoint(op);
// Currently a MapInfoOp argument can only show up on a single target
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is a rule for having a single TargetOp user for a MapInfoOp. I believe @ergawy had previously added some tests in one of his patches where the MapInfoOp was reused.

Can you please add a check or assert here to make sure it's single user.

@@ -313,6 +313,9 @@ bool isBoxNone(mlir::Type ty);
/// e.g. !fir.box<!fir.type<derived>>
bool isBoxedRecordType(mlir::Type ty);

/// Return true iff `ty` is a type that contains descriptor information.
bool isTypeWithDescriptor(mlir::Type ty);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this function might not see much use outside of the code added in this PR. Perhaps, we should have it as a local function/lambda instead?

If you feel like this is useful to have standalone, then we can keep it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used in a few different files in this case and I believe I was asked to make a utility function for this check earlier in this review, so I'd prefer to keep it as a utility function for the time being! Likely quite helpful if what classifies a descriptor type in Flang ever changes as well and it hopefully helps to make it clear what a descriptor type is represented as for newer implementors.

@@ -0,0 +1,27 @@
//=== Optimizer/CodeGen/CodeGenOpenMP.h - OpenMP code generation -*- C++
//-*-===//
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: was this meant to be on the previous line? If it does not fit then adjust the comment or the prefix in the previous line.

: Pass<"omp-descriptor-map-info-gen", "mlir::ModuleOp"> {
let summary = "expands OpenMP MapInfo operations containing descriptors";
let description = [{
Expands MapInfo operations containing descriptor types into multiple MapInfo's for each pointer element in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Fit this into 80 chars.

Comment on lines 11 to 15
The descriptor mapping for OpenMP currently works differently to the planned direction for OpenACC, however,
it is possible and would likely be ideal to align the method with OpenACC in the future. However, at least
currently the OpenMP specification is less descriptive and has less stringent rules around descriptor based
types so does not require as complex a set of descriptor management rules (although, in certain cases
for the interim adopting OpenACC's rules where it makes sense could be useful).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to focus on describing what is the implementation in OpenMP and move the comparison with OpenACC to the last paragraph under a separate heading.

flang/docs/OpenMP-descriptor-management.md Show resolved Hide resolved
@@ -1,4 +1,5 @@
add_flang_library(FIRCodeGen
CodeGenOpenMP.cpp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This was probably in alphabetical order.

@@ -0,0 +1,349 @@
// RUN: mlir-translate -mlir-to-llvmir %s | FileCheck %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these tests be minimized?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to hand reduce it as this is currently the output after running CSE/dce via mlir-opt, it's likely a little unnecessary for it to be a fully functional test case for the purpose of the IR checks in this test so I'll see what I can do.

Comment on lines 2097 to 2140
// CHECK-LABEL: omp_targets_is_allocatable
// CHECK-SAME: (%[[ARG0:.*]]: !llvm.ptr, %[[ARG1:.*]]: !llvm.ptr)
func.func @omp_targets_is_allocatable(%arg0: !llvm.ptr, %arg1: !llvm.ptr) -> () {
// CHECK: %[[MAP0:.*]] = omp.map_info var_ptr(%[[ARG0]] : !llvm.ptr, !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>) map_clauses(tofrom) capture(ByRef) -> !llvm.ptr {is_fortran_allocatable, name = ""}
%mapv1 = omp.map_info var_ptr(%arg0 : !llvm.ptr, !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>) map_clauses(tofrom) capture(ByRef) -> !llvm.ptr {is_fortran_allocatable, name = ""}
// CHECK: %[[MAP1:.*]] = omp.map_info var_ptr(%[[ARG1]] : !llvm.ptr, !llvm.array<10 x i32>) map_clauses(to) capture(ByCopy) -> !llvm.ptr {name = ""}
%mapv2 = omp.map_info var_ptr(%arg1 : !llvm.ptr, !llvm.array<10 x i32>) map_clauses(to) capture(ByCopy) -> !llvm.ptr {name = ""}
// CHECK: omp.target map_entries(%[[MAP0]] -> {{.*}}, %[[MAP1]] -> {{.*}} : !llvm.ptr, !llvm.ptr)
omp.target map_entries(%mapv1 -> %arg2, %mapv2 -> %arg3 : !llvm.ptr, !llvm.ptr) {
^bb0(%arg2: !llvm.ptr, %arg3 : !llvm.ptr):
omp.terminator
}
return
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not clear what this test is testing. Since members is the new field that was added in OpenMPOps.td we should be testing an operation with that field here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the catch! Yes, this test seems to have missed a few updates and seems to be testing old behavior..

Comment on lines 101 to 112
newMapOps.push_back(mapClauseOwner.getMapOperands()[i]);
} else {
newMapOps.push_back(mapClauseOwner.getMapOperands()[i]);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the common code be moved outside?

flang/lib/Optimizer/CodeGen/CodeGenOpenMP.cpp Show resolved Hide resolved
Comment on lines 84 to 86
In later stages of the compilation flow when the OpenMP dialect is being lowered to LLVM-IR these descriptor
mappings are treated as if they were structure mappings with explicit member maps on the same directive as
their parent was mapped.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you comment on why the member field is necessary. What is the disadvantage of mapping the descriptor and the data-pointer in it separately without tying them with a member field?

@agozillon
Copy link
Contributor Author

agozillon commented Jan 15, 2024

Most recent update attempts to take into consideration all the recent reviewer feedback, I've not rebased it yet (don't want to make the change set difficult to review) so the most recent commit should be self explanative of the changes made. So please do double check I've made the changes you requested as you'd intended! And feel free to add further requests for change/comments or approve if we've reached a point where you're happy with the current state of the PR.

@agozillon
Copy link
Contributor Author

A small ping for some review time next week if at all possible please! Hopefully won't be too much to check or comment on as it feels like we're approaching a reasonable state now.

Copy link
Contributor

@kiranchandramohan kiranchandramohan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK. A few comments inline.

@agozillon
Copy link
Contributor Author

The most recent commit does the following:

  • Added missing headers
  • Modified pass to work on FuncOps
  • Extended OMPDescriptorMapInfoGen pass test to cover more of the passes behavior
  • Removed some usage of the Mutable interface inside of the pass to minimize it's usage to just the MapOwningOpInterface, as this simplifies the pass quite a bit and should reduce the require maintenance quite a bit.

I also rebased to check everything runs as it should still, apologies to the reviewers.

Copy link
Contributor

@kiranchandramohan kiranchandramohan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@agozillon
Copy link
Contributor Author

agozillon commented Jan 25, 2024

LGTM.

Thank you very much @kiranchandramohan!

I'll give this PR 24 hours (slightly over it as I wish to rebase and do some final testing to do my best to avoid last minute buildbot surprises as they're never fun) for anyone to add more comments or further requests for change before I land it as it's quite substantial.

@agozillon
Copy link
Contributor Author

p.s. Thank you to everyone who spent time reviewing this PR, I greatly appreciate your effort and help throughout the process, it has been invaluable.

Copy link
Contributor

@razvanlupusoru razvanlupusoru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will preface this by saying that the approach you've employed here looks pretty great. I like the documentation, the passes which handle some of the mapping detail after lowering, the testing, and the dialect operations changes.

So no need to let my question block this PR. However, there is still something unintuitive here and I cannot tell why it is done this way - you may have answered for others before though. Thank you again for your work on this!


...
%12 = fir.box_offset %1#1 base_addr : (!fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>) -> !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>>
%13 = omp.map_info var_ptr(%12 : !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>>, !fir.array<?xi32>) map_clauses(tofrom) capture(ByRef) bounds(%11) -> !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>> {name = ""}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the var_ptr a pointer to a pointer? This inconsistency feels unintuitive to me. A mapping operation is intended to map the data pointed to by a pointer recorded in var_ptr. In this case, it feels it is mapping a pointer instead of the data itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am understanding correctly (which I might not be, I'm sorry if that's the case), I believe that's a side affect of how the box_offset works currently (https://github.com/llvm/llvm-project/blob/main/flang/test/Fir/box-offset.fir#L16) which gets the address of the pointer fields in the descriptor, which at least currently is required for us to access the relevant descriptor fields on lowering without needing specialised descriptor related information (resolves to a GEP of the field in the descriptor structure if I recall correctly) stored in the map_info operation. I believe in this particular example (which in hindsight is perhaps not the clearest example for the document) we are also taking the address of an intermediate allocation for an assumed shape or sized argument, which is necessary for the box_addr to be utilised (requires an in memory box), which may also cloud things a little further.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am understanding correctly (which I might not be, I'm sorry if that's the case), I believe that's a side affect of how the box_offset works currently (https://github.com/llvm/llvm-project/blob/main/flang/test/Fir/box-offset.fir#L16) which gets the address of the pointer fields in the descriptor, which at least currently is required for us to access the relevant descriptor fields on lowering without needing specialised descriptor related information (resolves to a GEP of the field in the descriptor structure if I recall correctly) stored in the map_info operation. I believe in this particular example (which in hindsight is perhaps not the clearest example for the document) we are also taking the address of an intermediate allocation for an assumed shape or sized argument, which is necessary for the box_addr to be utilised (requires an in memory box), which may also cloud things a little further.

Everything you said sounds right to me. And the fact that you need to create storage in some cases to be able to take address of box also sounds right to me. It's not just fir.box_offset that requires "pointer" type, but the data operations in both acc and omp dialects.

That said, the main concern I have is that you are storing a pointer to a pointer in a spot where it is not expected (semantics of var_ptr says The address of variable to copy and not the address of the address). I guess what I am saying is, should the result of fir.box_offset be stored in var_ptr_ptr instead? That is where I intended it to be used in the acc dialect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's very true and I agree that it likely is more suitable to the var_ptr_ptr field, but the lowering isn't really setup very well at the moment to distinguish between the both just yet unfortunately, it always utilises the var_ptr currently (although, it could be modified to utilise var_ptr_ptr when it's provided in place of var_ptr, but that seems like a band-aid). I am hoping to change that soon in a future PR, but I am also considering changing the name to base pointer/offload pointer (or perhaps base pointer would be better as a new field) as that's currently what these are used as/most analogous to in the OpenMP lowering (initial seeds for filling in the kernel arg structs, baseptr and offload_ptr arguments).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed this in the review. Thanks @razvanlupusoru for bringing this up.

Besides the address of address issue, the documentation for the field also says that var_ptr_ptr is used when the variable mapped is a member of a structure. Since the descriptor mapping is being set up analogous to structure members it probably makes sense to use it to avoid confusion.
https://github.com/llvm/llvm-project/blob/main/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td#L1243

    - `var_ptr_ptr`: Used when the variable copied is a member of a class, structure
      or derived type and refers to the originating struct.

I am OK if this will be changed in follow up patches but please wait for @razvanlupusoru

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a slightly different approach to the movement of the base_addr to the var_ptr_ptr field that doesn't require as many changes. Still not too sure what the best replacement for the var_ptr field for it is however unfortunately, but I've opted for the descriptor it comes from for the moment, I am open to other suggestions!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small ping for an answer to @kiranchandramohan's question if possible @razvanlupusoru so that I can move forward with either splitting the PR or landing it next week!

I have a follow up PR I am hoping to have ready next week that builds on this, so it would be nice to close this one out if at all possible, if it's in a reasonable enough state for that of course.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took another high level look at the implementation and the tests. I also took a look at the details of the dialect changes. Overall looks good to me and it feels that @agozillon addressed all of the concerns/suggestions brought up. So please consider an unofficial approval from me :) To give a formal approval, I would need to spend more time thinking about this - and at this point I think if any concerns arise, they can be addressed in follow-up changes.

Copy link
Contributor Author

@agozillon agozillon Feb 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much @razvanlupusoru I'm sorry for the bother. But yes, if any concerns arise with the implementation I am more than happy to make the alterations necessary to alleviate them (and I am still very happy to swap to the OpenACC method for descriptor mapping at some point in the future). I want a good functioning implementation of this (and anything else I work on) as much as everyone else.

As it stands this PR adds functionality we do not currently have and some initial record type mapping code. Which I hope to build on further in a PR I hope to have up a little after this lands (there will likely be more than one derived type/record related PR, so this would be the first of a few).

Are we happy enough with the current level of acceptance @kiranchandramohan for me to go ahead and try to land this either later today or tomorrow? I am also happy to give the PR a few more days for people to comment on it if they desire if you wish.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we happy enough with the current level of acceptance @kiranchandramohan for me to go ahead and try to land this either later today or tomorrow? I am also happy to give the PR a few more days for people to comment on it if they desire if you wish.

Please go ahead.

@agozillon
Copy link
Contributor Author

I will preface this by saying that the approach you've employed here looks pretty great. I like the documentation, the passes which handle some of the mapping detail after lowering, the testing, and the dialect operations changes.

Thank you very much! I am very excited to see the OpenACC implementation, I enjoyed the documentation very much, it showcased a lot of concepts and possibilities I wasn't aware of. Hopefully we'll be able to adopt that in OpenMP as well at some point in the future.

So no need to let my question block this PR. However, there is still something unintuitive here and I cannot tell why it is done this way - you may have answered for others before though. Thank you again for your work on this!

I'm happy to delay the PR if you believe it's something worth addressing. I've tried to answer your question, hopefully I understood it well enough to give a reasonable enough reply. And thank you for your review time, comments, suggestions and help, the PR wouldn't be what it is without your input.

@agozillon agozillon force-pushed the full-alloca-map-work branch 2 times, most recently from aa9301a to 8d58651 Compare February 1, 2024 17:54
… other descriptor types for target devices

This patch seeks to add an initial lowering for pointers and allocatable variables captured by implicit and explicit map in Flang OpenMP for Target operations that take map clauses e.g. Target, Target Update. Target Exit/Enter etc.

Currently this is done by treating the type that lowers to a descriptor (allocatable/pointer/assumed shape) as a map of a record type (e.g. a structure) as that's
effectively what descriptor types lower to in LLVM-IR and what they're represented as
in the Fortran runtime (written in C/C++). The descriptor effectively lowers to a strucutre
containing scalar and array elements that represent various aspects of the underlying
data being mapped (lower bound, upper bound, extent being the main ones of interest
in most cases) and a pointer to the allocated data. In this current iteration of the mapping
we map the structure in it's entirety and then attach the underlying data pointer and map
the data to the device, this allows most of the required data to be resident on the device
for use. Currently we do not support the addendum (another block of pointer data), but
it shouldn't be too difficult to extend this to support it.

The MapInfoOp generation for descriptor types is primarily handled in an optimisation
pass, where it expands BoxType (descriptor types) map captures into two maps, one for
the structure (scalar elements) and the other for the pointer data (base address) and
links them in a Parent <-> Child relationship. The later lowering processes will then treat
them as a conjoined structure with a pointer member map.
@agozillon agozillon merged commit 95fe47c into llvm:main Feb 5, 2024
5 of 6 checks passed
agozillon added a commit to agozillon/llvm-project that referenced this pull request Feb 5, 2024
…for target devices (llvm#71766)

This patch seeks to add an initial lowering for pointers and allocatable variables 
captured by implicit and explicit map in Flang OpenMP for Target operations that 
take map clauses e.g. Target, Target Update. Target Exit/Enter etc.

Currently this is done by treating the type that lowers to a descriptor 
(allocatable/pointer/assumed shape) as a map of a record type (e.g. a structure) as that's
effectively what descriptor types lower to in LLVM-IR and what they're represented as
in the Fortran runtime (written in C/C++). The descriptor effectively lowers to a structure
containing scalar and array elements that represent various aspects of the underlying
data being mapped (lower bound, upper bound, extent being the main ones of interest
in most cases) and a pointer to the allocated data. In this current iteration of the mapping
we map the structure in it's entirety and then attach the underlying data pointer and map
the data to the device, this allows most of the required data to be resident on the device
for use. Currently we do not support the addendum (another block of pointer data), but
it shouldn't be too difficult to extend this to support it.

The MapInfoOp generation for descriptor types is primarily handled in an optimization
pass, where it expands BoxType (descriptor types) map captures into two maps, one for
the structure (scalar elements) and the other for the pointer data (base address) and
links them in a Parent <-> Child relationship. The later lowering processes will then treat
them as a conjoined structure with a pointer member map.
ichaer added a commit to ichaer/llvm-project-onesided_lower_bound that referenced this pull request Feb 12, 2024
* llvm/main: (328 commits)
  [Flang][OpenMP] Attempt to make map-types-and-sizes.f90 test more agnostic to other architectures
  [Transforms] Add more cos combinations to SimplifyLibCalls and InstCombine (llvm#79699)
  [workflows] Close issues used for backports once the PR has been created (llvm#80394)
  [RISCV] Add support for RISC-V Pointer Masking (llvm#79929)
  [lldb] Cleanup regex in libcxx formatters (NFC) (llvm#80618)
  [lldb] Remove unused private TypeCategoryMap methods (NFC) (llvm#80602)
  [mlir][sparse] refine sparse assembler strategy (llvm#80521)
  [NFC] Fix typo (llvm#80703)
  Fix broken ARM processor features test (llvm#80717)
  [ValueTracking][NFC] Pass `SimplifyQuery` to `computeKnownFPClass` family (llvm#80657)
  [x86_64][windows][swift] do not use Swift async extended frame for wi… (llvm#80468)
  [X86] addConstantComments - add FP16 MOVSH asm comments support
  [X86] Regenerate some vector constant comments missed in recent patches to improve mask predicate handling in addConstantComments
  [clang][AMDGPU][CUDA] Handle __builtin_printf for device printf (llvm#68515)
  Add some clarification to email check message
  [GitHub][Workflows] Prevent multiple private email comments (temporarily) (llvm#80648)
  [workflows] Use /mnt as the build directory on Linux (llvm#80583)
  [Flang][OpenMP] Initial mapping of Fortran pointers and allocatables for target devices (llvm#71766)
  [AMDGPU] GlobalISel for f8 conversions (llvm#80503)
  [AMDGPU] Fixed byte_sel of v_cvt_f32_bf8/v_cvt_f32_fp8 (llvm#80502)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants