Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Flang][OpenMP] Initial mapping of Fortran pointers and allocatables for target devices #71766

Merged
merged 1 commit into from
Feb 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
125 changes: 125 additions & 0 deletions flang/docs/OpenMP-descriptor-management.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
<!--===- docs/OpenMP-descriptor-management.md

Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
See https://llvm.org/LICENSE.txt for license information.
SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

-->

# OpenMP dialect: Fortran descriptor type mapping for offload

The initial method for mapping Fortran types tied to descriptors for OpenMP offloading is to treat these types
as a special case of OpenMP record type (C/C++ structure/class, Fortran derived type etc.) mapping as far as the
runtime is concerned. Where the box (descriptor information) is the holding container and the underlying
data pointer is contained within the container, and we must generate explicit maps for both the pointer member and
the container. As an example, a small C++ program that is equivalent to the concept described, with the
`mock_descriptor` class being representative of the class utilised for descriptors in Clang:

```C++
struct mock_descriptor {
long int x;
std::byte x1, x2, x3, x4;
void *pointer;
long int lx[1][3];
};

int main() {
mock_descriptor data;
#pragma omp target map(tofrom: data, data.pointer[:upper_bound])
{
do something...
}

return 0;
}
```
kiranchandramohan marked this conversation as resolved.
Show resolved Hide resolved

In the above, we have to map both the containing structure, with its non-pointer members and the
data pointed to by the pointer contained within the structure to appropriately access the data. This
is effectively what is done with descriptor types for the time being. Other pointers that are part
of the descriptor container such as the addendum should also be treated as the data pointer is
treated.

Currently, Flang will lower these descriptor types in the OpenMP lowering (lower/OpenMP.cpp) similarly
to all other map types, generating an omp.MapInfoOp containing relevant information required for lowering
the OpenMP dialect to LLVM-IR during the final stages of the MLIR lowering. However, after
the lowering to FIR/HLFIR has been performed an OpenMP dialect specific pass for Fortran,
`OMPDescriptorMapInfoGenPass` (Optimizer/OMPDescriptorMapInfoGen.cpp) will expand the
`omp.MapInfoOp`'s containing descriptors (which currently will be a `BoxType` or `BoxAddrOp`) into multiple
mappings, with one extra per pointer member in the descriptor that is supported on top of the original
descriptor map operation. These pointers members are linked to the parent descriptor by adding them to
the member field of the original descriptor map operation, they are then inserted into the relevant map
owning operation's (`omp.TargetOp`, `omp.DataOp` etc.) map operand list and in cases where the owning operation
is `IsolatedFromAbove`, it also inserts them as `BlockArgs` to canonicalize the mappings and simplify lowering.

An example transformation by the `OMPDescriptorMapInfoGenPass`:

```

...
%12 = omp.map_info var_ptr(%1#1 : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>, !fir.box<!fir.ptr<!fir.array<?xi32>>>) map_clauses(tofrom) capture(ByRef) bounds(%11) -> !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>> {name = "arg_alloc"}
...
omp.target map_entries(%12 -> %arg1, %13 -> %arg2 : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>, !fir.ref<i32>) {
^bb0(%arg1: !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>, %arg2: !fir.ref<i32>):
...

====>

...
%12 = fir.box_offset %1#1 base_addr : (!fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>) -> !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>>
%13 = omp.map_info var_ptr(%1#1 : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>, !fir.array<?xi32>) var_ptr_ptr(%12 : !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>>) map_clauses(tofrom) capture(ByRef) bounds(%11) -> !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>> {name = ""}
%14 = omp.map_info var_ptr(%1#1 : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>, !fir.box<!fir.ptr<!fir.array<?xi32>>>) map_clauses(tofrom) capture(ByRef) members(%13 : !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>>) -> !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>> {name = "arg_alloc"}
...
omp.target map_entries(%13 -> %arg1, %14 -> %arg2, %15 -> %arg3 : !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>>, !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>, !fir.ref<i32>) {
^bb0(%arg1: !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>>, %arg2: !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>, %arg3: !fir.ref<i32>):
...

```

In later stages of the compilation flow when the OpenMP dialect is being lowered to LLVM-IR these descriptor
mappings are treated as if they were structure mappings with explicit member maps on the same directive as
their parent was mapped.

This implementation utilises the member field of the `map_info` operation to indicate that the pointer
descriptor elements which are contained in their own `map_info` operation are part of their respective
parent descriptor. This allows the descriptor containing the descriptor pointer member to be mapped
as a composite entity during lowering, with the correct mappings being generated to tie them together,
allowing the OpenMP runtime to map them correctly, attaching the pointer member to the parent
structure so it can be accessed during execution. If we opt to not treat the descriptor as a single
entity we have issues with the member being correctly attached to the parent and being accessible,
this can cause runtime segfaults on the device when we try to access the data through the parent. It
may be possible to avoid this member mapping, treating them as individual entities, but treating a
composite mapping as an individual mapping could lead to problems such as the runtime taking
liberties with the mapping it usually wouldn't if it knew they were linked, we would also have to
be careful to maintian the correct order of mappings as we lower, if we misorder the maps, it'd be
possible to overwrite already written data, e.g. if we write the descriptor data pointer first, and
then the containing descriptor, we would overwrite the descriptor data pointer with the incorrect
address.

This method is generic in the sense that the OpenMP dialect doesn't need to understand that it is mapping a
Fortran type containing a descriptor, it just thinks it's a record type from either Fortran or C++. However,
it is a little rigid in how the descriptor mappings are handled as there is no specialisation or possibility
to specialise the mappings for possible edge cases without polluting the dialect or lowering with further
knowledge of Fortran and the FIR dialect.

# OpenMP dialect differences from OpenACC dialect

The descriptor mapping for OpenMP currently works differently to the planned direction for OpenACC, however,
it is possible and would likely be ideal to align the method with OpenACC in the future.

Currently the OpenMP specification is less descriptive and has less stringent rules around descriptor based
types so does not require as complex a set of descriptor management rules as OpenACC (although, in certain
cases for the interim adopting OpenACC's rules where it makes sense could be useful). To handle the more
complex descriptor mapping rules OpenACC has opted to utilise a more runtime oriented approach, where
specialized runtime functions for handling descriptor mapping for OpenACC are created and these runtime
function handles are attatched to a special OpenACC dialect operation. When this operation is lowered it
will lower to the attatched OpenACC descriptor mapping runtime function. This sounds like it will work
(no implementation yet) similarly to some of the existing HLFIR operations which optionally lower to
Fortran runtime calls.

This methodology described by OpenACC which utilises runtime functions to handle specialised mappings allows
more flexibility as a significant amount of the mapping logic can be moved into the runtime from the compiler.
It also allows specialisation of the mapping for fortran specific types. This may be a desireable approach
to take for OpenMP in the future, in particular if we find need to specialise mapping further for
descriptors or other Fortran types. However, for the moment the currently chosen implementation for OpenMP
appears sufficient as far as the OpenMP specification and current testing can show.
26 changes: 26 additions & 0 deletions flang/include/flang/Optimizer/CodeGen/CodeGenOpenMP.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
//===------- Optimizer/CodeGen/CodeGenOpenMP.h - OpenMP codegen -*- C++ -*-===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//

#ifndef FORTRAN_OPTIMIZER_CODEGEN_CODEGENOPENMP_H
#define FORTRAN_OPTIMIZER_CODEGEN_CODEGENOPENMP_H

#include "mlir/Pass/Pass.h"
#include "mlir/Pass/PassRegistry.h"

namespace fir {
class LLVMTypeConverter;

/// Specialised conversion patterns of OpenMP operations for FIR to LLVM
/// dialect, utilised in cases where the default OpenMP dialect handling cannot
/// handle all cases for intermingled fir types and operations.
void populateOpenMPFIRToLLVMConversionPatterns(
LLVMTypeConverter &converter, mlir::RewritePatternSet &patterns);

} // namespace fir

#endif // FORTRAN_OPTIMIZER_CODEGEN_CODEGENOPENMP_H
3 changes: 3 additions & 0 deletions flang/include/flang/Optimizer/Dialect/FIRType.h
Original file line number Diff line number Diff line change
Expand Up @@ -321,6 +321,9 @@ bool isBoxNone(mlir::Type ty);
/// e.g. !fir.box<!fir.type<derived>>
bool isBoxedRecordType(mlir::Type ty);

/// Return true iff `ty` is a type that contains descriptor information.
bool isTypeWithDescriptor(mlir::Type ty);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this function might not see much use outside of the code added in this PR. Perhaps, we should have it as a local function/lambda instead?

If you feel like this is useful to have standalone, then we can keep it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used in a few different files in this case and I believe I was asked to make a utility function for this check earlier in this review, so I'd prefer to keep it as a utility function for the time being! Likely quite helpful if what classifies a descriptor type in Flang ever changes as well and it hopefully helps to make it clear what a descriptor type is represented as for newer implementors.


/// Return true iff `ty` is a scalar boxed record type.
/// e.g. !fir.box<!fir.type<derived>>
/// !fir.box<!fir.heap<!fir.type<derived>>>
Expand Down
1 change: 1 addition & 0 deletions flang/include/flang/Optimizer/Transforms/Passes.h
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@ std::unique_ptr<mlir::Pass>
createAlgebraicSimplificationPass(const mlir::GreedyRewriteConfig &config);
std::unique_ptr<mlir::Pass> createPolymorphicOpConversionPass();

std::unique_ptr<mlir::Pass> createOMPDescriptorMapInfoGenPass();
std::unique_ptr<mlir::Pass> createOMPFunctionFilteringPass();
std::unique_ptr<mlir::OperationPass<mlir::ModuleOp>>
createOMPMarkDeclareTargetPass();
Expand Down
12 changes: 12 additions & 0 deletions flang/include/flang/Optimizer/Transforms/Passes.td
Original file line number Diff line number Diff line change
Expand Up @@ -318,6 +318,18 @@ def LoopVersioning : Pass<"loop-versioning", "mlir::func::FuncOp"> {
let dependentDialects = [ "fir::FIROpsDialect" ];
}

def OMPDescriptorMapInfoGenPass
: Pass<"omp-descriptor-map-info-gen", "mlir::func::FuncOp"> {
let summary = "expands OpenMP MapInfo operations containing descriptors";
let description = [{
Expands MapInfo operations containing descriptor types into multiple
MapInfo's for each pointer element in the descriptor that requires
explicit individual mapping by the OpenMP runtime.
}];
let constructor = "::fir::createOMPDescriptorMapInfoGenPass()";
let dependentDialects = ["mlir::omp::OpenMPDialect"];
}

def OMPMarkDeclareTargetPass
: Pass<"omp-mark-declare-target", "mlir::ModuleOp"> {
let summary = "Marks all functions called by an OpenMP declare target function as declare target";
Expand Down
1 change: 1 addition & 0 deletions flang/include/flang/Tools/CLOptions.inc
Original file line number Diff line number Diff line change
Expand Up @@ -274,6 +274,7 @@ inline void createHLFIRToFIRPassPipeline(
/// rather than the host device.
inline void createOpenMPFIRPassPipeline(
mlir::PassManager &pm, bool isTargetDevice) {
pm.addPass(fir::createOMPDescriptorMapInfoGenPass());
pm.addPass(fir::createOMPMarkDeclareTargetPass());
if (isTargetDevice)
pm.addPass(fir::createOMPFunctionFilteringPass());
Expand Down
75 changes: 48 additions & 27 deletions flang/lib/Lower/OpenMP.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1821,27 +1821,25 @@ bool ClauseProcessor::processLink(

static mlir::omp::MapInfoOp
createMapInfoOp(fir::FirOpBuilder &builder, mlir::Location loc,
mlir::Value baseAddr, std::stringstream &name,
mlir::SmallVector<mlir::Value> bounds, uint64_t mapType,
mlir::omp::VariableCaptureKind mapCaptureType,
mlir::Type retTy) {
mlir::Value varPtr, varPtrPtr;
mlir::TypeAttr varType;

mlir::Value baseAddr, mlir::Value varPtrPtr, std::string name,
mlir::SmallVector<mlir::Value> bounds,
mlir::SmallVector<mlir::Value> members, uint64_t mapType,
mlir::omp::VariableCaptureKind mapCaptureType, mlir::Type retTy,
bool isVal = false) {
if (auto boxTy = baseAddr.getType().dyn_cast<fir::BaseBoxType>()) {
baseAddr = builder.create<fir::BoxAddrOp>(loc, baseAddr);
retTy = baseAddr.getType();
}

varPtr = baseAddr;
varType = mlir::TypeAttr::get(
mlir::TypeAttr varType = mlir::TypeAttr::get(
llvm::cast<mlir::omp::PointerLikeType>(retTy).getElementType());

mlir::omp::MapInfoOp op = builder.create<mlir::omp::MapInfoOp>(
loc, retTy, varPtr, varType, varPtrPtr, bounds,
loc, retTy, baseAddr, varType, varPtrPtr, members, bounds,
builder.getIntegerAttr(builder.getIntegerType(64, false), mapType),
builder.getAttr<mlir::omp::VariableCaptureKindAttr>(mapCaptureType),
builder.getStringAttr(name.str()));
builder.getStringAttr(name));

return op;
}

Expand Down Expand Up @@ -1904,28 +1902,37 @@ bool ClauseProcessor::processMap(
std::get<Fortran::parser::OmpObjectList>(mapClause->v.t).v) {
llvm::SmallVector<mlir::Value> bounds;
std::stringstream asFortran;

Fortran::lower::AddrAndBoundsInfo info =
Fortran::lower::gatherDataOperandAddrAndBounds<
Fortran::parser::OmpObject, mlir::omp::DataBoundsOp,
mlir::omp::DataBoundsType>(
converter, firOpBuilder, semanticsContext, stmtCtx, ompObject,
clauseLocation, asFortran, bounds, treatIndexAsSection);

auto origSymbol =
converter.getSymbolAddress(*getOmpObjectSymbol(ompObject));
mlir::Value symAddr = info.addr;
if (origSymbol && fir::isTypeWithDescriptor(origSymbol.getType()))
symAddr = origSymbol;

// Explicit map captures are captured ByRef by default,
// optimisation passes may alter this to ByCopy or other capture
// types to optimise
mlir::Value mapOp = createMapInfoOp(
firOpBuilder, clauseLocation, info.addr, asFortran, bounds,
firOpBuilder, clauseLocation, symAddr, mlir::Value{},
asFortran.str(), bounds, {},
static_cast<
std::underlying_type_t<llvm::omp::OpenMPOffloadMappingFlags>>(
mapTypeBits),
mlir::omp::VariableCaptureKind::ByRef, info.addr.getType());
mlir::omp::VariableCaptureKind::ByRef, symAddr.getType());

mapOperands.push_back(mapOp);
if (mapSymTypes)
mapSymTypes->push_back(info.addr.getType());
mapSymTypes->push_back(symAddr.getType());
if (mapSymLocs)
mapSymLocs->push_back(info.addr.getLoc());
mapSymLocs->push_back(symAddr.getLoc());

if (mapSymbols)
mapSymbols->push_back(getOmpObjectSymbol(ompObject));
}
Expand Down Expand Up @@ -2032,12 +2039,22 @@ bool ClauseProcessor::processMotionClauses(
converter, firOpBuilder, semanticsContext, stmtCtx, ompObject,
clauseLocation, asFortran, bounds, treatIndexAsSection);

auto origSymbol =
converter.getSymbolAddress(*getOmpObjectSymbol(ompObject));
mlir::Value symAddr = info.addr;
if (origSymbol && fir::isTypeWithDescriptor(origSymbol.getType()))
symAddr = origSymbol;

// Explicit map captures are captured ByRef by default,
// optimisation passes may alter this to ByCopy or other capture
// types to optimise
mlir::Value mapOp = createMapInfoOp(
firOpBuilder, clauseLocation, info.addr, asFortran, bounds,
firOpBuilder, clauseLocation, symAddr, mlir::Value{},
asFortran.str(), bounds, {},
static_cast<
std::underlying_type_t<llvm::omp::OpenMPOffloadMappingFlags>>(
mapTypeBits),
mlir::omp::VariableCaptureKind::ByRef, info.addr.getType());
mlir::omp::VariableCaptureKind::ByRef, symAddr.getType());

mapOperands.push_back(mapOp);
}
Expand Down Expand Up @@ -2812,7 +2829,8 @@ static void genBodyOfTargetOp(
std::stringstream name;
firOpBuilder.setInsertionPoint(targetOp);
mlir::Value mapOp = createMapInfoOp(
firOpBuilder, copyVal.getLoc(), copyVal, name, bounds,
firOpBuilder, copyVal.getLoc(), copyVal, mlir::Value{}, name.str(),
bounds, llvm::SmallVector<mlir::Value>{},
static_cast<
std::underlying_type_t<llvm::omp::OpenMPOffloadMappingFlags>>(
llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_IMPLICIT),
Expand Down Expand Up @@ -2934,18 +2952,21 @@ genTargetOp(Fortran::lower::AbstractConverter &converter,
llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_IMPLICIT;
mlir::omp::VariableCaptureKind captureKind =
mlir::omp::VariableCaptureKind::ByRef;
if (auto refType = baseOp.getType().dyn_cast<fir::ReferenceType>()) {
auto eleType = refType.getElementType();
if (fir::isa_trivial(eleType) || fir::isa_char(eleType)) {
captureKind = mlir::omp::VariableCaptureKind::ByCopy;
} else if (!fir::isa_builtin_cptr_type(eleType)) {
mapFlag |= llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_TO;
mapFlag |= llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_FROM;
}

mlir::Type eleType = baseOp.getType();
if (auto refType = baseOp.getType().dyn_cast<fir::ReferenceType>())
eleType = refType.getElementType();

if (fir::isa_trivial(eleType) || fir::isa_char(eleType)) {
captureKind = mlir::omp::VariableCaptureKind::ByCopy;
} else if (!fir::isa_builtin_cptr_type(eleType)) {
mapFlag |= llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_TO;
mapFlag |= llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_FROM;
}

mlir::Value mapOp = createMapInfoOp(
converter.getFirOpBuilder(), baseOp.getLoc(), baseOp, name, bounds,
converter.getFirOpBuilder(), baseOp.getLoc(), baseOp, mlir::Value{},
name.str(), bounds, {},
static_cast<
std::underlying_type_t<llvm::omp::OpenMPOffloadMappingFlags>>(
mapFlag),
Expand Down
1 change: 1 addition & 0 deletions flang/lib/Optimizer/CodeGen/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ add_flang_library(FIRCodeGen
BoxedProcedure.cpp
CGOps.cpp
CodeGen.cpp
CodeGenOpenMP.cpp
PreCGRewrite.cpp
TBAABuilder.cpp
Target.cpp
Expand Down
6 changes: 6 additions & 0 deletions flang/lib/Optimizer/CodeGen/CodeGen.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
#include "flang/Optimizer/CodeGen/CodeGen.h"

#include "CGOps.h"
#include "flang/Optimizer/CodeGen/CodeGenOpenMP.h"
#include "flang/Optimizer/Dialect/FIRAttr.h"
#include "flang/Optimizer/Dialect/FIROps.h"
#include "flang/Optimizer/Dialect/FIRType.h"
Expand Down Expand Up @@ -3959,6 +3960,11 @@ class FIRToLLVMLowering
mlir::populateMathToLibmConversionPatterns(pattern);
mlir::populateComplexToLLVMConversionPatterns(typeConverter, pattern);
mlir::populateVectorToLLVMConversionPatterns(typeConverter, pattern);

// Flang specific overloads for OpenMP operations, to allow for special
// handling of things like Box types.
fir::populateOpenMPFIRToLLVMConversionPatterns(typeConverter, pattern);

mlir::ConversionTarget target{*context};
target.addLegalDialect<mlir::LLVM::LLVMDialect>();
// The OpenMP dialect is legal for Operations without regions, for those
Expand Down