Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OpenMP][Flang] Add "IsolatedFromAbove" trait to omp.target #67164

Closed
wants to merge 1 commit into from

Conversation

TIFitis
Copy link
Member

@TIFitis TIFitis commented Sep 22, 2023

This patch adds the PFT lowering changes required for adding the IsolatedFromAbove trait to omp.target.

Key Changes:

  • Add IsolatedFromAbove trait to target op in MLIR.
  • Main reason for this change is to prevent CSE and other similar optimisations from crossing region boundaries for target operations. The link below has the discourse discussion surrounding this issue.
  • Move implicit operand capturing to the PFT lowering stage.
  • Implicit operands are first added as implicitly captured map_operands with their map_types set accordingly to indicate this. Later, all map_operands including implicit ones are added as block arguments.
  • Remove implicit attribute from the MapInfoOp. This information is already captured by the map_type.
  • The custom printer and parser for the map_types have been updated to show the implicit and literal map_types.
  • Update related tests.
  • This fixes [MLIR][OpenMP] Optimizations/analyses cross omp.target op region boundaries. #63555.
  • This fixes [openmp][flang] map offloading commonblock failure #70488.

Related discussion: https://discourse.llvm.org/t/rfc-prevent-cse-from-removing-expressions-inside-some-non-isolatedfromabove-operation-regions/73150

@llvmbot
Copy link
Collaborator

llvmbot commented Sep 22, 2023

@llvm/pr-subscribers-flang-driver
@llvm/pr-subscribers-mlir
@llvm/pr-subscribers-mlir-llvm
@llvm/pr-subscribers-flang-fir-hlfir
@llvm/pr-subscribers-mlir-openmp

@llvm/pr-subscribers-flang-openmp

Changes

This patch adds the IsolatedFromAbove and OutlineableOpenMPOpInterface traits along with supporting changes to the OpenMP Target directive in order to prevent CSE optimisations from crossing region boundaries for target regions.

Related discussion: https://discourse.llvm.org/t/rfc-prevent-cse-from-removing-expressions-inside-some-non-isolatedfromabove-operation-regions/73150


Patch is 43.55 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/67164.diff

20 Files Affected:

  • (modified) flang/include/flang/Optimizer/Transforms/Passes.h (-2)
  • (modified) flang/include/flang/Optimizer/Transforms/Passes.td (+3-15)
  • (modified) flang/include/flang/Tools/CLOptions.inc (+1-3)
  • (modified) flang/lib/Lower/OpenMP.cpp (+112-15)
  • (modified) flang/lib/Optimizer/Transforms/CMakeLists.txt (-1)
  • (modified) flang/test/Fir/convert-to-llvm-openmp-and-fir.fir (+6-4)
  • (modified) flang/test/Lower/OpenMP/FIR/function-filtering-2.f90 (-2)
  • (modified) flang/test/Lower/OpenMP/FIR/function-filtering.f90 (-5)
  • (modified) flang/test/Lower/OpenMP/FIR/location.f90 (+1-1)
  • (removed) flang/test/Lower/OpenMP/FIR/omp-target-early-outlining.f90 (-89)
  • (modified) flang/test/Lower/OpenMP/FIR/target.f90 (+20-16)
  • (modified) mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td (+2-6)
  • (modified) mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp (+12)
  • (modified) mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+20-7)
  • (modified) mlir/test/Conversion/OpenMPToLLVM/convert-to-llvmir.mlir (+9-5)
  • (modified) mlir/test/Dialect/OpenMP/canonicalize.mlir (+3-2)
  • (modified) mlir/test/Target/LLVMIR/omptarget-region-device-llvm.mlir (+4-3)
  • (modified) mlir/test/Target/LLVMIR/omptarget-region-llvm-target-device.mlir (+4-3)
  • (modified) mlir/test/Target/LLVMIR/omptarget-region-llvm.mlir (+4-3)
  • (modified) mlir/test/Target/LLVMIR/omptarget-region-parallel-llvm.mlir (+4-3)
diff --git a/flang/include/flang/Optimizer/Transforms/Passes.h b/flang/include/flang/Optimizer/Transforms/Passes.h
index 8aeb3e373298e88..460aeafbe1a7e19 100644
--- a/flang/include/flang/Optimizer/Transforms/Passes.h
+++ b/flang/include/flang/Optimizer/Transforms/Passes.h
@@ -74,8 +74,6 @@ std::unique_ptr<mlir::Pass>
 createAlgebraicSimplificationPass(const mlir::GreedyRewriteConfig &config);
 std::unique_ptr<mlir::Pass> createPolymorphicOpConversionPass();
 
-std::unique_ptr<mlir::OperationPass<mlir::ModuleOp>>
-createOMPEarlyOutliningPass();
 std::unique_ptr<mlir::Pass> createOMPFunctionFilteringPass();
 std::unique_ptr<mlir::OperationPass<mlir::ModuleOp>>
 createOMPMarkDeclareTargetPass();
diff --git a/flang/include/flang/Optimizer/Transforms/Passes.td b/flang/include/flang/Optimizer/Transforms/Passes.td
index 9474edf13ce4639..dd4d99a1060af90 100644
--- a/flang/include/flang/Optimizer/Transforms/Passes.td
+++ b/flang/include/flang/Optimizer/Transforms/Passes.td
@@ -16,7 +16,7 @@
 
 include "mlir/Pass/PassBase.td"
 
-class AbstractResultOptBase<string optExt, string operation> 
+class AbstractResultOptBase<string optExt, string operation>
   : Pass<"abstract-result-on-" # optExt # "-opt", operation> {
   let summary = "Convert fir.array, fir.box and fir.rec function result to "
                 "function argument";
@@ -277,8 +277,8 @@ def PolymorphicOpConversion : Pass<"fir-polymorphic-op", "::mlir::func::FuncOp">
   let summary =
     "Simplify operations on polymorphic types";
   let description = [{
-    This pass breaks up the lowering of operations on polymorphic types by 
-    introducing an intermediate FIR level that simplifies code geneation. 
+    This pass breaks up the lowering of operations on polymorphic types by
+    introducing an intermediate FIR level that simplifies code geneation.
   }];
   let constructor = "::fir::createPolymorphicOpConversionPass()";
   let dependentDialects = [
@@ -298,18 +298,6 @@ def LoopVersioning : Pass<"loop-versioning", "mlir::func::FuncOp"> {
   let dependentDialects = [ "fir::FIROpsDialect" ];
 }
 
-def OMPEarlyOutliningPass
-    : Pass<"omp-early-target-outlining", "mlir::ModuleOp"> {
-  let summary = "Outlines all target ops into separate functions";
-  let description = [{
-    This pass outlines all omp.target operations into individual functions.
-    It is invoked in the front end after the initial FIR has been constructed.
-    This pass is only needed when compiling for the target device to prevent
-    the optimizer to perform transforms across target region boundaries.
-  }];
-  let constructor = "::fir::createOMPEarlyOutliningPass()";
-}
-
 def OMPMarkDeclareTargetPass
     : Pass<"omp-mark-declare-target", "mlir::ModuleOp"> {
   let summary = "Marks all functions called by an OpenMP declare target function as declare target";
diff --git a/flang/include/flang/Tools/CLOptions.inc b/flang/include/flang/Tools/CLOptions.inc
index 616d9ddc066a75d..5458eb3187a24e4 100644
--- a/flang/include/flang/Tools/CLOptions.inc
+++ b/flang/include/flang/Tools/CLOptions.inc
@@ -266,10 +266,8 @@ inline void createHLFIRToFIRPassPipeline(
 inline void createOpenMPFIRPassPipeline(
     mlir::PassManager &pm, bool isTargetDevice) {
   pm.addPass(fir::createOMPMarkDeclareTargetPass());
-  if (isTargetDevice) {
-    pm.addPass(fir::createOMPEarlyOutliningPass());
+  if (isTargetDevice)
     pm.addPass(fir::createOMPFunctionFilteringPass());
-  }
 }
 
 #if !defined(FLANG_EXCLUDE_CODEGEN)
diff --git a/flang/lib/Lower/OpenMP.cpp b/flang/lib/Lower/OpenMP.cpp
index 5f5e968eaaa6414..c369cba4255d4f4 100644
--- a/flang/lib/Lower/OpenMP.cpp
+++ b/flang/lib/Lower/OpenMP.cpp
@@ -538,7 +538,11 @@ class ClauseProcessor {
                   const llvm::omp::Directive &directive,
                   Fortran::semantics::SemanticsContext &semanticsContext,
                   Fortran::lower::StatementContext &stmtCtx,
-                  llvm::SmallVectorImpl<mlir::Value> &mapOperands) const;
+                  llvm::SmallVectorImpl<mlir::Value> &mapOperands,
+                  llvm::SmallVectorImpl<mlir::Type> *mapSymTypes = nullptr,
+                  llvm::SmallVectorImpl<mlir::Location> *mapSymLocs = nullptr,
+                  llvm::SmallVectorImpl<const Fortran::semantics::Symbol *>
+                      *mapSymbols = nullptr) const;
   bool processReduction(
       mlir::Location currentLocation,
       llvm::SmallVectorImpl<mlir::Value> &reductionVars,
@@ -1665,7 +1669,7 @@ static mlir::omp::MapInfoOp
 createMapInfoOp(fir::FirOpBuilder &builder, mlir::Location loc,
                 mlir::Value baseAddr, std::stringstream &name,
                 mlir::SmallVector<mlir::Value> bounds, uint64_t mapType,
-                mlir::omp::VariableCaptureKind mapCaptureType, bool implicit,
+                mlir::omp::VariableCaptureKind mapCaptureType,
                 mlir::Type retTy) {
   mlir::Value varPtrPtr;
   if (auto boxTy = baseAddr.getType().dyn_cast<fir::BaseBoxType>()) {
@@ -1676,7 +1680,6 @@ createMapInfoOp(fir::FirOpBuilder &builder, mlir::Location loc,
   mlir::omp::MapInfoOp op =
       builder.create<mlir::omp::MapInfoOp>(loc, retTy, baseAddr);
   op.setNameAttr(builder.getStringAttr(name.str()));
-  op.setImplicit(implicit);
   op.setMapType(mapType);
   op.setMapCaptureType(mapCaptureType);
 
@@ -1695,7 +1698,11 @@ bool ClauseProcessor::processMap(
     mlir::Location currentLocation, const llvm::omp::Directive &directive,
     Fortran::semantics::SemanticsContext &semanticsContext,
     Fortran::lower::StatementContext &stmtCtx,
-    llvm::SmallVectorImpl<mlir::Value> &mapOperands) const {
+    llvm::SmallVectorImpl<mlir::Value> &mapOperands,
+    llvm::SmallVectorImpl<mlir::Type> *mapSymTypes,
+    llvm::SmallVectorImpl<mlir::Location> *mapSymLocs,
+    llvm::SmallVectorImpl<const Fortran::semantics::Symbol *> *mapSymbols)
+    const {
   fir::FirOpBuilder &firOpBuilder = converter.getFirOpBuilder();
   return findRepeatableClause<ClauseTy::Map>(
       [&](const ClauseTy::Map *mapClause,
@@ -1755,13 +1762,20 @@ bool ClauseProcessor::processMap(
           // Explicit map captures are captured ByRef by default,
           // optimisation passes may alter this to ByCopy or other capture
           // types to optimise
-          mapOperands.push_back(createMapInfoOp(
+          mlir::Value mapOp = createMapInfoOp(
               firOpBuilder, clauseLocation, baseAddr, asFortran, bounds,
               static_cast<
                   std::underlying_type_t<llvm::omp::OpenMPOffloadMappingFlags>>(
                   mapTypeBits),
-              mlir::omp::VariableCaptureKind::ByRef, false,
-              baseAddr.getType()));
+              mlir::omp::VariableCaptureKind::ByRef, baseAddr.getType());
+
+          mapOperands.push_back(mapOp);
+          if (mapSymTypes)
+            mapSymTypes->push_back(mapOp.getType());
+          if (mapSymLocs)
+            mapSymLocs->push_back(mapOp.getLoc());
+          if (mapSymbols)
+            mapSymbols->push_back(getOmpObjectSymbol(ompObject));
         }
       });
 }
@@ -2142,7 +2156,7 @@ static void createBodyOfOp(
   }
 }
 
-static void createBodyOfTargetDataOp(
+static void genBodyOfTargetDataOp(
     Fortran::lower::AbstractConverter &converter, mlir::omp::DataOp &dataOp,
     const llvm::SmallVector<mlir::Type> &useDeviceTypes,
     const llvm::SmallVector<mlir::Location> &useDeviceLocs,
@@ -2356,8 +2370,8 @@ genDataOp(Fortran::lower::AbstractConverter &converter,
   auto dataOp = converter.getFirOpBuilder().create<mlir::omp::DataOp>(
       currentLocation, ifClauseOperand, deviceOperand, devicePtrOperands,
       deviceAddrOperands, mapOperands);
-  createBodyOfTargetDataOp(converter, dataOp, useDeviceTypes, useDeviceLocs,
-                           useDeviceSymbols, currentLocation);
+  genBodyOfTargetDataOp(converter, dataOp, useDeviceTypes, useDeviceLocs,
+                        useDeviceSymbols, currentLocation);
   return dataOp;
 }
 
@@ -2400,6 +2414,52 @@ genEnterExitDataOp(Fortran::lower::AbstractConverter &converter,
                                    deviceOperand, nowaitAttr, mapOperands);
 }
 
+static void genBodyOfTargetOp(
+    Fortran::lower::AbstractConverter &converter, mlir::omp::TargetOp &targetOp,
+    const llvm::SmallVector<mlir::Type> &mapSymTypes,
+    const llvm::SmallVector<mlir::Location> &mapSymLocs,
+    const llvm::SmallVector<const Fortran::semantics::Symbol *> &mapSymbols,
+    const mlir::Location &currentLocation) {
+  assert(mapSymTypes.size() == mapSymLocs.size() &&
+         mapSymTypes.size() == mapSymbols.size());
+
+  fir::FirOpBuilder &firOpBuilder = converter.getFirOpBuilder();
+  mlir::Region &region = targetOp.getRegion();
+
+  firOpBuilder.createBlock(&region, {}, mapSymTypes, mapSymLocs);
+  firOpBuilder.create<mlir::omp::TerminatorOp>(currentLocation);
+  firOpBuilder.setInsertionPointToStart(&region.front());
+
+  unsigned argIndex = 0;
+  for (const Fortran::semantics::Symbol *sym : mapSymbols) {
+    const mlir::BlockArgument &arg = region.front().getArgument(argIndex);
+    fir::ExtendedValue extVal = converter.getSymbolExtendedValue(*sym);
+    mlir::Value val = fir::getBase(arg);
+    extVal.match(
+        [&](const fir::BoxValue &v) {
+          converter.bindSymbol(*sym, fir::BoxValue(val, v.getLBounds(),
+                                                   v.getExplicitParameters(),
+                                                   v.getExplicitExtents()));
+        },
+        [&](const fir::MutableBoxValue &v) {
+          converter.bindSymbol(*sym,
+                               fir::MutableBoxValue(val, v.getLBounds(),
+                                                    v.getMutableProperties()));
+        },
+        [&](const fir::ArrayBoxValue &v) {
+          converter.bindSymbol(*sym, fir::ArrayBoxValue(val, v.getExtents(),
+                                                        v.getLBounds(),
+                                                        v.getSourceBox()));
+        },
+        [&](const fir::UnboxedValue &v) { converter.bindSymbol(*sym, val); },
+        [&](const auto &) {
+          TODO(converter.getCurrentLocation(),
+               "target map clause operand unsupported type");
+        });
+    argIndex++;
+  }
+}
+
 static mlir::omp::TargetOp
 genTargetOp(Fortran::lower::AbstractConverter &converter,
             Fortran::lower::pft::Evaluation &eval,
@@ -2411,6 +2471,9 @@ genTargetOp(Fortran::lower::AbstractConverter &converter,
   mlir::Value ifClauseOperand, deviceOperand, threadLimitOperand;
   mlir::UnitAttr nowaitAttr;
   llvm::SmallVector<mlir::Value> mapOperands;
+  llvm::SmallVector<mlir::Type> mapSymTypes;
+  llvm::SmallVector<mlir::Location> mapSymLocs;
+  llvm::SmallVector<const Fortran::semantics::Symbol *> mapSymbols;
 
   ClauseProcessor cp(converter, clauseList);
   cp.processIf(stmtCtx,
@@ -2420,7 +2483,7 @@ genTargetOp(Fortran::lower::AbstractConverter &converter,
   cp.processThreadLimit(stmtCtx, threadLimitOperand);
   cp.processNowait(nowaitAttr);
   cp.processMap(currentLocation, directive, semanticsContext, stmtCtx,
-                mapOperands);
+                mapOperands, &mapSymTypes, &mapSymLocs, &mapSymbols);
   cp.processTODO<Fortran::parser::OmpClause::Private,
                  Fortran::parser::OmpClause::Depend,
                  Fortran::parser::OmpClause::Firstprivate,
@@ -2433,10 +2496,44 @@ genTargetOp(Fortran::lower::AbstractConverter &converter,
                  Fortran::parser::OmpClause::Defaultmap>(
       currentLocation, llvm::omp::Directive::OMPD_target);
 
-  return genOpWithBody<mlir::omp::TargetOp>(
-      converter, eval, currentLocation, outerCombined, &clauseList,
-      ifClauseOperand, deviceOperand, threadLimitOperand, nowaitAttr,
-      mapOperands);
+  auto captureImplicitMap = [&](const Fortran::semantics::Symbol &sym) {
+    if (llvm::find(mapSymbols, &sym) == mapSymbols.end()) {
+      mlir::Value baseOp = converter.getSymbolAddress(sym);
+      if (!baseOp)
+        if (const auto *details = sym.template detailsIf<
+                                  Fortran::semantics::HostAssocDetails>()) {
+          baseOp = converter.getSymbolAddress(details->symbol());
+          converter.copySymbolBinding(details->symbol(), sym);
+        }
+
+      if (baseOp) {
+        llvm::SmallVector<mlir::Value> bounds;
+        std::stringstream name;
+        name << sym.name().ToString();
+        mlir::Value mapOp = createMapInfoOp(
+            converter.getFirOpBuilder(), baseOp.getLoc(), baseOp, name, bounds,
+            static_cast<
+                std::underlying_type_t<llvm::omp::OpenMPOffloadMappingFlags>>(
+                llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_LITERAL |
+                llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_IMPLICIT),
+            mlir::omp::VariableCaptureKind::ByCopy, baseOp.getType());
+        mapOperands.push_back(mapOp);
+        mapSymTypes.push_back(baseOp.getType());
+        mapSymLocs.push_back(baseOp.getLoc());
+        mapSymbols.push_back(&sym);
+      }
+    }
+  };
+  Fortran::lower::pft::visitAllSymbols(eval, captureImplicitMap);
+
+  auto targetOp = converter.getFirOpBuilder().create<mlir::omp::TargetOp>(
+      currentLocation, ifClauseOperand, deviceOperand, threadLimitOperand,
+      nowaitAttr, mapOperands);
+
+  genBodyOfTargetOp(converter, targetOp, mapSymTypes, mapSymLocs, mapSymbols,
+                    currentLocation);
+
+  return targetOp;
 }
 
 static mlir::omp::TeamsOp
diff --git a/flang/lib/Optimizer/Transforms/CMakeLists.txt b/flang/lib/Optimizer/Transforms/CMakeLists.txt
index 3d2b7e5eaeade0a..4035fa9908a7778 100644
--- a/flang/lib/Optimizer/Transforms/CMakeLists.txt
+++ b/flang/lib/Optimizer/Transforms/CMakeLists.txt
@@ -16,7 +16,6 @@ add_flang_library(FIRTransforms
   AddDebugFoundation.cpp
   PolymorphicOpConversion.cpp
   LoopVersioning.cpp
-  OMPEarlyOutlining.cpp
   OMPFunctionFiltering.cpp
   OMPMarkDeclareTarget.cpp
 
diff --git a/flang/test/Fir/convert-to-llvm-openmp-and-fir.fir b/flang/test/Fir/convert-to-llvm-openmp-and-fir.fir
index 501a2bee29321a3..4054b57b02a90f7 100644
--- a/flang/test/Fir/convert-to-llvm-openmp-and-fir.fir
+++ b/flang/test/Fir/convert-to-llvm-openmp-and-fir.fir
@@ -420,7 +420,7 @@ func.func @_QPomp_target_data_empty() {
 
 // CHECK-LABEL:   llvm.func @_QPomp_target_data_empty
 // CHECK: omp.target_data   use_device_addr(%1 : !llvm.ptr<array<1024 x i32>>) {
-// CHECK: }
+// CHECK: } 
 
 // -----
 
@@ -434,11 +434,12 @@ func.func @_QPomp_target() {
   %2 = omp.bounds   lower_bound(%c0 : index) upper_bound(%1 : index) extent(%c512 : index) stride(%c1 : index) start_idx(%c1 : index)
   %3 = omp.map_info var_ptr(%0 : !fir.ref<!fir.array<512xi32>>)   map_clauses(tofrom) capture(ByRef) bounds(%2) -> !fir.ref<!fir.array<512xi32>> {name = "a"}
   omp.target   thread_limit(%c64_i32 : i32) map_entries(%3 : !fir.ref<!fir.array<512xi32>>) {
+    ^bb0(%arg0: !fir.ref<!fir.array<512xi32>>):
     %c10_i32 = arith.constant 10 : i32
     %c1_i64 = arith.constant 1 : i64
     %c1_i64_0 = arith.constant 1 : i64
     %4 = arith.subi %c1_i64, %c1_i64_0 : i64
-    %5 = fir.coordinate_of %0, %4 : (!fir.ref<!fir.array<512xi32>>, i64) -> !fir.ref<i32>
+    %5 = fir.coordinate_of %arg0, %4 : (!fir.ref<!fir.array<512xi32>>, i64) -> !fir.ref<i32>
     fir.store %c10_i32 to %5 : !fir.ref<i32>
     omp.terminator
   }
@@ -456,11 +457,12 @@ func.func @_QPomp_target() {
 // CHECK:           %[[BOUNDS:.*]] = omp.bounds   lower_bound(%[[LOWER]] : i64) upper_bound(%[[UPPER]] : i64) extent(%[[EXTENT]] : i64) stride(%[[STRIDE]] : i64) start_idx(%[[STRIDE]] : i64)
 // CHECK:           %[[MAP:.*]] = omp.map_info var_ptr(%2 : !llvm.ptr<array<512 x i32>>)   map_clauses(tofrom) capture(ByRef) bounds(%[[BOUNDS]]) -> !llvm.ptr<array<512 x i32>> {name = "a"}
 // CHECK:           omp.target   thread_limit(%[[VAL_2]] : i32) map_entries(%[[MAP]] : !llvm.ptr<array<512 x i32>>) {
+// CHECK:           ^bb0(%[[ARG_0:.*]]: !llvm.ptr<array<512 x i32>>):
 // CHECK:             %[[VAL_3:.*]] = llvm.mlir.constant(10 : i32) : i32
 // CHECK:             %[[VAL_4:.*]] = llvm.mlir.constant(1 : i64) : i64
 // CHECK:             %[[VAL_5:.*]] = llvm.mlir.constant(1 : i64) : i64
 // CHECK:             %[[VAL_6:.*]] = llvm.mlir.constant(0 : i64) : i64
-// CHECK:             %[[VAL_7:.*]] = llvm.getelementptr %[[VAL_1]][0, %[[VAL_6]]] : (!llvm.ptr<array<512 x i32>>, i64) -> !llvm.ptr<i32>
+// CHECK:             %[[VAL_7:.*]] = llvm.getelementptr %[[ARG_0]][0, %[[VAL_6]]] : (!llvm.ptr<array<512 x i32>>, i64) -> !llvm.ptr<i32>
 // CHECK:             llvm.store %[[VAL_3]], %[[VAL_7]] : !llvm.ptr<i32>
 // CHECK:             omp.terminator
 // CHECK:           }
@@ -827,4 +829,4 @@ func.func @sub_() {
     omp.terminator
   }
   return
-} 
+}
diff --git a/flang/test/Lower/OpenMP/FIR/function-filtering-2.f90 b/flang/test/Lower/OpenMP/FIR/function-filtering-2.f90
index 8065467504dd27d..68d970d94353643 100644
--- a/flang/test/Lower/OpenMP/FIR/function-filtering-2.f90
+++ b/flang/test/Lower/OpenMP/FIR/function-filtering-2.f90
@@ -26,9 +26,7 @@ subroutine no_declaretarget()
 end subroutine no_declaretarget
 
 ! MLIR-HOST: func.func @{{.*}}main(
-! MLIR-HOST-NOT: func.func @{{.*}}main_omp_outline{{.*}}()
 ! MLIR-DEVICE-NOT: func.func @{{.*}}main(
-! MLIR-DEVICE: func.func @{{.*}}main_omp_outline{{.*}}() attributes {omp.declare_target = #omp.declaretarget<device_type = (host), capture_clause = (to)>, omp.outline_parent_name = "_QQmain"}
 ! MLIR-ALL: return
 
 ! LLVM-HOST: define {{.*}} @{{.*}}main{{.*}}(
diff --git a/flang/test/Lower/OpenMP/FIR/function-filtering.f90 b/flang/test/Lower/OpenMP/FIR/function-filtering.f90
index c0ca351b9b335f0..352f88fc8f32fda 100644
--- a/flang/test/Lower/OpenMP/FIR/function-filtering.f90
+++ b/flang/test/Lower/OpenMP/FIR/function-filtering.f90
@@ -34,14 +34,9 @@ end function host_fn
 
 ! MLIR-HOST: func.func @{{.*}}target_subr(
 ! MLIR-HOST: return
-! MLIR-HOST-NOT: func.func @{{.*}}target_subr_omp_outline_0(
-! MLIR-DEVICE-NOT: func.func @{{.*}}target_subr(
-! MLIR-DEVICE: func.func @{{.*}}target_subr_omp_outline_0(
 ! MLIR-DEVICE: return
 
-! LLVM-ALL-NOT: define {{.*}} @{{.*}}target_subr_omp_outline_0{{.*}}(
 ! LLVM-HOST: define {{.*}} @{{.*}}target_subr{{.*}}(
-! LLVM-DEVICE-NOT: {{.*}} @{{.*}}target_subr{{.*}}(
 ! LLVM-ALL: define {{.*}} @__omp_offloading_{{.*}}_{{.*}}_target_subr__{{.*}}(
 subroutine target_subr(x)
   integer, intent(out) :: x
diff --git a/flang/test/Lower/OpenMP/FIR/location.f90 b/flang/test/Lower/OpenMP/FIR/location.f90
index 0e36e09b19e1942..84bbd1605179262 100644
--- a/flang/test/Lower/OpenMP/FIR/location.f90
+++ b/flang/test/Lower/OpenMP/FIR/location.f90
@@ -17,7 +17,7 @@ subroutine sub_parallel()
 !CHECK-LABEL: sub_target
 subroutine sub_target()
   print *, x
-!CHECK: omp.target  {
+!CHECK: omp.target
   !$omp target
     print *, x
 !CHECK:   omp.terminator loc(#[[TAR_LOC:.*]])
diff --git a/flang/test/Lower/OpenMP/FIR/omp-target-early-outlining.f90 b/flang/test/Lower/OpenMP/FIR/omp-target-early-outlining.f90
deleted file mode 100644
index eac19f889f433cd..000000000000000
--- a/flang/test/Lower/OpenMP/FIR/omp-target-early-outlining.f90
+++ /dev/null
@@ -1,89 +0,0 @@
-!REQUIRES: amdgpu-registered-target
-
-!RUN: %flang_fc1 -triple amdgcn-amd-amdhsa -emit-fir -fopenmp -fopenmp-is-target-device %s -o - | FileCheck %s
-!RUN: %flang_fc1 -triple x86_64-unknown-linux-gnu -emit-fir -fopenmp -fopenmp-is-target-device %s -o - | FileCheck %s
-!RUN: bbc -emit-fir -fopenmp -fopenmp-is-target-device %s -o - | FileCheck %s 
-!RUN: bbc -emit-fir -fopenmp -fopenmp-is-gpu -fopenmp-is-target-device %s -o - | FileCheck %s 
-
-!CHECK: func.func @_QPtarget_function
-
-!CHECK:  func.func @_QPwrite_index_omp_outline_0(%[[ARG0:.*]]: !fir.ref<i32>) attributes {omp.declare_target = #omp.declaretarget<device_type = (host), capture_clause = (to)>, omp.outline_parent_name = "_QPwrite_index"} {
-!CHECK-NEXT: %[[map_info0:.*]] = omp.map_info var_ptr(%[[ARG0]]{{.*}}
-!CHECK-NEXT: omp.target  map_entries(%[[map_info0]]{{.*}} {
-!CHECK: %[[CONSTANT_VALUE_10:.*]] = arith.constant 10 : i32
-!CHECK: fir.store %[[CONSTANT_VALUE_10]] to %[[ARG0]] : !fir.ref<i32>
-!CHECK: omp.terminator
-!CHECK-NEXT: }
-!CHECK-NEXT: return
-
-!CHECK:  func.func @_QPwrite_index_omp_outline_1(%[[ARG1:.*]]: !fir.ref<i32>) attributes {omp.declare_target = #omp.declaretarget<device_type = (host), capture_clause = (to)>, omp.outl...
[truncated]

@kiranchandramohan
Copy link
Contributor

Please provide an explanation of what is being achieved with this patch and what are the major changes.
Separate the MLIR and Flang changes into different Pull Requests if possible.

@TIFitis

This comment was marked as resolved.

@joker-eph
Copy link
Collaborator

The changes have been separated into different commits, let me know if you'd like me to create different PRs for them altogether.

Yes please.

@TIFitis

This comment was marked as resolved.

@TIFitis TIFitis changed the title [OpenMP][MLIR] Add "IsolatedFromAbove" and "OutlineableOpenMPOpInterface" trait to omp.target [OpenMP][Flang] Add "IsolatedFromAbove" and "OutlineableOpenMPOpInterface" trait to omp.target Sep 25, 2023
@TIFitis TIFitis force-pushed the add_traits branch 3 times, most recently from d10fb66 to ec8ff64 Compare September 27, 2023 14:15
@kiranchandramohan
Copy link
Contributor

Add IsolatedFromAbove and OutlineableOpenMPOpInterface traits to target op in MLIR.

The OutlineableOpenMPOpInterface looks like a separate change.

Move implicit operand capturing to the PFT lowering stage.

Could you expand on what is being done for the implicit operands? Are they added to the map operands? Or are they just entry block arguments?

Could you also comment on (and add to the summary) why this is advantageous to the outlining approach?

@agozillon
Copy link
Contributor

agozillon commented Sep 27, 2023

Add IsolatedFromAbove and OutlineableOpenMPOpInterface traits to target op in MLIR.

The OutlineableOpenMPOpInterface looks like a separate change.

Move implicit operand capturing to the PFT lowering stage.

Could you expand on what is being done for the implicit operands? Are they added to the map operands? Or are they just entry block arguments?

Could you also comment on (and add to the summary) why this is advantageous to the outlining approach?

I think there should be both map and block arguments, former to make lowering easier and carry map information for the captures, whilst making the mapping more explicit to readers of the IR and the latter to fulfil the IsolatedFromAbove interface, I think this patch does that currently. Perhaps sometime in the future the BlockArgs and Map clauses can be aligned so we only have one, not too sure how you'd go about doing that though, perhaps make an extended BlockArgs operation(?) for Map or a map clause interface for BlockArgs.

@TIFitis TIFitis force-pushed the add_traits branch 2 times, most recently from 6b4f0c0 to b5173a0 Compare November 1, 2023 13:22
@TIFitis
Copy link
Member Author

TIFitis commented Nov 1, 2023

@kiranchandramohan @jeanPerier Sorry to bother both of you, but I've run into an issue with this patch.

When the target region contains a do while loop like in the example below, the loop code is absent entirely from the generated FIR/HLFIR.

This happens for the target data directive as well for which we added a similar custom region body generator some time back.

Any idea why?

program main
   integer :: x(10) = (/0, 0, 0, 0, 0, 0, 0, 0, 0, 0/)
   integer :: i = 1
   integer :: j = 11
   !$omp target map(tofrom:x, i, j)
      x(1) = 10
      do while (i <= j)
         x(i) = i;
         i = i + 1
      end do
   !$omp end target
end program main

Generated FIR:

func.func @_QQmain() attributes {fir.bindc_name = "main"} {
  %0 = fir.address_of(@_QFEi) : !fir.ref<i32>
  %1 = fir.address_of(@_QFEj) : !fir.ref<i32>
  %2 = fir.address_of(@_QFEx) : !fir.ref<!fir.array<10xi32>>
  %c10 = arith.constant 10 : index
  %c1 = arith.constant 1 : index
  %c0 = arith.constant 0 : index
  %3 = arith.subi %c10, %c1 : index
  %4 = omp.bounds lower_bound(%c0 : index) upper_bound(%3 : index) extent(%c10 : index) stride(%c1 : index) start_idx(%c1 : index)
  %5 = omp.map_info var_ptr(%2 : !fir.ref<!fir.array<10xi32>>, !fir.array<10xi32>) map_clauses(tofrom) capture(ByRef) bounds(%4) -> !fir.ref<!fir.array<10xi32>> {name = "x"}
  %6 = omp.map_info var_ptr(%0 : !fir.ref<i32>, i32) map_clauses(tofrom) capture(ByRef) -> !fir.ref<i32> {name = "i"}
  %7 = omp.map_info var_ptr(%1 : !fir.ref<i32>, i32) map_clauses(tofrom) capture(ByRef) -> !fir.ref<i32> {name = "j"}
  omp.target_data map_entries(%5, %6, %7 : !fir.ref<!fir.array<10xi32>>, !fir.ref<i32>, !fir.ref<i32>) {
    %c10_i32 = arith.constant 10 : i32
    %c1_i64 = arith.constant 1 : i64
    %c1_i64_0 = arith.constant 1 : i64
    %8 = arith.subi %c1_i64, %c1_i64_0 : i64
    %9 = fir.coordinate_of %2, %8 : (!fir.ref<!fir.array<10xi32>>, i64) -> !fir.ref<i32>
    fir.store %c10_i32 to %9 : !fir.ref<i32>
    omp.terminator
  }
  return
}

@kiranchandramohan
Copy link
Contributor

@kiranchandramohan @jeanPerier Sorry to bother both of you, but I've run into an issue with this patch.

When the target region contains a do while loop like in the example below, the loop code is absent entirely from the generated FIR/HLFIR.

This happens for the target data directive as well for which we added a similar custom region body generator some time back.

Any idea why?

I am assuming this happens only when there is unstructured code (code with jumps, do while etc) in the body of the operation. There is some special handling for unstructured code, have a look at

// If it is an unstructured region and is not the outer region of a combined

This is a separate issue and can be fixed in another patch.

@agozillon
Copy link
Contributor

agozillon commented Nov 1, 2023

@kiranchandramohan @jeanPerier Sorry to bother both of you, but I've run into an issue with this patch.
When the target region contains a do while loop like in the example below, the loop code is absent entirely from the generated FIR/HLFIR.
This happens for the target data directive as well for which we added a similar custom region body generator some time back.
Any idea why?

I am assuming this happens only when there is unstructured code (code with jumps, do while etc) in the body of the operation. There is some special handling for unstructured code, have a look at

// If it is an unstructured region and is not the outer region of a combined

This is a separate issue and can be fixed in another patch.

Unfortunately this breaks a bunch of the runtime tests we currently have working in the openmp fortran test directory. Although, they can be converted to do loops, instead of while loops when this patch lands, if that helps work around the issue with while loops at all (the main reason they're do while, is lack of implicit capture handling which this patch series begins to tackle).

If it is a problem even for do loops, then I imagine that could be a little bit of an issue to maintain downstream as well, as we try to keep as close to top of the tree as we can right now and are doing a lot of work on loop related offload constructs.

However, I'm not opposed to dealing with it in a seperate patch if that is what we wish to do, just personally not a huge fan of landing this and breaking what we have working currently, as we try to find a fix.

@kiranchandramohan
Copy link
Contributor

However, I'm not opposed to dealing with it in a seperate patch if that is what we wish to do, just personally not a huge fan of landing this and breaking what we have working currently, as we try to find a fix.

To clarify, I was only giving a suggestion. You can fix it in this patch if that is what you want to do.

@agozillon
Copy link
Contributor

agozillon commented Nov 1, 2023

However, I'm not opposed to dealing with it in a seperate patch if that is what we wish to do, just personally not a huge fan of landing this and breaking what we have working currently, as we try to find a fix.

To clarify, I was only giving a suggestion. You can fix it in this patch if that is what you want to do.

Of course, mines is just my opinion (with some context points), and not the opinion of any wider group. I'm happy to leave it up to @TIFitis to decide if he wishes to land the patches as is, this won't break any buildbot because of those tests as is, as nothing runs the tests that it would break. If we want to land the patches as is, I am happy to approve them, I've read through the changes for this current iteration and I am happy with them.

flang/lib/Lower/OpenMP.cpp Outdated Show resolved Hide resolved
@TIFitis TIFitis force-pushed the add_traits branch 3 times, most recently from 110bbed to 3e7031f Compare November 2, 2023 13:56
Copy link
Contributor

@agozillon agozillon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you very much for the patch.

TIFitis added a commit to TIFitis/llvm-project that referenced this pull request Nov 6, 2023
This patch adds the MLIR translation changes required for add the IsolatedFromAbove and OutlineableOpenMPOpInterface traits to omp.target. It links the newly added block arguments to their corresponding llvm values.

Depends on llvm#67164.
This patch adds the PFT lowering changes required for adding the IsolatedFromAbove trait to omp.target.

Key Changes:
	- Add IsolatedFromAbove trait to target op in MLIR.
	- Main reason for this change is to prevent CSE and other similar optimisations from crossing region boundaries for target operations. The link below has the discourse discussion surrounding this issue.
	- Move implicit operand capturing to the PFT lowering stage.
	- Implicit operands are first added as implicitly captured map_operands with their map_types set accordingly to indicate this. Later, all map_operands including implicit ones are added as block arguments.
	- Remove `implicit` attribute from the `MapInfoOp`. This information is already captured by the `map_type`.
	- The custom printer and parser for the map_types have been updated to show the `implicit` and `literal` map_types.
	- Update related tests.
	- This fixes llvm#63555.
	- This fixes llvm#70488.
TIFitis added a commit to TIFitis/llvm-project that referenced this pull request Nov 6, 2023
target_map_common_block2.f90
	- Fix the extra space in the print message.
	- llvm#67164 fixes this. So moving it outside of failing and also removing XFAIL marker.

basic-target-region-3D-array.f90
	- Corrected the check to account for the new lines printed.

Depends on llvm#67319
TIFitis added a commit that referenced this pull request Nov 6, 2023
This patch adds the MLIR translation changes required for add the IsolatedFromAbove and OutlineableOpenMPOpInterface traits to omp.target. It links the newly added block arguments to their corresponding llvm values.

Depends on #67164.
TIFitis added a commit that referenced this pull request Nov 6, 2023
target_map_common_block2.f90
	- Fix the extra space in the print message.
	- #67164 fixes this. So moving it outside of failing and also removing XFAIL marker.

basic-target-region-3D-array.f90
	- Corrected the check to account for the new lines printed.

Depends on #67319
@TIFitis
Copy link
Member Author

TIFitis commented Nov 6, 2023

Pushed commit fbaf2c6.

@TIFitis TIFitis closed this Nov 6, 2023
TIFitis added a commit that referenced this pull request Nov 6, 2023
@luporl
Copy link
Contributor

luporl commented Nov 7, 2023

It seems this introduced a regression in llvm-test-suite's gfortran tests, in some 2-stage buildbots:

@TIFitis
Copy link
Member Author

TIFitis commented Nov 7, 2023

It seems this introduced a regression in llvm-test-suite's gfortran tests, in some 2-stage buildbots:

@luporl Commit 6bb7c65 should have fixed this.

@luporl
Copy link
Contributor

luporl commented Nov 7, 2023

It fixed, thanks.

@TIFitis TIFitis deleted the add_traits branch November 9, 2023 19:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:openmp OpenMP related changes to Clang flang:driver flang:fir-hlfir flang:openmp flang Flang issues not falling into any other category mlir:llvm mlir:openmp mlir
Projects
None yet
9 participants