Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[flang][OpenMP] Decompose compound constructs, do recursive lowering #90098

Merged
merged 20 commits into from
May 13, 2024

Conversation

kparzysz
Copy link
Contributor

@kparzysz kparzysz commented Apr 25, 2024

A compound construct with a list of clauses is broken up into individual leaf/composite constructs. Each such construct has the list of clauses that apply to it based on the OpenMP spec.

Each lowering function (i.e. a function that generates MLIR ops) is now responsible for generating its body as described below.

Functions that receive AST nodes extract the construct, and the clauses from the node. They then create a work queue consisting of individual constructs, and invoke a common dispatch function to process (lower) the queue.

The dispatch function examines the current position in the queue, and invokes the appropriate lowering function. Each lowering function receives the queue as well, and once it needs to generate its body, it either invokes the dispatch function on the rest of the queue (if any), or processes nested evaluations if the work queue is at the end.

This will unify the interface a bit more.
A compound construct with a list of clauses is broken up into
individual leaf/composite constructs. Each such construct has
the list of clauses that apply to it based on the OpenMP spec.

Each lowering function (i.e. a function that generates MLIR ops)
is now responsible for generating its body as described below.

Functions that receive AST nodes extract the construct, and the
clauses from the node. They then create a work queue consisting
of individual constructs, and invoke a common dispatch function.

The dispatch function examines the current position in the queue,
and invokes the appropriate lowering function. Each lowering
function receives the queue as well, and once it needs to generate
its body, it either invokes the dispatch function on the rest of
the queue (if any), or processes nested evaluations if the work
queue is at the end.
@llvmbot llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir flang:openmp clang:openmp OpenMP related changes to Clang labels Apr 25, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Apr 25, 2024

@llvm/pr-subscribers-flang-fir-hlfir

@llvm/pr-subscribers-flang-openmp

Author: Krzysztof Parzyszek (kparzysz)

Changes

A compound construct with a list of clauses is broken up into individual leaf/composite constructs. Each such construct has the list of clauses that apply to it based on the OpenMP spec.

Each lowering function (i.e. a function that generates MLIR ops) is now responsible for generating its body as described below.

Functions that receive AST nodes extract the construct, and the clauses from the node. They then create a work queue consisting of individual constructs, and invoke a common dispatch function.

The dispatch function examines the current position in the queue, and invokes the appropriate lowering function. Each lowering function receives the queue as well, and once it needs to generate its body, it either invokes the dispatch function on the rest of the queue (if any), or processes nested evaluations if the work queue is at the end.


Patch is 93.75 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/90098.diff

2 Files Affected:

  • (modified) flang/lib/Lower/OpenMP/OpenMP.cpp (+391-393)
  • (added) llvm/include/llvm/Frontend/OpenMP/ConstructDecompositionT.h (+985)
diff --git a/flang/lib/Lower/OpenMP/OpenMP.cpp b/flang/lib/Lower/OpenMP/OpenMP.cpp
index 47935e6cf8efcf..4b8afd42f639d5 100644
--- a/flang/lib/Lower/OpenMP/OpenMP.cpp
+++ b/flang/lib/Lower/OpenMP/OpenMP.cpp
@@ -36,6 +36,7 @@
 #include "mlir/Dialect/OpenMP/OpenMPDialect.h"
 #include "mlir/Transforms/RegionUtils.h"
 #include "llvm/ADT/STLExtras.h"
+#include "llvm/Frontend/OpenMP/ConstructDecompositionT.h"
 #include "llvm/Frontend/OpenMP/OMPConstants.h"
 
 using namespace Fortran::lower::omp;
@@ -72,6 +73,89 @@ static void genNestedEvaluations(Fortran::lower::AbstractConverter &converter,
     converter.genEval(e);
 }
 
+//===----------------------------------------------------------------------===//
+// Directive decomposition
+//===----------------------------------------------------------------------===//
+
+namespace {
+using DirectiveWithClauses = tomp::DirectiveWithClauses<lower::omp::Clause>;
+using ConstructQueue = List<DirectiveWithClauses>;
+} // namespace
+
+static void genOMPDispatch(Fortran::lower::AbstractConverter &converter,
+                           Fortran::lower::SymMap &symTable,
+                           Fortran::semantics::SemanticsContext &semaCtx,
+                           Fortran::lower::pft::Evaluation &eval,
+                           mlir::Location loc, const ConstructQueue &queue,
+                           ConstructQueue::iterator item);
+
+namespace {
+struct ConstructDecomposition {
+  ConstructDecomposition(mlir::ModuleOp modOp,
+                         semantics::SemanticsContext &semaCtx,
+                         lower::pft::Evaluation &ev,
+                         llvm::omp::Directive construct,
+                         const List<Clause> &clauses)
+      : semaCtx(semaCtx), mod(modOp), eval(ev) {
+    tomp::ConstructDecompositionT decompose(getOpenMPVersion(modOp), *this,
+                                            construct, llvm::ArrayRef(clauses));
+    output = std::move(decompose.output);
+  }
+
+  // Given an object, return its base object if one exists.
+  std::optional<Object> getBaseObject(const Object &object) {
+    return lower::omp::getBaseObject(object, semaCtx);
+  }
+
+  // Return the iteration variable of the associated loop if any.
+  std::optional<Object> getLoopIterVar() {
+    if (semantics::Symbol *symbol = getIterationVariableSymbol(eval))
+      return Object{symbol, /*designator=*/{}};
+    return std::nullopt;
+  }
+
+  semantics::SemanticsContext &semaCtx;
+  mlir::ModuleOp mod;
+  lower::pft::Evaluation &eval;
+  List<DirectiveWithClauses> output;
+};
+} // namespace
+
+LLVM_DUMP_METHOD static llvm::raw_ostream &
+operator<<(llvm::raw_ostream &os, const DirectiveWithClauses &dwc) {
+  os << llvm::omp::getOpenMPDirectiveName(dwc.id);
+  for (auto [index, clause] : llvm::enumerate(dwc.clauses)) {
+    os << (index == 0 ? '\t' : ' ');
+    os << llvm::omp::getOpenMPClauseName(clause.id);
+  }
+  return os;
+}
+
+static void splitCompoundConstruct(
+    mlir::ModuleOp modOp, Fortran::semantics::SemanticsContext &semaCtx,
+    Fortran::lower::pft::Evaluation &eval, llvm::omp::Directive construct,
+    const List<Clause> &clauses, List<DirectiveWithClauses> &directives) {
+
+  ConstructDecomposition decompose(modOp, semaCtx, eval, construct, clauses);
+  assert(!decompose.output.empty());
+
+  llvm::SmallVector<llvm::omp::Directive> loweringUnits;
+  std::ignore =
+      llvm::omp::getLeafOrCompositeConstructs(construct, loweringUnits);
+
+  int leafIndex = 0;
+  for (llvm::omp::Directive dir_id : loweringUnits) {
+    directives.push_back(DirectiveWithClauses{dir_id});
+    DirectiveWithClauses &dwc = directives.back();
+    llvm::ArrayRef<llvm::omp::Directive> leafsOrSelf =
+        llvm::omp::getLeafConstructsOrSelf(dir_id);
+    for (int i = 0, e = leafsOrSelf.size(); i != e; ++i) {
+      dwc.clauses.append(decompose.output[leafIndex].clauses);
+      ++leafIndex;
+    }
+  }
+}
+
 static fir::GlobalOp globalInitialization(
     Fortran::lower::AbstractConverter &converter,
     fir::FirOpBuilder &firOpBuilder, const Fortran::semantics::Symbol &sym,
@@ -460,81 +544,6 @@ markDeclareTarget(mlir::Operation *op,
   declareTargetOp.setDeclareTarget(deviceType, captureClause);
 }
 
-/// Split a combined directive into an outer leaf directive and the (possibly
-/// combined) rest of the combined directive. Composite directives and
-/// non-compound directives are not split, in which case it will return the
-/// input directive as its first output and an empty value as its second output.
-static std::pair<llvm::omp::Directive, std::optional<llvm::omp::Directive>>
-splitCombinedDirective(llvm::omp::Directive dir) {
-  using D = llvm::omp::Directive;
-  switch (dir) {
-  case D::OMPD_masked_taskloop:
-    return {D::OMPD_masked, D::OMPD_taskloop};
-  case D::OMPD_masked_taskloop_simd:
-    return {D::OMPD_masked, D::OMPD_taskloop_simd};
-  case D::OMPD_master_taskloop:
-    return {D::OMPD_master, D::OMPD_taskloop};
-  case D::OMPD_master_taskloop_simd:
-    return {D::OMPD_master, D::OMPD_taskloop_simd};
-  case D::OMPD_parallel_do:
-    return {D::OMPD_parallel, D::OMPD_do};
-  case D::OMPD_parallel_do_simd:
-    return {D::OMPD_parallel, D::OMPD_do_simd};
-  case D::OMPD_parallel_masked:
-    return {D::OMPD_parallel, D::OMPD_masked};
-  case D::OMPD_parallel_masked_taskloop:
-    return {D::OMPD_parallel, D::OMPD_masked_taskloop};
-  case D::OMPD_parallel_masked_taskloop_simd:
-    return {D::OMPD_parallel, D::OMPD_masked_taskloop_simd};
-  case D::OMPD_parallel_master:
-    return {D::OMPD_parallel, D::OMPD_master};
-  case D::OMPD_parallel_master_taskloop:
-    return {D::OMPD_parallel, D::OMPD_master_taskloop};
-  case D::OMPD_parallel_master_taskloop_simd:
-    return {D::OMPD_parallel, D::OMPD_master_taskloop_simd};
-  case D::OMPD_parallel_sections:
-    return {D::OMPD_parallel, D::OMPD_sections};
-  case D::OMPD_parallel_workshare:
-    return {D::OMPD_parallel, D::OMPD_workshare};
-  case D::OMPD_target_parallel:
-    return {D::OMPD_target, D::OMPD_parallel};
-  case D::OMPD_target_parallel_do:
-    return {D::OMPD_target, D::OMPD_parallel_do};
-  case D::OMPD_target_parallel_do_simd:
-    return {D::OMPD_target, D::OMPD_parallel_do_simd};
-  case D::OMPD_target_simd:
-    return {D::OMPD_target, D::OMPD_simd};
-  case D::OMPD_target_teams:
-    return {D::OMPD_target, D::OMPD_teams};
-  case D::OMPD_target_teams_distribute:
-    return {D::OMPD_target, D::OMPD_teams_distribute};
-  case D::OMPD_target_teams_distribute_parallel_do:
-    return {D::OMPD_target, D::OMPD_teams_distribute_parallel_do};
-  case D::OMPD_target_teams_distribute_parallel_do_simd:
-    return {D::OMPD_target, D::OMPD_teams_distribute_parallel_do_simd};
-  case D::OMPD_target_teams_distribute_simd:
-    return {D::OMPD_target, D::OMPD_teams_distribute_simd};
-  case D::OMPD_teams_distribute:
-    return {D::OMPD_teams, D::OMPD_distribute};
-  case D::OMPD_teams_distribute_parallel_do:
-    return {D::OMPD_teams, D::OMPD_distribute_parallel_do};
-  case D::OMPD_teams_distribute_parallel_do_simd:
-    return {D::OMPD_teams, D::OMPD_distribute_parallel_do_simd};
-  case D::OMPD_teams_distribute_simd:
-    return {D::OMPD_teams, D::OMPD_distribute_simd};
-  case D::OMPD_parallel_loop:
-    return {D::OMPD_parallel, D::OMPD_loop};
-  case D::OMPD_target_parallel_loop:
-    return {D::OMPD_target, D::OMPD_parallel_loop};
-  case D::OMPD_target_teams_loop:
-    return {D::OMPD_target, D::OMPD_teams_loop};
-  case D::OMPD_teams_loop:
-    return {D::OMPD_teams, D::OMPD_loop};
-  default:
-    return {dir, std::nullopt};
-  }
-}
-
 //===----------------------------------------------------------------------===//
 // Op body generation helper structures and functions
 //===----------------------------------------------------------------------===//
@@ -555,11 +564,6 @@ struct OpWithBodyGenInfo {
       : converter(converter), symTable(symTable), semaCtx(semaCtx), loc(loc),
         eval(eval), dir(dir) {}
 
-  OpWithBodyGenInfo &setGenNested(bool value) {
-    genNested = value;
-    return *this;
-  }
-
   OpWithBodyGenInfo &setOuterCombined(bool value) {
     outerCombined = value;
     return *this;
@@ -600,8 +604,6 @@ struct OpWithBodyGenInfo {
   Fortran::lower::pft::Evaluation &eval;
   /// [in] leaf directive for which to generate the op body.
   llvm::omp::Directive dir;
-  /// [in] whether to generate FIR for nested evaluations
-  bool genNested = true;
   /// [in] is this an outer operation - prevents privatization.
   bool outerCombined = false;
   /// [in] list of clauses to process.
@@ -622,7 +624,9 @@ struct OpWithBodyGenInfo {
 ///
 /// \param [in]   op - the operation the body belongs to.
 /// \param [in] info - options controlling code-gen for the construction.
-static void createBodyOfOp(mlir::Operation &op, OpWithBodyGenInfo &info) {
+static void createBodyOfOp(mlir::Operation &op, const OpWithBodyGenInfo &info,
+                           const ConstructQueue &queue,
+                           ConstructQueue::iterator item) {
   fir::FirOpBuilder &firOpBuilder = info.converter.getFirOpBuilder();
 
   auto insertMarker = [](fir::FirOpBuilder &builder) {
@@ -678,7 +682,10 @@ static void createBodyOfOp(mlir::Operation &op, OpWithBodyGenInfo &info) {
     }
   }
 
-  if (info.genNested) {
+  if (ConstructQueue::iterator next = std::next(item); next != queue.end()) {
+    genOMPDispatch(info.converter, info.symTable, info.semaCtx, info.eval,
+                   info.loc, queue, next);
+  } else {
     // genFIR(Evaluation&) tries to patch up unterminated blocks, causing
     // a lot of complications for our approach if the terminator generation
     // is delayed past this point. Insert a temporary terminator here, then
@@ -769,11 +776,12 @@ static void genBodyOfTargetDataOp(
     Fortran::lower::AbstractConverter &converter,
     Fortran::lower::SymMap &symTable,
     Fortran::semantics::SemanticsContext &semaCtx,
-    Fortran::lower::pft::Evaluation &eval, bool genNested,
-    mlir::omp::TargetDataOp &dataOp, llvm::ArrayRef<mlir::Type> useDeviceTypes,
+    Fortran::lower::pft::Evaluation &eval, mlir::omp::TargetDataOp &dataOp,
+    llvm::ArrayRef<mlir::Type> useDeviceTypes,
     llvm::ArrayRef<mlir::Location> useDeviceLocs,
     llvm::ArrayRef<const Fortran::semantics::Symbol *> useDeviceSymbols,
-    const mlir::Location &currentLocation) {
+    const mlir::Location &currentLocation, const ConstructQueue &queue,
+    ConstructQueue::iterator item) {
   fir::FirOpBuilder &firOpBuilder = converter.getFirOpBuilder();
   mlir::Region &region = dataOp.getRegion();
 
@@ -826,8 +834,13 @@ static void genBodyOfTargetDataOp(
 
   // Set the insertion point after the marker.
   firOpBuilder.setInsertionPointAfter(undefMarker.getDefiningOp());
-  if (genNested)
+
+  if (ConstructQueue::iterator next = std::next(item); next != queue.end()) {
+    genOMPDispatch(converter, symTable, semaCtx, eval, currentLocation, queue,
+                   next);
+  } else {
     genNestedEvaluations(converter, eval);
+  }
 }
 
 // This functions creates a block for the body of the targetOp's region. It adds
@@ -836,12 +849,13 @@ static void
 genBodyOfTargetOp(Fortran::lower::AbstractConverter &converter,
                   Fortran::lower::SymMap &symTable,
                   Fortran::semantics::SemanticsContext &semaCtx,
-                  Fortran::lower::pft::Evaluation &eval, bool genNested,
+                  Fortran::lower::pft::Evaluation &eval,
                   mlir::omp::TargetOp &targetOp,
                   llvm::ArrayRef<const Fortran::semantics::Symbol *> mapSyms,
                   llvm::ArrayRef<mlir::Location> mapSymLocs,
                   llvm::ArrayRef<mlir::Type> mapSymTypes,
-                  const mlir::Location &currentLocation) {
+                  const mlir::Location &currentLocation,
+                  const ConstructQueue &queue, ConstructQueue::iterator item) {
   assert(mapSymTypes.size() == mapSymLocs.size());
 
   fir::FirOpBuilder &firOpBuilder = converter.getFirOpBuilder();
@@ -981,15 +995,22 @@ genBodyOfTargetOp(Fortran::lower::AbstractConverter &converter,
 
   // Create the insertion point after the marker.
   firOpBuilder.setInsertionPointAfter(undefMarker.getDefiningOp());
-  if (genNested)
+
+  if (ConstructQueue::iterator next = std::next(item); next != queue.end()) {
+    genOMPDispatch(converter, symTable, semaCtx, eval, currentLocation, queue,
+                   next);
+  } else {
     genNestedEvaluations(converter, eval);
+  }
 }
 
 template <typename OpTy, typename... Args>
-static OpTy genOpWithBody(OpWithBodyGenInfo &info, Args &&...args) {
+static OpTy genOpWithBody(const OpWithBodyGenInfo &info,
+                          const ConstructQueue &queue,
+                          ConstructQueue::iterator item, Args &&...args) {
   auto op = info.converter.getFirOpBuilder().create<OpTy>(
       info.loc, std::forward<Args>(args)...);
-  createBodyOfOp(*op, info);
+  createBodyOfOp(*op, info, queue, item);
   return op;
 }
 
@@ -1274,7 +1295,8 @@ static mlir::omp::BarrierOp
 genBarrierOp(Fortran::lower::AbstractConverter &converter,
              Fortran::lower::SymMap &symTable,
              Fortran::semantics::SemanticsContext &semaCtx,
-             Fortran::lower::pft::Evaluation &eval, mlir::Location loc) {
+             Fortran::lower::pft::Evaluation &eval, mlir::Location loc,
+             const ConstructQueue &queue, ConstructQueue::iterator item) {
   return converter.getFirOpBuilder().create<mlir::omp::BarrierOp>(loc);
 }
 
@@ -1282,8 +1304,9 @@ static mlir::omp::CriticalOp
 genCriticalOp(Fortran::lower::AbstractConverter &converter,
               Fortran::lower::SymMap &symTable,
               Fortran::semantics::SemanticsContext &semaCtx,
-              Fortran::lower::pft::Evaluation &eval, bool genNested,
-              mlir::Location loc, const List<Clause> &clauses,
+              Fortran::lower::pft::Evaluation &eval, mlir::Location loc,
+              const List<Clause> &clauses, const ConstructQueue &queue,
+              ConstructQueue::iterator item,
               const std::optional<Fortran::parser::Name> &name) {
   fir::FirOpBuilder &firOpBuilder = converter.getFirOpBuilder();
   mlir::FlatSymbolRefAttr nameAttr;
@@ -1306,17 +1329,17 @@ genCriticalOp(Fortran::lower::AbstractConverter &converter,
 
   return genOpWithBody<mlir::omp::CriticalOp>(
       OpWithBodyGenInfo(converter, symTable, semaCtx, loc, eval,
-                        llvm::omp::Directive::OMPD_critical)
-          .setGenNested(genNested),
-      nameAttr);
+                        llvm::omp::Directive::OMPD_critical),
+      queue, item, nameAttr);
 }
 
 static mlir::omp::DistributeOp
 genDistributeOp(Fortran::lower::AbstractConverter &converter,
                 Fortran::lower::SymMap &symTable,
                 Fortran::semantics::SemanticsContext &semaCtx,
-                Fortran::lower::pft::Evaluation &eval, bool genNested,
-                mlir::Location loc, const List<Clause> &clauses) {
+                Fortran::lower::pft::Evaluation &eval, mlir::Location loc,
+                const List<Clause> &clauses, const ConstructQueue &queue,
+                ConstructQueue::iterator item) {
   TODO(loc, "Distribute construct");
   return nullptr;
 }
@@ -1326,7 +1349,8 @@ genFlushOp(Fortran::lower::AbstractConverter &converter,
            Fortran::lower::SymMap &symTable,
            Fortran::semantics::SemanticsContext &semaCtx,
            Fortran::lower::pft::Evaluation &eval, mlir::Location loc,
-           const ObjectList &objects, const List<Clause> &clauses) {
+           const ObjectList &objects, const List<Clause> &clauses,
+           const ConstructQueue &queue, ConstructQueue::iterator item) {
   llvm::SmallVector<mlir::Value> operandRange;
   genFlushClauses(converter, semaCtx, objects, clauses, loc, operandRange);
 
@@ -1338,12 +1362,13 @@ static mlir::omp::MasterOp
 genMasterOp(Fortran::lower::AbstractConverter &converter,
             Fortran::lower::SymMap &symTable,
             Fortran::semantics::SemanticsContext &semaCtx,
-            Fortran::lower::pft::Evaluation &eval, bool genNested,
-            mlir::Location loc) {
+            Fortran::lower::pft::Evaluation &eval, mlir::Location loc,
+            const List<Clause> &clauses, const ConstructQueue &queue,
+            ConstructQueue::iterator item) {
   return genOpWithBody<mlir::omp::MasterOp>(
       OpWithBodyGenInfo(converter, symTable, semaCtx, loc, eval,
-                        llvm::omp::Directive::OMPD_master)
-          .setGenNested(genNested));
+                        llvm::omp::Directive::OMPD_master),
+      queue, item);
 }
 
 static mlir::omp::OrderedOp
@@ -1351,7 +1376,8 @@ genOrderedOp(Fortran::lower::AbstractConverter &converter,
              Fortran::lower::SymMap &symTable,
              Fortran::semantics::SemanticsContext &semaCtx,
              Fortran::lower::pft::Evaluation &eval, mlir::Location loc,
-             const List<Clause> &clauses) {
+             const List<Clause> &clauses, const ConstructQueue &queue,
+             ConstructQueue::iterator item) {
   TODO(loc, "OMPD_ordered");
   return nullptr;
 }
@@ -1360,25 +1386,25 @@ static mlir::omp::OrderedRegionOp
 genOrderedRegionOp(Fortran::lower::AbstractConverter &converter,
                    Fortran::lower::SymMap &symTable,
                    Fortran::semantics::SemanticsContext &semaCtx,
-                   Fortran::lower::pft::Evaluation &eval, bool genNested,
-                   mlir::Location loc, const List<Clause> &clauses) {
+                   Fortran::lower::pft::Evaluation &eval, mlir::Location loc,
+                   const List<Clause> &clauses, const ConstructQueue &queue,
+                   ConstructQueue::iterator item) {
   mlir::omp::OrderedRegionClauseOps clauseOps;
   genOrderedRegionClauses(converter, semaCtx, clauses, loc, clauseOps);
 
   return genOpWithBody<mlir::omp::OrderedRegionOp>(
       OpWithBodyGenInfo(converter, symTable, semaCtx, loc, eval,
-                        llvm::omp::Directive::OMPD_ordered)
-          .setGenNested(genNested),
-      clauseOps);
+                        llvm::omp::Directive::OMPD_ordered),
+      queue, item, clauseOps);
 }
 
 static mlir::omp::ParallelOp
 genParallelOp(Fortran::lower::AbstractConverter &converter,
               Fortran::lower::SymMap &symTable,
               Fortran::semantics::SemanticsContext &semaCtx,
-              Fortran::lower::pft::Evaluation &eval, bool genNested,
-              mlir::Location loc, const List<Clause> &clauses,
-              bool outerCombined = false) {
+              Fortran::lower::pft::Evaluation &eval, mlir::Location loc,
+              const List<Clause> &clauses, const ConstructQueue &queue,
+              ConstructQueue::iterator item, bool outerCombined = false) {
   fir::FirOpBuilder &firOpBuilder = converter.getFirOpBuilder();
   Fortran::lower::StatementContext stmtCtx;
   mlir::omp::ParallelClauseOps clauseOps;
@@ -1397,14 +1423,14 @@ genParallelOp(Fortran::lower::AbstractConverter &converter,
   OpWithBodyGenInfo genInfo =
       OpWithBodyGenInfo(converter, symTable, semaCtx, loc, eval,
                         llvm::omp::Directive::OMPD_parallel)
-          .setGenNested(genNested)
           .setOuterCombined(outerCombined)
           .setClauses(&clauses)
           .setReductions(&reductionSyms, &reductionTypes)
           .setGenRegionEntryCb(reductionCallback);
 
   if (!enableDelayedPrivatization)
-    return genOpWithBody<mlir::omp::ParallelOp>(genInfo, clauseOps);
+    return genOpWithBody<mlir::omp::ParallelOp>(genInfo, queue, item,
+                                                clauseOps);
 
   bool privatize = !outerCombined;
   DataSharingProcessor dsp(converter, semaCtx, clauses, eval,
@@ -1447,22 +1473,23 @@ genParallelOp(Fortran::lower::AbstractConverter &converter,
 
   // TODO Merge with the reduction CB.
   genInfo.setGenRegionEntryCb(genRegionEntryCB).setDataSharingProcessor(&dsp);
-  return genOpWithBody<mlir::omp::ParallelOp>(genInfo, clauseOps);
+  return genOpWithBody<mlir::omp::ParallelOp>(genInfo, queue, item, clauseOps);
 }
 
 static mlir::omp::SectionOp
 genSectionOp(Fortran::lower::AbstractConverter &converter,
              Fortran::lower::SymMap &symTable,
              Fortran::semantics::SemanticsContext &semaCtx,
-             Fortran::lower::pft::Evaluation &eval, bool genNested...
[truncated]

@kiranchandramohan
Copy link
Contributor

Is it possible to add tests for this?

Can we move the decomposition logic into a separate file?

@kparzysz
Copy link
Contributor Author

Is it possible to add tests for this?

Can we move the decomposition logic into a separate file?

The parts between lines 76 and 158? Sure.

I have a couple more PRs to open, and I'll work on that next.

@kparzysz kparzysz changed the title [flang][OpenMP] Decompose compound construccts, do recursive lowering [flang][OpenMP] Decompose compound constructs, do recursive lowering Apr 25, 2024
@kparzysz
Copy link
Contributor Author

Tests are still coming...

Base automatically changed from users/kparzysz/spr/c08-symtable to main April 30, 2024 16:44
Fortran::lower::pft::Evaluation &eval, llvm::omp::Directive compound,
const List<Clause> &clauses) {

List<UnitConstruct> constructs;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Why not just SmallVector?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConstructQueue is defined with List. We use List for lists of clauses, so it's just for consistency.

const UnitConstruct &uc);

ConstructQueue
buildConstructQueue(mlir::ModuleOp modOp,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add docs with example input(s) & expected output(s) of this function? Just to clarify its meaning/purpose more concretely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment explaining what this function does.

flang/lib/Lower/OpenMP/OpenMP.cpp Outdated Show resolved Hide resolved
Comment on lines +60 to +65
if (first == container.end())
return first;
auto second = std::find_if(std::next(first), container.end(), pred);
if (second == container.end())
return first;
return container.end();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should work and might be a bit easier to read:

Suggested change
if (first == container.end())
return first;
auto second = std::find_if(std::next(first), container.end(), pred);
if (second == container.end())
return first;
return container.end();
std::count_if(container.begin(), container.end(), pred) == 1 ? first : container.end();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will go over [begin, first] for the second time.

Copy link

github-actions bot commented May 7, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@kparzysz
Copy link
Contributor Author

I think all the points made so far were either addressed, or they will be addressed in upcoming PRs.

Are there still any remaining issues?

Copy link
Contributor

@skatrak skatrak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Krzysztof for working on this and addressing my comments. LGTM.

@kparzysz kparzysz merged commit ca1bd59 into main May 13, 2024
4 checks passed
@kparzysz kparzysz deleted the users/kparzysz/spr/d09-recurse branch May 13, 2024 13:09
kparzysz added a commit that referenced this pull request May 13, 2024
kparzysz added a commit that referenced this pull request May 13, 2024
…90098)

A compound construct with a list of clauses is broken up into individual
leaf/composite constructs. Each such construct has the list of clauses
that apply to it based on the OpenMP spec.

Each lowering function (i.e. a function that generates MLIR ops) is now
responsible for generating its body as described below.

Functions that receive AST nodes extract the construct, and the clauses
from the node. They then create a work queue consisting of individual
constructs, and invoke a common dispatch function to process (lower) the
queue.

The dispatch function examines the current position in the queue, and
invokes the appropriate lowering function. Each lowering function
receives the queue as well, and once it needs to generate its body, it
either invokes the dispatch function on the rest of the queue (if any),
or processes nested evaluations if the work queue is at the end.

Re-application of ca1bd59 with fixes for
compilation errors.
@omjavaid
Copy link
Contributor

This appears to cause a regression in LLVM testsuite gfortran testsuire FAIL: test-suite::gfortran-regression-compile-regression__gomp__reduction7_f90.test
https://lab.llvm.org/buildbot/#/builders/197/builds/14288

@kparzysz
Copy link
Contributor Author

Thanks, I'll take a look.

@tblah
Copy link
Contributor

tblah commented May 14, 2024

A few tests fail for me in my local build since this commit

Lower/OpenMP/parallel-sections.f90
Lower/OpenMP/taskgroup.f90
Lower/OpenMP/teams.f90

But I see the aarch64 buildbot is green: https://lab.llvm.org/buildbot/#/builders/173 so perhaps this is a problem on my end. Do you have any idea where I should look?

Or has anyone else come across the same problem?

@kparzysz
Copy link
Contributor Author

kparzysz commented May 14, 2024

How does it fail? Is it just different output, or are you seeing a crash?

Edit: I'll take a look at it to see what could potentially be cause of the problem. Some earlier version of the code had non-determinism in it, maybe I haven't fixed all of that...

@psteinfeld
Copy link
Contributor

@kparzysz, this change is causing one of our internal tests to fail. Here's an excerpt from the log file:

flang -I. -I. -c -mp -O3 -DNO_GPU -DNO_AMD -DNO_PHI -DLINUX -DNO_GNU -DNO_PGI -fpic /proj/ta/tests/applications/cpu/GFC/src/stsubs.F90 -o ./OBJ/stsubs.o
f18: /proj/build/llvm/Linux_x86_64/flang/lib/Lower/OpenMP/Decomposer.cpp:96: Fortran::lower::omp::ConstructQueue Fortran::lower::omp::buildConstructQueue(mlir::ModuleOp, Fortran::semantics::SemanticsContext&, Fortran::lower::pft::Evaluation&, const Fortran::parser::CharBlock&, llvm::omp::Directive, Fortran::lower::omp::List<Fortran::lower::omp::Clause>&): Assertion `!decompose.output.empty() && "Construct decomposition failed"' failed.
make[1]: *** [/proj/ta/tests/applications/cpu/GFC/src/Makefile:216: OBJ/stsubs.o] Aborted (core dumped)
make[1]: Leaving directory '/tmp/qa/GFC_0'
make: *** [/proj/ta/tests/applications/cpu/GFC/makefile:15: build] Error 2
make TOOLKIT=LLVM SRC=/proj/ta/tests/applications/cpu/GFC/src -f /proj/ta/tests/applications/cpu/GFC/src/Makefile
make[1]: Entering directory '/tmp/qa/GFC_0'
flang -I. -I. -c -mp -O3 -DNO_GPU -DNO_AMD -DNO_PHI -DLINUX -DNO_GNU -DNO_PGI -fpic /proj/ta/tests/applications/cpu/GFC/src/stsubs.F90 -o ./OBJ/stsubs.o
f18: /proj/build/llvm/Linux_x86_64/flang/lib/Lower/OpenMP/Decomposer.cpp:96: Fortran::lower::omp::ConstructQueue Fortran::lower::omp::buildConstructQueue(mlir::ModuleOp, Fortran::semantics::SemanticsContext&, Fortran::lower::pft::Evaluation&, const Fortran::parser::CharBlock&, llvm::omp::Directive, Fortran::lower::omp::List<Fortran::lower::omp::Clause>&): Assertion `!decompose.output.empty() && "Construct decomposition failed"' failed.
make[1]: *** [/proj/ta/tests/applications/cpu/GFC/src/Makefile:216: OBJ/stsubs.o] Aborted (core dumped)

I'll work on getting a small reproducer.

@tblah
Copy link
Contributor

tblah commented May 14, 2024

How does it fail? Is it just different output, or are you seeing a crash?

All three hit assertion failures in Decomposer.cpp:96: "Construct decomposition failed". This looks the same as @psteinfeld's issue.

Edit: I'll take a look at it to see what could potentially be cause of the problem. Some earlier version of the code had non-determinism in it, maybe I haven't fixed all of that...

Thank you

@kparzysz
Copy link
Contributor Author

This appears to cause a regression in LLVM testsuite gfortran testsuire FAIL: test-suite::gfortran-regression-compile-regression__gomp__reduction7_f90.test https://lab.llvm.org/buildbot/#/builders/197/builds/14288

That test has a simd directive with a reduction clause that has task modifier, which is not valid when used with simd. The purpose of the test is to detect the use of the invalid modifier. The decomposition code was trying to apply the modifier to the construct for which it is valid (as prescribed by the spec), but in this case it hasn't found such a construct, and ended up not applying the modifier at all. This resulted in a reduction clause without the modifier, which is actually valid for simd, and was ultimately compiled successfully, against the expectation of the test.
I put out #92160 for this, but the proper fix would be to detect it in the semantic checks. The decomposition code assumes that the code it deals with is valid, and the best it can do is to assert.

@kparzysz
Copy link
Contributor Author

A few tests fail for me in my local build since this commit

Lower/OpenMP/parallel-sections.f90
Lower/OpenMP/taskgroup.f90
Lower/OpenMP/teams.f90

But I see the aarch64 buildbot is green: https://lab.llvm.org/buildbot/#/builders/173 so perhaps this is a problem on my end. Do you have any idea where I should look?

Or has anyone else come across the same problem?

I was just able to reproduce this now. I don't know what's causing the assertion yet, but I know why it didn't fail check-flang for me earlier on. The test is predicated on the presence of omp_runtime in the lit environment (because the test uses the omp_lib module). That module wasn't built in my environment, and the test was silently skipped. When I ran it by hand, flang-new complained about some other of its own modules not being valid. I had to rebuild/reinstall the entire project from scratch to force these modules to be rebuilt with a recent flang-new. FYI, the module errors I was seeing were

  use,intrinsic::iso_c_binding,only:c_new_line
                 ^^^^^^^^^^^^^
/work/kparzysz/c/org/bin/../include/flang/omp_lib_kinds.mod:52:16: error: Cannot read module file for module 'iso_c_binding': File is not the right module file for 'iso_c_binding': /work/kparzysz/c/org/bin/../include/flang/iso_c_binding.mod

@kparzysz
Copy link
Contributor Author

taskgroup.f90

This testcase is invalid: the allocate clause requires another clause that can privatize the given object. The only such clause is task_reduction, but it's not yet supported. The only way to fix the test now is to remove the allocate clause completely (#92173). My code also needs to recognize additional privatizing clauses (#92176).

@kparzysz
Copy link
Contributor Author

teams.f90

There is an if clause on the teams directive: this is only allowed in OpenMP 5.2 or later (#92180).

@kparzysz
Copy link
Contributor Author

parallel-sections.f90

This test uses allocate clause without a privatizing clause. The failure is of the same type to the one in "taskgroup.f90". One way to fix this is to add private(x, y) to the construct. This would cause the allocate to only apply to sections (since that's where the private clause would apply). #92185

@kparzysz
Copy link
Contributor Author

I'll work on getting a small reproducer.

@psteinfeld Could you check if adding -fopenmp-version=52 changes anything?

@psteinfeld
Copy link
Contributor

-fopenmp-version=52

From what I can tell, this doesn't make a difference.

@tblah
Copy link
Contributor

tblah commented May 15, 2024

Thank you very much for the quick fixes. I'm concerned that invalid openmp directives led to assertion failures rather than semantic failures. Please could you make an issue for this so we don't forget to address it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:openmp OpenMP related changes to Clang flang:fir-hlfir flang:openmp flang Flang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants