Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[flang][OpenMP] Add support for target ... private #78968

Closed
wants to merge 1 commit into from

Conversation

ergawy
Copy link
Member

@ergawy ergawy commented Jan 22, 2024

Adds support for the private clause in the target directive. In order to support that, the DataSharingProcessor was also restructured in order to separate the collection of prviate symbols from their actual privatization code-gen.

The commit adds both a code-gen and an offloading tests.

@llvmbot llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir flang:openmp openmp:libomptarget OpenMP offload runtime labels Jan 22, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Jan 22, 2024

@llvm/pr-subscribers-flang-openmp

@llvm/pr-subscribers-flang-fir-hlfir

Author: Kareem Ergawy (ergawy)

Changes

Adds support for the private clause in the target directive. In order to support that, the DataSharingProcessor was also restructured in order to separate the collection of prviate symbols from their actual privatization code-gen.

The commit adds both a code-gen and an offloading tests.


Full diff: https://github.com/llvm/llvm-project/pull/78968.diff

3 Files Affected:

  • (modified) flang/lib/Lower/OpenMP.cpp (+60-23)
  • (added) flang/test/Lower/OpenMP/target_private.f90 (+30)
  • (added) openmp/libomptarget/test/offloading/fortran/target_private.f90 (+29)
diff --git a/flang/lib/Lower/OpenMP.cpp b/flang/lib/Lower/OpenMP.cpp
index 7dd25f75d9eb76..5e35d56668a92f 100644
--- a/flang/lib/Lower/OpenMP.cpp
+++ b/flang/lib/Lower/OpenMP.cpp
@@ -143,15 +143,17 @@ static void genNestedEvaluations(Fortran::lower::AbstractConverter &converter,
 //===----------------------------------------------------------------------===//
 
 class DataSharingProcessor {
+  using SymbolSet = llvm::SetVector<const Fortran::semantics::Symbol *>;
+
   bool hasLastPrivateOp;
   mlir::OpBuilder::InsertPoint lastPrivIP;
   mlir::OpBuilder::InsertPoint insPt;
   mlir::Value loopIV;
   // Symbols in private, firstprivate, and/or lastprivate clauses.
-  llvm::SetVector<const Fortran::semantics::Symbol *> privatizedSymbols;
-  llvm::SetVector<const Fortran::semantics::Symbol *> defaultSymbols;
-  llvm::SetVector<const Fortran::semantics::Symbol *> symbolsInNestedRegions;
-  llvm::SetVector<const Fortran::semantics::Symbol *> symbolsInParentRegions;
+  SymbolSet privatizedSymbols;
+  SymbolSet defaultSymbols;
+  SymbolSet symbolsInNestedRegions;
+  SymbolSet symbolsInParentRegions;
   Fortran::lower::AbstractConverter &converter;
   fir::FirOpBuilder &firOpBuilder;
   const Fortran::parser::OmpClauseList &opClauseList;
@@ -182,35 +184,54 @@ class DataSharingProcessor {
       : hasLastPrivateOp(false), converter(converter),
         firOpBuilder(converter.getFirOpBuilder()), opClauseList(opClauseList),
         eval(eval) {}
-  // Privatisation is split into two steps.
-  // Step1 performs cloning of all privatisation clauses and copying for
-  // firstprivates. Step1 is performed at the place where process/processStep1
+  // Privatisation is split into 3 steps:
+  //
+  // * Step1: collects all symbols that should be privatized.
+  //
+  // * Step2: performs cloning of all privatisation clauses and copying for
+  // firstprivates. Step2 is performed at the place where process/processStep2
   // is called. This is usually inside the Operation corresponding to the OpenMP
-  // construct, for looping constructs this is just before the Operation. The
-  // split into two steps was performed basically to be able to call
-  // privatisation for looping constructs before the operation is created since
-  // the bounds of the MLIR OpenMP operation can be privatised.
-  // Step2 performs the copying for lastprivates and requires knowledge of the
-  // MLIR operation to insert the last private update. Step2 adds
+  // construct, for looping constructs this is just before the Operation.
+  //
+  // * Step3: performs the copying for lastprivates and requires knowledge of
+  // the MLIR operation to insert the last private update. Step2 adds
   // dealocation code as well.
+  //
+  // The split was performed for the following reasons:
+  //
+  // 1. Step1 was split so that the `target` op knows which symbols should not
+  // be mapped into the target region due to being `private`. The implicit
+  // mapping happens before the op body is generated so we need to to collect
+  // the private symbols first and then later in the body actually privatize
+  // them.
+  //
+  // 2. Step2 was split in order to call privatisation for looping constructs
+  // before the operation is created since the bounds of the MLIR OpenMP
+  // operation can be privatised.
   void processStep1();
-  void processStep2(mlir::Operation *op, bool isLoop);
+  void processStep2();
+  void processStep3(mlir::Operation *op, bool isLoop);
 
   void setLoopIV(mlir::Value iv) {
     assert(!loopIV && "Loop iteration variable already set");
     loopIV = iv;
   }
+
+  const SymbolSet &getPrivatizedSymbols() const { return privatizedSymbols; }
 };
 
 void DataSharingProcessor::processStep1() {
   collectSymbolsForPrivatization();
   collectDefaultSymbols();
+}
+
+void DataSharingProcessor::processStep2() {
   privatize();
   defaultPrivatize();
   insertBarrier();
 }
 
-void DataSharingProcessor::processStep2(mlir::Operation *op, bool isLoop) {
+void DataSharingProcessor::processStep3(mlir::Operation *op, bool isLoop) {
   insPt = firOpBuilder.saveInsertionPoint();
   copyLastPrivatize(op);
   firOpBuilder.restoreInsertionPoint(insPt);
@@ -2306,11 +2327,12 @@ static void createBodyOfOp(
     if (!dsp) {
       DataSharingProcessor proc(converter, *clauses, eval);
       proc.processStep1();
-      proc.processStep2(op, isLoop);
+      proc.processStep2();
+      proc.processStep3(op, isLoop);
     } else {
       if (isLoop && args.size() > 0)
         dsp->setLoopIV(converter.getSymbolAddress(*args[0]));
-      dsp->processStep2(op, isLoop);
+      dsp->processStep3(op, isLoop);
     }
 
     if (storeOp)
@@ -2648,7 +2670,9 @@ static void genBodyOfTargetOp(
     const llvm::SmallVector<mlir::Type> &mapSymTypes,
     const llvm::SmallVector<mlir::Location> &mapSymLocs,
     const llvm::SmallVector<const Fortran::semantics::Symbol *> &mapSymbols,
-    const mlir::Location &currentLocation) {
+    const mlir::Location &currentLocation,
+    const Fortran::parser::OmpClauseList &clauseList,
+    DataSharingProcessor &dsp) {
   assert(mapSymTypes.size() == mapSymLocs.size());
 
   fir::FirOpBuilder &firOpBuilder = converter.getFirOpBuilder();
@@ -2657,6 +2681,8 @@ static void genBodyOfTargetOp(
   auto *regionBlock =
       firOpBuilder.createBlock(&region, {}, mapSymTypes, mapSymLocs);
 
+  dsp.processStep2();
+
   // Clones the `bounds` placing them inside the target region and returns them.
   auto cloneBound = [&](mlir::Value bound) {
     if (mlir::isMemoryEffectFree(bound.getDefiningOp())) {
@@ -2811,8 +2837,7 @@ genTargetOp(Fortran::lower::AbstractConverter &converter,
   cp.processNowait(nowaitAttr);
   cp.processMap(currentLocation, directive, semanticsContext, stmtCtx,
                 mapOperands, &mapSymTypes, &mapSymLocs, &mapSymbols);
-  cp.processTODO<Fortran::parser::OmpClause::Private,
-                 Fortran::parser::OmpClause::Depend,
+  cp.processTODO<Fortran::parser::OmpClause::Depend,
                  Fortran::parser::OmpClause::Firstprivate,
                  Fortran::parser::OmpClause::IsDevicePtr,
                  Fortran::parser::OmpClause::HasDeviceAddr,
@@ -2823,11 +2848,21 @@ genTargetOp(Fortran::lower::AbstractConverter &converter,
                  Fortran::parser::OmpClause::Defaultmap>(
       currentLocation, llvm::omp::Directive::OMPD_target);
 
+  DataSharingProcessor dsp(converter, clauseList, eval);
+  dsp.processStep1();
+
   // 5.8.1 Implicit Data-Mapping Attribute Rules
   // The following code follows the implicit data-mapping rules to map all the
-  // symbols used inside the region that have not been explicitly mapped using
-  // the map clause.
+  // symbols used inside the region that do not have explicit data-environment
+  // attribute clauses (neither data-sharing; e.g. `private`, nor `map`
+  // clauses).
   auto captureImplicitMap = [&](const Fortran::semantics::Symbol &sym) {
+    if (dsp.getPrivatizedSymbols().contains(&sym)) {
+      llvm::errs() << "sym is to be privatized: " << sym.name().ToString()
+                   << "\n";
+      return;
+    }
+
     if (llvm::find(mapSymbols, &sym) == mapSymbols.end()) {
       mlir::Value baseOp = converter.getSymbolAddress(sym);
       if (!baseOp)
@@ -2893,7 +2928,7 @@ genTargetOp(Fortran::lower::AbstractConverter &converter,
       nowaitAttr, mapOperands);
 
   genBodyOfTargetOp(converter, eval, genNested, targetOp, mapSymTypes,
-                    mapSymLocs, mapSymbols, currentLocation);
+                    mapSymLocs, mapSymbols, currentLocation, clauseList, dsp);
 
   return targetOp;
 }
@@ -3127,6 +3162,7 @@ createSimdLoop(Fortran::lower::AbstractConverter &converter,
   fir::FirOpBuilder &firOpBuilder = converter.getFirOpBuilder();
   DataSharingProcessor dsp(converter, loopOpClauseList, eval);
   dsp.processStep1();
+  dsp.processStep2();
 
   Fortran::lower::StatementContext stmtCtx;
   mlir::Value scheduleChunkClauseOperand, ifClauseOperand;
@@ -3179,6 +3215,7 @@ static void createWsLoop(Fortran::lower::AbstractConverter &converter,
   fir::FirOpBuilder &firOpBuilder = converter.getFirOpBuilder();
   DataSharingProcessor dsp(converter, beginClauseList, eval);
   dsp.processStep1();
+  dsp.processStep2();
 
   Fortran::lower::StatementContext stmtCtx;
   mlir::Value scheduleChunkClauseOperand;
diff --git a/flang/test/Lower/OpenMP/target_private.f90 b/flang/test/Lower/OpenMP/target_private.f90
new file mode 100644
index 00000000000000..98e3b79d035dfd
--- /dev/null
+++ b/flang/test/Lower/OpenMP/target_private.f90
@@ -0,0 +1,30 @@
+!Test data-sharing attribute clauses for the `target` directive.
+
+!RUN: %flang_fc1 -emit-hlfir -fopenmp %s -o - | FileCheck %s
+
+!CHECK-LABEL: func.func @_QPomp_target_private()
+subroutine omp_target_private
+    implicit none
+    integer :: x(1)
+
+!$omp target private(x)
+    x(1) = 42
+!$omp end target
+!CHECK: omp.target {
+!CHECK-DAG:    %[[C1:.*]] = arith.constant 1 : index
+!CHECK-DAG:    %[[PRIV_ALLOC:.*]] = fir.alloca !fir.array<1xi32> {bindc_name = "x",
+!CHECK-SAME:     pinned, uniq_name = "_QFomp_target_privateEx"}
+!CHECK-NEXT:   %[[SHAPE:.*]] = fir.shape %[[C1]] : (index) -> !fir.shape<1>
+!CHECK-NEXT:   %[[PRIV_DECL:.*]]:2 = hlfir.declare %[[PRIV_ALLOC]](%[[SHAPE]])
+!CHECK-SAME:     {uniq_name = "_QFomp_target_privateEx"} :
+!CHECK-SAME:     (!fir.ref<!fir.array<1xi32>>, !fir.shape<1>) ->
+!CHECK-SAME:     (!fir.ref<!fir.array<1xi32>>, !fir.ref<!fir.array<1xi32>>)
+!CHECK-DAG:    %[[C42:.*]] = arith.constant 42 : i32
+!CHECK-DAG:    %[[C1_2:.*]] = arith.constant 1 : index
+!CHECK-NEXT:   %[[PRIV_BINDING:.*]] = hlfir.designate %[[PRIV_DECL]]#0 (%[[C1_2]])
+!CHECK-SAME:     : (!fir.ref<!fir.array<1xi32>>, index) -> !fir.ref<i32>
+!CHECK-NEXT:   hlfir.assign %[[C42]] to %[[PRIV_BINDING]] : i32, !fir.ref<i32>
+!CHECK-NEXT:   omp.terminator
+!CHECK-NEXT: }
+
+end subroutine omp_target_private
diff --git a/openmp/libomptarget/test/offloading/fortran/target_private.f90 b/openmp/libomptarget/test/offloading/fortran/target_private.f90
new file mode 100644
index 00000000000000..486c23ec2ec8d2
--- /dev/null
+++ b/openmp/libomptarget/test/offloading/fortran/target_private.f90
@@ -0,0 +1,29 @@
+! Basic offloading test with a target region
+! REQUIRES: flang
+! UNSUPPORTED: nvptx64-nvidia-cuda-LTO
+! UNSUPPORTED: aarch64-unknown-linux-gnu
+! UNSUPPORTED: aarch64-unknown-linux-gnu-LTO
+! UNSUPPORTED: x86_64-pc-linux-gnu
+! UNSUPPORTED: x86_64-pc-linux-gnu-LTO
+
+! RUN: %libomptarget-compile-fortran-generic
+! RUN: env LIBOMPTARGET_INFO=16 %libomptarget-run-generic 2>&1 | %fcheck-generic
+program target_update
+    implicit none
+    integer :: x(1)
+    integer :: y(1)
+
+    x(1) = 42
+
+!$omp target private(x) map(tofrom: y)
+    x(1) = 84
+    y(1) = x(1)
+!$omp end target
+
+    print *, "x =", x(1)
+    print *, "y =", y(1)
+
+end program target_update
+! CHECK: "PluginInterface" device {{[0-9]+}} info: Launching kernel {{.*}}
+! CHECK: x = 42
+! CHECK: y = 84

Adds support for the `private` clause in the `target` directive. In
order to support that, the `DataSharingProcessor` was also restructured
in order to separate the collection of prviate symbols from their actual
privatization code-gen.

The commit adds both a code-gen and an offloading tests.
@clementval
Copy link
Contributor

What about having a proper representation of private on the target operation and delay the actual privatization to a later stage?

#75836

@ergawy
Copy link
Member Author

ergawy commented Jan 23, 2024

What about having a proper representation of private on the target operation and delay the actual privatization to a later stage?

#75836

Thanks for the suggestion, I think that would be much better indeed. To disuss this in more detail, for the target op, I think it should change from this:

omp.target ... {
bb0(...):
    %priv_x = fir.alloca ... {bindc_name = "x"} ... // alloca for privatized var
    ... use %priv_x ...
 }

To something like this:

omp.target ... private(%x -> %argx: type) {
bb0(%argx: type):
    ... use %argx ...
}

And then we either:

  1. Would move/adapt the alloca generation logic currently in DataSharingProcessor::privatize() to OpenMPToLLVMIRTranslation.cpp. However, we need to adapt that logic to directly generate LLVM IR alloca instructions instead of fir.allocas.
  2. Or, have a conversion pattern that gets inserted in the pipeline while we still have FIR (*). The advantage here is that, I think, we can more directly and easily share code between that DataSharingProcessor::privatize() and that new conversion pattern. Or even reuse the DataSharingProcessor util altogether somehow. What I mean is that I think the DataSharingProcessor can be grown into a standalone OpConversionPattern that does the whole privitatization logic as later in the dialect as we would like.

I am leaning towards the 2nd option since it would be cleaner and easier to adapt IMO. WDYT?

(*) I know that emitting fir.allocas for privatized variables may not be a good thing to have since it couples the FIR and OMP dialects but having a separate conversion pattern will a step towards uncoupling the 2 dialects (if we prefer to). Even if we do not care about the coupling, the separate conversion patter would be a cleaner and standalone way to handle all the privatization materializations in one self-contained place.

Looking forward for any thoughts on this :).

@kiranchandramohan
Copy link
Contributor

kiranchandramohan commented Jan 23, 2024

omp.target ... private(%x -> %argx: type) {
bb0(%argx: type):
... use %argx ...
}

You can augment the private with a function like op that declares how to make a clone (with alloca) of this type. Flang will create this with FIR/HLFIR. This will be converted by the current conversion patterns in codegen.cpp to LLVM dialect and then in OpenMPToLLVMIRTranslation.cpp you can hopefully just inline this function like op at the correct place.

  omp.private.decl @private_decl_i32 : !fir.ref<i32> init {
  ^bb0(%arg0: !fir.ref<i32>):
    %0 = fir.alloca i32
    omp.yield %0 : !fir.ref<i32>
  }
omp.target ... private(private_decl_i32 %x -> %argx: !fir.ref<i32>) {
bb0(%argx: !fir.ref<i32>):
     ... use %argx ...
}

The copyprivate patch reuses some of the copying (needed for firstprivate) from DataSharingProcessor.
#73128

You can hopefully combine what you do and what is there for copyprivate to implement firstprivate as well.

Note: You can find similar code in OpenMP reduction declarations and in OpenACC privatisation and reduction recipes

@ergawy
Copy link
Member Author

ergawy commented Jan 29, 2024

omp.target ... private(%x -> %argx: type) {
bb0(%argx: type):
... use %argx ...
}

You can augment the private with a function like op that declares how to make a clone (with alloca) of this type. Flang will create this with FIR/HLFIR. This will be converted by the current conversion patterns in codegen.cpp to LLVM dialect and then in OpenMPToLLVMIRTranslation.cpp you can hopefully just inline this function like op at the correct place.

  omp.private.decl @private_decl_i32 : !fir.ref<i32> init {
  ^bb0(%arg0: !fir.ref<i32>):
    %0 = fir.alloca i32
    omp.yield %0 : !fir.ref<i32>
  }
omp.target ... private(private_decl_i32 %x -> %argx: !fir.ref<i32>) {
bb0(%argx: !fir.ref<i32>):
     ... use %argx ...
}

The copyprivate patch reuses some of the copying (needed for firstprivate) from DataSharingProcessor. #73128

You can hopefully combine what you do and what is there for copyprivate to implement firstprivate as well.

Note: You can find similar code in OpenMP reduction declarations and in OpenACC privatisation and reduction recipes

@kiranchandramohan @clementval I created an incomplete WIP for delayed privatization in #79862. In the commit message I explain what is done and is still todo.

I know it is a bit early stage but I wanted to take your input on the current approach at least from a high-level of view. Any comments from anyone is highly appreciated.

I can abandon this PR and we can probably move the discussion to the WIP PR.

@clementval
Copy link
Contributor

Good! I'll have a look at it today.

@ergawy
Copy link
Member Author

ergawy commented Feb 6, 2024

Abandoning since we are moving to delayed privatization: #79862.

@ergawy ergawy closed this Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flang:fir-hlfir flang:openmp flang Flang issues not falling into any other category openmp:libomptarget OpenMP offload runtime
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants