Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[flang][OpenMP] - Transform target offloading directives with dependencies during PFT to MLIR conversion #85130

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

bhandarkar-pranav
Copy link
Contributor

This PR changes PFT to MLIR lowering for the following directives when they have the depend clause on them.

!$ omp target depend(..)
!$ omp target enter data depend(..)
!$ omp target update data depend(..)
!$ omp target exit data depend(..)

With this PR, lowering now involves the creation of an omp.task operation that encloses the omp.target operation that is otherwise generated for the target construct. In addition, the depend clause from the target is moved to the enclosing new omp.task. The new omp.task is a mergeable task.

!$ omp target map(..) depend(in:a)
   b = a

is transformed to the following MLIR

omp.task mergeable depend(in:a) {
  omp.target map(..) {
    //MLIR for b = a;
  }
  omp.terminator
}

Note

This PR is an alternative to #83966. Its benefits are that unlike #83966 it does not do an entire pass over the IR and changes required for translation of the depend clause on target constructs into LLVM IR are entirely contained in this one change. This is in comparison to the approach of #83966 which would need adding that pass to the pass pipeline in flang.
Further, now the all depend clause related operands and attributes of omp.target can be removed from MLIR (patch coming soon after this PR is merged)

@llvmbot
Copy link
Collaborator

llvmbot commented Mar 13, 2024

@llvm/pr-subscribers-flang-fir-hlfir

Author: Pranav Bhandarkar (bhandarkar-pranav)

Changes

This PR changes PFT to MLIR lowering for the following directives when they have the depend clause on them.

!$ omp target depend(..)
!$ omp target enter data depend(..)
!$ omp target update data depend(..)
!$ omp target exit data depend(..)

With this PR, lowering now involves the creation of an omp.task operation that encloses the omp.target operation that is otherwise generated for the target construct. In addition, the depend clause from the target is moved to the enclosing new omp.task. The new omp.task is a mergeable task.

!$ omp target map(..) depend(in:a)
   b = a

is transformed to the following MLIR

omp.task mergeable depend(in:a) {
  omp.target map(..) {
    //MLIR for b = a;
  }
  omp.terminator
}

>[!NOTE]
>This PR is an alternative to #83966. Its benefits are that unlike #83966 it does not do an entire pass over the IR and changes required for translation of the depend clause on target constructs into LLVM IR are entirely contained in this one change. This is in comparison to the approach of #83966 which would need adding that pass to the pass pipeline in flang.
Further, now the all depend clause related operands and attributes of omp.target can be removed from MLIR (patch coming soon after this PR is merged)


Full diff: https://github.com/llvm/llvm-project/pull/85130.diff

4 Files Affected:

  • (modified) flang/lib/Lower/OpenMP/ClauseProcessor.cpp (+37)
  • (modified) flang/lib/Lower/OpenMP/ClauseProcessor.h (+7)
  • (modified) flang/lib/Lower/OpenMP/OpenMP.cpp (+6-14)
  • (modified) flang/test/Lower/OpenMP/target.f90 (+9-4)
diff --git a/flang/lib/Lower/OpenMP/ClauseProcessor.cpp b/flang/lib/Lower/OpenMP/ClauseProcessor.cpp
index a41f8312a28c9e..8961268fd63a17 100644
--- a/flang/lib/Lower/OpenMP/ClauseProcessor.cpp
+++ b/flang/lib/Lower/OpenMP/ClauseProcessor.cpp
@@ -808,6 +808,43 @@ bool ClauseProcessor::processDepend(
       });
 }
 
+bool ClauseProcessor::processTargetDepend(
+    mlir::Location currentLocation) const {
+  llvm::SmallVector<mlir::Attribute> dependTypeOperands;
+  llvm::SmallVector<mlir::Value> dependOperands;
+
+  processDepend(dependTypeOperands, dependOperands);
+  if (dependTypeOperands.empty())
+    return false;
+
+  // If 'dependTypeOperands' is not empty, this means the depend
+  // clause was used and we create an omp.task operation that'll
+  // enclose the omp.target operation corresponding to the target
+  // construct used. This new omp.task will be a mergeable task
+  // on which the depend clause will be tacked on. The depend
+  // clause on the original target construct is dropped.
+  fir::FirOpBuilder &firOpBuilder = converter.getFirOpBuilder();
+
+  // Create the new omp.task op.
+  // As per the OpenMP Spec a target directive creates a mergeable 'target
+  // task'
+  mlir::omp::TaskOp taskOp = firOpBuilder.create<mlir::omp::TaskOp>(
+      currentLocation, /*if_expr*/ mlir::Value(),
+      /*final_expr*/ mlir::Value(), /*untied*/ mlir::UnitAttr(),
+      /*mergeable*/ firOpBuilder.getUnitAttr(),
+      /*in_reduction_vars*/ mlir::ValueRange(), /*in_reductions*/ nullptr,
+      /*priority*/ mlir::Value(),
+      mlir::ArrayAttr::get(converter.getFirOpBuilder().getContext(),
+                           dependTypeOperands),
+      dependOperands, /*allocate_vars*/ mlir::ValueRange(),
+      /*allocate_vars*/ mlir::ValueRange());
+
+  firOpBuilder.createBlock(&taskOp.getRegion());
+  firOpBuilder.create<mlir::omp::TerminatorOp>(currentLocation);
+  firOpBuilder.setInsertionPointToStart(&taskOp.getRegion().front());
+  return true;
+}
+
 bool ClauseProcessor::processIf(
     Fortran::parser::OmpIfClause::DirectiveNameModifier directiveName,
     mlir::Value &result) const {
diff --git a/flang/lib/Lower/OpenMP/ClauseProcessor.h b/flang/lib/Lower/OpenMP/ClauseProcessor.h
index 11aff0be25053d..140c7a8d5cefe8 100644
--- a/flang/lib/Lower/OpenMP/ClauseProcessor.h
+++ b/flang/lib/Lower/OpenMP/ClauseProcessor.h
@@ -101,6 +101,13 @@ class ClauseProcessor {
       llvm::SmallVectorImpl<mlir::Attribute> &copyPrivateFuncs) const;
   bool processDepend(llvm::SmallVectorImpl<mlir::Attribute> &dependTypeOperands,
                      llvm::SmallVectorImpl<mlir::Value> &dependOperands) const;
+
+  // This is a special case of processDepend that processes the depend
+  // clause on Target ops - TargetOp, EnterDataOp, ExitDataOp, UpdateDataOp
+  // It sets up the generation of MLIR code for the target construct
+  // in question by first creating an enclosing omp.task operation and transfers
+  // the 'depend' clause and its arguments to this new omp.task operation.
+  bool processTargetDepend(mlir::Location currentLocation) const;
   bool
   processEnter(llvm::SmallVectorImpl<DeclareTargetCapturePair> &result) const;
   bool
diff --git a/flang/lib/Lower/OpenMP/OpenMP.cpp b/flang/lib/Lower/OpenMP/OpenMP.cpp
index 25bb4d9cff5d16..011e7d75245061 100644
--- a/flang/lib/Lower/OpenMP/OpenMP.cpp
+++ b/flang/lib/Lower/OpenMP/OpenMP.cpp
@@ -933,7 +933,7 @@ genEnterExitUpdateDataOp(Fortran::lower::AbstractConverter &converter,
   ClauseProcessor cp(converter, semaCtx, clauseList);
   cp.processIf(directiveName, ifClauseOperand);
   cp.processDevice(stmtCtx, deviceOperand);
-  cp.processDepend(dependTypeOperands, dependOperands);
+  cp.processTargetDepend(currentLocation);
   cp.processNowait(nowaitAttr);
 
   if constexpr (std::is_same_v<OpTy, mlir::omp::UpdateDataOp>) {
@@ -946,13 +946,9 @@ genEnterExitUpdateDataOp(Fortran::lower::AbstractConverter &converter,
     cp.processMap(currentLocation, directive, stmtCtx, mapOperands);
   }
 
-  return firOpBuilder.create<OpTy>(
-      currentLocation, ifClauseOperand, deviceOperand,
-      dependTypeOperands.empty()
-          ? nullptr
-          : mlir::ArrayAttr::get(converter.getFirOpBuilder().getContext(),
-                                 dependTypeOperands),
-      dependOperands, nowaitAttr, mapOperands);
+  return firOpBuilder.create<OpTy>(currentLocation, ifClauseOperand,
+                                   deviceOperand, nullptr, mlir::ValueRange(),
+                                   nowaitAttr, mapOperands);
 }
 
 // This functions creates a block for the body of the targetOp's region. It adds
@@ -1130,7 +1126,7 @@ genTargetOp(Fortran::lower::AbstractConverter &converter,
                ifClauseOperand);
   cp.processDevice(stmtCtx, deviceOperand);
   cp.processThreadLimit(stmtCtx, threadLimitOperand);
-  cp.processDepend(dependTypeOperands, dependOperands);
+  cp.processTargetDepend(currentLocation);
   cp.processNowait(nowaitAttr);
   cp.processMap(currentLocation, directive, stmtCtx, mapOperands, &mapSymTypes,
                 &mapSymLocs, &mapSymbols);
@@ -1232,11 +1228,7 @@ genTargetOp(Fortran::lower::AbstractConverter &converter,
 
   auto targetOp = converter.getFirOpBuilder().create<mlir::omp::TargetOp>(
       currentLocation, ifClauseOperand, deviceOperand, threadLimitOperand,
-      dependTypeOperands.empty()
-          ? nullptr
-          : mlir::ArrayAttr::get(converter.getFirOpBuilder().getContext(),
-                                 dependTypeOperands),
-      dependOperands, nowaitAttr, mapOperands);
+      nullptr, mlir::ValueRange(), nowaitAttr, mapOperands);
 
   genBodyOfTargetOp(converter, semaCtx, eval, genNested, targetOp, mapSymTypes,
                     mapSymLocs, mapSymbols, currentLocation);
diff --git a/flang/test/Lower/OpenMP/target.f90 b/flang/test/Lower/OpenMP/target.f90
index 030533e1a04553..10c928dda19b43 100644
--- a/flang/test/Lower/OpenMP/target.f90
+++ b/flang/test/Lower/OpenMP/target.f90
@@ -27,9 +27,10 @@ subroutine omp_target_enter_depend
    !$omp task depend(out: a)
    call foo(a)
    !$omp end task
+   !CHECK: omp.task mergeable depend(taskdependin -> %[[A]]#1 : !fir.ref<!fir.array<1024xi32>>) {
    !CHECK: %[[BOUNDS:.*]] = omp.bounds   lower_bound({{.*}}) upper_bound({{.*}}) extent({{.*}}) stride({{.*}}) start_idx({{.*}})
    !CHECK: %[[MAP:.*]] = omp.map_info var_ptr({{.*}})   map_clauses(to) capture(ByRef) bounds(%[[BOUNDS]]) -> !fir.ref<!fir.array<1024xi32>> {name = "a"}
-   !CHECK: omp.target_enter_data   map_entries(%[[MAP]] : !fir.ref<!fir.array<1024xi32>>) depend(taskdependin -> %[[A]]#1 : !fir.ref<!fir.array<1024xi32>>)
+   !CHECK: omp.target_enter_data   map_entries(%[[MAP]] : !fir.ref<!fir.array<1024xi32>>)
    !$omp target enter data map(to: a) depend(in: a)
     return
 end subroutine omp_target_enter_depend
@@ -166,9 +167,10 @@ subroutine omp_target_exit_depend
    !$omp task depend(out: a)
    call foo(a)
    !$omp end task
+   !CHECK: omp.task mergeable depend(taskdependout -> %[[A]]#1 : !fir.ref<!fir.array<1024xi32>>)
    !CHECK: %[[BOUNDS:.*]] = omp.bounds   lower_bound({{.*}}) upper_bound({{.*}}) extent({{.*}}) stride({{.*}}) start_idx({{.*}})
    !CHECK: %[[MAP:.*]] = omp.map_info var_ptr({{.*}})   map_clauses(from) capture(ByRef) bounds(%[[BOUNDS]]) -> !fir.ref<!fir.array<1024xi32>> {name = "a"}
-   !CHECK: omp.target_exit_data   map_entries(%[[MAP]] : !fir.ref<!fir.array<1024xi32>>) depend(taskdependout -> %[[A]]#1 : !fir.ref<!fir.array<1024xi32>>)
+   !CHECK: omp.target_exit_data   map_entries(%[[MAP]] : !fir.ref<!fir.array<1024xi32>>)
    !$omp target exit data map(from: a) depend(out: a)
 end subroutine omp_target_exit_depend
 
@@ -187,9 +189,10 @@ subroutine omp_target_update_depend
    call foo(a)
    !$omp end task
 
+   !CHECK: omp.task mergeable depend(taskdependin -> %[[A]]#1 : !fir.ref<!fir.array<1024xi32>>) {
    !CHECK: %[[BOUNDS:.*]] = omp.bounds
    !CHECK: %[[MAP:.*]] = omp.map_info var_ptr(%[[A]]#0 : !fir.ref<!fir.array<1024xi32>>, !fir.array<1024xi32>) map_clauses(to) capture(ByRef) bounds(%[[BOUNDS]]) -> !fir.ref<!fir.array<1024xi32>> {name = "a"}
-   !CHECK: omp.target_update_data motion_entries(%[[MAP]] : !fir.ref<!fir.array<1024xi32>>) depend(taskdependin -> %[[A]]#1 : !fir.ref<!fir.array<1024xi32>>)
+   !CHECK: omp.target_update_data motion_entries(%[[MAP]] : !fir.ref<!fir.array<1024xi32>>)
    !$omp target update to(a) depend(in:a)
 end subroutine omp_target_update_depend
 
@@ -367,12 +370,14 @@ subroutine omp_target_depend
    !$omp task depend(out: a)
    call foo(a)
    !$omp end task
+
+   !CHECK: omp.task mergeable depend(taskdependin -> %[[A]]#1 : !fir.ref<!fir.array<1024xi32>>) {
    !CHECK: %[[STRIDE_A:.*]] = arith.constant 1 : index
    !CHECK: %[[LBOUND_A:.*]] = arith.constant 0 : index
    !CHECK: %[[UBOUND_A:.*]] = arith.subi %c1024, %c1 : index
    !CHECK: %[[BOUNDS_A:.*]] = omp.bounds lower_bound(%[[LBOUND_A]] : index) upper_bound(%[[UBOUND_A]] : index) extent(%[[EXTENT_A]] : index) stride(%[[STRIDE_A]] : index) start_idx(%[[STRIDE_A]] : index)
    !CHECK: %[[MAP_A:.*]] = omp.map_info var_ptr(%[[A]]#0 : !fir.ref<!fir.array<1024xi32>>, !fir.array<1024xi32>) map_clauses(tofrom) capture(ByRef) bounds(%[[BOUNDS_A]]) -> !fir.ref<!fir.array<1024xi32>> {name = "a"}
-   !CHECK: omp.target map_entries(%[[MAP_A]] -> %[[BB0_ARG:.*]] : !fir.ref<!fir.array<1024xi32>>) depend(taskdependin -> %[[A]]#1 : !fir.ref<!fir.array<1024xi32>>) {
+   !CHECK: omp.target map_entries(%[[MAP_A]] -> %[[BB0_ARG:.*]] : !fir.ref<!fir.array<1024xi32>>)
    !$omp target map(tofrom: a) depend(in: a)
       a(1) = 10
       !CHECK: omp.terminator

@llvmbot
Copy link
Collaborator

llvmbot commented Mar 13, 2024

@llvm/pr-subscribers-flang-openmp

Author: Pranav Bhandarkar (bhandarkar-pranav)

Changes

This PR changes PFT to MLIR lowering for the following directives when they have the depend clause on them.

!$ omp target depend(..)
!$ omp target enter data depend(..)
!$ omp target update data depend(..)
!$ omp target exit data depend(..)

With this PR, lowering now involves the creation of an omp.task operation that encloses the omp.target operation that is otherwise generated for the target construct. In addition, the depend clause from the target is moved to the enclosing new omp.task. The new omp.task is a mergeable task.

!$ omp target map(..) depend(in:a)
   b = a

is transformed to the following MLIR

omp.task mergeable depend(in:a) {
  omp.target map(..) {
    //MLIR for b = a;
  }
  omp.terminator
}

>[!NOTE]
>This PR is an alternative to #83966. Its benefits are that unlike #83966 it does not do an entire pass over the IR and changes required for translation of the depend clause on target constructs into LLVM IR are entirely contained in this one change. This is in comparison to the approach of #83966 which would need adding that pass to the pass pipeline in flang.
Further, now the all depend clause related operands and attributes of omp.target can be removed from MLIR (patch coming soon after this PR is merged)


Full diff: https://github.com/llvm/llvm-project/pull/85130.diff

4 Files Affected:

  • (modified) flang/lib/Lower/OpenMP/ClauseProcessor.cpp (+37)
  • (modified) flang/lib/Lower/OpenMP/ClauseProcessor.h (+7)
  • (modified) flang/lib/Lower/OpenMP/OpenMP.cpp (+6-14)
  • (modified) flang/test/Lower/OpenMP/target.f90 (+9-4)
diff --git a/flang/lib/Lower/OpenMP/ClauseProcessor.cpp b/flang/lib/Lower/OpenMP/ClauseProcessor.cpp
index a41f8312a28c9e..8961268fd63a17 100644
--- a/flang/lib/Lower/OpenMP/ClauseProcessor.cpp
+++ b/flang/lib/Lower/OpenMP/ClauseProcessor.cpp
@@ -808,6 +808,43 @@ bool ClauseProcessor::processDepend(
       });
 }
 
+bool ClauseProcessor::processTargetDepend(
+    mlir::Location currentLocation) const {
+  llvm::SmallVector<mlir::Attribute> dependTypeOperands;
+  llvm::SmallVector<mlir::Value> dependOperands;
+
+  processDepend(dependTypeOperands, dependOperands);
+  if (dependTypeOperands.empty())
+    return false;
+
+  // If 'dependTypeOperands' is not empty, this means the depend
+  // clause was used and we create an omp.task operation that'll
+  // enclose the omp.target operation corresponding to the target
+  // construct used. This new omp.task will be a mergeable task
+  // on which the depend clause will be tacked on. The depend
+  // clause on the original target construct is dropped.
+  fir::FirOpBuilder &firOpBuilder = converter.getFirOpBuilder();
+
+  // Create the new omp.task op.
+  // As per the OpenMP Spec a target directive creates a mergeable 'target
+  // task'
+  mlir::omp::TaskOp taskOp = firOpBuilder.create<mlir::omp::TaskOp>(
+      currentLocation, /*if_expr*/ mlir::Value(),
+      /*final_expr*/ mlir::Value(), /*untied*/ mlir::UnitAttr(),
+      /*mergeable*/ firOpBuilder.getUnitAttr(),
+      /*in_reduction_vars*/ mlir::ValueRange(), /*in_reductions*/ nullptr,
+      /*priority*/ mlir::Value(),
+      mlir::ArrayAttr::get(converter.getFirOpBuilder().getContext(),
+                           dependTypeOperands),
+      dependOperands, /*allocate_vars*/ mlir::ValueRange(),
+      /*allocate_vars*/ mlir::ValueRange());
+
+  firOpBuilder.createBlock(&taskOp.getRegion());
+  firOpBuilder.create<mlir::omp::TerminatorOp>(currentLocation);
+  firOpBuilder.setInsertionPointToStart(&taskOp.getRegion().front());
+  return true;
+}
+
 bool ClauseProcessor::processIf(
     Fortran::parser::OmpIfClause::DirectiveNameModifier directiveName,
     mlir::Value &result) const {
diff --git a/flang/lib/Lower/OpenMP/ClauseProcessor.h b/flang/lib/Lower/OpenMP/ClauseProcessor.h
index 11aff0be25053d..140c7a8d5cefe8 100644
--- a/flang/lib/Lower/OpenMP/ClauseProcessor.h
+++ b/flang/lib/Lower/OpenMP/ClauseProcessor.h
@@ -101,6 +101,13 @@ class ClauseProcessor {
       llvm::SmallVectorImpl<mlir::Attribute> &copyPrivateFuncs) const;
   bool processDepend(llvm::SmallVectorImpl<mlir::Attribute> &dependTypeOperands,
                      llvm::SmallVectorImpl<mlir::Value> &dependOperands) const;
+
+  // This is a special case of processDepend that processes the depend
+  // clause on Target ops - TargetOp, EnterDataOp, ExitDataOp, UpdateDataOp
+  // It sets up the generation of MLIR code for the target construct
+  // in question by first creating an enclosing omp.task operation and transfers
+  // the 'depend' clause and its arguments to this new omp.task operation.
+  bool processTargetDepend(mlir::Location currentLocation) const;
   bool
   processEnter(llvm::SmallVectorImpl<DeclareTargetCapturePair> &result) const;
   bool
diff --git a/flang/lib/Lower/OpenMP/OpenMP.cpp b/flang/lib/Lower/OpenMP/OpenMP.cpp
index 25bb4d9cff5d16..011e7d75245061 100644
--- a/flang/lib/Lower/OpenMP/OpenMP.cpp
+++ b/flang/lib/Lower/OpenMP/OpenMP.cpp
@@ -933,7 +933,7 @@ genEnterExitUpdateDataOp(Fortran::lower::AbstractConverter &converter,
   ClauseProcessor cp(converter, semaCtx, clauseList);
   cp.processIf(directiveName, ifClauseOperand);
   cp.processDevice(stmtCtx, deviceOperand);
-  cp.processDepend(dependTypeOperands, dependOperands);
+  cp.processTargetDepend(currentLocation);
   cp.processNowait(nowaitAttr);
 
   if constexpr (std::is_same_v<OpTy, mlir::omp::UpdateDataOp>) {
@@ -946,13 +946,9 @@ genEnterExitUpdateDataOp(Fortran::lower::AbstractConverter &converter,
     cp.processMap(currentLocation, directive, stmtCtx, mapOperands);
   }
 
-  return firOpBuilder.create<OpTy>(
-      currentLocation, ifClauseOperand, deviceOperand,
-      dependTypeOperands.empty()
-          ? nullptr
-          : mlir::ArrayAttr::get(converter.getFirOpBuilder().getContext(),
-                                 dependTypeOperands),
-      dependOperands, nowaitAttr, mapOperands);
+  return firOpBuilder.create<OpTy>(currentLocation, ifClauseOperand,
+                                   deviceOperand, nullptr, mlir::ValueRange(),
+                                   nowaitAttr, mapOperands);
 }
 
 // This functions creates a block for the body of the targetOp's region. It adds
@@ -1130,7 +1126,7 @@ genTargetOp(Fortran::lower::AbstractConverter &converter,
                ifClauseOperand);
   cp.processDevice(stmtCtx, deviceOperand);
   cp.processThreadLimit(stmtCtx, threadLimitOperand);
-  cp.processDepend(dependTypeOperands, dependOperands);
+  cp.processTargetDepend(currentLocation);
   cp.processNowait(nowaitAttr);
   cp.processMap(currentLocation, directive, stmtCtx, mapOperands, &mapSymTypes,
                 &mapSymLocs, &mapSymbols);
@@ -1232,11 +1228,7 @@ genTargetOp(Fortran::lower::AbstractConverter &converter,
 
   auto targetOp = converter.getFirOpBuilder().create<mlir::omp::TargetOp>(
       currentLocation, ifClauseOperand, deviceOperand, threadLimitOperand,
-      dependTypeOperands.empty()
-          ? nullptr
-          : mlir::ArrayAttr::get(converter.getFirOpBuilder().getContext(),
-                                 dependTypeOperands),
-      dependOperands, nowaitAttr, mapOperands);
+      nullptr, mlir::ValueRange(), nowaitAttr, mapOperands);
 
   genBodyOfTargetOp(converter, semaCtx, eval, genNested, targetOp, mapSymTypes,
                     mapSymLocs, mapSymbols, currentLocation);
diff --git a/flang/test/Lower/OpenMP/target.f90 b/flang/test/Lower/OpenMP/target.f90
index 030533e1a04553..10c928dda19b43 100644
--- a/flang/test/Lower/OpenMP/target.f90
+++ b/flang/test/Lower/OpenMP/target.f90
@@ -27,9 +27,10 @@ subroutine omp_target_enter_depend
    !$omp task depend(out: a)
    call foo(a)
    !$omp end task
+   !CHECK: omp.task mergeable depend(taskdependin -> %[[A]]#1 : !fir.ref<!fir.array<1024xi32>>) {
    !CHECK: %[[BOUNDS:.*]] = omp.bounds   lower_bound({{.*}}) upper_bound({{.*}}) extent({{.*}}) stride({{.*}}) start_idx({{.*}})
    !CHECK: %[[MAP:.*]] = omp.map_info var_ptr({{.*}})   map_clauses(to) capture(ByRef) bounds(%[[BOUNDS]]) -> !fir.ref<!fir.array<1024xi32>> {name = "a"}
-   !CHECK: omp.target_enter_data   map_entries(%[[MAP]] : !fir.ref<!fir.array<1024xi32>>) depend(taskdependin -> %[[A]]#1 : !fir.ref<!fir.array<1024xi32>>)
+   !CHECK: omp.target_enter_data   map_entries(%[[MAP]] : !fir.ref<!fir.array<1024xi32>>)
    !$omp target enter data map(to: a) depend(in: a)
     return
 end subroutine omp_target_enter_depend
@@ -166,9 +167,10 @@ subroutine omp_target_exit_depend
    !$omp task depend(out: a)
    call foo(a)
    !$omp end task
+   !CHECK: omp.task mergeable depend(taskdependout -> %[[A]]#1 : !fir.ref<!fir.array<1024xi32>>)
    !CHECK: %[[BOUNDS:.*]] = omp.bounds   lower_bound({{.*}}) upper_bound({{.*}}) extent({{.*}}) stride({{.*}}) start_idx({{.*}})
    !CHECK: %[[MAP:.*]] = omp.map_info var_ptr({{.*}})   map_clauses(from) capture(ByRef) bounds(%[[BOUNDS]]) -> !fir.ref<!fir.array<1024xi32>> {name = "a"}
-   !CHECK: omp.target_exit_data   map_entries(%[[MAP]] : !fir.ref<!fir.array<1024xi32>>) depend(taskdependout -> %[[A]]#1 : !fir.ref<!fir.array<1024xi32>>)
+   !CHECK: omp.target_exit_data   map_entries(%[[MAP]] : !fir.ref<!fir.array<1024xi32>>)
    !$omp target exit data map(from: a) depend(out: a)
 end subroutine omp_target_exit_depend
 
@@ -187,9 +189,10 @@ subroutine omp_target_update_depend
    call foo(a)
    !$omp end task
 
+   !CHECK: omp.task mergeable depend(taskdependin -> %[[A]]#1 : !fir.ref<!fir.array<1024xi32>>) {
    !CHECK: %[[BOUNDS:.*]] = omp.bounds
    !CHECK: %[[MAP:.*]] = omp.map_info var_ptr(%[[A]]#0 : !fir.ref<!fir.array<1024xi32>>, !fir.array<1024xi32>) map_clauses(to) capture(ByRef) bounds(%[[BOUNDS]]) -> !fir.ref<!fir.array<1024xi32>> {name = "a"}
-   !CHECK: omp.target_update_data motion_entries(%[[MAP]] : !fir.ref<!fir.array<1024xi32>>) depend(taskdependin -> %[[A]]#1 : !fir.ref<!fir.array<1024xi32>>)
+   !CHECK: omp.target_update_data motion_entries(%[[MAP]] : !fir.ref<!fir.array<1024xi32>>)
    !$omp target update to(a) depend(in:a)
 end subroutine omp_target_update_depend
 
@@ -367,12 +370,14 @@ subroutine omp_target_depend
    !$omp task depend(out: a)
    call foo(a)
    !$omp end task
+
+   !CHECK: omp.task mergeable depend(taskdependin -> %[[A]]#1 : !fir.ref<!fir.array<1024xi32>>) {
    !CHECK: %[[STRIDE_A:.*]] = arith.constant 1 : index
    !CHECK: %[[LBOUND_A:.*]] = arith.constant 0 : index
    !CHECK: %[[UBOUND_A:.*]] = arith.subi %c1024, %c1 : index
    !CHECK: %[[BOUNDS_A:.*]] = omp.bounds lower_bound(%[[LBOUND_A]] : index) upper_bound(%[[UBOUND_A]] : index) extent(%[[EXTENT_A]] : index) stride(%[[STRIDE_A]] : index) start_idx(%[[STRIDE_A]] : index)
    !CHECK: %[[MAP_A:.*]] = omp.map_info var_ptr(%[[A]]#0 : !fir.ref<!fir.array<1024xi32>>, !fir.array<1024xi32>) map_clauses(tofrom) capture(ByRef) bounds(%[[BOUNDS_A]]) -> !fir.ref<!fir.array<1024xi32>> {name = "a"}
-   !CHECK: omp.target map_entries(%[[MAP_A]] -> %[[BB0_ARG:.*]] : !fir.ref<!fir.array<1024xi32>>) depend(taskdependin -> %[[A]]#1 : !fir.ref<!fir.array<1024xi32>>) {
+   !CHECK: omp.target map_entries(%[[MAP_A]] -> %[[BB0_ARG:.*]] : !fir.ref<!fir.array<1024xi32>>)
    !$omp target map(tofrom: a) depend(in: a)
       a(1) = 10
       !CHECK: omp.terminator

@bhandarkar-pranav bhandarkar-pranav changed the title [mlir][OpenMP] - Transform target offloading directives with dependencies during PFT to MLIR conversion [flang][OpenMP] - Transform target offloading directives with dependencies during PFT to MLIR conversion Mar 13, 2024
Copy link
Contributor

@skatrak skatrak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Pranav, I do prefer this approach and it looks good except some minimal nits. Please give time for @ergawy, @jsjodin to take a look in case they have any concerns with this approach.

dependTypeOperands),
dependOperands, nowaitAttr, mapOperands);
return firOpBuilder.create<OpTy>(currentLocation, ifClauseOperand,
deviceOperand, nullptr, mlir::ValueRange(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Add comment to document what these default values refer to.

Suggested change
deviceOperand, nullptr, mlir::ValueRange(),
deviceOperand, /*depends=*/ nullptr, /*depend_vars=*/ mlir::ValueRange(),

: mlir::ArrayAttr::get(converter.getFirOpBuilder().getContext(),
dependTypeOperands),
dependOperands, nowaitAttr, mapOperands);
nullptr, mlir::ValueRange(), nowaitAttr, mapOperands);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Document default values here as well.

@kiranchandramohan
Copy link
Contributor

The concern with this approach is that it is attempting early lowering or conversion of target dependencies. It also means there will be no lowering for target dependencies in MLIR. It is probably not good that there is no lowering for target dependencies in the OpenMP dialect when there is a representation for it.

The issue with the conversion OpenMP MLIR pass is that it introduces tasks wrapping target constructs only when there are dependencies. Why is it not there when there are no dependencies? If this is an LLVM openmp runtime-specific lowering or just matching what Clang is doing, then we can add it as part of a legalize-for-export kind of pass. Equivalently you could refactor the task dependence lowering in OpenMPIRBuilder and/or OpenMP Translation.

@bhandarkar-pranav
Copy link
Contributor Author

bhandarkar-pranav commented Mar 14, 2024

Thanks @skatrak and @kiranchandramohan. My replies to Kiran's comment are inline below

The concern with this approach is that it is attempting early lowering or conversion of target dependencies. It also means there will be no lowering for target dependencies in MLIR. It is probably not good that there is no lowering for target dependencies in the OpenMP dialect when there is a representation for it.

The other PR (#83966) which is an alternative solution to this does the lowering in MLIR instead. With this approach one of the outcomes would be that we could remove dependencies from the omp.target representation entirely.

The issue with the conversion OpenMP MLIR pass is that it introduces tasks wrapping target constructs only when there are dependencies. Why is it not there when there are no dependencies?

I don't understand why the the outer task is needed in the absence of dependencies. Dependencies are handled by the runtime on the host side and hence the outer task.

If this is an LLVM openmp runtime-specific lowering or just matching what Clang is doing, then we can add it as part of a legalize-for-export kind of pass. Equivalently you could refactor the task dependence lowering in OpenMPIRBuilder and/or OpenMP Translation.

Yes, it matches what clang is doing. But, clang does it when generating LLVM IR. But IMHO, this is a higher level transformation ideally suited for MLIR. So, I would prefer a legalize-for-export pass rather than handle this procedurally in OpenMP Translation. Clang, of course, has no choice because it doesn't use MLIR.

@kiranchandramohan
Copy link
Contributor

The issue with the conversion OpenMP MLIR pass is that it introduces tasks wrapping target constructs only when there are dependencies. Why is it not there when there are no dependencies?

I don't understand why the the outer task is needed in the absence of dependencies. Dependencies are handled by the runtime on the host side and hence the outer task.

Isn't there always a target task generated by the target construct? This could be implicit and I am assuming that you are making the implicit task for a target construct explicit for dependencies. And by making an explicit task in the presence of dependences we are making a difference with the regular target flow.

If this is an LLVM openmp runtime-specific lowering or just matching what Clang is doing, then we can add it as part of a legalize-for-export kind of pass. Equivalently you could refactor the task dependence lowering in OpenMPIRBuilder and/or OpenMP Translation.

Yes, it matches what clang is doing. But, clang does it when generating LLVM IR. But IMHO, this is a higher level transformation ideally suited for MLIR. So, I would prefer a legalize-for-export pass rather than handle this procedurally in OpenMP Translation. Clang, of course, has no choice because it doesn't use MLIR.

I am fine with it as a legalize-for-export pass. Doing this in MLIR slightly deviates from the OpenMPIRBuilder approach that the LLVM openmp developers wanted us to use. Could you write a quick RFC in MLIR discourse, point out that this is a good use case of MLIR progressive lowering and ping @jdoerfert @Meinersbur and check whether it is OK?

@bhandarkar-pranav
Copy link
Contributor Author

Thanks @kiranchandramohan

Isn't there always a target task generated by the target construct? This could be implicit and I am assuming that you are making the implicit task for a target construct explicit for dependencies. And by making an explicit task in the presence of dependences we are making a difference with the regular target flow.

Rereading the spec (5.2) it appears to me that an outer task should always be generated. Here is the relevant part taken from here

When a target construct is encountered, a new target task is generated. The target task region encloses the target region. The target task is complete after the execution of the target region is complete.

But, in the absence of the nowait clause on the target construct, this enclosing task is an included task i.e. undeferred and executed by the encountering thread . This is why, I think, Clang only generates an enclosing task if depend, in_reduction, nowait or thread_limit (if OpenMP > 5.1) clauses are used on the target construct.

I am fine with it as a legalize-for-export pass. Doing this in MLIR slightly deviates from the OpenMPIRBuilder approach that the LLVM openmp developers wanted us to use. Could you write a quick RFC in MLIR discourse, point out that this is a good use case of MLIR progressive lowering and ping @jdoerfert @Meinersbur and check whether it is OK?

Sounds good, but can the RFC wait a few weeks. I am out on parental leave starting Monday and think it is best to pick it up after I come back (mid April). If you think otherwise, let me know and I'll write up a quick RFC on Friday

@kiranchandramohan
Copy link
Contributor

But, in the absence of the nowait clause on the target construct, this enclosing task is an included task i.e. undeferred and executed by the encountering thread . This is why, I think, Clang only generates an enclosing task if depend, in_reduction, nowait or thread_limit (if OpenMP > 5.1) clauses are used on the target construct.

Makes sense. So I guess if go the legalize pass way then the nowait case also should be handled there.

Sounds good, but can the RFC wait a few weeks. I am out on parental leave starting Monday and think it is best to pick it up after I come back (mid April). If you think otherwise, let me know and I'll write up a quick RFC on Friday

Sure, that is fine. Happy parenting!

@bhandarkar-pranav
Copy link
Contributor Author

bhandarkar-pranav commented Mar 15, 2024

Makes sense. So I guess if go the legalize pass way then the nowait case also should be handled there.

Exactly.

Sure, that is fine. Happy parenting!

Thank you :)

…cies during PFT to MLIR conversion

This patch changes PFT to MLIR lowering for the following directives
when they have the `depend` clause on them.

!$ omp target depend(..)
!$ omp target enter data depend(..)
!$ omp target update data depend(..)
!$ omp target exit data depend(..)

Lowering now involves the creation of an `omp.task` operation that encloses
the `omp.target` operation that is otherwise generated for the target construct.
In addition, the depend clause from the target is moved to the enclosing
new `omp.task`. The new `omp.task` is a mergeable task.

```
!$ omp target map(..) depend(in:a)
   b = a
```
is transformed to the following MLIR

```
omp.task mergeable depend(in:a) {
  omp.target map(..) {
    //MLIR for b = a;
  }
  omp.terminator
}
```
…cies during PFT to MLIR conversion

This patch changes PFT to MLIR lowering for the following directives
when they have the `depend` clause on them.

!$ omp target depend(..)
!$ omp target enter data depend(..)
!$ omp target update data depend(..)
!$ omp target exit data depend(..)

Lowering now involves the creation of an `omp.task` operation that encloses
the `omp.target` operation that is otherwise generated for the target construct.
In addition, the depend clause from the target is moved to the enclosing
new `omp.task`. The new `omp.task` is a mergeable task.

```
!$ omp target map(..) depend(in:a)
   b = a
```
is transformed to the following MLIR

```
omp.task mergeable depend(in:a) {
  omp.target map(..) {
    //MLIR for b = a;
  }
  omp.terminator
}
```
Copy link

github-actions bot commented Apr 30, 2024

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff 143be6a60186d6c1a6a298d0b7acdc1a4d69a321 8ec38538f2e44c2196c21dcdcc4d06b69cb08ef9 -- flang/lib/Lower/OpenMP/ClauseProcessor.cpp flang/lib/Lower/OpenMP/ClauseProcessor.h flang/lib/Lower/OpenMP/OpenMP.cpp
View the diff from clang-format here.
diff --git a/flang/lib/Lower/OpenMP/ClauseProcessor.cpp b/flang/lib/Lower/OpenMP/ClauseProcessor.cpp
index d3f45cdd14..a03bb3a583 100644
--- a/flang/lib/Lower/OpenMP/ClauseProcessor.cpp
+++ b/flang/lib/Lower/OpenMP/ClauseProcessor.cpp
@@ -761,11 +761,11 @@ bool ClauseProcessor::processHasDeviceAddr(
         addUseDeviceClause(converter, devAddrClause.v, result.hasDeviceAddrVars,
                            isDeviceTypes, isDeviceLocs, isDeviceSymbols);
       });
-
 }
 
 bool ClauseProcessor::processTargetDepend(
-    mlir::Location currentLocation, mlir::omp::DependClauseOps &clauseOps) const {
+    mlir::Location currentLocation,
+    mlir::omp::DependClauseOps &clauseOps) const {
 
   processDepend(clauseOps);
   if (clauseOps.dependTypeAttrs.empty())
diff --git a/flang/lib/Lower/OpenMP/ClauseProcessor.h b/flang/lib/Lower/OpenMP/ClauseProcessor.h
index 24d8e9f1b5..cd3b940c88 100644
--- a/flang/lib/Lower/OpenMP/ClauseProcessor.h
+++ b/flang/lib/Lower/OpenMP/ClauseProcessor.h
@@ -99,7 +99,8 @@ public:
   // It sets up the generation of MLIR code for the target construct
   // in question by first creating an enclosing omp.task operation and transfers
   // the 'depend' clause and its arguments to this new omp.task operation.
-  bool processTargetDepend(mlir::Location currentLocation, mlir::omp::DependClauseOps &clauseOps) const;
+  bool processTargetDepend(mlir::Location currentLocation,
+                           mlir::omp::DependClauseOps &clauseOps) const;
   bool
   processEnter(llvm::SmallVectorImpl<DeclareTargetCapturePair> &result) const;
   bool processIf(omp::clause::If::DirectiveNameModifier directiveName,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flang:fir-hlfir flang:openmp flang Flang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants