Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[flang][MLIR][OpenMP] Extend delayed privatization for scalar allocatables and pointers #84740

Merged
merged 1 commit into from
Mar 18, 2024

Conversation

ergawy
Copy link
Member

@ergawy ergawy commented Mar 11, 2024

One more step in extending support for delayed privatization. This diff adds support for scalar allocatables and pointers.

@ergawy
Copy link
Member Author

ergawy commented Mar 11, 2024

This PR replaces the previous overly complicated solution proposed in #84033.

@llvmbot llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir flang:openmp labels Mar 11, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Mar 11, 2024

@llvm/pr-subscribers-flang-openmp

@llvm/pr-subscribers-flang-fir-hlfir

Author: Kareem Ergawy (ergawy)

Changes

One more step in extending support for delayed privatization. This diff adds support for scalar allocatables.


Full diff: https://github.com/llvm/llvm-project/pull/84740.diff

5 Files Affected:

  • (modified) flang/include/flang/Optimizer/Builder/BoxValue.h (+10)
  • (modified) flang/lib/Lower/Bridge.cpp (+14-10)
  • (modified) flang/lib/Lower/OpenMP/DataSharingProcessor.cpp (+5-3)
  • (added) flang/test/Lower/OpenMP/delayed-privatization-allocatable-firstprivate.f90 (+33)
  • (added) flang/test/Lower/OpenMP/delayed-privatization-allocatable-private.f90 (+40)
diff --git a/flang/include/flang/Optimizer/Builder/BoxValue.h b/flang/include/flang/Optimizer/Builder/BoxValue.h
index 2fed2d48a7a080..698a5afa17b23d 100644
--- a/flang/include/flang/Optimizer/Builder/BoxValue.h
+++ b/flang/include/flang/Optimizer/Builder/BoxValue.h
@@ -535,6 +535,16 @@ class ExtendedValue : public details::matcher<ExtendedValue> {
 
   const VT &matchee() const { return box; }
 
+  /// Clone an ExtendedValue to a new instance changing the base address.
+  ///
+  /// TODO So far, this is only a shallow clone; only the base address is
+  /// replaced. This will probably be extended to implement deep cloning to
+  /// support scenarios such as delayed privatization for ArrayBoxValue's.
+  ExtendedValue clone(mlir::Value newBase) const  {
+    return match([&](const UnboxedValue &box) -> ExtendedValue { return newBase; },
+                 [&](const auto &box) -> ExtendedValue { return box.clone(newBase); });
+  }
+
 private:
   VT box;
 };
diff --git a/flang/lib/Lower/Bridge.cpp b/flang/lib/Lower/Bridge.cpp
index a668ba4116faab..5fa914d087f020 100644
--- a/flang/lib/Lower/Bridge.cpp
+++ b/flang/lib/Lower/Bridge.cpp
@@ -748,7 +748,8 @@ class FirConverter : public Fortran::lower::AbstractConverter {
 
   void copyVar(mlir::Location loc, mlir::Value dst,
                mlir::Value src) override final {
-    copyVarHLFIR(loc, dst, src);
+    copyVarHLFIR(loc, Fortran::lower::SymbolBox::Intrinsic{dst},
+                 Fortran::lower::SymbolBox::Intrinsic{src});
   }
 
   void copyHostAssociateVar(
@@ -1009,10 +1010,7 @@ class FirConverter : public Fortran::lower::AbstractConverter {
       // `omp.private`'s `alloc` block. If this is the case, we return this
       // `SymbolBox::Intrinsic` value.
       if (Fortran::lower::SymbolBox v = symMap->lookupSymbol(sym))
-        return v.match(
-            [&](const Fortran::lower::SymbolBox::Intrinsic &)
-                -> Fortran::lower::SymbolBox { return v; },
-            [](const auto &) -> Fortran::lower::SymbolBox { return {}; });
+        return v;
 
       return {};
     }
@@ -1060,15 +1058,16 @@ class FirConverter : public Fortran::lower::AbstractConverter {
                const Fortran::lower::SymbolBox &rhs_sb) {
     mlir::Location loc = genLocation(sym.name());
     if (lowerToHighLevelFIR())
-      copyVarHLFIR(loc, lhs_sb.getAddr(), rhs_sb.getAddr());
+      copyVarHLFIR(loc, lhs_sb, rhs_sb);
     else
       copyVarFIR(loc, sym, lhs_sb, rhs_sb);
   }
 
-  void copyVarHLFIR(mlir::Location loc, mlir::Value dst, mlir::Value src) {
+  void copyVarHLFIR(mlir::Location loc, Fortran::lower::SymbolBox dst,
+                    Fortran::lower::SymbolBox src) {
     assert(lowerToHighLevelFIR());
-    hlfir::Entity lhs{dst};
-    hlfir::Entity rhs{src};
+    hlfir::Entity lhs{dst.getAddr()};
+    hlfir::Entity rhs{src.getAddr()};
     // Temporary_lhs is set to true in hlfir.assign below to avoid user
     // assignment to be used and finalization to be called on the LHS.
     // This may or may not be correct but mimics the current behaviour
@@ -1082,7 +1081,12 @@ class FirConverter : public Fortran::lower::AbstractConverter {
           /*keepLhsLengthInAllocatableAssignment=*/false,
           /*temporary_lhs=*/true);
     };
-    if (lhs.isAllocatable()) {
+
+    bool isBoxAllocatable = dst.match(
+        [](const fir::MutableBoxValue &box) { return box.isAllocatable(); },
+        [](const auto &box) { return false; });
+
+    if (isBoxAllocatable) {
       // Deep copy allocatable if it is allocated.
       // Note that when allocated, the RHS is already allocated with the LHS
       // shape for copy on entry in createHostAssociateVarClone.
diff --git a/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp b/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
index 717b8cc0276a30..bee533949c6564 100644
--- a/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
+++ b/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
@@ -50,6 +50,7 @@ void DataSharingProcessor::processStep2(mlir::Operation *op, bool isLoop) {
 }
 
 void DataSharingProcessor::insertDeallocs() {
+  // TODO Extend delayed privatization to include a `dealloc` region?
   for (const Fortran::semantics::Symbol *sym : privatizedSymbols)
     if (Fortran::semantics::IsAllocatable(sym->GetUltimate())) {
       converter.createHostAssociateVarCloneDealloc(*sym);
@@ -376,6 +377,7 @@ void DataSharingProcessor::doPrivatize(const Fortran::semantics::Symbol *sym) {
         symLoc, uniquePrivatizerName, symType,
         isFirstPrivate ? mlir::omp::DataSharingClauseType::FirstPrivate
                        : mlir::omp::DataSharingClauseType::Private);
+    fir::ExtendedValue symExV = converter.getSymbolExtendedValue(*sym);
 
     symTable->pushScope();
 
@@ -386,7 +388,7 @@ void DataSharingProcessor::doPrivatize(const Fortran::semantics::Symbol *sym) {
           &allocRegion, /*insertPt=*/{}, symType, symLoc);
 
       firOpBuilder.setInsertionPointToEnd(allocEntryBlock);
-      symTable->addSymbol(*sym, allocRegion.getArgument(0));
+      symTable->addSymbol(*sym, symExV.clone(allocRegion.getArgument(0)));
       symTable->pushScope();
       cloneSymbol(sym);
       firOpBuilder.create<mlir::omp::YieldOp>(
@@ -403,10 +405,10 @@ void DataSharingProcessor::doPrivatize(const Fortran::semantics::Symbol *sym) {
       mlir::Block *copyEntryBlock = firOpBuilder.createBlock(
           &copyRegion, /*insertPt=*/{}, {symType, symType}, {symLoc, symLoc});
       firOpBuilder.setInsertionPointToEnd(copyEntryBlock);
-      symTable->addSymbol(*sym, copyRegion.getArgument(0),
+      symTable->addSymbol(*sym, symExV.clone(copyRegion.getArgument(0)),
                           /*force=*/true);
       symTable->pushScope();
-      symTable->addSymbol(*sym, copyRegion.getArgument(1));
+      symTable->addSymbol(*sym, symExV.clone(copyRegion.getArgument(1)));
       auto ip = firOpBuilder.saveInsertionPoint();
       copyFirstPrivateSymbol(sym, &ip);
 
diff --git a/flang/test/Lower/OpenMP/delayed-privatization-allocatable-firstprivate.f90 b/flang/test/Lower/OpenMP/delayed-privatization-allocatable-firstprivate.f90
new file mode 100644
index 00000000000000..545221b4c1abfe
--- /dev/null
+++ b/flang/test/Lower/OpenMP/delayed-privatization-allocatable-firstprivate.f90
@@ -0,0 +1,33 @@
+! Test delayed privatization for allocatables: `firstprivate`.
+
+! RUN: bbc -emit-hlfir -fopenmp --openmp-enable-delayed-privatization -o - %s 2>&1 | FileCheck %s
+
+subroutine delayed_privatization_allocatable
+  implicit none
+  integer, allocatable :: var1
+
+!$omp parallel firstprivate(var1)
+  var1 = 10
+!$omp end parallel
+end subroutine
+
+! CHECK-LABEL: omp.private {type = firstprivate}
+! CHECK-SAME: @[[PRIVATIZER_SYM:.*]] : [[TYPE:!fir.ref<!fir.box<!fir.heap<i32>>>]] alloc {
+
+! CHECK-NEXT: ^bb0(%[[PRIV_ARG:.*]]: [[TYPE]]):
+
+! CHECK: } copy {
+! CHECK: ^bb0(%[[PRIV_ORIG_ARG:.*]]: [[TYPE]], %[[PRIV_PRIV_ARG:.*]]: [[TYPE]]):
+
+! CHECK-NEXT:  %[[PRIV_BASE_VAL:.*]] = fir.load %[[PRIV_PRIV_ARG]]
+! CHECK-NEXT:  %[[PRIV_BASE_BOX:.*]] = fir.box_addr %[[PRIV_BASE_VAL]]
+! CHECK-NEXT:  %[[PRIV_BASE_ADDR:.*]] = fir.convert %[[PRIV_BASE_BOX]]
+! CHECK-NEXT:  %[[C0:.*]] = arith.constant 0 : i64
+! CHECK-NEXT:  %[[COPY_COND:.*]] = arith.cmpi ne, %[[PRIV_BASE_ADDR]], %[[C0]] : i64
+
+! CHECK-NEXT:  fir.if %[[COPY_COND]] {
+! CHECK-NEXT:    %[[ORIG_BASE_VAL:.*]] = fir.load %[[PRIV_ORIG_ARG]]
+! CHECK-NEXT:    %[[ORIG_BASE_ADDR:.*]] = fir.box_addr %[[ORIG_BASE_VAL]]
+! CHECK-NEXT:    %[[ORIG_BASE_LD:.*]] = fir.load %[[ORIG_BASE_ADDR]]
+! CHECK-NEXT:    hlfir.assign %[[ORIG_BASE_LD]] to %[[PRIV_BASE_BOX]] temporary_lhs
+! CHECK-NEXT:  }
diff --git a/flang/test/Lower/OpenMP/delayed-privatization-allocatable-private.f90 b/flang/test/Lower/OpenMP/delayed-privatization-allocatable-private.f90
new file mode 100644
index 00000000000000..83fb516879ff20
--- /dev/null
+++ b/flang/test/Lower/OpenMP/delayed-privatization-allocatable-private.f90
@@ -0,0 +1,40 @@
+! Test delayed privatization for allocatables: `private`.
+
+! RUN: bbc -emit-hlfir -fopenmp --openmp-enable-delayed-privatization -o - %s 2>&1 | FileCheck %s
+
+subroutine delayed_privatization_allocatable
+  implicit none
+  integer, allocatable :: var1
+
+!$omp parallel private(var1)
+  var1 = 10
+!$omp end parallel
+end subroutine
+
+! CHECK-LABEL: omp.private {type = private}
+! CHECK-SAME: @[[PRIVATIZER_SYM:.*]] : [[TYPE:!fir.ref<!fir.box<!fir.heap<i32>>>]] alloc {
+
+! CHECK-NEXT: ^bb0(%[[PRIV_ARG:.*]]: [[TYPE]]):
+
+! CHECK-NEXT:   %[[PRIV_ALLOC:.*]] = fir.alloca !fir.box<!fir.heap<i32>> {bindc_name = "var1", pinned, uniq_name = "_QFdelayed_privatization_allocatableEvar1"}
+
+! CHECK-NEXT:   %[[PRIV_ARG_VAL:.*]] = fir.load %[[PRIV_ARG]] : !fir.ref<!fir.box<!fir.heap<i32>>>
+! CHECK-NEXT:   %[[PRIV_ARG_BOX:.*]] = fir.box_addr %[[PRIV_ARG_VAL]] : (!fir.box<!fir.heap<i32>>) -> !fir.heap<i32>
+! CHECK-NEXT:   %[[PRIV_ARG_ADDR:.*]] = fir.convert %[[PRIV_ARG_BOX]] : (!fir.heap<i32>) -> i64
+! CHECK-NEXT:   %[[C0:.*]] = arith.constant 0 : i64
+! CHECK-NEXT:   %[[ALLOC_COND:.*]] = arith.cmpi ne, %[[PRIV_ARG_ADDR]], %[[C0]] : i64
+
+! CHECK-NEXT:   fir.if %[[ALLOC_COND]] {
+! CHECK-NEXT:     %[[PRIV_ALLOCMEM:.*]] = fir.allocmem i32 {fir.must_be_heap = true, uniq_name = "_QFdelayed_privatization_allocatableEvar1.alloc"}
+! CHECK-NEXT:     %[[PRIV_ALLOCMEM_BOX:.*]] = fir.embox %[[PRIV_ALLOCMEM]] : (!fir.heap<i32>) -> !fir.box<!fir.heap<i32>>
+! CHECK-NEXT:     fir.store %[[PRIV_ALLOCMEM_BOX]] to %[[PRIV_ALLOC]] : !fir.ref<!fir.box<!fir.heap<i32>>>
+! CHECK-NEXT:   } else {
+! CHECK-NEXT:     %[[ZERO_BITS:.*]] = fir.zero_bits !fir.heap<i32>
+! CHECK-NEXT:     %[[ZERO_BOX:.*]] = fir.embox %[[ZERO_BITS]] : (!fir.heap<i32>) -> !fir.box<!fir.heap<i32>>
+! CHECK-NEXT:     fir.store %[[ZERO_BOX]] to %[[PRIV_ALLOC]] : !fir.ref<!fir.box<!fir.heap<i32>>>
+! CHECK-NEXT:   }
+
+! CHECK-NEXT:   %[[PRIV_DECL:.*]]:2 = hlfir.declare %[[PRIV_ALLOC]]
+! CHECK-NEXT:   omp.yield(%[[PRIV_DECL]]#0 : [[TYPE]])
+
+! CHECK-NEXT: }

Copy link

github-actions bot commented Mar 11, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@kiranchandramohan
Copy link
Contributor

Thanks @ergawy for the change. And sorry for the changes you had to make due to the previous usage of both base and original base.

I was expecting the deallocation of the allocatable variable to be inside a separate region in the PrivateClause Op.

init {
} copy {
} dealloc {
// insert dealloc here
}

@ergawy
Copy link
Member Author

ergawy commented Mar 11, 2024

... And sorry for the changes you had to make due to the previous usage of both base and original base.

No problem at all. I learned quite a bit trying to figure out the solution so the time spent did not go to waste for sure.

I was expecting the deallocation of the allocatable variable to be inside a separate region in the PrivateClause Op.

init {
} copy {
} dealloc {
// insert dealloc here
}

That should be the case indeed. I added a TODO in DataSharingProcessor::insertDeallocs() to extend omp.private op to add a dealloc region since we don't have that one (I did not anticipate the need for it since I was not familiar with how allocatables work when I started). I will create follow up PR(s) for this if you don't mind.

Comment on lines +15 to +21
! CHECK-SAME: @[[PRIVATIZER_SYM:.*]] : [[TYPE:!fir.ref<!fir.box<!fir.heap<i32>>>]] alloc {

! CHECK-NEXT: ^bb0(%[[PRIV_ARG:.*]]: [[TYPE]]):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is allocated is not visible in the test here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that deferred to the private test?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, instead of just copying the check lines from the private test again, I skipped most of it here. Let me know if you prefer to repeat the checks here as well.

Copy link
Contributor

@kiranchandramohan kiranchandramohan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will create follow up PR(s) for this if you don't mind.

That is fine.

flang/lib/Lower/OpenMP/DataSharingProcessor.cpp Outdated Show resolved Hide resolved
@kiranchandramohan
Copy link
Contributor

Would pointers come in a separate patch?

else
copyVarFIR(loc, sym, lhs_sb, rhs_sb);
}

void copyVarHLFIR(mlir::Location loc, mlir::Value dst, mlir::Value src) {
void copyVarHLFIR(mlir::Location loc, Fortran::lower::SymbolBox dst,
Fortran::lower::SymbolBox src) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you need to change the argument here and the way allocatable were tested for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With delayed privatization, the address that the dst box is not anymore a FortranVariableOpInterface value. It is rather a block argument to the copy region of the delayed privatizer:

omp.private {type = firstprivate} @_QF... : !fir.ref<!fir.box<!fir.heap<i32>>> alloc {
  ^bb0(%arg0: !fir.ref<!fir.box<!fir.heap<i32>>>):
  ...
  } copy {
  ^bb0(%arg0: !fir.ref<!fir.box<!fir.heap<i32>>>, %arg1: !fir.ref<!fir.box<!fir.heap<i32>>>):
  ...
  }

%arg1 in the above snippet is the dst value. So we don't have access to the fortran_attrs attributes attached to the original value anymore. However, the SymbolBox still has that information since it directly checks the type of the address.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I just realized that for eager privatization, we also have to check for the FortranVariableOpInterface of the SymbolBox variant. I didn't catch the issue since all tests were green so I will add more tests for eager privatization for both allocatables and pointers in a follow-up PR. Seems like this was an oversight, I think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in latest commit.

@ergawy
Copy link
Member Author

ergawy commented Mar 12, 2024

Edit: never mind, found the fix and it is pretty simple so I will include it in this PR after adding some tests.

Would pointers come in a separate patch?

@kiranchandramohan, I just tried delayed privatization with pointers and there is a difference between the copy logic in eager privatization vs. the delayed case:

Eager privatization:

      %6 = fir.load %3#0 : !fir.ref<!fir.box<!fir.ptr<i32>>>
      fir.store %6 to %5#0 : !fir.ref<!fir.box<!fir.ptr<i32>>>

Delayed privatization:

copy {
  ^bb0(%arg0: !fir.ref<!fir.box<!fir.ptr<i32>>>, %arg1: !fir.ref<!fir.box<!fir.ptr<i32>>>):
    %0 = fir.load %arg0 : !fir.ref<!fir.box<!fir.ptr<i32>>>
    %1 = fir.box_addr %0 : (!fir.box<!fir.ptr<i32>>) -> !fir.ptr<i32>
    %2 = fir.load %1 : !fir.ptr<i32>
    hlfir.assign %2 to %arg1 temporary_lhs : i32, !fir.ref<!fir.box<!fir.ptr<i32>>>
    omp.yield(%arg1 : !fir.ref<!fir.box<!fir.ptr<i32>>>)
  }

As far as I understand, firstprivate should initialize the clone to the same target of the original value which does not happen now with delayed privatization (it changes the value of the target).

So this will need proper handling which I think makes sense to defer to a follow-up PR.

@ergawy
Copy link
Member Author

ergawy commented Mar 12, 2024

Would pointers come in a separate patch?

Done in latest commit.

@ergawy ergawy changed the title [flang][MLIR][OpenMP] Extend delayed privatization for scalar allocatables [flang][MLIR][OpenMP] Extend delayed privatization for scalar allocatables and pointers Mar 12, 2024
@ergawy
Copy link
Member Author

ergawy commented Mar 12, 2024

I think this is now ready for another round of review :). I am working on supporting delayed privatization for array at the moment 👨‍🏭 .

@ergawy
Copy link
Member Author

ergawy commented Mar 14, 2024

Ping 🔔! Can you please have another look?

I have #85023 lined up for array support and also working on supporting CHARACTER at the moment.

Copy link
Contributor

@kiranchandramohan kiranchandramohan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG.

@ergawy ergawy force-pushed the delayed_privatization_14 branch 2 times, most recently from b0976ad to 7816049 Compare March 15, 2024 13:48
…ables and pointers

One more step in extending support for delayed privatization. This diff
adds support for scalar allocatables and pointers.
@ergawy ergawy merged commit 87cee71 into llvm:main Mar 18, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flang:fir-hlfir flang:openmp flang Flang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants