Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[flang][OpenMP] Fix construct privatization in default clause #72510

Merged

Conversation

NimishMishra
Copy link
Contributor

@NimishMishra NimishMishra commented Nov 16, 2023

Current implementation of default clause privatization incorrectly fails to privatize in presence of non-OpenMP constructs (i.e. nested constructs with regions whose symbols need to be privatized in the scope of the parent OpenMP construct). This patch fixes the same by considering non-OpenMP constructs separately by collecting symbols of a nested region if it is a non-OpenMP construct with a region, and privatizing it in the scope of the parent OpenMP construct.

Fixes #71914 and #71915

@llvmbot
Copy link
Collaborator

llvmbot commented Nov 16, 2023

@llvm/pr-subscribers-flang-fir-hlfir

@llvm/pr-subscribers-flang-openmp

Author: None (NimishMishra)

Changes

Current implementation of default clause privatization incorrectly fails to privatize in presence of non-OpenMP constructs. This patch fixes the same by considering non-OpenMP constructs separately.

Fixes #71914 and #71915 and


Full diff: https://github.com/llvm/llvm-project/pull/72510.diff

4 Files Affected:

  • (modified) flang/lib/Lower/Bridge.cpp (+1-1)
  • (modified) flang/lib/Lower/OpenMP.cpp (+1-7)
  • (modified) flang/test/Lower/OpenMP/FIR/default-clause.f90 (+15-6)
  • (modified) flang/test/Lower/OpenMP/default-clause.f90 (+87-5)
diff --git a/flang/lib/Lower/Bridge.cpp b/flang/lib/Lower/Bridge.cpp
index 872bf6bc729ecd0..c0415ffc2d9505c 100644
--- a/flang/lib/Lower/Bridge.cpp
+++ b/flang/lib/Lower/Bridge.cpp
@@ -855,7 +855,7 @@ class FirConverter : public Fortran::lower::AbstractConverter {
                               bool collectSymbol) {
             if (collectSymbol && oriSymbol.test(flag))
               symbolSet.insert(&oriSymbol);
-            if (checkHostAssociatedSymbols)
+            else if (checkHostAssociatedSymbols)
               if (const auto *details{
                       oriSymbol
                           .detailsIf<Fortran::semantics::HostAssocDetails>()})
diff --git a/flang/lib/Lower/OpenMP.cpp b/flang/lib/Lower/OpenMP.cpp
index 00f16026d9b4bb8..0bbdae0b4ae7c2b 100644
--- a/flang/lib/Lower/OpenMP.cpp
+++ b/flang/lib/Lower/OpenMP.cpp
@@ -120,7 +120,6 @@ class DataSharingProcessor {
   llvm::SetVector<const Fortran::semantics::Symbol *> privatizedSymbols;
   llvm::SetVector<const Fortran::semantics::Symbol *> defaultSymbols;
   llvm::SetVector<const Fortran::semantics::Symbol *> symbolsInNestedRegions;
-  llvm::SetVector<const Fortran::semantics::Symbol *> symbolsInParentRegions;
   Fortran::lower::AbstractConverter &converter;
   fir::FirOpBuilder &firOpBuilder;
   const Fortran::parser::OmpClauseList &opClauseList;
@@ -426,14 +425,10 @@ void DataSharingProcessor::collectSymbols(
                              /*collectSymbols=*/true,
                              /*collectHostAssociatedSymbols=*/true);
   for (Fortran::lower::pft::Evaluation &e : eval.getNestedEvaluations()) {
-    if (e.hasNestedEvaluations())
+    if (e.hasNestedEvaluations() && !e.isConstruct())
       converter.collectSymbolSet(e, symbolsInNestedRegions, flag,
                                  /*collectSymbols=*/true,
                                  /*collectHostAssociatedSymbols=*/false);
-    else
-      converter.collectSymbolSet(e, symbolsInParentRegions, flag,
-                                 /*collectSymbols=*/false,
-                                 /*collectHostAssociatedSymbols=*/true);
   }
 }
 
@@ -485,7 +480,6 @@ void DataSharingProcessor::defaultPrivatize() {
         !sym->GetUltimate().has<Fortran::semantics::DerivedTypeDetails>() &&
         !sym->GetUltimate().has<Fortran::semantics::NamelistDetails>() &&
         !symbolsInNestedRegions.contains(sym) &&
-        !symbolsInParentRegions.contains(sym) &&
         !privatizedSymbols.contains(sym)) {
       cloneSymbol(sym);
       copyFirstPrivateSymbol(sym);
diff --git a/flang/test/Lower/OpenMP/FIR/default-clause.f90 b/flang/test/Lower/OpenMP/FIR/default-clause.f90
index 14c0d375896a1a1..8d131c5d69b16a4 100644
--- a/flang/test/Lower/OpenMP/FIR/default-clause.f90
+++ b/flang/test/Lower/OpenMP/FIR/default-clause.f90
@@ -192,31 +192,40 @@ subroutine nested_default_clause_tests
 !CHECK: omp.parallel {
 !CHECK: %[[PRIVATE_X:.*]] = fir.alloca i32 {bindc_name = "x", pinned, uniq_name = "_QFnested_default_clause_testsEx"}
 !CHECK: %[[PRIVATE_Y:.*]] = fir.alloca i32 {bindc_name = "y", pinned, uniq_name = "_QFnested_default_clause_testsEy"}
-!CHECK: %[[PRIVATE_Z:.*]] = fir.alloca i32 {bindc_name = "z", pinned, uniq_name = "_QFnested_default_clause_testsEz"}
 !CHECK: %[[PRIVATE_W:.*]] = fir.alloca i32 {bindc_name = "w", pinned, uniq_name = "_QFnested_default_clause_testsEw"} 
+!CHECK: %[[PRIVATE_Z:.*]] = fir.alloca i32 {bindc_name = "z", pinned, uniq_name = "_QFnested_default_clause_testsEz"}
 !CHECK: omp.parallel {
 !CHECK: %[[PRIVATE_INNER_X:.*]] = fir.alloca i32 {bindc_name = "x", pinned, uniq_name = "_QFnested_default_clause_testsEx"}
 !CHECK: %[[temp:.*]] = fir.load %[[PRIVATE_X]] : !fir.ref<i32>
 !CHECK: fir.store %[[temp]] to %[[PRIVATE_INNER_X]] : !fir.ref<i32>
-!CHECK: %[[INNER_PRIVATE_Y:.*]] = fir.alloca i32 {bindc_name = "y", pinned, uniq_name = "_QFnested_default_clause_testsEy"}
+!CHECK: %[[PRIVATE_INNER_Y:.*]] = fir.alloca i32 {bindc_name = "y", pinned, uniq_name = "_QFnested_default_clause_testsEy"}
 !CHECK: %[[temp:.*]] = fir.load %[[PRIVATE_Y]] : !fir.ref<i32>
-!CHECK: fir.store %[[temp]] to %[[INNER_PRIVATE_Y]] : !fir.ref<i32>
-!CHECK: %[[temp:.*]] = fir.load %[[INNER_PRIVATE_Y]] : !fir.ref<i32>
+!CHECK: fir.store %[[temp]] to %[[PRIVATE_INNER_Y]] : !fir.ref<i32>
+!CHECK: %[[PRIVATE_INNER_W:.*]] = fir.alloca i32 {bindc_name = "w", pinned, uniq_name = "_QFnested_default_clause_testsEw"}
+!CHECK: %[[TEMP:.*]] = fir.load %[[PRIVATE_W]] : !fir.ref<i32>
+!CHECK: fir.store %[[TEMP]] to %[[PRIVATE_INNER_W]]
+!CHECK: %[[temp:.*]] = fir.load %[[PRIVATE_INNER_Y]] : !fir.ref<i32>
 !CHECK: fir.store %[[temp]] to %[[PRIVATE_INNER_X]] : !fir.ref<i32>
+!CHECK: %[[LOADED_W:.*]] = fir.load %[[PRIVATE_INNER_W]] : !fir.ref<i32>
+!CHECK: %[[CONST:.*]] = arith.constant 1 : i32
+!CHECK: %[[RESULT:.*]] = arith.addi %[[LOADED_W]], %[[CONST]] : i32
+!CHECK: fir.store %[[RESULT]] to %[[PRIVATE_INNER_W]] : !fir.ref<i32>
 !CHECK: omp.terminator
 !CHECK: }
 !CHECK: omp.parallel {
+!CHECK: %[[PRIVATE_INNER_Z:.*]] = fir.alloca i32 {bindc_name = "z", pinned, uniq_name = "_QFnested_default_clause_testsEz"}
 !CHECK: %[[PRIVATE_INNER_W:.*]] = fir.alloca i32 {bindc_name = "w", pinned, uniq_name = "_QFnested_default_clause_testsEw"} 
 !CHECK: %[[PRIVATE_INNER_X:.*]] = fir.alloca i32 {bindc_name = "x", pinned, uniq_name = "_QFnested_default_clause_testsEx"}
 !CHECK: %[[temp_1:.*]] = fir.load %[[PRIVATE_INNER_X]] : !fir.ref<i32>
-!CHECK: %[[temp_2:.*]] = fir.load %[[PRIVATE_Z]] : !fir.ref<i32>
-!CHECK: %[[result:.*]] = arith.addi %{{.*}}, %{{.*}} : i32
+!CHECK: %[[temp_2:.*]] = fir.load %[[PRIVATE_INNER_Z]] : !fir.ref<i32>
+!CHECK: %[[result:.*]] = arith.addi %[[temp_1]], %[[temp_2]] : i32
 !CHECK: fir.store %[[result]] to %[[PRIVATE_INNER_W]] : !fir.ref<i32>
 !CHECK: omp.terminator
 !CHECK: }
     !$omp parallel default(private)
         !$omp parallel default(firstprivate)
             x = y
+            w = w + 1
         !$omp end parallel
 
         !$omp parallel default(private) shared(z)
diff --git a/flang/test/Lower/OpenMP/default-clause.f90 b/flang/test/Lower/OpenMP/default-clause.f90
index 0a7443eecf52d77..c8188e21560a6fa 100644
--- a/flang/test/Lower/OpenMP/default-clause.f90
+++ b/flang/test/Lower/OpenMP/default-clause.f90
@@ -221,13 +221,14 @@ subroutine nested_default_clause_tests
     
     
 !CHECK: omp.parallel {
+!CHECK: %[[PRIVATE_X:.*]] = fir.alloca i32 {bindc_name = "x", pinned, uniq_name = "_QFnested_default_clause_testsEx"}
 !CHECK: %[[PRIVATE_X_DECL:.*]]:2 = hlfir.declare %[[PRIVATE_X]] {uniq_name = "_QFnested_default_clause_testsEx"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
 !CHECK: %[[PRIVATE_Y:.*]] = fir.alloca i32 {bindc_name = "y", pinned, uniq_name = "_QFnested_default_clause_testsEy"}
 !CHECK: %[[PRIVATE_Y_DECL:.*]]:2 = hlfir.declare %[[PRIVATE_Y]] {uniq_name = "_QFnested_default_clause_testsEy"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
-!CHECK: %[[PRIVATE_Z:.*]] = fir.alloca i32 {bindc_name = "z", pinned, uniq_name = "_QFnested_default_clause_testsEz"}
-!CHECK: %[[PRIVATE_Z_DECL:.*]]:2 = hlfir.declare %[[PRIVATE_Z]] {uniq_name = "_QFnested_default_clause_testsEz"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
 !CHECK: %[[PRIVATE_W:.*]] = fir.alloca i32 {bindc_name = "w", pinned, uniq_name = "_QFnested_default_clause_testsEw"}
 !CHECK: %[[PRIVATE_W_DECL:.*]]:2 = hlfir.declare %[[PRIVATE_W]] {uniq_name = "_QFnested_default_clause_testsEw"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
+!CHECK: %[[PRIVATE_Z:.*]] = fir.alloca i32 {bindc_name = "z", pinned, uniq_name = "_QFnested_default_clause_testsEz"}
+!CHECK: %[[PRIVATE_Z_DECL:.*]]:2 = hlfir.declare %[[PRIVATE_Z]] {uniq_name = "_QFnested_default_clause_testsEz"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
 !CHECK: omp.parallel {
 !CHECK: %[[PRIVATE_INNER_X:.*]] = fir.alloca i32 {bindc_name = "x", pinned, uniq_name = "_QFnested_default_clause_testsEx"}
 !CHECK: %[[PRIVATE_INNER_X_DECL:.*]]:2 = hlfir.declare %[[PRIVATE_INNER_X]] {uniq_name = "_QFnested_default_clause_testsEx"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
@@ -237,17 +238,27 @@ subroutine nested_default_clause_tests
 !CHECK: %[[INNER_PRIVATE_Y_DECL:.*]]:2 = hlfir.declare %[[INNER_PRIVATE_Y]] {uniq_name = "_QFnested_default_clause_testsEy"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
 !CHECK: %[[TEMP:.*]] = fir.load %[[PRIVATE_Y_DECL]]#0 : !fir.ref<i32>
 !CHECK: hlfir.assign %[[TEMP]] to %[[INNER_PRIVATE_Y_DECL]]#0 temporary_lhs : i32, !fir.ref<i32>
+!CHECK: %[[INNER_PRIVATE_W:.*]] = fir.alloca i32 {bindc_name = "w", pinned, uniq_name = "_QFnested_default_clause_testsEw"}
+!CHECK: %[[INNER_PRIVATE_W_DECL:.*]]:2 = hlfir.declare %[[INNER_PRIVATE_W]] {{.*}}
+!CHECK: %[[TEMP:.*]] = fir.load %[[PRIVATE_W_DECL]]#0 : !fir.ref<i32>
+!CHECK: hlfir.assign %[[TEMP]] to %[[INNER_PRIVATE_W_DECL]]#0 temporary_lhs : i32, !fir.ref<i32>
 !CHECK: %[[TEMP:.*]] = fir.load %[[INNER_PRIVATE_Y_DECL]]#0 : !fir.ref<i32>
 !CHECK: hlfir.assign %[[TEMP]] to %[[PRIVATE_INNER_X_DECL]]#0 : i32, !fir.ref<i32>
+!CHECK: %[[TEMP:.*]] = fir.load %[[INNER_PRIVATE_W_DECL]]#0 : !fir.ref<i32>
+!CHECK: %[[CONSTANT:.*]] = arith.constant 1 : i32
+!CHECK: %[[RESULT:.*]] = arith.addi %[[TEMP]], %[[CONSTANT]] : i32
+!CHECK: hlfir.assign %[[RESULT]] to %[[INNER_PRIVATE_W_DECL]]#0 : i32, !fir.ref<i32>
 !CHECK: omp.terminator
 !CHECK: }
 !CHECK: omp.parallel {
+!CHECK: %[[PRIVATE_INNER_Z:.*]] = fir.alloca i32 {bindc_name = "z", pinned, uniq_name = "_QFnested_default_clause_testsEz"}
+!CHECK: %[[PRIVATE_INNER_Z_DECL:.*]]:2 = hlfir.declare %[[PRIVATE_INNER_Z]] {{.*}}
 !CHECK: %[[PRIVATE_INNER_W:.*]] = fir.alloca i32 {bindc_name = "w", pinned, uniq_name = "_QFnested_default_clause_testsEw"}
 !CHECK: %[[PRIVATE_INNER_W_DECL:.*]]:2 = hlfir.declare %[[PRIVATE_INNER_W]] {uniq_name = "_QFnested_default_clause_testsEw"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
 !CHECK: %[[PRIVATE_INNER_X:.*]] = fir.alloca i32 {bindc_name = "x", pinned, uniq_name = "_QFnested_default_clause_testsEx"}
 !CHECK: %[[PRIVATE_INNER_X_DECL:.*]]:2 = hlfir.declare %[[PRIVATE_INNER_X]] {uniq_name = "_QFnested_default_clause_testsEx"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
 !CHECK: %[[TEMP_1:.*]] = fir.load %[[PRIVATE_INNER_X_DECL]]#0 : !fir.ref<i32>
-!CHECK: %[[TEMP_2:.*]] = fir.load %[[PRIVATE_Z_DECL]]#0 : !fir.ref<i32>
+!CHECK: %[[TEMP_2:.*]] = fir.load %[[PRIVATE_INNER_Z_DECL]]#0 : !fir.ref<i32>
 !CHECK: %[[RESULT:.*]] = arith.addi %{{.*}}, %{{.*}} : i32
 !CHECK: hlfir.assign %[[RESULT]] to %[[PRIVATE_INNER_W_DECL]]#0 : i32, !fir.ref<i32>
 !CHECK: omp.terminator
@@ -255,6 +266,7 @@ subroutine nested_default_clause_tests
     !$omp parallel default(private)
         !$omp parallel default(firstprivate)
             x = y
+            w = w + 1
         !$omp end parallel
 
         !$omp parallel default(private) shared(z)
@@ -317,14 +329,84 @@ subroutine nested_default_clause_tests
 !CHECK: omp.terminator
 !CHECK: }
 !CHECK: omp.terminator
-!CHECK: }
-!CHECK: return
 !CHECK: } 
 	!$omp parallel default(firstprivate)
 		!$omp single
 			x = y
 		!$omp end single
 	!$omp end parallel
+
+!CHECK: omp.parallel {
+!CHECK: %[[LOOP_VAR_ALLOCA:.*]] = fir.alloca i32 {adapt.valuebyref, pinned}
+!CHECK: %[[LOOP_VAR_DECLARE:.*]]:2 = hlfir.declare %[[LOOP_VAR_ALLOCA]] {{.*}}
+!CHECK: %[[X_ALLOCA:.*]] = fir.alloca i32 {bindc_name = "x", pinned, uniq_name = "_QFnested_default_clause_testsEx"}
+!CHECK: %[[X_DECLARE:.*]]:2 = hlfir.declare %[[X_ALLOCA]] {{.*}}
+!CHECK: %[[CONST_LB:.*]] = arith.constant 1 : i32
+!CHECK: %[[CONST_UB:.*]] = arith.constant 50 : i32
+!CHECK: %[[CONST_STEP:.*]] = arith.constant 1 : i32
+!CHECK: omp.wsloop for  (%[[ARG:.*]]) : i32 = (%[[CONST_LB]]) to (%[[CONST_UB]]) inclusive step (%[[CONST_STEP]]) {
+!CHECK: fir.store %[[ARG]] to %[[LOOP_VAR_DECLARE]]#1 : !fir.ref<i32>
+!CHECK: %[[LOADED_X:.*]] = fir.load %[[X_DECLARE]]#0 : !fir.ref<i32>
+!CHECK: %[[CONST:.*]] = arith.constant 1 : i32
+!CHECK: %[[RESULT:.*]] = arith.addi %[[LOADED_X]], %[[CONST]] : i32
+!CHECK: hlfir.assign %[[RESULT]] to %[[X_DECLARE]]#0 : i32, !fir.ref<i32>
+!CHECK: omp.yield
+!CHECK: }
+!CHECK: omp.terminator
+!CHECK: }
+        !$omp parallel do private(x)
+                do i=1, 50
+                  x = x + 1
+                end do
+        !$omp end parallel do
+
+!CHECK: omp.parallel {
+!CHECK: %[[LOOP_VAR:.*]] = fir.alloca i32 {adapt.valuebyref, pinned}
+!CHECK: %[[LOOP_VAR_DECLARE:.*]]:2 = hlfir.declare %[[LOOP_VAR]] {{.*}}
+!CHECK: %[[X_VAR:.*]] = fir.alloca i32 {bindc_name = "x", pinned, uniq_name = "_QFnested_default_clause_testsEx"}
+!CHECK: %[[X_VAR_DECLARE:.*]]:2 = hlfir.declare %[[X_VAR]] {{.*}}
+!CHECK: %[[Y_VAR:.*]] = fir.alloca i32 {bindc_name = "y", pinned, uniq_name = "_QFnested_default_clause_testsEy"}
+!CHECK: %[[Y_VAR_DECLARE:.*]]:2 = hlfir.declare %[[Y_VAR]] {{.*}}
+!CHECK: %[[Z_VAR:.*]] = fir.alloca i32 {bindc_name = "z", pinned, uniq_name = "_QFnested_default_clause_testsEz"}
+!CHECK: %[[Z_VAR_DECLARE:.*]]:2 = hlfir.declare %[[Z_VAR]] {{.*}}
+!CHECK: %[[CONST_LB:.*]] = arith.constant 1 : i32
+!CHECK: %[[CONST_UB:.*]] = arith.constant 10 : i32
+!CHECK: %[[CONST_STEP:.*]] = arith.constant 1 : i32
+!CHECK: omp.wsloop for (%[[ARG:.*]]) : i32 = (%[[CONST_LB]]) to (%[[CONST_UB]]) inclusive step (%[[CONST_STEP]]) {
+!CHECK: fir.store %[[ARG]] to %[[LOOP_VAR_DECLARE]]#1 : !fir.ref<i32>
+!CHECK: %[[LOADED_X:.*]] = fir.load %[[X_VAR_DECLARE]]#0 : !fir.ref<i32>
+!CHECK: %[[CONST:.*]] = arith.constant 1 : i32
+!CHECK: %[[ADD:.*]] = arith.addi %[[LOADED_X]], %[[CONST]] : i32
+!CHECK: hlfir.assign %[[ADD]] to %[[X_VAR_DECLARE]]#0 : i32, !fir.ref<i32>
+!CHECK: omp.parallel {
+!CHECK: %[[INNER_Y_ALLOCA:.*]] = fir.alloca i32 {bindc_name = "y", pinned, uniq_name = "_QFnested_default_clause_testsEy"}
+!CHECK: %[[INNER_Y_DECLARE:.*]]:2 = hlfir.declare %[[INNER_Y_ALLOCA]] {{.}}
+!CHECK: %[[TEMP:.*]] = fir.load %[[Y_VAR_DECLARE]]#0 : !fir.ref<i32>
+!CHECK: hlfir.assign %[[TEMP]] to %[[INNER_Y_DECLARE]]#0 temporary_lhs : i32, !fir.ref<i32>
+!CHECK: %[[INNER_Z_ALLOCA:.*]] = fir.alloca i32 {bindc_name = "z", pinned, uniq_name = "_QFnested_default_clause_testsEz"}
+!CHECK: %[[INNER_Z_DECLARE:.*]]:2 = hlfir.declare %[[INNER_Z_ALLOCA]] {{.}}
+!CHECK: %[[TEMP:.*]] = fir.load %[[Z_VAR_DECLARE]]#0 : !fir.ref<i32>
+!CHECK: hlfir.assign %[[TEMP]] to %[[INNER_Z_DECLARE]]#0 temporary_lhs : i32, !fir.ref<i32>
+!CHECK: %[[LOADED_Y:.*]] = fir.load %[[INNER_Y_DECLARE]]#0 : !fir.ref<i32>
+!CHECK: %[[LOADED_Z:.*]] = fir.load %[[INNER_Z_DECLARE]]#0 : !fir.ref<i32>
+!CHECK: %[[RESULT:.*]] = arith.addi %[[LOADED_Y]], %[[LOADED_Z]] : i32
+!CHECK: hlfir.assign %[[RESULT]] to %[[INNER_Y_DECLARE]]#0 : i32, !fir.ref<i32>
+!CHECK: omp.terminator
+!CHECK: }
+!CHECK: omp.yield
+!CHECK: }
+!CHECK: omp.terminator
+!CHECK: }
+!CHECK: return
+!CHECK: }
+  !$omp parallel do default(private) 
+      do i = 1, 10
+          x  = x + 1
+          !$omp parallel default(firstprivate)
+             y = y + z
+          !$omp end parallel
+      end do
+  !$omp end parallel do
 end subroutine
 
 !CHECK: func.func @_QPskipped_default_clause_checks() {

Copy link
Contributor

@kiranchandramohan kiranchandramohan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fails to privatize in presence of non-OpenMP constructs.

Could you comment on what non-OpenMP constructs are you talking about here? Is it constructs with regions?

In general, could you expan on the issue and the fix?

flang/lib/Lower/Bridge.cpp Show resolved Hide resolved
@@ -120,7 +120,6 @@ class DataSharingProcessor {
llvm::SetVector<const Fortran::semantics::Symbol *> privatizedSymbols;
llvm::SetVector<const Fortran::semantics::Symbol *> defaultSymbols;
llvm::SetVector<const Fortran::semantics::Symbol *> symbolsInNestedRegions;
llvm::SetVector<const Fortran::semantics::Symbol *> symbolsInParentRegions;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not required anymore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad design from my end. Earlier, we were collecting both the symbol as well as the host-association details, and then removing one of them using this list. I have modified the current implementation to a use a else if, and hence this isn't required.

@NimishMishra NimishMishra force-pushed the default-privatization-nested-construct branch from b103d7f to 8fffade Compare November 22, 2023 09:53
flang/lib/Lower/OpenMP.cpp Outdated Show resolved Hide resolved
@NimishMishra
Copy link
Contributor Author

Ping for review!

@@ -426,14 +425,10 @@ void DataSharingProcessor::collectSymbols(
/*collectSymbols=*/true,
/*collectHostAssociatedSymbols=*/true);
for (Fortran::lower::pft::Evaluation &e : eval.getNestedEvaluations()) {
if (e.hasNestedEvaluations())
if (e.hasNestedEvaluations() && !e.isConstruct())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it collected in non-constructs (like OpenMP)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kiranchandramohan I was on leave mainly for past few days; sorry about the delayed response.

So we maintain two lists: defaultSymbols and symbolsInNestedRegions. For non-constructs like OpenMP, we would not like to collected the host associated symbols for nested regions (thereby collectHostAssociatedSymbols = false). They can be handled as the default clause may be in their respective nested blocks. This was the earlier design.

However, for non-OpenMP constructs, the handling is different. This is because even though the inner construct has a block region (like OpenMP constructs do), all privatization of variables in the nested construct needs to happen in the outer OpenMP construct. Hence, we do not need to populate the symbolsInNestedRegions list. Otherwise, no privatization will happen, leading to incorrect results.

That said, if you think so, I can relook the design of the lists once. It has been quite some time when we implemented the default clause. So maybe, we can think of improving this lowering flow once. What are your suggestions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kiranchandramohan A ping on this! How can we take this forward?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we maintain two lists: defaultSymbols and symbolsInNestedRegions. For non-constructs like OpenMP, we would not like to collected the host associated symbols for nested regions (thereby collectHostAssociatedSymbols = false). They can be handled as the default clause may be in their respective nested blocks. This was the earlier design.

Some OpenMP constructs (like critical) will not have a default clause because such a clause is not supported for these constructs. Some OpenMP constructs might not have them specified (i.e no default clause is present eventhough it is supported). Do we need to distinguish these cases? For those constructs which have them specified, for every pair of defaults do their exact values matter for the inner construct's symbols?

outer(default=shared), inner(default=shared) : Nothing to do
outer(default=private), inner(default=shared) : ?
outer(default=shared), inner(default=private) : ?
outer(default=private),inner(default=private) : ?

However, for non-OpenMP constructs, the handling is different. This is because even though the inner construct has a block region (like OpenMP constructs do), all privatization of variables in the nested construct needs to happen in the outer OpenMP construct. Hence, we do not need to populate the symbolsInNestedRegions list.

Makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kiranchandramohan I checked for the use-cases mentioned. The lowering seems to be okay. Should I add these combinations in the tests for default clause?

outer(default=private), inner(default=private)

Consider the following example:

integer :: j
!$omp parallel default(private)
      j = 10
      !$omp parallel default(firstprivate)
                 j = j + 5
      !$omp end parallel
!$omp end parallel

This lowers as:

%0 = fir.alloca i32 ......
%1:2 = hlfir.declare %0 .....
omp.parallel {
       %2 = fir.alloca i32 .....      // privatized j
        %3:2 = hlfir.declare %2
        %c = arith.constant 10 
        hlfir.assign %c to %3#0 ....
        omp.parallel {
                 %4 = fir.alloca i32 ....
                 %5:2 = hlfir.declare %4 ....
                 %6 = fir.load %3#0 .....
                 hlfir.assign %6 to %5#0 ......
        }
}

outer(default=private), inner(default=shared)

Example:

integer :: j
!$omp parallel default(private)
      j = 10
      !$omp parallel
                 j = j + 5
      !$omp end parallel
!$omp end parallel

This lowers as:

%0 = fir.alloca i32 ......
%1:2 = hlfir.declare %0 .....
omp.parallel {
       %2 = fir.alloca i32 .....      // privatized j
        %3:2 = hlfir.declare %2
        %c = arith.constant 10 
        hlfir.assign %c to %3#0 ....
        omp.parallel {
                 %4 = fir.load %3#0 .....
                 // add 5 to %4 and store in %5
                 hlfir.assign %5 to %3#0 ....
        }
}

outer(default=shared), inner(default=private)

Example:

integer :: j
!$omp parallel
      j = 10
      !$omp parallel default(private)
                 j = j + 5
      !$omp end parallel
!$omp end parallel

This lowers as:

%0 = fir.alloca i32 ......
%1:2 = hlfir.declare %0 .....
omp.parallel {
        %c = arith.constant 10 
        hlfir.assign %c to %3#0 ....
        omp.parallel {
                 %2 = fir.alloca ....
                 %3:2 = hlfir.declare %2 ....
                 %4 = fir.load %3#0 .....
                 // add 5 to %4 and store in %5
                 hlfir.assign %5 to %3#0 ....
        }
}

And should be consider firstprivate in the inner parallel:

%0 = fir.alloca i32 ......
%1:2 = hlfir.declare %0 .....
omp.parallel {
        %c = arith.constant 10 
        hlfir.assign %c to %3#0 ....
        omp.parallel {
                 %2 = fir.alloca ....
                 %3:2 = hlfir.declare %2 ....
                 %4 = fir.load %1#0 ....
                 hlfir.assign %4 to %3#0 .....
                 %5 = fir.load %3#0 .....
                 // add 5 to %4 and store in %6
                 hlfir.assign %6 to %3#0 ....
        }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, for constructs like critical not having an explicit default clause, the lowered IR is in line with the correct symbols that the construct should be using.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to understand the need to distinguish between OpenMP and non-OpenMP constructs for collecting symbols recursively. Particularly in cases where the variable on which a default applies is only visible inside the nested OpenMP construct. Consider the following example :

program mn
  call sb1()
contains
 subroutine sb1()
   integer :: x
   x = 20
   !$omp parallel default(private)
   !$omp critical
   x = 30
   !$omp end critical
   !$omp end parallel
   print *, x
 end subroutine
end program

But the following example with nested parallel works. I am assuming the semantics code is creating symbols here. Is that the reason?
If so, why not handle everything (nested openmp and nested non-openmp constructs) in the frontend? What is the split between semantics and the code here?

program mn
  call sb1()
contains
 subroutine sb1()
   integer :: x
   x = 20
   !$omp parallel default(private)
   !$omp parallel 
   x = 30
   !$omp end parallel
   !$omp end parallel
   print *, x
 end subroutine
end program

Copy link
Contributor Author

@NimishMishra NimishMishra Feb 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @kiranchandramohan,

I re-looked at the possibility of handling the entire default clause at one place. For instance, I checked whether we can handle it completely during lowering. One possible way:

  1. Use a function similar to collectSymbolSet with a bool checkFlag, such that the function checks symbol flag iff checkFlag = True, otherwise it simply adds the symbol to the symbol list.

  2. Given any OpenMP context, all symbols encountered in the same context can then be collected through checkFlag = True. In other words, any symbol in the same region where default(private) is specified should only collect symbols with the private data-sharing attribute. However, any symbol in a nested region can be collected irrespective of the specified flag on that symbol.

This approach, unfortunately, does not work without the symbol modifications we currently do in semantics.

For your question on handling OpenMP and non OpenMP constructs in semantics itself, currently, both are being handled. It's just that while lowering, for nested OpenMP constructs, we need to collect host-associated symbols as well. For non-OpenMP constructs, we skip host-associated symbols.

I tried a few variations in which to simply this (use a single list defaultSymbols instead of two lists: defaultSymbols and symbolsInNestedRegions). But so far, nothing concrete has come up...

@@ -426,14 +425,10 @@ void DataSharingProcessor::collectSymbols(
/*collectSymbols=*/true,
/*collectHostAssociatedSymbols=*/true);
for (Fortran::lower::pft::Evaluation &e : eval.getNestedEvaluations()) {
if (e.hasNestedEvaluations())
if (e.hasNestedEvaluations() && !e.isConstruct())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a hard time trying to understand how collectSymbols() work.
This is how I understand it:

  • First, all symbols used in eval that match flag are collected into defaultSymbols, including host associated symbols
    (why host associated symbols need to be collected? To look at the outer scope? But if the symbol in the outer scope matches the default DSA, does anything need to be done?)
  • For each evaluation that is a direct child of eval:
    • If the evaluation has children and is not a Fortran construct (such as if, do, etc)
      • Add the symbols used in it, that match flag, to symbolsInNestedRegions, skipping host associated symbols
        (because otherwise symbols from the top eval level could also be collected?)
  • Later, in defaultPrivatize, symbolsInNestedRegions are excluded from defaultSymbols, presumably because each nested region will handle them appropriately. Then, the remaining symbols are privatized, with some exceptions, such as procedures.

Is my understanding correct?

It would be nice to add a brief comment to collectSymbols, explaining what it does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @luporl for the review. Yes, the understanding is correct. Since each nested region needs to handle the symbols separately, we needed to make collectSymbols as it is now. In fact, in response to Kiran's comment above, I have given the lowered IR for combinations of nested constructs with default clause. You could check if you are ok with the current behaviour.

And sure, I'll capture more details on collectSymbols with this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation @NimishMishra. The current behavior seems ok to me.
I'm planning to use collectSymbols to handle symbols with implicit DSA, such as those of task, that must be firstprivate if no default clause is present (and outer context doesn't define it as shared).

@mjklemm
Copy link
Contributor

mjklemm commented Feb 5, 2024

I have commented the correct data sharing for the given example as comments in the code:

program mn
  call sb1()
contains
 subroutine sb1()
   integer :: x 
   x = 20
   !$omp parallel default(private)  ! x is declare outside of the region, so it's shared by default, default(private) changes this to private
   !$omp parallel  ! the private x of the outer region is "declared" outside of this region, thus it's shared for the nested region
   x = 30
   !$omp end parallel
   !$omp end parallel
   print *, x
 end subroutine
end program

So, all OpenMP constructs that may take data-sharing attribute clauses may sort of have an implicit default data-sharing attribute or inherit from the outer scope. For instance, parallel uses the implicit default data-sharing attribute, while critical just uses what's coming from the outer scope.

@luporl
Copy link
Contributor

luporl commented Feb 5, 2024

So, all OpenMP constructs that may take data-sharing attribute clauses may sort of have an implicit default data-sharing attribute or inherit from the outer scope. For instance, parallel uses the implicit default data-sharing attribute, while critical just uses what's coming from the outer scope.

OpenMP 4.5, subclause 2.15.1.1 lists the rules of implicit data-sharing attributes. In fact, some of them are missing in Flang, like that of task: #64480

@luporl
Copy link
Contributor

luporl commented Mar 13, 2024

I have been using this patch (rebased on top of main) without issues for some time.
But today I have noticed a strange behavior with a test program, which I reduced to the following:

!$omp parallel do default(private) firstprivate(y)
  do i = 1, 10
    y = 1
    !$omp parallel default(private)
      y = 2
    !$omp end parallel
  end do
!$omp end parallel do
end

This program hits an assert:
Assertion failed: (success && "Privatization failed due to existing binding"), function cloneSymbol, file DataSharingProcessor.cpp, line 74.

The problem is that the inner parallel directive is inside a Fortran construct (do), which causes its symbols to not be collected as symbolsInNestedRegions. This results in defaultPrivatize trying to privatize y twice.

A possible solution could be to visit nested evaluations recursively and, in each one, collect the symbols from the first OpenMP construct (if any).

@NimishMishra
Copy link
Contributor Author

Thank you @luporl for pointing this out.

I think you are right in stating that a recursive solution would be better in this context. I'll work on this and send out a revision.

@NimishMishra
Copy link
Contributor Author

A possible solution could be to visit nested evaluations recursively and, in each one, collect the symbols from the first OpenMP construct (if any).

Hi @luporl,

I am revisiting this patch, and am wondering if recursion is needed here. Checking up with you once if I am missing something.

!$omp parallel do default(private) firstprivate(y)
  do i = 1, 10
    y = 1           ! Point X
    !$omp parallel default(private)
      y = 2        ! Point Y
    !$omp end parallel
  end do
!$omp end parallel do
end

Then collectSymbolSet() on the outermost evaluation shall collect all symbols within the region. Then we enter into its nested evaluations and catch do{... region ...}, which is essentially,

do{
     assignment-stmt
     parallel{... region ...}
}

So if we iterate over these evaluations, and do the following:

  1. For non-OpenMP constructs, skip adding symbols to symbolsInNestedRegion. This shall skip the symbol y marked at Point X.
  2. For OpenMP constructs, add symbols. This shall add the symbol y marked at Point Y.

Concisely, for a do{... region ...}, if we have a OpenMP region inside it, then we can add all symbols within it to symbolsInNestedRegion with just one collectSymbolSet call (which in turn is recursive). So I was wondering if we need additional recursion at this point.

@luporl
Copy link
Contributor

luporl commented Mar 26, 2024

Hi @NimishMishra,

I am revisiting this patch, and am wondering if recursion is needed here. Checking up with you once if I am missing something.

!$omp parallel do default(private) firstprivate(y)
  do i = 1, 10
    y = 1           ! Point X
    !$omp parallel default(private)
      y = 2        ! Point Y
    !$omp end parallel
  end do
!$omp end parallel do
end

Then collectSymbolSet() on the outermost evaluation shall collect all symbols within the region. Then we enter into its nested evaluations and catch do{... region ...}, which is essentially,

Right. IIUC, do{... region ...} is the only nested evaluation in this case.

do{
     assignment-stmt
     parallel{... region ...}
}

So if we iterate over these evaluations, and do the following:

1. For non-OpenMP constructs, skip adding symbols to `symbolsInNestedRegion`. This shall skip the symbol `y` marked at `Point X`.

2. For OpenMP constructs, add symbols. This shall add the symbol `y` marked at `Point Y`.

Concisely, for a do{... region ...}, if we have a OpenMP region inside it, then we can add all symbols within it to symbolsInNestedRegion with just one collectSymbolSet call (which in turn is recursive). So I was wondering if we need additional recursion at this point.

Let me see if I understood it correctly. Are you proposing to add an extra loop to iterate over the nested evaluations of each "top" nested evaluation, applying rules 1 and 2 to each of them?
If so, this should fix this test program, but not others with deeper nesting, such as:

!$omp parallel do default(private) firstprivate(y)
  do i = 1, 10
    y = 1
    do j = 1, 10
      !$omp parallel default(private)
        y = 2
      !$omp end parallel
    end do
  end do
!$omp end parallel do
end

That's why I think that, at least with this approach, recursion would be needed to go through the nested evaluations properly. I agree that whenever an OpenMP construct is found, one call to collectSymbolSet is enough to collect all symbols in it and no further recursion is needed in that branch. But for non-OpenMP constructs, I don't see how to find if there is a nested OpenMP construct in them without scanning each nested evaluation recursively. I hope this won't be too costly in large programs.

@NimishMishra
Copy link
Contributor Author

Thanks @luporl. Yeah, I think recursion is the best way to go forward. I made the required changes for a recursive solution, and for the following:

program main
      integer :: y, z
      !$omp parallel default(private) firstprivate(y)
      do i = 1, 10 
            y = 1
            do j = 1, 10
            z = 20
            !$omp parallel default(private)
                y = 2
            !$omp end parallel
            end do
        end do
      !$omp end parallel 
end program

I have the following FIR:

%0 = fir.alloca i32 {bindc_name = "i", uniq_name = "_QFEi"}
    %1 = fir.declare %0 {uniq_name = "_QFEi"} : (!fir.ref<i32>) -> !fir.ref<i32>
    %2 = fir.alloca i32 {bindc_name = "j", uniq_name = "_QFEj"}
    %3 = fir.declare %2 {uniq_name = "_QFEj"} : (!fir.ref<i32>) -> !fir.ref<i32>
    %4 = fir.alloca i32 {bindc_name = "y", uniq_name = "_QFEy"}
    %5 = fir.declare %4 {uniq_name = "_QFEy"} : (!fir.ref<i32>) -> !fir.ref<i32>
    %6 = fir.alloca i32 {bindc_name = "z", uniq_name = "_QFEz"}
    %7 = fir.declare %6 {uniq_name = "_QFEz"} : (!fir.ref<i32>) -> !fir.ref<i32>
    omp.parallel {
      %8 = fir.alloca i32 {bindc_name = "j", pinned}
      %9 = fir.declare %8 {uniq_name = "_QFEj"} : (!fir.ref<i32>) -> !fir.ref<i32>
      %10 = fir.alloca i32 {bindc_name = "i", pinned}
      %11 = fir.declare %10 {uniq_name = "_QFEi"} : (!fir.ref<i32>) -> !fir.ref<i32>
      %12 = fir.alloca i32 {bindc_name = "y", pinned, uniq_name = "_QFEy"}
      %13 = fir.declare %12 {uniq_name = "_QFEy"} : (!fir.ref<i32>) -> !fir.ref<i32>
      %14 = fir.load %5 : !fir.ref<i32>
      fir.store %14 to %13 : !fir.ref<i32>
      %15 = fir.alloca i32 {bindc_name = "z", pinned, uniq_name = "_QFEz"}
      %16 = fir.declare %15 {uniq_name = "_QFEz"} : (!fir.ref<i32>) -> !fir.ref<i32>
      %17 = fir.convert %c1_i32 : (i32) -> index
      %18 = fir.convert %c10_i32 : (i32) -> index
      %19 = fir.convert %17 : (index) -> i32
      %20:2 = fir.do_loop %arg0 = %17 to %18 step %c1 iter_args(%arg1 = %19) -> (index, i32) {
        fir.store %arg1 to %11 : !fir.ref<i32>
        fir.store %c1_i32 to %13 : !fir.ref<i32>
        %21 = fir.convert %c1_i32 : (i32) -> index
        %22 = fir.convert %c10_i32 : (i32) -> index
        %23 = fir.convert %21 : (index) -> i32
        %24:2 = fir.do_loop %arg2 = %21 to %22 step %c1 iter_args(%arg3 = %23) -> (index, i32) {
          fir.store %arg3 to %9 : !fir.ref<i32>
          fir.store %c20_i32 to %16 : !fir.ref<i32>
          omp.parallel {
            %33 = fir.alloca i32 {bindc_name = "y", pinned, uniq_name = "_QFEy"}
            %34 = fir.declare %33 {uniq_name = "_QFEy"} : (!fir.ref<i32>) -> !fir.ref<i32>
            fir.store %c2_i32 to %34 : !fir.ref<i32>
            omp.terminator
          }
          %29 = arith.addi %arg2, %c1 : index
          %30 = fir.convert %c1 : (index) -> i32
          %31 = fir.load %9 : !fir.ref<i32>
          %32 = arith.addi %31, %30 : i32
          fir.result %29, %32 : index, i32
        }
        fir.store %24#1 to %9 : !fir.ref<i32>
        %25 = arith.addi %arg0, %c1 : index
        %26 = fir.convert %c1 : (index) -> i32
        %27 = fir.load %11 : !fir.ref<i32>
        %28 = arith.addi %27, %26 : i32
        fir.result %25, %28 : index, i32
      }
      fir.store %20#1 to %11 : !fir.ref<i32>
      omp.terminator

The behaviour of the IR seems fine to me. Is it okay with you? I'll push the revision thereafter.

@luporl
Copy link
Contributor

luporl commented Mar 27, 2024

The behaviour of the IR seems fine to me. Is it okay with you? I'll push the revision thereafter.

@NimishMishra, the test program and resulting IR looks good to me too. Thanks for working on this.

@NimishMishra NimishMishra force-pushed the default-privatization-nested-construct branch from 8fffade to 9153b0a Compare March 28, 2024 06:53
@NimishMishra
Copy link
Contributor Author

NimishMishra commented Mar 28, 2024

The behaviour of the IR seems fine to me. Is it okay with you? I'll push the revision thereafter.

@NimishMishra, the test program and resulting IR looks good to me too. Thanks for working on this.

Thanks @luporl . I have pushed the change. I think a similar treatment (to deeply nested constructs) is also needed when we manipulate symbols while resolving directives. However, that problem has already been resolved with your change at #85989, where shared(z) uses the parent's copy. Once both the changes go in, the problem should be solved. Is that fine with you? Or should I merge both changes into one patch and then raise a review for the same?

Also requesting @kiranchandramohan's suggestion on this, once he is back...

@luporl
Copy link
Contributor

luporl commented Apr 1, 2024

Thanks @luporl . I have pushed the change. I think a similar treatment (to deeply nested constructs) is also needed when we manipulate symbols while resolving directives. However, that problem has already been resolved with your change at #85989, where shared(z) uses the parent's copy. Once both the changes go in, the problem should be solved. Is that fine with you? Or should I merge both changes into one patch and then raise a review for the same?

@NimishMishra, I think that, if it's not too much work, it would be nice to move from #85989 to this change the part that filters out symbols that are out of eval scope. It would be the if (clauseScopes.contains(&sym->owner())) symbols.insert(sym) part and all of its dependencies. This would avoid a regression in default clause handling, while #85989 doesn't land.

Copy link
Contributor

@luporl luporl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for the comments explaining how default privatization works.

flang/lib/Lower/OpenMP/DataSharingProcessor.cpp Outdated Show resolved Hide resolved
flang/lib/Lower/OpenMP/DataSharingProcessor.cpp Outdated Show resolved Hide resolved
@kiranchandramohan
Copy link
Contributor

Have all the changes been made here @NimishMishra? If so can @luporl approve?

@luporl
Copy link
Contributor

luporl commented Apr 9, 2024

Have all the changes been made here @NimishMishra? If so can @luporl approve?

Besides some minor nitpicks, there is an issue remaining, that causes a regression in some specific cases, as when you have nested default clauses with shared, like the example below:

    !$omp parallel default(private)
        !$omp parallel default(firstprivate)
            x = y
        !$omp end parallel
        !$omp parallel default(private) shared(z)
            w = x + z
        !$omp end parallel
    !$omp end parallel 

In this case, z will end up being privatized in the inner parallel. #85989 fixes this.
I think it would be better to move that fix to this patch. But if we assume that #85989 is going to be merged soon and you are also ok with this, then this can be approved as is.

@NimishMishra
Copy link
Contributor Author

I am okay with the changes in #85989; that merge can go ahead once this PR is merged. If @kiranchandramohan is fine with #85989, we can do thus. Otherwise, I will port the relevant code from that PR into this one. Let me know the preferred approach...

Copy link
Contributor

@kiranchandramohan kiranchandramohan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few questions. Otherwise looks OK.

I am okay with the changes in #85989; that merge can go ahead once this PR is merged. If @kiranchandramohan is fine with #85989, we can do thus. Otherwise, I will port the relevant code from that PR into this one. Let me know the preferred approach...

We can submit this and submit the relevant code either along with #85989 or as a separate patch.

else
converter.collectSymbolSet(nestedEval, symbolsInNestedRegions, flag,
/*collectSymbols=*/true,
/*collectHostAssociatedSymbols=*/false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is collectHostAssociatedSymbols false here? Is it because a previous invocation of collectSymbolSet has already collected it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that is correct.

Comment on lines 300 to 335
converter.collectSymbolSet(eval, defaultSymbols, flag,
/*collectSymbols=*/true,
/*collectHostAssociatedSymbols=*/true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be folded into collectSymbolsInNestedRegions? I see the value of collectHostAssociatedSymbols is different.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, that might not work. defaultSymbols collect both the symbols as well as host-associations (if any). While collectSymbolsInNestedRegions skip host associations. This prevents any duplicate privatization

// In step 1, collect all symbols in `eval` that match `flag`
// into `defaultSymbols`.
// In step 2, for nested constructs (if any), if and only if
// the nested construct is an OpenMP construct, collect those nested
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean if it not an OpenMP construct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For non-OpenMP constructs, we skip collecting any symbols in symbolsInNestedRegions.

eval.getNestedEvaluations()) {
if (nestedEval.hasNestedEvaluations()) {
if (nestedEval.isConstruct())
// Recursively look for OpenMP constructs within `nestedEval`'s region
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean look for non-OpenMP constructs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we want to capture symbols within nested OpenMP constructs in symbolsInNestedRegions.

In this case, the current evaluation is a non-OpenMP construct (like do). This recursive calls intends to find if there are any OpenMP constructs within this outer non-OpenMP construct.

@NimishMishra NimishMishra force-pushed the default-privatization-nested-construct branch 2 times, most recently from 3b12905 to d1da5ed Compare April 24, 2024 10:48
@NimishMishra
Copy link
Contributor Author

Thanks @luporl and @kiranchandramohan for the detailed reviews on this patch.

@luporl I have fixed description of collectSymbolSet as you mentioned. Does all look ok now? If so, could you approve the PR, post which I can submit it. Thanks!

@NimishMishra
Copy link
Contributor Author

Have a Error: clangformat failed to install. Will try rebasing after a while and see if that is fixed.

@kiranchandramohan
Copy link
Contributor

Writing a page (in flang/docs) on how default privatization is done (particularly how symbols are collected) will be good. You can do that after you submit this patch.

Copy link
Contributor

@luporl luporl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for all your work on this.

@NimishMishra NimishMishra force-pushed the default-privatization-nested-construct branch from d1da5ed to b86e099 Compare May 2, 2024 07:47
@NimishMishra
Copy link
Contributor Author

Accidentally closed. Rebasing and reopening

@NimishMishra NimishMishra reopened this May 2, 2024
@NimishMishra NimishMishra merged commit 37f6ba4 into llvm:main May 3, 2024
7 checks passed
@kiranchandramohan
Copy link
Contributor

@NimishMishra Could you remind me why we want to skip collecting symbols in nested OpenMP regions?

@NimishMishra
Copy link
Contributor Author

NimishMishra commented May 17, 2024

@NimishMishra Could you remind me why we want to skip collecting symbols in nested OpenMP regions?

Sure @kiranchandramohan. We need to do that since defaultSymbols captures both the symbol and its host association. In the nested OpenMP region, we collect the nested symbol (but not the host associations, that is why we set checkHostAssociatedSymbols = false in nested regions).

If we don't do this, then privatization fails due to already existing bindings.

      std::function<void(const Fortran::semantics::Symbol &, bool)>
          insertSymbols = [&](const Fortran::semantics::Symbol &oriSymbol,
                              bool collectSymbol) {
            if (collectSymbol && oriSymbol.test(flag))
              symbolSet.insert(&oriSymbol);
            else if (checkHostAssociatedSymbols)
              if (const auto *details{
                      oriSymbol
                          .detailsIf<Fortran::semantics::HostAssocDetails>()})
                insertSymbols(details->symbol(), true);
          };
      insertSymbols(sym, collectSymbols);
    };

This is the relevant functionality in collectSymbolSet. In case the check for symbol flag fails, we collect the host associated details in the outer region

@NimishMishra
Copy link
Contributor Author

I'm planning to spend time now on the documentation that you mentioned. I was held up mostly with the other issue, but now that is done. I'll send across a PR for review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flang:fir-hlfir flang:openmp flang Flang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Flang][OpenMP] default(private) failing with parallel do
5 participants