-
Notifications
You must be signed in to change notification settings - Fork 12k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[flang][OpenMP] Fix construct privatization in default clause #72510
[flang][OpenMP] Fix construct privatization in default clause #72510
Conversation
@llvm/pr-subscribers-flang-fir-hlfir @llvm/pr-subscribers-flang-openmp Author: None (NimishMishra) ChangesCurrent implementation of default clause privatization incorrectly fails to privatize in presence of non-OpenMP constructs. This patch fixes the same by considering non-OpenMP constructs separately. Full diff: https://github.com/llvm/llvm-project/pull/72510.diff 4 Files Affected:
diff --git a/flang/lib/Lower/Bridge.cpp b/flang/lib/Lower/Bridge.cpp
index 872bf6bc729ecd0..c0415ffc2d9505c 100644
--- a/flang/lib/Lower/Bridge.cpp
+++ b/flang/lib/Lower/Bridge.cpp
@@ -855,7 +855,7 @@ class FirConverter : public Fortran::lower::AbstractConverter {
bool collectSymbol) {
if (collectSymbol && oriSymbol.test(flag))
symbolSet.insert(&oriSymbol);
- if (checkHostAssociatedSymbols)
+ else if (checkHostAssociatedSymbols)
if (const auto *details{
oriSymbol
.detailsIf<Fortran::semantics::HostAssocDetails>()})
diff --git a/flang/lib/Lower/OpenMP.cpp b/flang/lib/Lower/OpenMP.cpp
index 00f16026d9b4bb8..0bbdae0b4ae7c2b 100644
--- a/flang/lib/Lower/OpenMP.cpp
+++ b/flang/lib/Lower/OpenMP.cpp
@@ -120,7 +120,6 @@ class DataSharingProcessor {
llvm::SetVector<const Fortran::semantics::Symbol *> privatizedSymbols;
llvm::SetVector<const Fortran::semantics::Symbol *> defaultSymbols;
llvm::SetVector<const Fortran::semantics::Symbol *> symbolsInNestedRegions;
- llvm::SetVector<const Fortran::semantics::Symbol *> symbolsInParentRegions;
Fortran::lower::AbstractConverter &converter;
fir::FirOpBuilder &firOpBuilder;
const Fortran::parser::OmpClauseList &opClauseList;
@@ -426,14 +425,10 @@ void DataSharingProcessor::collectSymbols(
/*collectSymbols=*/true,
/*collectHostAssociatedSymbols=*/true);
for (Fortran::lower::pft::Evaluation &e : eval.getNestedEvaluations()) {
- if (e.hasNestedEvaluations())
+ if (e.hasNestedEvaluations() && !e.isConstruct())
converter.collectSymbolSet(e, symbolsInNestedRegions, flag,
/*collectSymbols=*/true,
/*collectHostAssociatedSymbols=*/false);
- else
- converter.collectSymbolSet(e, symbolsInParentRegions, flag,
- /*collectSymbols=*/false,
- /*collectHostAssociatedSymbols=*/true);
}
}
@@ -485,7 +480,6 @@ void DataSharingProcessor::defaultPrivatize() {
!sym->GetUltimate().has<Fortran::semantics::DerivedTypeDetails>() &&
!sym->GetUltimate().has<Fortran::semantics::NamelistDetails>() &&
!symbolsInNestedRegions.contains(sym) &&
- !symbolsInParentRegions.contains(sym) &&
!privatizedSymbols.contains(sym)) {
cloneSymbol(sym);
copyFirstPrivateSymbol(sym);
diff --git a/flang/test/Lower/OpenMP/FIR/default-clause.f90 b/flang/test/Lower/OpenMP/FIR/default-clause.f90
index 14c0d375896a1a1..8d131c5d69b16a4 100644
--- a/flang/test/Lower/OpenMP/FIR/default-clause.f90
+++ b/flang/test/Lower/OpenMP/FIR/default-clause.f90
@@ -192,31 +192,40 @@ subroutine nested_default_clause_tests
!CHECK: omp.parallel {
!CHECK: %[[PRIVATE_X:.*]] = fir.alloca i32 {bindc_name = "x", pinned, uniq_name = "_QFnested_default_clause_testsEx"}
!CHECK: %[[PRIVATE_Y:.*]] = fir.alloca i32 {bindc_name = "y", pinned, uniq_name = "_QFnested_default_clause_testsEy"}
-!CHECK: %[[PRIVATE_Z:.*]] = fir.alloca i32 {bindc_name = "z", pinned, uniq_name = "_QFnested_default_clause_testsEz"}
!CHECK: %[[PRIVATE_W:.*]] = fir.alloca i32 {bindc_name = "w", pinned, uniq_name = "_QFnested_default_clause_testsEw"}
+!CHECK: %[[PRIVATE_Z:.*]] = fir.alloca i32 {bindc_name = "z", pinned, uniq_name = "_QFnested_default_clause_testsEz"}
!CHECK: omp.parallel {
!CHECK: %[[PRIVATE_INNER_X:.*]] = fir.alloca i32 {bindc_name = "x", pinned, uniq_name = "_QFnested_default_clause_testsEx"}
!CHECK: %[[temp:.*]] = fir.load %[[PRIVATE_X]] : !fir.ref<i32>
!CHECK: fir.store %[[temp]] to %[[PRIVATE_INNER_X]] : !fir.ref<i32>
-!CHECK: %[[INNER_PRIVATE_Y:.*]] = fir.alloca i32 {bindc_name = "y", pinned, uniq_name = "_QFnested_default_clause_testsEy"}
+!CHECK: %[[PRIVATE_INNER_Y:.*]] = fir.alloca i32 {bindc_name = "y", pinned, uniq_name = "_QFnested_default_clause_testsEy"}
!CHECK: %[[temp:.*]] = fir.load %[[PRIVATE_Y]] : !fir.ref<i32>
-!CHECK: fir.store %[[temp]] to %[[INNER_PRIVATE_Y]] : !fir.ref<i32>
-!CHECK: %[[temp:.*]] = fir.load %[[INNER_PRIVATE_Y]] : !fir.ref<i32>
+!CHECK: fir.store %[[temp]] to %[[PRIVATE_INNER_Y]] : !fir.ref<i32>
+!CHECK: %[[PRIVATE_INNER_W:.*]] = fir.alloca i32 {bindc_name = "w", pinned, uniq_name = "_QFnested_default_clause_testsEw"}
+!CHECK: %[[TEMP:.*]] = fir.load %[[PRIVATE_W]] : !fir.ref<i32>
+!CHECK: fir.store %[[TEMP]] to %[[PRIVATE_INNER_W]]
+!CHECK: %[[temp:.*]] = fir.load %[[PRIVATE_INNER_Y]] : !fir.ref<i32>
!CHECK: fir.store %[[temp]] to %[[PRIVATE_INNER_X]] : !fir.ref<i32>
+!CHECK: %[[LOADED_W:.*]] = fir.load %[[PRIVATE_INNER_W]] : !fir.ref<i32>
+!CHECK: %[[CONST:.*]] = arith.constant 1 : i32
+!CHECK: %[[RESULT:.*]] = arith.addi %[[LOADED_W]], %[[CONST]] : i32
+!CHECK: fir.store %[[RESULT]] to %[[PRIVATE_INNER_W]] : !fir.ref<i32>
!CHECK: omp.terminator
!CHECK: }
!CHECK: omp.parallel {
+!CHECK: %[[PRIVATE_INNER_Z:.*]] = fir.alloca i32 {bindc_name = "z", pinned, uniq_name = "_QFnested_default_clause_testsEz"}
!CHECK: %[[PRIVATE_INNER_W:.*]] = fir.alloca i32 {bindc_name = "w", pinned, uniq_name = "_QFnested_default_clause_testsEw"}
!CHECK: %[[PRIVATE_INNER_X:.*]] = fir.alloca i32 {bindc_name = "x", pinned, uniq_name = "_QFnested_default_clause_testsEx"}
!CHECK: %[[temp_1:.*]] = fir.load %[[PRIVATE_INNER_X]] : !fir.ref<i32>
-!CHECK: %[[temp_2:.*]] = fir.load %[[PRIVATE_Z]] : !fir.ref<i32>
-!CHECK: %[[result:.*]] = arith.addi %{{.*}}, %{{.*}} : i32
+!CHECK: %[[temp_2:.*]] = fir.load %[[PRIVATE_INNER_Z]] : !fir.ref<i32>
+!CHECK: %[[result:.*]] = arith.addi %[[temp_1]], %[[temp_2]] : i32
!CHECK: fir.store %[[result]] to %[[PRIVATE_INNER_W]] : !fir.ref<i32>
!CHECK: omp.terminator
!CHECK: }
!$omp parallel default(private)
!$omp parallel default(firstprivate)
x = y
+ w = w + 1
!$omp end parallel
!$omp parallel default(private) shared(z)
diff --git a/flang/test/Lower/OpenMP/default-clause.f90 b/flang/test/Lower/OpenMP/default-clause.f90
index 0a7443eecf52d77..c8188e21560a6fa 100644
--- a/flang/test/Lower/OpenMP/default-clause.f90
+++ b/flang/test/Lower/OpenMP/default-clause.f90
@@ -221,13 +221,14 @@ subroutine nested_default_clause_tests
!CHECK: omp.parallel {
+!CHECK: %[[PRIVATE_X:.*]] = fir.alloca i32 {bindc_name = "x", pinned, uniq_name = "_QFnested_default_clause_testsEx"}
!CHECK: %[[PRIVATE_X_DECL:.*]]:2 = hlfir.declare %[[PRIVATE_X]] {uniq_name = "_QFnested_default_clause_testsEx"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
!CHECK: %[[PRIVATE_Y:.*]] = fir.alloca i32 {bindc_name = "y", pinned, uniq_name = "_QFnested_default_clause_testsEy"}
!CHECK: %[[PRIVATE_Y_DECL:.*]]:2 = hlfir.declare %[[PRIVATE_Y]] {uniq_name = "_QFnested_default_clause_testsEy"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
-!CHECK: %[[PRIVATE_Z:.*]] = fir.alloca i32 {bindc_name = "z", pinned, uniq_name = "_QFnested_default_clause_testsEz"}
-!CHECK: %[[PRIVATE_Z_DECL:.*]]:2 = hlfir.declare %[[PRIVATE_Z]] {uniq_name = "_QFnested_default_clause_testsEz"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
!CHECK: %[[PRIVATE_W:.*]] = fir.alloca i32 {bindc_name = "w", pinned, uniq_name = "_QFnested_default_clause_testsEw"}
!CHECK: %[[PRIVATE_W_DECL:.*]]:2 = hlfir.declare %[[PRIVATE_W]] {uniq_name = "_QFnested_default_clause_testsEw"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
+!CHECK: %[[PRIVATE_Z:.*]] = fir.alloca i32 {bindc_name = "z", pinned, uniq_name = "_QFnested_default_clause_testsEz"}
+!CHECK: %[[PRIVATE_Z_DECL:.*]]:2 = hlfir.declare %[[PRIVATE_Z]] {uniq_name = "_QFnested_default_clause_testsEz"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
!CHECK: omp.parallel {
!CHECK: %[[PRIVATE_INNER_X:.*]] = fir.alloca i32 {bindc_name = "x", pinned, uniq_name = "_QFnested_default_clause_testsEx"}
!CHECK: %[[PRIVATE_INNER_X_DECL:.*]]:2 = hlfir.declare %[[PRIVATE_INNER_X]] {uniq_name = "_QFnested_default_clause_testsEx"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
@@ -237,17 +238,27 @@ subroutine nested_default_clause_tests
!CHECK: %[[INNER_PRIVATE_Y_DECL:.*]]:2 = hlfir.declare %[[INNER_PRIVATE_Y]] {uniq_name = "_QFnested_default_clause_testsEy"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
!CHECK: %[[TEMP:.*]] = fir.load %[[PRIVATE_Y_DECL]]#0 : !fir.ref<i32>
!CHECK: hlfir.assign %[[TEMP]] to %[[INNER_PRIVATE_Y_DECL]]#0 temporary_lhs : i32, !fir.ref<i32>
+!CHECK: %[[INNER_PRIVATE_W:.*]] = fir.alloca i32 {bindc_name = "w", pinned, uniq_name = "_QFnested_default_clause_testsEw"}
+!CHECK: %[[INNER_PRIVATE_W_DECL:.*]]:2 = hlfir.declare %[[INNER_PRIVATE_W]] {{.*}}
+!CHECK: %[[TEMP:.*]] = fir.load %[[PRIVATE_W_DECL]]#0 : !fir.ref<i32>
+!CHECK: hlfir.assign %[[TEMP]] to %[[INNER_PRIVATE_W_DECL]]#0 temporary_lhs : i32, !fir.ref<i32>
!CHECK: %[[TEMP:.*]] = fir.load %[[INNER_PRIVATE_Y_DECL]]#0 : !fir.ref<i32>
!CHECK: hlfir.assign %[[TEMP]] to %[[PRIVATE_INNER_X_DECL]]#0 : i32, !fir.ref<i32>
+!CHECK: %[[TEMP:.*]] = fir.load %[[INNER_PRIVATE_W_DECL]]#0 : !fir.ref<i32>
+!CHECK: %[[CONSTANT:.*]] = arith.constant 1 : i32
+!CHECK: %[[RESULT:.*]] = arith.addi %[[TEMP]], %[[CONSTANT]] : i32
+!CHECK: hlfir.assign %[[RESULT]] to %[[INNER_PRIVATE_W_DECL]]#0 : i32, !fir.ref<i32>
!CHECK: omp.terminator
!CHECK: }
!CHECK: omp.parallel {
+!CHECK: %[[PRIVATE_INNER_Z:.*]] = fir.alloca i32 {bindc_name = "z", pinned, uniq_name = "_QFnested_default_clause_testsEz"}
+!CHECK: %[[PRIVATE_INNER_Z_DECL:.*]]:2 = hlfir.declare %[[PRIVATE_INNER_Z]] {{.*}}
!CHECK: %[[PRIVATE_INNER_W:.*]] = fir.alloca i32 {bindc_name = "w", pinned, uniq_name = "_QFnested_default_clause_testsEw"}
!CHECK: %[[PRIVATE_INNER_W_DECL:.*]]:2 = hlfir.declare %[[PRIVATE_INNER_W]] {uniq_name = "_QFnested_default_clause_testsEw"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
!CHECK: %[[PRIVATE_INNER_X:.*]] = fir.alloca i32 {bindc_name = "x", pinned, uniq_name = "_QFnested_default_clause_testsEx"}
!CHECK: %[[PRIVATE_INNER_X_DECL:.*]]:2 = hlfir.declare %[[PRIVATE_INNER_X]] {uniq_name = "_QFnested_default_clause_testsEx"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
!CHECK: %[[TEMP_1:.*]] = fir.load %[[PRIVATE_INNER_X_DECL]]#0 : !fir.ref<i32>
-!CHECK: %[[TEMP_2:.*]] = fir.load %[[PRIVATE_Z_DECL]]#0 : !fir.ref<i32>
+!CHECK: %[[TEMP_2:.*]] = fir.load %[[PRIVATE_INNER_Z_DECL]]#0 : !fir.ref<i32>
!CHECK: %[[RESULT:.*]] = arith.addi %{{.*}}, %{{.*}} : i32
!CHECK: hlfir.assign %[[RESULT]] to %[[PRIVATE_INNER_W_DECL]]#0 : i32, !fir.ref<i32>
!CHECK: omp.terminator
@@ -255,6 +266,7 @@ subroutine nested_default_clause_tests
!$omp parallel default(private)
!$omp parallel default(firstprivate)
x = y
+ w = w + 1
!$omp end parallel
!$omp parallel default(private) shared(z)
@@ -317,14 +329,84 @@ subroutine nested_default_clause_tests
!CHECK: omp.terminator
!CHECK: }
!CHECK: omp.terminator
-!CHECK: }
-!CHECK: return
!CHECK: }
!$omp parallel default(firstprivate)
!$omp single
x = y
!$omp end single
!$omp end parallel
+
+!CHECK: omp.parallel {
+!CHECK: %[[LOOP_VAR_ALLOCA:.*]] = fir.alloca i32 {adapt.valuebyref, pinned}
+!CHECK: %[[LOOP_VAR_DECLARE:.*]]:2 = hlfir.declare %[[LOOP_VAR_ALLOCA]] {{.*}}
+!CHECK: %[[X_ALLOCA:.*]] = fir.alloca i32 {bindc_name = "x", pinned, uniq_name = "_QFnested_default_clause_testsEx"}
+!CHECK: %[[X_DECLARE:.*]]:2 = hlfir.declare %[[X_ALLOCA]] {{.*}}
+!CHECK: %[[CONST_LB:.*]] = arith.constant 1 : i32
+!CHECK: %[[CONST_UB:.*]] = arith.constant 50 : i32
+!CHECK: %[[CONST_STEP:.*]] = arith.constant 1 : i32
+!CHECK: omp.wsloop for (%[[ARG:.*]]) : i32 = (%[[CONST_LB]]) to (%[[CONST_UB]]) inclusive step (%[[CONST_STEP]]) {
+!CHECK: fir.store %[[ARG]] to %[[LOOP_VAR_DECLARE]]#1 : !fir.ref<i32>
+!CHECK: %[[LOADED_X:.*]] = fir.load %[[X_DECLARE]]#0 : !fir.ref<i32>
+!CHECK: %[[CONST:.*]] = arith.constant 1 : i32
+!CHECK: %[[RESULT:.*]] = arith.addi %[[LOADED_X]], %[[CONST]] : i32
+!CHECK: hlfir.assign %[[RESULT]] to %[[X_DECLARE]]#0 : i32, !fir.ref<i32>
+!CHECK: omp.yield
+!CHECK: }
+!CHECK: omp.terminator
+!CHECK: }
+ !$omp parallel do private(x)
+ do i=1, 50
+ x = x + 1
+ end do
+ !$omp end parallel do
+
+!CHECK: omp.parallel {
+!CHECK: %[[LOOP_VAR:.*]] = fir.alloca i32 {adapt.valuebyref, pinned}
+!CHECK: %[[LOOP_VAR_DECLARE:.*]]:2 = hlfir.declare %[[LOOP_VAR]] {{.*}}
+!CHECK: %[[X_VAR:.*]] = fir.alloca i32 {bindc_name = "x", pinned, uniq_name = "_QFnested_default_clause_testsEx"}
+!CHECK: %[[X_VAR_DECLARE:.*]]:2 = hlfir.declare %[[X_VAR]] {{.*}}
+!CHECK: %[[Y_VAR:.*]] = fir.alloca i32 {bindc_name = "y", pinned, uniq_name = "_QFnested_default_clause_testsEy"}
+!CHECK: %[[Y_VAR_DECLARE:.*]]:2 = hlfir.declare %[[Y_VAR]] {{.*}}
+!CHECK: %[[Z_VAR:.*]] = fir.alloca i32 {bindc_name = "z", pinned, uniq_name = "_QFnested_default_clause_testsEz"}
+!CHECK: %[[Z_VAR_DECLARE:.*]]:2 = hlfir.declare %[[Z_VAR]] {{.*}}
+!CHECK: %[[CONST_LB:.*]] = arith.constant 1 : i32
+!CHECK: %[[CONST_UB:.*]] = arith.constant 10 : i32
+!CHECK: %[[CONST_STEP:.*]] = arith.constant 1 : i32
+!CHECK: omp.wsloop for (%[[ARG:.*]]) : i32 = (%[[CONST_LB]]) to (%[[CONST_UB]]) inclusive step (%[[CONST_STEP]]) {
+!CHECK: fir.store %[[ARG]] to %[[LOOP_VAR_DECLARE]]#1 : !fir.ref<i32>
+!CHECK: %[[LOADED_X:.*]] = fir.load %[[X_VAR_DECLARE]]#0 : !fir.ref<i32>
+!CHECK: %[[CONST:.*]] = arith.constant 1 : i32
+!CHECK: %[[ADD:.*]] = arith.addi %[[LOADED_X]], %[[CONST]] : i32
+!CHECK: hlfir.assign %[[ADD]] to %[[X_VAR_DECLARE]]#0 : i32, !fir.ref<i32>
+!CHECK: omp.parallel {
+!CHECK: %[[INNER_Y_ALLOCA:.*]] = fir.alloca i32 {bindc_name = "y", pinned, uniq_name = "_QFnested_default_clause_testsEy"}
+!CHECK: %[[INNER_Y_DECLARE:.*]]:2 = hlfir.declare %[[INNER_Y_ALLOCA]] {{.}}
+!CHECK: %[[TEMP:.*]] = fir.load %[[Y_VAR_DECLARE]]#0 : !fir.ref<i32>
+!CHECK: hlfir.assign %[[TEMP]] to %[[INNER_Y_DECLARE]]#0 temporary_lhs : i32, !fir.ref<i32>
+!CHECK: %[[INNER_Z_ALLOCA:.*]] = fir.alloca i32 {bindc_name = "z", pinned, uniq_name = "_QFnested_default_clause_testsEz"}
+!CHECK: %[[INNER_Z_DECLARE:.*]]:2 = hlfir.declare %[[INNER_Z_ALLOCA]] {{.}}
+!CHECK: %[[TEMP:.*]] = fir.load %[[Z_VAR_DECLARE]]#0 : !fir.ref<i32>
+!CHECK: hlfir.assign %[[TEMP]] to %[[INNER_Z_DECLARE]]#0 temporary_lhs : i32, !fir.ref<i32>
+!CHECK: %[[LOADED_Y:.*]] = fir.load %[[INNER_Y_DECLARE]]#0 : !fir.ref<i32>
+!CHECK: %[[LOADED_Z:.*]] = fir.load %[[INNER_Z_DECLARE]]#0 : !fir.ref<i32>
+!CHECK: %[[RESULT:.*]] = arith.addi %[[LOADED_Y]], %[[LOADED_Z]] : i32
+!CHECK: hlfir.assign %[[RESULT]] to %[[INNER_Y_DECLARE]]#0 : i32, !fir.ref<i32>
+!CHECK: omp.terminator
+!CHECK: }
+!CHECK: omp.yield
+!CHECK: }
+!CHECK: omp.terminator
+!CHECK: }
+!CHECK: return
+!CHECK: }
+ !$omp parallel do default(private)
+ do i = 1, 10
+ x = x + 1
+ !$omp parallel default(firstprivate)
+ y = y + z
+ !$omp end parallel
+ end do
+ !$omp end parallel do
end subroutine
!CHECK: func.func @_QPskipped_default_clause_checks() {
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fails to privatize in presence of non-OpenMP constructs.
Could you comment on what non-OpenMP constructs are you talking about here? Is it constructs with regions?
In general, could you expan on the issue and the fix?
flang/lib/Lower/OpenMP.cpp
Outdated
@@ -120,7 +120,6 @@ class DataSharingProcessor { | |||
llvm::SetVector<const Fortran::semantics::Symbol *> privatizedSymbols; | |||
llvm::SetVector<const Fortran::semantics::Symbol *> defaultSymbols; | |||
llvm::SetVector<const Fortran::semantics::Symbol *> symbolsInNestedRegions; | |||
llvm::SetVector<const Fortran::semantics::Symbol *> symbolsInParentRegions; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this not required anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bad design from my end. Earlier, we were collecting both the symbol as well as the host-association details, and then removing one of them using this list. I have modified the current implementation to a use a else if
, and hence this isn't required.
b103d7f
to
8fffade
Compare
Ping for review! |
flang/lib/Lower/OpenMP.cpp
Outdated
@@ -426,14 +425,10 @@ void DataSharingProcessor::collectSymbols( | |||
/*collectSymbols=*/true, | |||
/*collectHostAssociatedSymbols=*/true); | |||
for (Fortran::lower::pft::Evaluation &e : eval.getNestedEvaluations()) { | |||
if (e.hasNestedEvaluations()) | |||
if (e.hasNestedEvaluations() && !e.isConstruct()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it collected in non-constructs (like OpenMP)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kiranchandramohan I was on leave mainly for past few days; sorry about the delayed response.
So we maintain two lists: defaultSymbols
and symbolsInNestedRegions
. For non-constructs like OpenMP, we would not like to collected the host associated symbols for nested regions (thereby collectHostAssociatedSymbols = false
). They can be handled as the default
clause may be in their respective nested blocks. This was the earlier design.
However, for non-OpenMP constructs, the handling is different. This is because even though the inner construct has a block region (like OpenMP constructs do), all privatization of variables in the nested construct needs to happen in the outer OpenMP construct. Hence, we do not need to populate the symbolsInNestedRegions
list. Otherwise, no privatization will happen, leading to incorrect results.
That said, if you think so, I can relook the design of the lists once. It has been quite some time when we implemented the default clause. So maybe, we can think of improving this lowering flow once. What are your suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kiranchandramohan A ping on this! How can we take this forward?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we maintain two lists: defaultSymbols and symbolsInNestedRegions. For non-constructs like OpenMP, we would not like to collected the host associated symbols for nested regions (thereby collectHostAssociatedSymbols = false). They can be handled as the default clause may be in their respective nested blocks. This was the earlier design.
Some OpenMP constructs (like critical) will not have a default clause because such a clause is not supported for these constructs. Some OpenMP constructs might not have them specified (i.e no default clause is present eventhough it is supported). Do we need to distinguish these cases? For those constructs which have them specified, for every pair of defaults do their exact values matter for the inner construct's symbols?
outer(default=shared), inner(default=shared) : Nothing to do
outer(default=private), inner(default=shared) : ?
outer(default=shared), inner(default=private) : ?
outer(default=private),inner(default=private) : ?
However, for non-OpenMP constructs, the handling is different. This is because even though the inner construct has a block region (like OpenMP constructs do), all privatization of variables in the nested construct needs to happen in the outer OpenMP construct. Hence, we do not need to populate the symbolsInNestedRegions list.
Makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kiranchandramohan I checked for the use-cases mentioned. The lowering seems to be okay. Should I add these combinations in the tests for default clause?
outer(default=private), inner(default=private)
Consider the following example:
integer :: j
!$omp parallel default(private)
j = 10
!$omp parallel default(firstprivate)
j = j + 5
!$omp end parallel
!$omp end parallel
This lowers as:
%0 = fir.alloca i32 ......
%1:2 = hlfir.declare %0 .....
omp.parallel {
%2 = fir.alloca i32 ..... // privatized j
%3:2 = hlfir.declare %2
%c = arith.constant 10
hlfir.assign %c to %3#0 ....
omp.parallel {
%4 = fir.alloca i32 ....
%5:2 = hlfir.declare %4 ....
%6 = fir.load %3#0 .....
hlfir.assign %6 to %5#0 ......
}
}
outer(default=private), inner(default=shared)
Example:
integer :: j
!$omp parallel default(private)
j = 10
!$omp parallel
j = j + 5
!$omp end parallel
!$omp end parallel
This lowers as:
%0 = fir.alloca i32 ......
%1:2 = hlfir.declare %0 .....
omp.parallel {
%2 = fir.alloca i32 ..... // privatized j
%3:2 = hlfir.declare %2
%c = arith.constant 10
hlfir.assign %c to %3#0 ....
omp.parallel {
%4 = fir.load %3#0 .....
// add 5 to %4 and store in %5
hlfir.assign %5 to %3#0 ....
}
}
outer(default=shared), inner(default=private)
Example:
integer :: j
!$omp parallel
j = 10
!$omp parallel default(private)
j = j + 5
!$omp end parallel
!$omp end parallel
This lowers as:
%0 = fir.alloca i32 ......
%1:2 = hlfir.declare %0 .....
omp.parallel {
%c = arith.constant 10
hlfir.assign %c to %3#0 ....
omp.parallel {
%2 = fir.alloca ....
%3:2 = hlfir.declare %2 ....
%4 = fir.load %3#0 .....
// add 5 to %4 and store in %5
hlfir.assign %5 to %3#0 ....
}
}
And should be consider firstprivate
in the inner parallel:
%0 = fir.alloca i32 ......
%1:2 = hlfir.declare %0 .....
omp.parallel {
%c = arith.constant 10
hlfir.assign %c to %3#0 ....
omp.parallel {
%2 = fir.alloca ....
%3:2 = hlfir.declare %2 ....
%4 = fir.load %1#0 ....
hlfir.assign %4 to %3#0 .....
%5 = fir.load %3#0 .....
// add 5 to %4 and store in %6
hlfir.assign %6 to %3#0 ....
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, for constructs like critical
not having an explicit default clause, the lowered IR is in line with the correct symbols that the construct should be using.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to understand the need to distinguish between OpenMP and non-OpenMP constructs for collecting symbols recursively. Particularly in cases where the variable on which a default applies is only visible inside the nested OpenMP construct. Consider the following example :
program mn
call sb1()
contains
subroutine sb1()
integer :: x
x = 20
!$omp parallel default(private)
!$omp critical
x = 30
!$omp end critical
!$omp end parallel
print *, x
end subroutine
end program
But the following example with nested parallel works. I am assuming the semantics code is creating symbols here. Is that the reason?
If so, why not handle everything (nested openmp and nested non-openmp constructs) in the frontend? What is the split between semantics and the code here?
program mn
call sb1()
contains
subroutine sb1()
integer :: x
x = 20
!$omp parallel default(private)
!$omp parallel
x = 30
!$omp end parallel
!$omp end parallel
print *, x
end subroutine
end program
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @kiranchandramohan,
I re-looked at the possibility of handling the entire default clause at one place. For instance, I checked whether we can handle it completely during lowering. One possible way:
-
Use a function similar to
collectSymbolSet
with a boolcheckFlag
, such that the function checks symbol flag iffcheckFlag = True
, otherwise it simply adds the symbol to the symbol list. -
Given any OpenMP context, all symbols encountered in the same context can then be collected through
checkFlag = True
. In other words, any symbol in the same region wheredefault(private)
is specified should only collect symbols with theprivate
data-sharing attribute. However, any symbol in a nested region can be collected irrespective of the specified flag on that symbol.
This approach, unfortunately, does not work without the symbol modifications we currently do in semantics.
For your question on handling OpenMP and non OpenMP constructs in semantics itself, currently, both are being handled. It's just that while lowering, for nested OpenMP constructs, we need to collect host-associated symbols as well. For non-OpenMP constructs, we skip host-associated symbols.
I tried a few variations in which to simply this (use a single list defaultSymbols
instead of two lists: defaultSymbols
and symbolsInNestedRegions
). But so far, nothing concrete has come up...
flang/lib/Lower/OpenMP.cpp
Outdated
@@ -426,14 +425,10 @@ void DataSharingProcessor::collectSymbols( | |||
/*collectSymbols=*/true, | |||
/*collectHostAssociatedSymbols=*/true); | |||
for (Fortran::lower::pft::Evaluation &e : eval.getNestedEvaluations()) { | |||
if (e.hasNestedEvaluations()) | |||
if (e.hasNestedEvaluations() && !e.isConstruct()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a hard time trying to understand how collectSymbols()
work.
This is how I understand it:
- First, all symbols used in
eval
that matchflag
are collected intodefaultSymbols
, including host associated symbols
(why host associated symbols need to be collected? To look at the outer scope? But if the symbol in the outer scope matches the default DSA, does anything need to be done?) - For each evaluation that is a direct child of
eval
:- If the evaluation has children and is not a Fortran construct (such as if, do, etc)
- Add the symbols used in it, that match
flag
, tosymbolsInNestedRegions
, skipping host associated symbols
(because otherwise symbols from the topeval
level could also be collected?)
- Add the symbols used in it, that match
- If the evaluation has children and is not a Fortran construct (such as if, do, etc)
- Later, in
defaultPrivatize
,symbolsInNestedRegions
are excluded fromdefaultSymbols
, presumably because each nested region will handle them appropriately. Then, the remaining symbols are privatized, with some exceptions, such as procedures.
Is my understanding correct?
It would be nice to add a brief comment to collectSymbols
, explaining what it does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @luporl for the review. Yes, the understanding is correct. Since each nested region needs to handle the symbols separately, we needed to make collectSymbols
as it is now. In fact, in response to Kiran's comment above, I have given the lowered IR for combinations of nested constructs with default clause. You could check if you are ok with the current behaviour.
And sure, I'll capture more details on collectSymbols
with this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation @NimishMishra. The current behavior seems ok to me.
I'm planning to use collectSymbols
to handle symbols with implicit DSA, such as those of task
, that must be firstprivate
if no default
clause is present (and outer context doesn't define it as shared).
I have commented the correct data sharing for the given example as comments in the code: program mn
call sb1()
contains
subroutine sb1()
integer :: x
x = 20
!$omp parallel default(private) ! x is declare outside of the region, so it's shared by default, default(private) changes this to private
!$omp parallel ! the private x of the outer region is "declared" outside of this region, thus it's shared for the nested region
x = 30
!$omp end parallel
!$omp end parallel
print *, x
end subroutine
end program So, all OpenMP constructs that may take data-sharing attribute clauses may sort of have an implicit default data-sharing attribute or inherit from the outer scope. For instance, |
OpenMP 4.5, subclause 2.15.1.1 lists the rules of implicit data-sharing attributes. In fact, some of them are missing in Flang, like that of |
I have been using this patch (rebased on top of main) without issues for some time. !$omp parallel do default(private) firstprivate(y)
do i = 1, 10
y = 1
!$omp parallel default(private)
y = 2
!$omp end parallel
end do
!$omp end parallel do
end This program hits an assert: The problem is that the inner parallel directive is inside a Fortran construct ( A possible solution could be to visit nested evaluations recursively and, in each one, collect the symbols from the first OpenMP construct (if any). |
Thank you @luporl for pointing this out. I think you are right in stating that a recursive solution would be better in this context. I'll work on this and send out a revision. |
Hi @luporl, I am revisiting this patch, and am wondering if recursion is needed here. Checking up with you once if I am missing something.
Then
So if we iterate over these evaluations, and do the following:
Concisely, for a |
Hi @NimishMishra,
Right. IIUC,
Let me see if I understood it correctly. Are you proposing to add an extra loop to iterate over the nested evaluations of each "top" nested evaluation, applying rules 1 and 2 to each of them? !$omp parallel do default(private) firstprivate(y)
do i = 1, 10
y = 1
do j = 1, 10
!$omp parallel default(private)
y = 2
!$omp end parallel
end do
end do
!$omp end parallel do
end That's why I think that, at least with this approach, recursion would be needed to go through the nested evaluations properly. I agree that whenever an OpenMP construct is found, one call to |
Thanks @luporl. Yeah, I think recursion is the best way to go forward. I made the required changes for a recursive solution, and for the following:
I have the following FIR:
The behaviour of the IR seems fine to me. Is it okay with you? I'll push the revision thereafter. |
@NimishMishra, the test program and resulting IR looks good to me too. Thanks for working on this. |
8fffade
to
9153b0a
Compare
Thanks @luporl . I have pushed the change. I think a similar treatment (to deeply nested constructs) is also needed when we manipulate symbols while resolving directives. However, that problem has already been resolved with your change at #85989, where Also requesting @kiranchandramohan's suggestion on this, once he is back... |
@NimishMishra, I think that, if it's not too much work, it would be nice to move from #85989 to this change the part that filters out symbols that are out of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks for the comments explaining how default privatization works.
Have all the changes been made here @NimishMishra? If so can @luporl approve? |
Besides some minor nitpicks, there is an issue remaining, that causes a regression in some specific cases, as when you have nested default clauses with
In this case, |
I am okay with the changes in #85989; that merge can go ahead once this PR is merged. If @kiranchandramohan is fine with #85989, we can do thus. Otherwise, I will port the relevant code from that PR into this one. Let me know the preferred approach... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few questions. Otherwise looks OK.
I am okay with the changes in #85989; that merge can go ahead once this PR is merged. If @kiranchandramohan is fine with #85989, we can do thus. Otherwise, I will port the relevant code from that PR into this one. Let me know the preferred approach...
We can submit this and submit the relevant code either along with #85989 or as a separate patch.
else | ||
converter.collectSymbolSet(nestedEval, symbolsInNestedRegions, flag, | ||
/*collectSymbols=*/true, | ||
/*collectHostAssociatedSymbols=*/false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is collectHostAssociatedSymbols
false
here? Is it because a previous invocation of collectSymbolSet has already collected it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that is correct.
converter.collectSymbolSet(eval, defaultSymbols, flag, | ||
/*collectSymbols=*/true, | ||
/*collectHostAssociatedSymbols=*/true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be folded into collectSymbolsInNestedRegions? I see the value of collectHostAssociatedSymbols
is different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, that might not work. defaultSymbols
collect both the symbols as well as host-associations (if any). While collectSymbolsInNestedRegions
skip host associations. This prevents any duplicate privatization
// In step 1, collect all symbols in `eval` that match `flag` | ||
// into `defaultSymbols`. | ||
// In step 2, for nested constructs (if any), if and only if | ||
// the nested construct is an OpenMP construct, collect those nested |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean if it not
an OpenMP construct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For non-OpenMP constructs, we skip collecting any symbols in symbolsInNestedRegions
.
eval.getNestedEvaluations()) { | ||
if (nestedEval.hasNestedEvaluations()) { | ||
if (nestedEval.isConstruct()) | ||
// Recursively look for OpenMP constructs within `nestedEval`'s region |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean look for non-OpenMP
constructs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, we want to capture symbols within nested OpenMP constructs in symbolsInNestedRegions
.
In this case, the current evaluation is a non-OpenMP construct (like do
). This recursive calls intends to find if there are any OpenMP constructs within this outer non-OpenMP construct.
3b12905
to
d1da5ed
Compare
Thanks @luporl and @kiranchandramohan for the detailed reviews on this patch. @luporl I have fixed description of |
Have a |
Writing a page (in flang/docs) on how default privatization is done (particularly how symbols are collected) will be good. You can do that after you submit this patch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for all your work on this.
d1da5ed
to
b86e099
Compare
Accidentally closed. Rebasing and reopening |
@NimishMishra Could you remind me why we want to skip collecting symbols in nested OpenMP regions? |
Sure @kiranchandramohan. We need to do that since If we don't do this, then privatization fails due to already existing bindings.
This is the relevant functionality in |
I'm planning to spend time now on the documentation that you mentioned. I was held up mostly with the other issue, but now that is done. I'll send across a PR for review |
Current implementation of default clause privatization incorrectly fails to privatize in presence of non-OpenMP constructs (i.e. nested constructs with regions whose symbols need to be privatized in the scope of the parent OpenMP construct). This patch fixes the same by considering non-OpenMP constructs separately by collecting symbols of a nested region if it is a non-OpenMP construct with a region, and privatizing it in the scope of the parent OpenMP construct.
Fixes #71914 and #71915