Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[flang][OpenMP] Only use HLFIR base in privatization logic #84123

Merged
merged 1 commit into from
Mar 11, 2024

Conversation

ergawy
Copy link
Member

@ergawy ergawy commented Mar 6, 2024

Modifies the privatization logic so that the emitted code only used the
HLFIR base (i.e. SSA value #0 returned from hlfir.declare). Before
that, that emitted privatization logic was a mix of using #0 and #1
which leads to some difficulties trying to move to delayed privatization
(see the discussion on #84033).

@llvmbot llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir flang:openmp labels Mar 6, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Mar 6, 2024

@llvm/pr-subscribers-flang-openmp

@llvm/pr-subscribers-flang-fir-hlfir

Author: Kareem Ergawy (ergawy)

Changes

An experimental diff to discuss using only the first component of hlfir.declare's result for allocatables. This is just to have a more informed discussion on #84033.

This is not meant to be merged, at least for now.


Full diff: https://github.com/llvm/llvm-project/pull/84123.diff

5 Files Affected:

  • (modified) flang/include/flang/Optimizer/Builder/HLFIRTools.h (+3-2)
  • (modified) flang/lib/Lower/Bridge.cpp (+5-4)
  • (modified) flang/lib/Optimizer/Builder/HLFIRTools.cpp (+9-6)
  • (modified) flang/test/Lower/OpenMP/parallel-private-clause-str.f90 (+5-5)
  • (modified) flang/test/Lower/OpenMP/parallel-private-clause.f90 (+17-17)
diff --git a/flang/include/flang/Optimizer/Builder/HLFIRTools.h b/flang/include/flang/Optimizer/Builder/HLFIRTools.h
index 170e134baef614..5177dd2746d7eb 100644
--- a/flang/include/flang/Optimizer/Builder/HLFIRTools.h
+++ b/flang/include/flang/Optimizer/Builder/HLFIRTools.h
@@ -194,7 +194,7 @@ class Entity : public mlir::Value {
   // information, or has access to it in other ways. Its advantage is that
   // it will never be a fir.box for explicit shape arrays, leading to simpler
   // FIR code generation.
-  mlir::Value getFirBase() const;
+  mlir::Value getFirBase(bool forceDeclBaseHACK = false) const;
 };
 
 /// Wrapper over an mlir::Value that can be viewed as a Fortran entity.
@@ -230,7 +230,8 @@ translateToExtendedValue(mlir::Location loc, fir::FirOpBuilder &builder,
 /// on the IR.
 fir::ExtendedValue
 translateToExtendedValue(mlir::Location loc, fir::FirOpBuilder &builder,
-                         fir::FortranVariableOpInterface fortranVariable);
+                         fir::FortranVariableOpInterface fortranVariable,
+                         bool forceDeclBaseHACK = false);
 
 /// Generate declaration for a fir::ExtendedValue in memory.
 fir::FortranVariableOpInterface
diff --git a/flang/lib/Lower/Bridge.cpp b/flang/lib/Lower/Bridge.cpp
index 8048693119b4c5..225aefbbf31af8 100644
--- a/flang/lib/Lower/Bridge.cpp
+++ b/flang/lib/Lower/Bridge.cpp
@@ -618,7 +618,7 @@ class FirConverter : public Fortran::lower::AbstractConverter {
     assert(details && "No host-association found");
     const Fortran::semantics::Symbol &hsym = details->symbol();
     mlir::Type hSymType = genType(hsym);
-    Fortran::lower::SymbolBox hsb = lookupSymbol(hsym);
+    Fortran::lower::SymbolBox hsb = lookupSymbol(hsym, nullptr, true);
 
     auto allocate = [&](llvm::ArrayRef<mlir::Value> shape,
                         llvm::ArrayRef<mlir::Value> typeParams) -> mlir::Value {
@@ -727,7 +727,7 @@ class FirConverter : public Fortran::lower::AbstractConverter {
   void createHostAssociateVarCloneDealloc(
       const Fortran::semantics::Symbol &sym) override final {
     mlir::Location loc = genLocation(sym.name());
-    Fortran::lower::SymbolBox hsb = lookupSymbol(sym);
+    Fortran::lower::SymbolBox hsb = lookupSymbol(sym, nullptr, true);
 
     fir::ExtendedValue hexv = symBoxToExtendedValue(hsb);
     hexv.match(
@@ -960,13 +960,14 @@ class FirConverter : public Fortran::lower::AbstractConverter {
   /// Find the symbol in the local map or return null.
   Fortran::lower::SymbolBox
   lookupSymbol(const Fortran::semantics::Symbol &sym,
-               Fortran::lower::SymMap *symMap = nullptr) {
+               Fortran::lower::SymMap *symMap = nullptr,
+               bool forceDeclBaseHACK = false) {
     symMap = symMap ? symMap : &localSymbols;
     if (lowerToHighLevelFIR()) {
       if (std::optional<fir::FortranVariableOpInterface> var =
               symMap->lookupVariableDefinition(sym)) {
         auto exv =
-            hlfir::translateToExtendedValue(toLocation(), *builder, *var);
+            hlfir::translateToExtendedValue(toLocation(), *builder, *var, forceDeclBaseHACK);
         return exv.match(
             [](mlir::Value x) -> Fortran::lower::SymbolBox {
               return Fortran::lower::SymbolBox::Intrinsic{x};
diff --git a/flang/lib/Optimizer/Builder/HLFIRTools.cpp b/flang/lib/Optimizer/Builder/HLFIRTools.cpp
index 4ffa303f27103a..9be563a41ea97a 100644
--- a/flang/lib/Optimizer/Builder/HLFIRTools.cpp
+++ b/flang/lib/Optimizer/Builder/HLFIRTools.cpp
@@ -183,11 +183,12 @@ static fir::CharBoxValue genUnboxChar(mlir::Location loc,
   return {addr, len};
 }
 
-mlir::Value hlfir::Entity::getFirBase() const {
+mlir::Value hlfir::Entity::getFirBase(bool forceDeclBaseHACK) const {
   if (fir::FortranVariableOpInterface variable = getIfVariableInterface()) {
     if (auto declareOp =
             mlir::dyn_cast<hlfir::DeclareOp>(variable.getOperation()))
-      return declareOp.getOriginalBase();
+      return forceDeclBaseHACK ? declareOp.getBase()
+                               : declareOp.getOriginalBase();
     if (auto associateOp =
             mlir::dyn_cast<hlfir::AssociateOp>(variable.getOperation()))
       return associateOp.getFirBase();
@@ -848,11 +849,12 @@ hlfir::LoopNest hlfir::genLoopNest(mlir::Location loc,
 
 static fir::ExtendedValue
 translateVariableToExtendedValue(mlir::Location loc, fir::FirOpBuilder &builder,
-                                 hlfir::Entity variable) {
+                                 hlfir::Entity variable,
+                                 bool forceDeclBaseHACK = false) {
   assert(variable.isVariable() && "must be a variable");
   /// When going towards FIR, use the original base value to avoid
   /// introducing descriptors at runtime when they are not required.
-  mlir::Value firBase = variable.getFirBase();
+  mlir::Value firBase = variable.getFirBase(forceDeclBaseHACK);
   if (variable.isMutableBox())
     return fir::MutableBoxValue(firBase, getExplicitTypeParams(variable),
                                 fir::MutableProperties{});
@@ -900,8 +902,9 @@ translateVariableToExtendedValue(mlir::Location loc, fir::FirOpBuilder &builder,
 
 fir::ExtendedValue
 hlfir::translateToExtendedValue(mlir::Location loc, fir::FirOpBuilder &builder,
-                                fir::FortranVariableOpInterface var) {
-  return translateVariableToExtendedValue(loc, builder, var);
+                                fir::FortranVariableOpInterface var,
+                                bool forceDeclBaseHACK) {
+  return translateVariableToExtendedValue(loc, builder, var, forceDeclBaseHACK);
 }
 
 std::pair<fir::ExtendedValue, std::optional<hlfir::CleanupFunction>>
diff --git a/flang/test/Lower/OpenMP/parallel-private-clause-str.f90 b/flang/test/Lower/OpenMP/parallel-private-clause-str.f90
index f668957624b497..025e51e0661764 100644
--- a/flang/test/Lower/OpenMP/parallel-private-clause-str.f90
+++ b/flang/test/Lower/OpenMP/parallel-private-clause-str.f90
@@ -10,7 +10,7 @@
 !CHECK:    %[[C_DECL:.*]]:2 = hlfir.declare %[[C_BOX_REF]] typeparams %{{.*}} {fortran_attrs = #fir.var_attrs<allocatable>, uniq_name = "_QFtest_allocatable_stringEc"} : (!fir.ref<!fir.box<!fir.heap<!fir.char<1,?>>>>, i32) -> (!fir.ref<!fir.box<!fir.heap<!fir.char<1,?>>>>, !fir.ref<!fir.box<!fir.heap<!fir.char<1,?>>>>)
 !CHECK:    omp.parallel {
 !CHECK:      %[[C_PVT_BOX_REF:.*]] = fir.alloca !fir.box<!fir.heap<!fir.char<1,?>>> {bindc_name = "c", pinned, uniq_name = "_QFtest_allocatable_stringEc"}
-!CHECK:      %[[C_BOX:.*]] = fir.load %[[C_DECL]]#1 : !fir.ref<!fir.box<!fir.heap<!fir.char<1,?>>>>
+!CHECK:      %[[C_BOX:.*]] = fir.load %[[C_DECL]]#0 : !fir.ref<!fir.box<!fir.heap<!fir.char<1,?>>>>
 !CHECK:      fir.if %{{.*}} {
 !CHECK:        %[[C_PVT_MEM:.*]] = fir.allocmem !fir.char<1,?>(%{{.*}} : index) {fir.must_be_heap = true, uniq_name = "_QFtest_allocatable_stringEc.alloc"}
 !CHECK:        %[[C_PVT_BOX:.*]] = fir.embox %[[C_PVT_MEM]] typeparams %{{.*}} : (!fir.heap<!fir.char<1,?>>, index) -> !fir.box<!fir.heap<!fir.char<1,?>>>
@@ -18,7 +18,7 @@
 !CHECK:      }
 !CHECK:      %[[C_PVT_DECL:.*]]:2 = hlfir.declare %[[C_PVT_BOX_REF]] {fortran_attrs = #fir.var_attrs<allocatable>, uniq_name = "_QFtest_allocatable_stringEc"} : (!fir.ref<!fir.box<!fir.heap<!fir.char<1,?>>>>) -> (!fir.ref<!fir.box<!fir.heap<!fir.char<1,?>>>>, !fir.ref<!fir.box<!fir.heap<!fir.char<1,?>>>>)
 !CHECK:      fir.if %{{.*}} {
-!CHECK:        %[[C_PVT_BOX:.*]] = fir.load %[[C_PVT_DECL]]#1 : !fir.ref<!fir.box<!fir.heap<!fir.char<1,?>>>>
+!CHECK:        %[[C_PVT_BOX:.*]] = fir.load %[[C_PVT_DECL]]#0 : !fir.ref<!fir.box<!fir.heap<!fir.char<1,?>>>>
 !CHECK:        %[[C_PVT_BOX_ADDR:.*]] = fir.box_addr %[[C_PVT_BOX]] : (!fir.box<!fir.heap<!fir.char<1,?>>>) -> !fir.heap<!fir.char<1,?>>
 !CHECK:        fir.freemem %[[C_PVT_BOX_ADDR]] : !fir.heap<!fir.char<1,?>>
 !CHECK:      }
@@ -38,16 +38,16 @@ subroutine test_allocatable_string(n)
 !CHECK:    %[[C_DECL:.*]]:2 = hlfir.declare %[[C_BOX_REF]] typeparams %{{.*}} {fortran_attrs = #fir.var_attrs<allocatable>, uniq_name = "_QFtest_allocatable_string_arrayEc"} : (!fir.ref<!fir.box<!fir.heap<!fir.array<?x!fir.char<1,?>>>>>, i32) -> (!fir.ref<!fir.box<!fir.heap<!fir.array<?x!fir.char<1,?>>>>>, !fir.ref<!fir.box<!fir.heap<!fir.array<?x!fir.char<1,?>>>>>)
 !CHECK:    omp.parallel {
 !CHECK:      %[[C_PVT_BOX_REF:.*]] = fir.alloca !fir.box<!fir.heap<!fir.array<?x!fir.char<1,?>>>> {bindc_name = "c", pinned, uniq_name = "_QFtest_allocatable_string_arrayEc"}
-!CHECK:      %{{.*}} = fir.load %[[C_DECL]]#1 : !fir.ref<!fir.box<!fir.heap<!fir.array<?x!fir.char<1,?>>>>>
+!CHECK:      %{{.*}} = fir.load %[[C_DECL]]#0 : !fir.ref<!fir.box<!fir.heap<!fir.array<?x!fir.char<1,?>>>>>
 !CHECK:      fir.if %{{.*}} {
 !CHECK:        %[[C_PVT_ALLOC:.*]] = fir.allocmem !fir.array<?x!fir.char<1,?>>(%{{.*}} : index), %{{.*}} {fir.must_be_heap = true, uniq_name = "_QFtest_allocatable_string_arrayEc.alloc"}
 !CHECK:        %[[C_PVT_BOX:.*]] = fir.embox %[[C_PVT_ALLOC]](%{{.*}}) typeparams %{{.*}} : (!fir.heap<!fir.array<?x!fir.char<1,?>>>, !fir.shapeshift<1>, index) -> !fir.box<!fir.heap<!fir.array<?x!fir.char<1,?>>>>
 !CHECK:        fir.store %[[C_PVT_BOX]] to %[[C_PVT_BOX_REF]] : !fir.ref<!fir.box<!fir.heap<!fir.array<?x!fir.char<1,?>>>>>
 !CHECK:      }
 !CHECK:      %[[C_PVT_DECL:.*]]:2 = hlfir.declare %[[C_PVT_BOX_REF]] {fortran_attrs = #fir.var_attrs<allocatable>, uniq_name = "_QFtest_allocatable_string_arrayEc"} : (!fir.ref<!fir.box<!fir.heap<!fir.array<?x!fir.char<1,?>>>>>) -> (!fir.ref<!fir.box<!fir.heap<!fir.array<?x!fir.char<1,?>>>>>, !fir.ref<!fir.box<!fir.heap<!fir.array<?x!fir.char<1,?>>>>>)
-!CHECK:      %{{.*}} = fir.load %[[C_PVT_DECL]]#1 : !fir.ref<!fir.box<!fir.heap<!fir.array<?x!fir.char<1,?>>>>>
+!CHECK:      %{{.*}} = fir.load %[[C_PVT_DECL]]#0 : !fir.ref<!fir.box<!fir.heap<!fir.array<?x!fir.char<1,?>>>>>
 !CHECK:      fir.if %{{.*}} {
-!CHECK:        %[[C_PVT_BOX:.*]] = fir.load %[[C_PVT_DECL]]#1 : !fir.ref<!fir.box<!fir.heap<!fir.array<?x!fir.char<1,?>>>>>
+!CHECK:        %[[C_PVT_BOX:.*]] = fir.load %[[C_PVT_DECL]]#0 : !fir.ref<!fir.box<!fir.heap<!fir.array<?x!fir.char<1,?>>>>>
 !CHECK:        %[[C_PVT_ADDR:.*]] = fir.box_addr %[[C_PVT_BOX]] : (!fir.box<!fir.heap<!fir.array<?x!fir.char<1,?>>>>) -> !fir.heap<!fir.array<?x!fir.char<1,?>>>
 !CHECK:        fir.freemem %[[C_PVT_ADDR]] : !fir.heap<!fir.array<?x!fir.char<1,?>>>
 !CHECK:      }
diff --git a/flang/test/Lower/OpenMP/parallel-private-clause.f90 b/flang/test/Lower/OpenMP/parallel-private-clause.f90
index 3e46d315f8cc47..5578b6710da7cd 100644
--- a/flang/test/Lower/OpenMP/parallel-private-clause.f90
+++ b/flang/test/Lower/OpenMP/parallel-private-clause.f90
@@ -150,8 +150,8 @@ subroutine private_clause_derived_type()
 !FIRDialect-DAG:    %[[X4_PVT:.*]] = fir.alloca !fir.box<!fir.heap<!fir.array<?xi32>>> {bindc_name = "x4", pinned, uniq_name = "{{.*}}Ex4"}
 !FIRDialect-DAG:    %[[X4_PVT_DECL:.*]]:2 = hlfir.declare %[[X4_PVT]] {fortran_attrs = #fir.var_attrs<allocatable>, uniq_name = "{{.*}}Ex4"} : (!fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>) -> (!fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>, !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>)
 
-!FIRDialect-DAG:    %[[TMP58:.*]] = fir.load %[[X4_DECL]]#1 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>
-!FIRDialect-DAG:    %[[TMP97:.*]] = fir.load %[[X4_DECL]]#1 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>
+!FIRDialect-DAG:    %[[TMP58:.*]] = fir.load %[[X4_DECL]]#0 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>
+!FIRDialect-DAG:    %[[TMP97:.*]] = fir.load %[[X4_DECL]]#0 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>
 !FIRDialect-DAG:    %[[TMP98:.*]]:3 = fir.box_dims %[[TMP97]], {{.*}} : (!fir.box<!fir.heap<!fir.array<?xi32>>>, index) -> (index, index, index)
 
 !FIRDialect-DAG:    %[[TMP101:.*]] = fir.allocmem !fir.array<?xi32>, {{.*}} {fir.must_be_heap = true, uniq_name = "{{.*}}Ex4.alloc"}
@@ -192,12 +192,12 @@ subroutine private_clause_allocatable()
 !FIRDialect-DAG: }
 !FIRDialect-DAG: %[[X5_PVT_DECL:.*]]:2 = hlfir.declare %[[X5_PVT]] {fortran_attrs = #fir.var_attrs<allocatable>, uniq_name = "_QFprivate_clause_real_call_allocatableEx5"} : (!fir.ref<!fir.box<!fir.heap<f32>>>) -> (!fir.ref<!fir.box<!fir.heap<f32>>>, !fir.ref<!fir.box<!fir.heap<f32>>>)
 !FIRDialect-DAG: fir.call @_QFprivate_clause_real_call_allocatablePhelper_private_clause_real_call_allocatable(%[[X5_PVT_DECL]]#0) fastmath<contract> : (!fir.ref<!fir.box<!fir.heap<f32>>>) -> ()
-!FIRDialect-DAG: %{{.*}} = fir.load %[[X5_PVT_DECL]]#1 : !fir.ref<!fir.box<!fir.heap<f32>>>
+!FIRDialect-DAG: %{{.*}} = fir.load %[[X5_PVT_DECL]]#0 : !fir.ref<!fir.box<!fir.heap<f32>>>
 
 !FIRDialect-DAG: fir.if %{{.*}} {
-!FIRDialect-DAG:   %{{.*}} = fir.load %[[X5_PVT_DECL]]#1 : !fir.ref<!fir.box<!fir.heap<f32>>>
+!FIRDialect-DAG:   %{{.*}} = fir.load %[[X5_PVT_DECL]]#0 : !fir.ref<!fir.box<!fir.heap<f32>>>
 
-!FIRDialect-DAG:     fir.store %{{.*}} to %[[X5_PVT_DECL]]#1 : !fir.ref<!fir.box<!fir.heap<f32>>>
+!FIRDialect-DAG:     fir.store %{{.*}} to %[[X5_PVT_DECL]]#0 : !fir.ref<!fir.box<!fir.heap<f32>>>
 !FIRDialect-DAG:   }
 !FIRDialect-DAG:   omp.terminator
 !FIRDialect-DAG:   }
@@ -313,12 +313,12 @@ subroutine simple_loop_1
     print*, i
   end do
   ! FIRDialect:     omp.yield
-  ! FIRDialect:     {{%.*}} = fir.load %[[R_DECL]]#1 : !fir.ref<!fir.box<!fir.heap<f32>>>
+  ! FIRDialect:     {{%.*}} = fir.load %[[R_DECL]]#0 : !fir.ref<!fir.box<!fir.heap<f32>>>
   ! FIRDialect:     fir.if {{%.*}} {
-  ! FIRDialect:     [[LD:%.*]] = fir.load %[[R_DECL]]#1 : !fir.ref<!fir.box<!fir.heap<f32>>>
+  ! FIRDialect:     [[LD:%.*]] = fir.load %[[R_DECL]]#0 : !fir.ref<!fir.box<!fir.heap<f32>>>
   ! FIRDialect:     [[AD:%.*]] = fir.box_addr [[LD]] : (!fir.box<!fir.heap<f32>>) -> !fir.heap<f32>
   ! FIRDialect:     fir.freemem [[AD]] : !fir.heap<f32>
-  ! FIRDialect:     fir.store {{%.*}} to %[[R_DECL]]#1 : !fir.ref<!fir.box<!fir.heap<f32>>>
+  ! FIRDialect:     fir.store {{%.*}} to %[[R_DECL]]#0 : !fir.ref<!fir.box<!fir.heap<f32>>>
   !$OMP END DO
   ! FIRDialect:  omp.terminator
   !$OMP END PARALLEL
@@ -351,12 +351,12 @@ subroutine simple_loop_2
     print*, i
   end do
   ! FIRDialect:     omp.yield
-  ! FIRDialect:     {{%.*}} = fir.load %[[R_DECL]]#1 : !fir.ref<!fir.box<!fir.heap<f32>>>
+  ! FIRDialect:     {{%.*}} = fir.load %[[R_DECL]]#0 : !fir.ref<!fir.box<!fir.heap<f32>>>
   ! FIRDialect:     fir.if {{%.*}} {
-  ! FIRDialect:     [[LD:%.*]] = fir.load %[[R_DECL]]#1 : !fir.ref<!fir.box<!fir.heap<f32>>>
+  ! FIRDialect:     [[LD:%.*]] = fir.load %[[R_DECL]]#0 : !fir.ref<!fir.box<!fir.heap<f32>>>
   ! FIRDialect:     [[AD:%.*]] = fir.box_addr [[LD]] : (!fir.box<!fir.heap<f32>>) -> !fir.heap<f32>
   ! FIRDialect:     fir.freemem [[AD]] : !fir.heap<f32>
-  ! FIRDialect:     fir.store {{%.*}} to %[[R_DECL]]#1 : !fir.ref<!fir.box<!fir.heap<f32>>>
+  ! FIRDialect:     fir.store {{%.*}} to %[[R_DECL]]#0 : !fir.ref<!fir.box<!fir.heap<f32>>>
   !$OMP END DO
   ! FIRDialect:  omp.terminator
   !$OMP END PARALLEL
@@ -388,12 +388,12 @@ subroutine simple_loop_3
     print*, i
   end do
   ! FIRDialect:     omp.yield
-  ! FIRDialect:     {{%.*}} = fir.load [[R_DECL]]#1 : !fir.ref<!fir.box<!fir.heap<f32>>>
+  ! FIRDialect:     {{%.*}} = fir.load [[R_DECL]]#0 : !fir.ref<!fir.box<!fir.heap<f32>>>
   ! FIRDialect:     fir.if {{%.*}} {
-  ! FIRDialect:     [[LD:%.*]] = fir.load [[R_DECL]]#1 : !fir.ref<!fir.box<!fir.heap<f32>>>
+  ! FIRDialect:     [[LD:%.*]] = fir.load [[R_DECL]]#0 : !fir.ref<!fir.box<!fir.heap<f32>>>
   ! FIRDialect:     [[AD:%.*]] = fir.box_addr [[LD]] : (!fir.box<!fir.heap<f32>>) -> !fir.heap<f32>
   ! FIRDialect:     fir.freemem [[AD]] : !fir.heap<f32>
-  ! FIRDialect:     fir.store {{%.*}} to [[R_DECL]]#1 : !fir.ref<!fir.box<!fir.heap<f32>>>
+  ! FIRDialect:     fir.store {{%.*}} to [[R_DECL]]#0 : !fir.ref<!fir.box<!fir.heap<f32>>>
   !$OMP END PARALLEL DO
   ! FIRDialect:  omp.terminator
 end subroutine
@@ -421,10 +421,10 @@ subroutine simd_loop_1
   end do
   !$OMP END SIMD
   ! FIRDialect:     omp.yield
-  ! FIRDialect:     {{%.*}} = fir.load [[R_DECL]]#1 : !fir.ref<!fir.box<!fir.heap<f32>>>
+  ! FIRDialect:     {{%.*}} = fir.load [[R_DECL]]#0 : !fir.ref<!fir.box<!fir.heap<f32>>>
   ! FIRDialect:     fir.if {{%.*}} {
-  ! FIRDialect:     [[LD:%.*]] = fir.load [[R_DECL]]#1 : !fir.ref<!fir.box<!fir.heap<f32>>>
+  ! FIRDialect:     [[LD:%.*]] = fir.load [[R_DECL]]#0 : !fir.ref<!fir.box<!fir.heap<f32>>>
   ! FIRDialect:     [[AD:%.*]] = fir.box_addr [[LD]] : (!fir.box<!fir.heap<f32>>) -> !fir.heap<f32>
   ! FIRDialect:     fir.freemem [[AD]] : !fir.heap<f32>
-  ! FIRDialect:     fir.store {{%.*}} to [[R_DECL]]#1 : !fir.ref<!fir.box<!fir.heap<f32>>>
+  ! FIRDialect:     fir.store {{%.*}} to [[R_DECL]]#0 : !fir.ref<!fir.box<!fir.heap<f32>>>
 end subroutine

Copy link

github-actions bot commented Mar 6, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Modifies the privatization logic so that the emitted code only used the
HLFIR base (i.e. SSA value `#0` returned from `hlfir.declare`). Before
that, that emitted privatization logic was a mix of using `#0` and `llvm#1`
which leads to some difficulties trying to move to delayed privatization
(see the discussion on llvm#84033).
@ergawy ergawy changed the title [not-for-merge] Experiment with using alloc's first component [flang][OpenMP] Only HLFIR base in privatization logic Mar 7, 2024
@ergawy
Copy link
Member Author

ergawy commented Mar 7, 2024

@kiranchandramohan @jeanPerier this is the most local solution I could find to apply what we discussed to privatization logic.

I think modifying getSymbolAddress as Kiran suggested at first might not work since:

  • In privatization logic, we need the SymbolBox and not just the address.
  • getSymbolAddress is not used by the privatization logic.

Hopefully this is a step in good direction.

Also, I am happy to help in general with cleaning up references to #1 in general (not just privatization). Let me know if you are interested, so that we can align on this somehow.

@jeanPerier
Copy link
Contributor

Your solution looks acceptable to me.

In the future, the clone creation could be based on HLFIR helpers (like hlfir::createTempFromMold) using #0 instead of going to the fir::ExtendedValue level and having custom code in the bridge. But these helpers may not do exactly what is needed for OpenMP currently (e.g, they will use allocmem for arrays, while the clone code creates allocas).

Also, I am happy to help in general with cleaning up references to #1 in general (not just privatization). Let me know if you are interested, so that we can align on this somehow.

In the case where #0 and #1 are both fir.box/fir.class, this could make sense given the commit that anyway "merged" them in HLFIR codegeneration. However, when #1 is not a fir.box, replacing it by #0 would most likely pessimize the generated code because FIR codegen for operations using fir.box is usually much more expensive/harder to optimize (by default, FIR codegen will assumed addressing of fir.box is not contiguous), so this would pessimize codes like this when going from HLFIR down to FIR (and later LLVM):

subroutine foo(x, y)
  real :: x(0:99), y(0:99)
  x = y
end subroutine

from the following FIR (-emit-fir of the loop body):

    fir.do_loop %arg2 = %c1 to %c100 step %c1 unordered {
      %5 = arith.addi %arg2, %c-1 : index
      %6 = fir.array_coor %3(%0) %5 : (!fir.ref<!fir.array<100xf32>>, !fir.shapeshift<1>, index) -> !fir.ref<f32>
      %7 = fir.load %6 : !fir.ref<f32>
      %8 = arith.addi %arg2, %c-1 : index
      %9 = fir.array_coor %1(%0) %8 : (!fir.ref<!fir.array<100xf32>>, !fir.shapeshift<1>, index) -> !fir.ref<f32>
      fir.store %7 to %9 : !fir.ref<f32>
    }

to

    fir.do_loop %arg2 = %c1 to %c100 step %c1 unordered {
      %5 = arith.addi %arg2, %c-1 : index
      %6 = fir.array_coor %4(%0) %5 : (!fir.box<!fir.array<100xf32>>, !fir.shapeshift<1>, index) -> !fir.ref<f32>
      %7 = fir.load %6 : !fir.ref<f32>
      %8 = arith.addi %arg2, %c-1 : index
      %9 = fir.array_coor %2(%0) %8 : (!fir.box<!fir.array<100xf32>>, !fir.shapeshift<1>, index) -> !fir.ref<f32>
      fir.store %7 to %9 : !fir.ref<f32>
    }

Because hlfir::translateToExtendedValue is used in the HLFIR to FIR pass to get rid of fir.box usages as much as possible, and use simple FIR operations as much as possible.

To be fair, LLVM with optimizations enabled does a good job at removing the temporary descriptors and replacing the byte stride addressing in the simple cases where it can see the descriptor creation. But I would not jump on changing this without a great confidence that increasing the descriptor usages in FIR operations (as opposed to HLFIR) won't be a problem for more complex optimizations and programs.

@ergawy ergawy changed the title [flang][OpenMP] Only HLFIR base in privatization logic [flang][OpenMP] Only use HLFIR base in privatization logic Mar 11, 2024
@ergawy
Copy link
Member Author

ergawy commented Mar 11, 2024

Thanks for the approval and for explaining some of the intricacies in this.

@ergawy ergawy merged commit 3b30559 into llvm:main Mar 11, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flang:fir-hlfir flang:openmp flang Flang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants