[flang][OpenMP] Support reduction of allocatable variables #88392

tblah · 2024-04-11T13:59:56Z

Both arrays and trivial scalars are supported. Both cases must use by-ref reductions because both are boxed.

My understanding of the standards are that OpenMP says that this should follow the rules of the intrinsic reduction operators in fortran, and fortran says that unallocated allocatable variables can only be referenced to allocate them or test if they are already allocated. Therefore we do not need a null pointer check in the combiner region.

Both arrays and trivial scalars are supported.

llvmbot · 2024-04-11T14:00:31Z

@llvm/pr-subscribers-flang-fir-hlfir

@llvm/pr-subscribers-flang-openmp

Author: Tom Eccles (tblah)

Changes

Both arrays and trivial scalars are supported. Both cases must use by-ref reductions because both are boxed.

My understanding of the standards are that OpenMP says that this should follow the rules of the intrinsic reduction operators in fortran, and fortran says that unallocated allocatable variables can only be referenced to allocate them or test if they are already allocated. Therefore we do not need a null pointer check in the combiner region.

Patch is 22.26 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/88392.diff

4 Files Affected:

(modified) flang/lib/Lower/OpenMP/ReductionProcessor.cpp (+72-14)
(removed) flang/test/Lower/OpenMP/Todo/reduction-allocatable.f90 (-21)
(added) flang/test/Lower/OpenMP/parallel-reduction-allocatable-array.f90 (+104)
(added) flang/test/Lower/OpenMP/wsloop-reduction-allocatable.f90 (+84)

diff --git a/flang/lib/Lower/OpenMP/ReductionProcessor.cpp b/flang/lib/Lower/OpenMP/ReductionProcessor.cpp
index 918edf27baf66c..736dca4ed13599 100644
--- a/flang/lib/Lower/OpenMP/ReductionProcessor.cpp
+++ b/flang/lib/Lower/OpenMP/ReductionProcessor.cpp
@@ -301,10 +301,11 @@ static void genBoxCombiner(fir::FirOpBuilder &builder, mlir::Location loc,
                            ReductionProcessor::ReductionIdentifier redId,
                            fir::BaseBoxType boxTy, mlir::Value lhs,
                            mlir::Value rhs) {
-  fir::SequenceType seqTy =
-      mlir::dyn_cast_or_null<fir::SequenceType>(boxTy.getEleTy());
-  // TODO: support allocatable arrays: !fir.box<!fir.heap<!fir.array<...>>>
-  if (!seqTy || seqTy.hasUnknownShape())
+  fir::SequenceType seqTy = mlir::dyn_cast_or_null<fir::SequenceType>(
+      fir::unwrapRefType(boxTy.getEleTy()));
+  fir::HeapType heapTy =
+      mlir::dyn_cast_or_null<fir::HeapType>(boxTy.getEleTy());
+  if ((!seqTy || seqTy.hasUnknownShape()) && !heapTy)
     TODO(loc, "Unsupported boxed type in OpenMP reduction");
 
   // load fir.ref<fir.box<...>>
@@ -312,6 +313,23 @@ static void genBoxCombiner(fir::FirOpBuilder &builder, mlir::Location loc,
   lhs = builder.create<fir::LoadOp>(loc, lhs);
   rhs = builder.create<fir::LoadOp>(loc, rhs);
 
+  if (heapTy && !seqTy) {
+    // get box contents (heap pointers)
+    lhs = builder.create<fir::BoxAddrOp>(loc, lhs);
+    rhs = builder.create<fir::BoxAddrOp>(loc, rhs);
+    mlir::Value lhsValAddr = lhs;
+
+    // load heap pointers
+    lhs = builder.create<fir::LoadOp>(loc, lhs);
+    rhs = builder.create<fir::LoadOp>(loc, rhs);
+
+    mlir::Value result = ReductionProcessor::createScalarCombiner(
+        builder, loc, redId, heapTy.getEleTy(), lhs, rhs);
+    builder.create<fir::StoreOp>(loc, result, lhsValAddr);
+    builder.create<mlir::omp::YieldOp>(loc, lhsAddr);
+    return;
+  }
+
   const unsigned rank = seqTy.getDimension();
   llvm::SmallVector<mlir::Value> extents;
   extents.reserve(rank);
@@ -338,6 +356,10 @@ static void genBoxCombiner(fir::FirOpBuilder &builder, mlir::Location loc,
 
   // Iterate over array elements, applying the equivalent scalar reduction:
 
+  // F2018 5.4.10.2: Unallocated allocatable variables may not be referenced
+  // and so no null check is needed here before indexing into the (possibly
+  // allocatable) arrays.
+
   // A hlfir::elemental here gets inlined with a temporary so create the
   // loop nest directly.
   // This function already controls all of the code in this region so we
@@ -412,9 +434,11 @@ createReductionCleanupRegion(fir::FirOpBuilder &builder, mlir::Location loc,
 
   mlir::Type valTy = fir::unwrapRefType(redTy);
   if (auto boxTy = mlir::dyn_cast_or_null<fir::BaseBoxType>(valTy)) {
-    mlir::Type innerTy = fir::extractSequenceType(boxTy);
-    if (!mlir::isa<fir::SequenceType>(innerTy))
-      typeError();
+    if (!mlir::isa<fir::HeapType>(boxTy.getEleTy())) {
+      mlir::Type innerTy = fir::extractSequenceType(boxTy);
+      if (!mlir::isa<fir::SequenceType>(innerTy))
+        typeError();
+    }
 
     mlir::Value arg = block->getArgument(0);
     arg = builder.loadIfRef(loc, arg);
@@ -443,6 +467,19 @@ createReductionCleanupRegion(fir::FirOpBuilder &builder, mlir::Location loc,
   typeError();
 }
 
+// like fir::unwrapSeqOrBoxedSeqType except it also works for non-sequence boxes
+static mlir::Type unwrapSeqOrBoxedType(mlir::Type ty) {
+  if (auto seqTy = ty.dyn_cast<fir::SequenceType>())
+    return seqTy.getEleTy();
+  if (auto boxTy = ty.dyn_cast<fir::BaseBoxType>()) {
+    auto eleTy = fir::unwrapRefType(boxTy.getEleTy());
+    if (auto seqTy = eleTy.dyn_cast<fir::SequenceType>())
+      return seqTy.getEleTy();
+    return eleTy;
+  }
+  return ty;
+}
+
 static mlir::Value
 createReductionInitRegion(fir::FirOpBuilder &builder, mlir::Location loc,
                           mlir::omp::DeclareReductionOp &reductionDecl,
@@ -450,7 +487,7 @@ createReductionInitRegion(fir::FirOpBuilder &builder, mlir::Location loc,
                           mlir::Type type, bool isByRef) {
   mlir::Type ty = fir::unwrapRefType(type);
   mlir::Value initValue = ReductionProcessor::getReductionInitValue(
-      loc, fir::unwrapSeqOrBoxedSeqType(ty), redId, builder);
+      loc, unwrapSeqOrBoxedType(ty), redId, builder);
 
   if (fir::isa_trivial(ty)) {
     if (isByRef) {
@@ -464,9 +501,24 @@ createReductionInitRegion(fir::FirOpBuilder &builder, mlir::Location loc,
 
   // all arrays are boxed
   if (auto boxTy = mlir::dyn_cast_or_null<fir::BaseBoxType>(ty)) {
-    assert(isByRef && "passing arrays by value is unsupported");
-    // TODO: support allocatable arrays: !fir.box<!fir.heap<!fir.array<...>>>
-    mlir::Type innerTy = fir::extractSequenceType(boxTy);
+    assert(isByRef && "passing boxes by value is unsupported");
+    mlir::Type innerTy = fir::unwrapRefType(boxTy.getEleTy());
+    if (fir::isa_trivial(innerTy)) {
+      // boxed non-sequence value e.g. !fir.box<!fir.heap<i32>>
+      if (!mlir::isa<fir::HeapType>(boxTy.getEleTy()))
+        TODO(loc, "Reduction of non-allocatable trivial typed box");
+      mlir::Value boxAlloca = builder.create<fir::AllocaOp>(loc, ty);
+      mlir::Value valAlloc = builder.create<fir::AllocMemOp>(loc, innerTy);
+      builder.createStoreWithConvert(loc, initValue, valAlloc);
+      mlir::Value box = builder.create<fir::EmboxOp>(loc, ty, valAlloc);
+      builder.create<fir::StoreOp>(loc, box, boxAlloca);
+
+      auto insPt = builder.saveInsertionPoint();
+      createReductionCleanupRegion(builder, loc, reductionDecl);
+      builder.restoreInsertionPoint(insPt);
+      return boxAlloca;
+    }
+    innerTy = fir::extractSequenceType(boxTy);
     if (!mlir::isa<fir::SequenceType>(innerTy))
       TODO(loc, "Unsupported boxed type for reduction");
     // Create the private copy from the initial fir.box:
@@ -478,9 +530,10 @@ createReductionInitRegion(fir::FirOpBuilder &builder, mlir::Location loc,
     // work by inserting stacksave/stackrestore around the reduction in
     // openmpirbuilder
     auto [temp, needsDealloc] = createTempFromMold(loc, builder, source);
-    // if needsDealloc isn't statically false, add cleanup region. TODO: always
+    // if needsDealloc isn't statically false, add cleanup region. Always
     // do this for allocatable boxes because they might have been re-allocated
     // in the body of the loop/parallel region
+
     std::optional<int64_t> cstNeedsDealloc =
         fir::getIntIfConstant(needsDealloc);
     assert(cstNeedsDealloc.has_value() &&
@@ -489,13 +542,18 @@ createReductionInitRegion(fir::FirOpBuilder &builder, mlir::Location loc,
       auto insPt = builder.saveInsertionPoint();
       createReductionCleanupRegion(builder, loc, reductionDecl);
       builder.restoreInsertionPoint(insPt);
+    } else {
+      assert(!mlir::isa<fir::HeapType>(boxTy.getEleTy()) &&
+             "Allocatable arrays must be heap allocated");
     }
 
     // Put the temporary inside of a box:
     hlfir::Entity box = hlfir::genVariableBox(loc, builder, temp);
-    builder.create<hlfir::AssignOp>(loc, initValue, box);
+    // hlfir::genVariableBox removes fir.heap<> around the element type
+    mlir::Value convertedBox = builder.createConvert(loc, ty, box.getBase());
+    builder.create<hlfir::AssignOp>(loc, initValue, convertedBox);
     mlir::Value boxAlloca = builder.create<fir::AllocaOp>(loc, ty);
-    builder.create<fir::StoreOp>(loc, box, boxAlloca);
+    builder.create<fir::StoreOp>(loc, convertedBox, boxAlloca);
     return boxAlloca;
   }
 
diff --git a/flang/test/Lower/OpenMP/Todo/reduction-allocatable.f90 b/flang/test/Lower/OpenMP/Todo/reduction-allocatable.f90
deleted file mode 100644
index 09aba6920232aa..00000000000000
--- a/flang/test/Lower/OpenMP/Todo/reduction-allocatable.f90
+++ /dev/null
@@ -1,21 +0,0 @@
-! RUN: %not_todo_cmd bbc -emit-fir -fopenmp -o - %s 2>&1 | FileCheck %s
-! RUN: %not_todo_cmd %flang_fc1 -emit-fir -fopenmp -o - %s 2>&1 | FileCheck %s
-
-! CHECK: not yet implemented: Reduction of some types is not supported
-subroutine reduction_allocatable
-  integer, allocatable :: x
-  integer :: i = 1
-
-  allocate(x)
-  x = 0
-
-  !$omp parallel num_threads(4)
-  !$omp do reduction(+:x)
-  do i = 1, 10
-    x = x + i
-  enddo
-  !$omp end do
-  !$omp end parallel
-
-  print *, x
-end subroutine
diff --git a/flang/test/Lower/OpenMP/parallel-reduction-allocatable-array.f90 b/flang/test/Lower/OpenMP/parallel-reduction-allocatable-array.f90
new file mode 100644
index 00000000000000..d1c9a26c595dce
--- /dev/null
+++ b/flang/test/Lower/OpenMP/parallel-reduction-allocatable-array.f90
@@ -0,0 +1,104 @@
+! RUN: bbc -emit-hlfir -fopenmp -o - %s | FileCheck %s
+! RUN: %flang_fc1 -emit-hlfir -fopenmp -o - %s | FileCheck %s
+
+program reduce
+integer :: i = 0
+integer, dimension(:), allocatable :: r
+
+allocate(r(2))
+
+!$omp parallel do reduction(+:r)
+do i=0,10
+  r(1) = i
+  r(2) = -i
+enddo
+!$omp end parallel do
+
+print *,r
+
+end program
+
+! CHECK-LABEL:   omp.declare_reduction @add_reduction_byref_box_heap_Uxi32 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>> init {
+! CHECK:         ^bb0(%[[VAL_0:.*]]: !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>):
+! CHECK:           %[[VAL_1:.*]] = arith.constant 0 : i32
+! CHECK:           %[[VAL_2:.*]] = fir.load %[[VAL_0]] : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>
+! CHECK:           %[[VAL_3:.*]] = arith.constant 0 : index
+! CHECK:           %[[VAL_4:.*]]:3 = fir.box_dims %[[VAL_2]], %[[VAL_3]] : (!fir.box<!fir.heap<!fir.array<?xi32>>>, index) -> (index, index, index)
+! CHECK:           %[[VAL_5:.*]] = fir.shape %[[VAL_4]]#1 : (index) -> !fir.shape<1>
+! CHECK:           %[[VAL_6:.*]] = fir.allocmem !fir.array<?xi32>, %[[VAL_4]]#1 {bindc_name = ".tmp", uniq_name = ""}
+! CHECK:           %[[VAL_7:.*]] = arith.constant true
+! CHECK:           %[[VAL_8:.*]]:2 = hlfir.declare %[[VAL_6]](%[[VAL_5]]) {uniq_name = ".tmp"} : (!fir.heap<!fir.array<?xi32>>, !fir.shape<1>) -> (!fir.box<!fir.array<?xi32>>, !fir.heap<!fir.array<?xi32>>)
+! CHECK:           %[[VAL_9:.*]] = fir.convert %[[VAL_8]]#0 : (!fir.box<!fir.array<?xi32>>) -> !fir.box<!fir.heap<!fir.array<?xi32>>>
+! CHECK:           hlfir.assign %[[VAL_1]] to %[[VAL_9]] : i32, !fir.box<!fir.heap<!fir.array<?xi32>>>
+! CHECK:           %[[VAL_10:.*]] = fir.alloca !fir.box<!fir.heap<!fir.array<?xi32>>>
+! CHECK:           fir.store %[[VAL_9]] to %[[VAL_10]] : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>
+! CHECK:           omp.yield(%[[VAL_10]] : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>)
+! CHECK:         } combiner {
+! CHECK:         ^bb0(%[[VAL_0:.*]]: !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>, %[[VAL_1:.*]]: !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>):
+! CHECK:           %[[VAL_2:.*]] = fir.load %[[VAL_0]] : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>
+! CHECK:           %[[VAL_3:.*]] = fir.load %[[VAL_1]] : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>
+! CHECK:           %[[VAL_4:.*]] = arith.constant 0 : index
+! CHECK:           %[[VAL_5:.*]]:3 = fir.box_dims %[[VAL_2]], %[[VAL_4]] : (!fir.box<!fir.heap<!fir.array<?xi32>>>, index) -> (index, index, index)
+! CHECK:           %[[VAL_6:.*]] = fir.shape_shift %[[VAL_5]]#0, %[[VAL_5]]#1 : (index, index) -> !fir.shapeshift<1>
+! CHECK:           %[[VAL_7:.*]] = arith.constant 1 : index
+! CHECK:           fir.do_loop %[[VAL_8:.*]] = %[[VAL_7]] to %[[VAL_5]]#1 step %[[VAL_7]] unordered {
+! CHECK:             %[[VAL_9:.*]] = fir.array_coor %[[VAL_2]](%[[VAL_6]]) %[[VAL_8]] : (!fir.box<!fir.heap<!fir.array<?xi32>>>, !fir.shapeshift<1>, index) -> !fir.ref<i32>
+! CHECK:             %[[VAL_10:.*]] = fir.array_coor %[[VAL_3]](%[[VAL_6]]) %[[VAL_8]] : (!fir.box<!fir.heap<!fir.array<?xi32>>>, !fir.shapeshift<1>, index) -> !fir.ref<i32>
+! CHECK:             %[[VAL_11:.*]] = fir.load %[[VAL_9]] : !fir.ref<i32>
+! CHECK:             %[[VAL_12:.*]] = fir.load %[[VAL_10]] : !fir.ref<i32>
+! CHECK:             %[[VAL_13:.*]] = arith.addi %[[VAL_11]], %[[VAL_12]] : i32
+! CHECK:             fir.store %[[VAL_13]] to %[[VAL_9]] : !fir.ref<i32>
+! CHECK:           }
+! CHECK:           omp.yield(%[[VAL_0]] : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>)
+! CHECK:         }  cleanup {
+! CHECK:         ^bb0(%[[VAL_0:.*]]: !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>):
+! CHECK:           %[[VAL_1:.*]] = fir.load %[[VAL_0]] : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>
+! CHECK:           %[[VAL_2:.*]] = fir.box_addr %[[VAL_1]] : (!fir.box<!fir.heap<!fir.array<?xi32>>>) -> !fir.heap<!fir.array<?xi32>>
+! CHECK:           %[[VAL_3:.*]] = fir.convert %[[VAL_2]] : (!fir.heap<!fir.array<?xi32>>) -> i64
+! CHECK:           %[[VAL_4:.*]] = arith.constant 0 : i64
+! CHECK:           %[[VAL_5:.*]] = arith.cmpi ne, %[[VAL_3]], %[[VAL_4]] : i64
+! CHECK:           fir.if %[[VAL_5]] {
+! CHECK:             fir.freemem %[[VAL_2]] : !fir.heap<!fir.array<?xi32>>
+! CHECK:           }
+! CHECK:           omp.yield
+! CHECK:         }
+
+! CHECK-LABEL:   func.func @_QQmain() attributes {fir.bindc_name = "reduce"} {
+! CHECK:           %[[VAL_0:.*]] = fir.address_of(@_QFEi) : !fir.ref<i32>
+! CHECK:           %[[VAL_1:.*]]:2 = hlfir.declare %[[VAL_0]] {uniq_name = "_QFEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
+! CHECK:           %[[VAL_2:.*]] = fir.address_of(@_QFEr) : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>
+! CHECK:           %[[VAL_3:.*]]:2 = hlfir.declare %[[VAL_2]] {fortran_attrs = {{.*}}<allocatable>, uniq_name = "_QFEr"} : (!fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>) -> (!fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>, !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>)
+! CHECK:           %[[VAL_4:.*]] = arith.constant 2 : i32
+! CHECK:           %[[VAL_5:.*]] = fir.convert %[[VAL_4]] : (i32) -> index
+! CHECK:           %[[VAL_6:.*]] = arith.constant 0 : index
+! CHECK:           %[[VAL_7:.*]] = arith.cmpi sgt, %[[VAL_5]], %[[VAL_6]] : index
+! CHECK:           %[[VAL_8:.*]] = arith.select %[[VAL_7]], %[[VAL_5]], %[[VAL_6]] : index
+! CHECK:           %[[VAL_9:.*]] = fir.allocmem !fir.array<?xi32>, %[[VAL_8]] {fir.must_be_heap = true, uniq_name = "_QFEr.alloc"}
+! CHECK:           %[[VAL_10:.*]] = fir.shape %[[VAL_8]] : (index) -> !fir.shape<1>
+! CHECK:           %[[VAL_11:.*]] = fir.embox %[[VAL_9]](%[[VAL_10]]) : (!fir.heap<!fir.array<?xi32>>, !fir.shape<1>) -> !fir.box<!fir.heap<!fir.array<?xi32>>>
+! CHECK:           fir.store %[[VAL_11]] to %[[VAL_3]]#1 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>
+! CHECK:           omp.parallel {
+! CHECK:             %[[VAL_12:.*]] = fir.alloca i32 {adapt.valuebyref, pinned}
+! CHECK:             %[[VAL_13:.*]]:2 = hlfir.declare %[[VAL_12]] {uniq_name = "_QFEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
+! CHECK:             %[[VAL_14:.*]] = arith.constant 0 : i32
+! CHECK:             %[[VAL_15:.*]] = arith.constant 10 : i32
+! CHECK:             %[[VAL_16:.*]] = arith.constant 1 : i32
+! CHECK:             omp.wsloop byref reduction(@add_reduction_byref_box_heap_Uxi32 %[[VAL_3]]#0 -> %[[VAL_17:.*]] : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>)  for  (%[[VAL_18:.*]]) : i32 = (%[[VAL_14]]) to (%[[VAL_15]]) inclusive step (%[[VAL_16]]) {
+! CHECK:               fir.store %[[VAL_18]] to %[[VAL_13]]#1 : !fir.ref<i32>
+! CHECK:               %[[VAL_19:.*]]:2 = hlfir.declare %[[VAL_17]] {fortran_attrs = {{.*}}<allocatable>, uniq_name = "_QFEr"} : (!fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>) -> (!fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>, !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>)
+! CHECK:               %[[VAL_20:.*]] = fir.load %[[VAL_13]]#0 : !fir.ref<i32>
+! CHECK:               %[[VAL_21:.*]] = fir.load %[[VAL_19]]#0 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>
+! CHECK:               %[[VAL_22:.*]] = arith.constant 1 : index
+! CHECK:               %[[VAL_23:.*]] = hlfir.designate %[[VAL_21]] (%[[VAL_22]])  : (!fir.box<!fir.heap<!fir.array<?xi32>>>, index) -> !fir.ref<i32>
+! CHECK:               hlfir.assign %[[VAL_20]] to %[[VAL_23]] : i32, !fir.ref<i32>
+! CHECK:               %[[VAL_24:.*]] = fir.load %[[VAL_13]]#0 : !fir.ref<i32>
+! CHECK:               %[[VAL_25:.*]] = arith.constant 0 : i32
+! CHECK:               %[[VAL_26:.*]] = arith.subi %[[VAL_25]], %[[VAL_24]] : i32
+! CHECK:               %[[VAL_27:.*]] = fir.load %[[VAL_19]]#0 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>
+! CHECK:               %[[VAL_28:.*]] = arith.constant 2 : index
+! CHECK:               %[[VAL_29:.*]] = hlfir.designate %[[VAL_27]] (%[[VAL_28]])  : (!fir.box<!fir.heap<!fir.array<?xi32>>>, index) -> !fir.ref<i32>
+! CHECK:               hlfir.assign %[[VAL_26]] to %[[VAL_29]] : i32, !fir.ref<i32>
+! CHECK:               omp.yield
+! CHECK:             }
+! CHECK:             omp.terminator
+! CHECK:           }
diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-allocatable.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-allocatable.f90
new file mode 100644
index 00000000000000..41f518818727b8
--- /dev/null
+++ b/flang/test/Lower/OpenMP/wsloop-reduction-allocatable.f90
@@ -0,0 +1,84 @@
+! RUN: bbc -emit-hlfir -fopenmp -o - %s | FileCheck %s
+! RUN: %flang_fc1 -emit-hlfir -fopenmp -o - %s | FileCheck %s
+
+program reduce
+integer :: i = 0
+integer, allocatable :: r
+
+allocate(r)
+r = 0
+
+!$omp parallel do reduction(+:r)
+do i=0,10
+  r = i
+enddo
+!$omp end parallel do
+
+print *,r
+
+end program
+
+! CHECK:         omp.declare_reduction @add_reduction_byref_box_heap_i32 : !fir.ref<!fir.box<!fir.heap<i32>>> init {
+! CHECK:         ^bb0(%[[VAL_0:.*]]: !fir.ref<!fir.box<!fir.heap<i32>>>):
+! CHECK:           %[[VAL_1:.*]] = arith.constant 0 : i32
+! CHECK:           %[[VAL_2:.*]] = fir.alloca !fir.box<!fir.heap<i32>>
+! CHECK:           %[[VAL_3:.*]] = fir.allocmem i32
+! CHECK:           fir.store %[[VAL_1]] to %[[VAL_3]] : !fir.heap<i32>
+! CHECK:           %[[VAL_4:.*]] = fir.embox %[[VAL_3]] : (!fir.heap<i32>) -> !fir.box<!fir.heap<i32>>
+! CHECK:           fir.store %[[VAL_4]] to %[[VAL_2]] : !fir.ref<!fir.box<!fir.heap<i32>>>
+! CHECK:           omp.yield(%[[VAL_2]] : !fir.ref<!fir.box<!fir.heap<i32>>>)
+! CHECK:         } combiner {
+! CHECK:         ^bb0(%[[VAL_0:.*]]: !fir.ref<!fir.box<!fir.heap<i32>>>, %[[VAL_1:.*]]: !fir.ref<!fir.box<!fir.heap<i32>>>):
+! CHECK:           %[[VAL_2:.*]] = fir.load %[[VAL_0]] : !fir.ref<!fir.box<!fir.heap<i32>>>
+! CHECK:           %[[VAL_3:.*]] = fir.load %[[VAL_1]] : !fir.ref<!fir.box<!fir.heap<i32>>>
+! CHECK:           %[[VAL_4:.*]] = fir.box_addr %[[VAL_2]] : (!fir.box<!fir.heap<i32>>) -> !fir.heap<i32>
+! CHECK:           %[[VAL_5:.*]] = fir.box_addr %[[VAL_3]] : (!fir.box<!fir.heap<i32>>) -> !fir.heap<i32>
+! CHECK:           %[[VAL_6:.*]] = fir.load %[[VAL_4]] : !fir.heap<i32>
+! CHECK:           %[[VAL_7:.*]] = fir.load %[[VAL_5]] : !fir.heap<i32>
+! CHECK:           %[[VAL_8:.*]] = arith.addi %[[VAL_6]], %[[VAL_7]] : i32
+! CHECK:           fir.store %[[VAL_8]] to %[[VAL_4]] : !fir.heap<i32>
+! CHECK:           omp.yield(%[[VAL_0]] : !fir.ref<!fir.box<!fir.heap<i32>>>)
+! CHECK:         }  cleanup {
+! CHECK:         ^bb0(%[[VAL_0:.*]]: !fir.ref<!fir.box<!fir.heap<i32>>>):
+! CHECK:           %[[VAL_1:.*]] = fir.load %[[VAL_0]] : !fir.ref<!fir.box<!fir.heap<i32>>>
+! CHECK:           %[[VAL_2:.*]] = fir.box_addr %[[VAL_1]] : (!fir.box<!fir.heap<i32>>) -> !fir.heap<i32>
+! CHECK:           %[[VAL_3:.*]] = fir.convert %[[VAL_2]] : (!fir.heap<i32>) -> i64
+! CHECK:           %[[VAL_4:.*]] = arith.constant 0 : i64
+! CHECK:           %[[VAL_5:.*]] = arith.cmpi ne, %[[VAL_3]], %[[VAL_4]] : i64
+! CHECK:           fir.if %[[VAL_5]] {
+! CHECK:             fir.freemem %[[VAL_2]] : !fir.heap<i32>
+! CHECK:           }
+! CHECK:           omp.yield
+! CHECK:         }
+
+! CHECK-LABEL:   func.func @_QQmain() attributes {fir.bindc_name = "reduce"} {
+! CHECK:           %[[VAL_0:.*]] = fir.address_of(@_QFEi) : !fir.ref<i32>
+! CHECK:           %[[VAL_1:.*]]:2 = hlfir.declare %[[VAL_0]] {uniq_name = "_QFEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
+! CHECK:           %[[VAL_2:.*]] = fir.alloca !fir.box<!fir.heap<i32>> {bindc_name = "r", uniq_name = "_QFEr"}
+! CHECK:           %[[VAL_3:.*]] = fir.zero_bits !fir.heap<i32>
+! CHECK:           %[[VAL_4:.*]] = fir.embox %[[VAL_3]] : (!fir.heap<i32>) -> !fir.box<!fir.heap<i...
[truncated]

flang/lib/Lower/OpenMP/ReductionProcessor.cpp

kiranchandramohan · 2024-04-15T11:13:14Z

flang/lib/Lower/OpenMP/ReductionProcessor.cpp

+    if (fir::isa_trivial(innerTy)) {
+      // boxed non-sequence value e.g. !fir.box<!fir.heap<i32>>
+      if (!mlir::isa<fir::HeapType>(boxTy.getEleTy()))
+        TODO(loc, "Reduction of non-allocatable trivial typed box");
+      mlir::Value boxAlloca = builder.create<fir::AllocaOp>(loc, ty);
+      mlir::Value valAlloc = builder.create<fir::AllocMemOp>(loc, innerTy);
+      builder.createStoreWithConvert(loc, initValue, valAlloc);
+      mlir::Value box = builder.create<fir::EmboxOp>(loc, ty, valAlloc);
+      builder.create<fir::StoreOp>(loc, box, boxAlloca);
+
+      auto insPt = builder.saveInsertionPoint();
+      createReductionCleanupRegion(builder, loc, reductionDecl);
+      builder.restoreInsertionPoint(insPt);
+      return boxAlloca;
+    }


For creating private copies the rules of allocatables are as follows. Do we have a check for the allocation status? Are these rules redefined in the standard for the reduction case?

For a list item or the subobject of a list item with the ALLOCATABLE attribute: -> If the allocation status is unallocated, the new list item or the subobject of the new list item will have an initial allocation status of unallocated; -> If the allocation status is allocated, the new list item or the subobject of the new list item will have an initial allocation status of allocated; and -> If the new list item or the subobject of the new list item is an array, its bounds will be the same as those of the original list item or the subobject of the original list item.

https://www.openmp.org/spec-html/5.0/openmpsu105.html

Thanks for pointing this out. I think I have now covered these with the addition of a7bf198

ye-luo · 2024-04-16T17:09:10Z

program main
  implicit none

  integer:: i
  REAL(8) :: EtsvdW_period
  REAL(8), DIMENSION(:,:), ALLOCATABLE :: FtsvdW_period,HtsvdW_period
  ALLOCATE(HtsvdW_period(3,3))

  !$omp parallel do reduction(+:EtsvdW_period) reduction(+:HtsvdW_period)
  do i=1,10
  enddo

endprogram

got ICE with flang-new -fopenmp main.f90

$ flang-new -fopenmp main.f90
flang-new: /home/yeluo/opt/llvm-clang/llvm-project/llvm/lib/IR/Instructions.cpp:3292: static BinaryOperator *llvm::BinaryOperator::Create(BinaryOps, Value *, Value *, const Twine &, Instruction *): Assertion `S1->getType() == S2->getType() && "Cannot create binary operator with two operands of differing type!"' failed.

If I fuse two reductions.

!$omp parallel do reduction(+:EtsvdW_period,HtsvdW_period)

the code compiles.

ye-luo · 2024-04-16T17:18:25Z

OK the issue doesn't seem directly related. reproducible on main #88935
After working around the syntax parsing bug, Quantum ESPRESSO compiles although I don't have a test case to hit the actually code path.

tblah · 2024-04-17T13:09:22Z

OK the issue doesn't seem directly related. reproducible on main #88935 After working around the syntax parsing bug, Quantum ESPRESSO compiles although I don't have a test case to hit the actually code path.

Thanks for reporting the issue and trying the patch!

tblah · 2024-04-18T15:00:31Z

There is a crash trying to find the right block to place an alloca during FIR lowering after the most recent changes. I will change the MR to WIP until this is fixed.

The change that led to this bug being exposed is that we now generate allocas nested inside of if statements instead of only in the immediate body of the reduction declare operation.

tblah · 2024-04-21T16:15:15Z

The bug should be fixed with 4dc7bd6. This is ready for review again

…e-array-reduction

kiranchandramohan

LGTM. Nice work Tom.

There was some effort to centralize code for deallocations.
#67003
There are also functions like genIsAllocatedOrAssociatedTest that could be helpful.

tblah · 2024-04-22T16:22:12Z

LGTM. Nice work Tom.

There was some effort to centralize code for deallocations. #67003 There are also functions like genIsAllocatedOrAssociatedTest that could be helpful.

Thanks for the recommendation. Using these APIs would require wrapping everything in a fir::MutableBoxValue. I didn't like that approach for a few reasons:

I feel it unnecessarily obfuscates what is being done in this case. I already use standard helpers to determine if the pointer is null. E.g. all that genIsAllocatedOrAssociated does is get the address and use the same helper to see if it is null.
It will make the code structure here more complex because only the allocatable boxes can be a MutableBoxValue
I want control of the generated fir::IfOp

A lot of the added complexity in the MutableBox code is for running finalizers etc. We only support trivial types here.

kiranchandramohan · 2024-04-22T16:26:03Z

LGTM. Nice work Tom.
There was some effort to centralize code for deallocations. #67003 There are also functions like genIsAllocatedOrAssociatedTest that could be helpful.

Thanks for the recommendation. Using these APIs would require wrapping everything in a fir::MutableBoxValue. I didn't like that approach for a few reasons:

I feel it unnecessarily obfuscates what is being done in this case. I already use standard helpers to determine if the pointer is null. E.g. all that genIsAllocatedOrAssociated does is get the address and use the same helper to see if it is null.

It will make the code structure here more complex because only the allocatable boxes can be a MutableBoxValue

I want control of the generated fir::IfOp

A lot of the added complexity in the MutableBox code is for running finalizers etc. We only support trivial types here.

OK.

BTW, do we have a TODO for types with finalizers?

tblah · 2024-04-22T16:28:30Z

BTW, do we have a TODO for types with finalizers?

Not a special one. I think it would hit the generic unsupported type in a reduction TODO.

[flang][OpenMP] Support reduction of allocatable variables

f55f251

Both arrays and trivial scalars are supported.

tblah requested review from DavidTruby, ergawy, ye-luo, skatrak, kiranchandramohan, Leporacanthicus, pawosm-arm and kparzysz April 11, 2024 13:59

llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir flang:openmp labels Apr 11, 2024

kiranchandramohan reviewed Apr 15, 2024

View reviewed changes

Correctly handle unallocated allocatables

a7bf198

tblah force-pushed the ecclescake/allocatable-array-reduction branch from 83614c3 to a7bf198 Compare April 17, 2024 13:10

tblah changed the title ~~[flang][OpenMP] Support reduction of allocatable variables~~ WIP: [flang][OpenMP] Support reduction of allocatable variables Apr 18, 2024

Fix broken alloca location

4dc7bd6

The change that led to this bug being exposed is that we now generate allocas nested inside of if statements instead of only in the immediate body of the reduction declare operation.

tblah changed the title ~~WIP: [flang][OpenMP] Support reduction of allocatable variables~~ [flang][OpenMP] Support reduction of allocatable variables Apr 21, 2024

Merge remote-tracking branch 'origin/main' into ecclescake/allocatabl…

ddedfd8

…e-array-reduction

kiranchandramohan approved these changes Apr 22, 2024

View reviewed changes

tblah merged commit 8cc34fa into llvm:main Apr 23, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[flang][OpenMP] Support reduction of allocatable variables #88392

[flang][OpenMP] Support reduction of allocatable variables #88392

tblah commented Apr 11, 2024

llvmbot commented Apr 11, 2024 •

edited

kiranchandramohan Apr 15, 2024

tblah Apr 17, 2024 •

edited

ye-luo commented Apr 16, 2024

ye-luo commented Apr 16, 2024 •

edited

tblah commented Apr 17, 2024

tblah commented Apr 18, 2024

tblah commented Apr 21, 2024

kiranchandramohan left a comment

tblah commented Apr 22, 2024

kiranchandramohan commented Apr 22, 2024

tblah commented Apr 22, 2024

[flang][OpenMP] Support reduction of allocatable variables #88392

[flang][OpenMP] Support reduction of allocatable variables #88392

Conversation

tblah commented Apr 11, 2024

llvmbot commented Apr 11, 2024 • edited

kiranchandramohan Apr 15, 2024

Choose a reason for hiding this comment

tblah Apr 17, 2024 • edited

Choose a reason for hiding this comment

ye-luo commented Apr 16, 2024

ye-luo commented Apr 16, 2024 • edited

tblah commented Apr 17, 2024

tblah commented Apr 18, 2024

tblah commented Apr 21, 2024

kiranchandramohan left a comment

Choose a reason for hiding this comment

tblah commented Apr 22, 2024

kiranchandramohan commented Apr 22, 2024

tblah commented Apr 22, 2024

llvmbot commented Apr 11, 2024 •

edited

tblah Apr 17, 2024 •

edited

ye-luo commented Apr 16, 2024 •

edited