Skip to content

Commit

Permalink
[flang][hlfir] Allow expanding realloc assignments with scalar RHS.
Browse files Browse the repository at this point in the history
F18 10.2.1.3 p. 3 states:
If the variable is an unallocated allocatable array, expr shall have the same rank.

So if LHS is an array and RHS is a scalar, then LHS must be allocated and
the assignment is performed according to F18 10.2.1.3 p. 5:
If expr is a scalar and the variable is an array,
the expr is treated as if it were an array of the same shape as the
variable with every element of the array equal to the scalar value of expr.

This resolves performance regression in CPU2006/437.leslie3d caused
by extra Assign runtime calls for ALLOCATABLE local arrays.
Note that the extra calls do not add overhead themselves.
The problem is that the descriptor for ALLOCATABLE is passed
to Assign runtime function, and this messes up the points-to
analysis.

Example:
```
      ALLOCATABLE DUDX(:),DUDY(:),DUDZ(:)
...
      ALLOCATE( QS(IMAX-1),FSK(IMAX-1,0:KMAX,ND),
     >      QDIFFZ(IMAX-1), RMU(IMAX-1), EKCOEF(IMAX-1),
     >      DUDX(IMAX-1),DUDY(IMAX-1),DUDZ(IMAX-1),
...
      DUDZ=0D0
...
               DO I = I1, I2
                  DUDZ(I) =
     >                  DZI * ABD * ((U(I,J,KBD) - U(I,J,KCD)) +
     >                       8.0D0 * (U(I,J, KK) - U(I,J,KBD))) * R6I
```

When we are not lowering `DUDZ=0D0` to Assign call, the `base_addr` of
`DUDZ`'s descriptor is a result of `malloc`, and LLVM is able to figure out
that the accesses through this `base_addr` cannot overlap with accesses of,
for exmaple, module (global) variable DZI. This enables CSE and LICM
for the loop, eventually, resulting in clean vectorization.

When `DUDZ`'s descriptor "escapes" to Assign runtime function,
there are no guarantees about where `base_addr` can point to.
I do not think this can be resolved by using any existing LLVM function/argument
attributes. Maybe we will be able to communicate the no-aliasing information
to LLVM using `Full Restrict Support` representation.

For the purpose of enabling HLFIR by default, I am just aligning the IR
with what we have with FIR lowering.

Reviewed By: tblah

Differential Revision: https://reviews.llvm.org/D159391
  • Loading branch information
vzakhari committed Sep 4, 2023
1 parent dd27036 commit 09361b1
Show file tree
Hide file tree
Showing 2 changed files with 17 additions and 6 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -459,9 +459,10 @@ class BroadcastAssignBufferization

mlir::LogicalResult BroadcastAssignBufferization::matchAndRewrite(
hlfir::AssignOp assign, mlir::PatternRewriter &rewriter) const {
if (assign.isAllocatableAssignment())
return rewriter.notifyMatchFailure(assign, "AssignOp may imply allocation");

// Since RHS is a scalar and LHS is an array, LHS must be allocated
// in a conforming Fortran program, and LHS cannot be reallocated
// as a result of the assignment. So we can ignore isAllocatableAssignment
// and do the transformation always.
mlir::Value rhs = assign.getRhs();
if (!fir::isa_trivial(rhs.getType()))
return rewriter.notifyMatchFailure(
Expand Down
16 changes: 13 additions & 3 deletions flang/test/HLFIR/opt-scalar-assign.fir
Original file line number Diff line number Diff line change
Expand Up @@ -86,9 +86,19 @@ func.func @_QPtest3(%arg0: !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>> {fir
}
// CHECK-LABEL: func.func @_QPtest3(
// CHECK-SAME: %[[VAL_0:.*]]: !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>> {fir.bindc_name = "x"}) {
// CHECK: %[[VAL_1:.*]] = arith.constant 0 : i32
// CHECK: %[[VAL_2:.*]]:2 = hlfir.declare %[[VAL_0]] {fortran_attrs = #fir.var_attrs<allocatable>, uniq_name = "_QFtest3Ex"} : (!fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>) -> (!fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>, !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>)
// CHECK: hlfir.assign %[[VAL_1]] to %[[VAL_2]]#0 realloc : i32, !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>
// CHECK: %[[VAL_1:.*]] = arith.constant 1 : index
// CHECK: %[[VAL_2:.*]] = arith.constant 0 : index
// CHECK: %[[VAL_3:.*]] = arith.constant 0 : i32
// CHECK: %[[VAL_4:.*]]:2 = hlfir.declare %[[VAL_0]] {fortran_attrs = #fir.var_attrs<allocatable>, uniq_name = "_QFtest3Ex"} : (!fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>) -> (!fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>, !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>)
// CHECK: %[[VAL_5:.*]] = fir.load %[[VAL_4]]#0 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>
// CHECK: %[[VAL_6:.*]]:3 = fir.box_dims %[[VAL_5]], %[[VAL_2]] : (!fir.box<!fir.heap<!fir.array<?xi32>>>, index) -> (index, index, index)
// CHECK: fir.do_loop %[[VAL_7:.*]] = %[[VAL_1]] to %[[VAL_6]]#1 step %[[VAL_1]] unordered {
// CHECK: %[[VAL_8:.*]]:3 = fir.box_dims %[[VAL_5]], %[[VAL_2]] : (!fir.box<!fir.heap<!fir.array<?xi32>>>, index) -> (index, index, index)
// CHECK: %[[VAL_9:.*]] = arith.subi %[[VAL_8]]#0, %[[VAL_1]] : index
// CHECK: %[[VAL_10:.*]] = arith.addi %[[VAL_7]], %[[VAL_9]] : index
// CHECK: %[[VAL_11:.*]] = hlfir.designate %[[VAL_5]] (%[[VAL_10]]) : (!fir.box<!fir.heap<!fir.array<?xi32>>>, index) -> !fir.ref<i32>
// CHECK: hlfir.assign %[[VAL_3]] to %[[VAL_11]] : i32, !fir.ref<i32>
// CHECK: }
// CHECK: return
// CHECK: }

Expand Down

0 comments on commit 09361b1

Please sign in to comment.