-
Notifications
You must be signed in to change notification settings - Fork 15.1k
Closed
Labels
Description
The following MLIR file:
func.func @legateMLIRKernel3(%arg0: memref<?xf64>, %arg1: memref<?xf64>) attributes {llvm.emit_c_interface} {
%alloc = memref.alloc() : memref<1xf64>
%alloc_0 = memref.alloc() : memref<1xf64>
%c0 = arith.constant 0 : index
%dim = memref.dim %arg0, %c0 : memref<?xf64>
affine.for %arg2 = #map(%c0) to #map(%dim) {
%0 = affine.load %arg0[%arg2] : memref<?xf64>
%1 = arith.addf %0, %0 : f64
affine.store %1, %alloc_0[0] : memref<1xf64>
%2 = affine.load %alloc_0[0] : memref<1xf64>
%3 = affine.load %arg0[%arg2] : memref<?xf64>
%4 = arith.addf %2, %3 : f64
affine.store %4, %alloc[0] : memref<1xf64>
%5 = affine.load %alloc[0] : memref<1xf64>
%6 = affine.load %arg0[%arg2] : memref<?xf64>
%7 = arith.addf %5, %6 : f64
affine.store %7, %arg1[%arg2] : memref<?xf64>
}
return
}Has some loads that should be eliminated once load-store forwarding is performed. Running the affine scalar replacement pass (either directly via a PassManager or mlir-opt) yields:
func.func @legateMLIRKernel3(%arg0: memref<?xf64>, %arg1: memref<?xf64>) attributes {llvm.emit_c_interface} {
%c0 = arith.constant 0 : index
%dim = memref.dim %arg0, %c0 : memref<?xf64>
affine.for %arg2 = #map(%c0) to #map(%dim) {
%0 = affine.load %arg0[%arg2] : memref<?xf64>
%1 = arith.addf %0, %0 : f64
%2 = affine.load %arg0[%arg2] : memref<?xf64>
%3 = arith.addf %1, %2 : f64
%4 = affine.load %arg0[%arg2] : memref<?xf64>
%5 = arith.addf %3, %4 : f64
affine.store %5, %arg1[%arg2] : memref<?xf64>
}
return
}which still contains multiple redundant loads. Application of the pass again removes these loads, which is the desired result. Based on discussion at https://discourse.llvm.org/t/understanding-the-affine-loop-fusion-pass/69452/19?u=rohany, this is a bug, as the pass should not require multiple applications.