[mlir][bufferization] Fix bug in bufferization of elementwise ops #97209

matthias-springer · 2024-06-30T11:57:00Z

There is an optimization in One-Shot Bufferize wrt. ops that bufferize to elementwise access. A copy can sometimes be avoided. E.g.:

%0 = tensor.empty()
%1 = tensor.fill ...
%2 = linalg.map ins(%1, ...) outs(%1)

In the above example, a buffer copy is not needed for %1, even though the same buffer is read/written by two different operands (of the same op). That's because the op bufferizes to elementwise access.

// Two equivalent operands of the same op are not conflicting if the op
// bufferizes to element-wise access. I.e., all loads at a position
// happen before all stores to the same position.

This optimization cannot be applied when op dominance cannot be used to rule out conflicts. E.g., when the linalg.map is inside of a loop. In such a case, the reads/writes happen multiple times and it is not guaranteed that "all loads at a position happen before all stores to the same position."

Fixes #90019.

There is an optimization in One-Shot Bufferize wrt. ops that bufferize to elementwise access. In such cases, a copy can sometimes be avoided. E.g.: ``` %0 = tensor.empty() %1 = tensor.fill ... %2 = linalg.map ins(%1, ...) outs(%1) ``` In the above example, a buffer copy is not needed for %1, even though the same buffer is read/written by two different operand. That's because the op bufferizes to elementwise access. ```c++ // Two equivalent operands of the same op are not conflicting if the op // bufferizes to element-wise access. I.e., all loads at a position // happen before all stores to the same position. ``` This optimization cannot be applied when op dominance cannot be used to rule out conflicts. E.g., when the `linalg.map` is inside of a loop. In such a case, the reads/writes happen multiple times and it is not guaranteed that "all loads at a position happen before all stores to the same position." Fixes #90019.

llvmbot · 2024-06-30T11:57:27Z

@llvm/pr-subscribers-mlir-bufferization

@llvm/pr-subscribers-mlir-linalg

Author: Matthias Springer (matthias-springer)

Changes

There is an optimization in One-Shot Bufferize wrt. ops that bufferize to elementwise access. A copy can sometimes be avoided. E.g.:

%0 = tensor.empty()
%1 = tensor.fill ...
%2 = linalg.map ins(%1, ...) outs(%1)

In the above example, a buffer copy is not needed for %1, even though the same buffer is read/written by two different operands (of the same op). That's because the op bufferizes to elementwise access.

// Two equivalent operands of the same op are not conflicting if the op
// bufferizes to element-wise access. I.e., all loads at a position
// happen before all stores to the same position.

This optimization cannot be applied when op dominance cannot be used to rule out conflicts. E.g., when the linalg.map is inside of a loop. In such a case, the reads/writes happen multiple times and it is not guaranteed that "all loads at a position happen before all stores to the same position."

Fixes #90019.

Full diff: https://github.com/llvm/llvm-project/pull/97209.diff

2 Files Affected:

(modified) mlir/lib/Dialect/Bufferization/Transforms/OneShotAnalysis.cpp (+16-16)
(modified) mlir/test/Dialect/Linalg/one-shot-bufferize-analysis.mlir (+28)

diff --git a/mlir/lib/Dialect/Bufferization/Transforms/OneShotAnalysis.cpp b/mlir/lib/Dialect/Bufferization/Transforms/OneShotAnalysis.cpp
index d0b4e0dd4383e..975bfb4d41e0b 100644
--- a/mlir/lib/Dialect/Bufferization/Transforms/OneShotAnalysis.cpp
+++ b/mlir/lib/Dialect/Bufferization/Transforms/OneShotAnalysis.cpp
@@ -725,23 +725,23 @@ hasReadAfterWriteInterference(const DenseSet<OpOperand *> &usesRead,
                                      "mutually exclusive regions\n");
           continue;
         }
-      }
 
-      // Two equivalent operands of the same op are not conflicting if the op
-      // bufferizes to element-wise access. I.e., all loads at a position happen
-      // before all stores to the same position.
-      if (conflictingWritingOp == readingOp) {
-        if (auto bufferizableOp = options.dynCastBufferizableOp(readingOp)) {
-          if (bufferizableOp.bufferizesToElementwiseAccess(
-                  state, {uRead, uConflictingWrite})) {
-            if (hasEquivalentValueInReverseUseDefChain(
-                    state, uRead->get(), uConflictingWrite->get()) ||
-                hasEquivalentValueInReverseUseDefChain(
-                    state, uConflictingWrite->get(), uRead->get())) {
-              LLVM_DEBUG(
-                  llvm::dbgs()
-                  << "  no conflict: op bufferizes to element-wise access\n");
-              continue;
+        // Two equivalent operands of the same op are not conflicting if the op
+        // bufferizes to element-wise access. I.e., all loads at a position
+        // happen before all stores to the same position.
+        if (conflictingWritingOp == readingOp) {
+          if (auto bufferizableOp = options.dynCastBufferizableOp(readingOp)) {
+            if (bufferizableOp.bufferizesToElementwiseAccess(
+                    state, {uRead, uConflictingWrite})) {
+              if (hasEquivalentValueInReverseUseDefChain(
+                      state, uRead->get(), uConflictingWrite->get()) ||
+                  hasEquivalentValueInReverseUseDefChain(
+                      state, uConflictingWrite->get(), uRead->get())) {
+                LLVM_DEBUG(
+                    llvm::dbgs()
+                    << "  no conflict: op bufferizes to element-wise access\n");
+                continue;
+              }
             }
           }
         }
diff --git a/mlir/test/Dialect/Linalg/one-shot-bufferize-analysis.mlir b/mlir/test/Dialect/Linalg/one-shot-bufferize-analysis.mlir
index 2d79a80cddc2b..5b7c2baf9d84f 100644
--- a/mlir/test/Dialect/Linalg/one-shot-bufferize-analysis.mlir
+++ b/mlir/test/Dialect/Linalg/one-shot-bufferize-analysis.mlir
@@ -107,3 +107,31 @@ func.func @elementwise_no_conflict_4(%arg0: tensor<8x32x32x32xf32>, %arg1: tenso
   }
   return %r : tensor<8x32x32x32xf32>
 }
+
+// -----
+
+// CHECK-LABEL: func @elementwise_access_regression(
+//       CHECK:   linalg.fill {__inplace_operands_attr__ = ["none", "false"]}
+//       CHECK:   linalg.map
+//  CHECK-SAME:   {__inplace_operands_attr__ = ["true", "true", "true"]}
+//       CHECK:   linalg.map
+//  CHECK-SAME:   {__inplace_operands_attr__ = ["true", "true", "true"]}
+func.func private @f(%arg: tensor<32x1xf32>) -> ()
+func.func @elementwise_access_regression(%arg0: i32, %arg2: tensor<32x1xf32>, %arg3: tensor<32x1xf32>) {
+      %cst_0 = arith.constant 0.000000e+00 : f32
+      %c0_i32 = arith.constant 0 : i32
+      %c1_i32 = arith.constant 1 : i32
+      %0 = tensor.empty() : tensor<32x1xf32>
+
+      // This op must bufferize out-of-place so that the filled tensor is not
+      // overwritten by the ops inside of the loop.
+      %1 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<32x1xf32>) -> tensor<32x1xf32>
+
+      scf.for %arg1 = %c0_i32 to %arg0 step %c1_i32 : i32 {
+        %2 = linalg.map { arith.subf } ins(%1, %arg2 : tensor<32x1xf32>, tensor<32x1xf32>) outs(%0 : tensor<32x1xf32>)
+        %3 = tensor.empty() : tensor<32x1xf32>
+        %4 = linalg.map { arith.subf } ins(%2, %arg3 : tensor<32x1xf32>, tensor<32x1xf32>) outs(%3 : tensor<32x1xf32>)
+        func.call @f(%4) : (tensor<32x1xf32>) -> ()
+      }
+      return
+}

llvmbot · 2024-06-30T11:57:27Z

@llvm/pr-subscribers-mlir

Author: Matthias Springer (matthias-springer)

Changes

There is an optimization in One-Shot Bufferize wrt. ops that bufferize to elementwise access. A copy can sometimes be avoided. E.g.:

%0 = tensor.empty()
%1 = tensor.fill ...
%2 = linalg.map ins(%1, ...) outs(%1)

In the above example, a buffer copy is not needed for %1, even though the same buffer is read/written by two different operands (of the same op). That's because the op bufferizes to elementwise access.

// Two equivalent operands of the same op are not conflicting if the op
// bufferizes to element-wise access. I.e., all loads at a position
// happen before all stores to the same position.

This optimization cannot be applied when op dominance cannot be used to rule out conflicts. E.g., when the linalg.map is inside of a loop. In such a case, the reads/writes happen multiple times and it is not guaranteed that "all loads at a position happen before all stores to the same position."

Fixes #90019.

Full diff: https://github.com/llvm/llvm-project/pull/97209.diff

2 Files Affected:

(modified) mlir/lib/Dialect/Bufferization/Transforms/OneShotAnalysis.cpp (+16-16)
(modified) mlir/test/Dialect/Linalg/one-shot-bufferize-analysis.mlir (+28)

diff --git a/mlir/lib/Dialect/Bufferization/Transforms/OneShotAnalysis.cpp b/mlir/lib/Dialect/Bufferization/Transforms/OneShotAnalysis.cpp
index d0b4e0dd4383e..975bfb4d41e0b 100644
--- a/mlir/lib/Dialect/Bufferization/Transforms/OneShotAnalysis.cpp
+++ b/mlir/lib/Dialect/Bufferization/Transforms/OneShotAnalysis.cpp
@@ -725,23 +725,23 @@ hasReadAfterWriteInterference(const DenseSet<OpOperand *> &usesRead,
                                      "mutually exclusive regions\n");
           continue;
         }
-      }
 
-      // Two equivalent operands of the same op are not conflicting if the op
-      // bufferizes to element-wise access. I.e., all loads at a position happen
-      // before all stores to the same position.
-      if (conflictingWritingOp == readingOp) {
-        if (auto bufferizableOp = options.dynCastBufferizableOp(readingOp)) {
-          if (bufferizableOp.bufferizesToElementwiseAccess(
-                  state, {uRead, uConflictingWrite})) {
-            if (hasEquivalentValueInReverseUseDefChain(
-                    state, uRead->get(), uConflictingWrite->get()) ||
-                hasEquivalentValueInReverseUseDefChain(
-                    state, uConflictingWrite->get(), uRead->get())) {
-              LLVM_DEBUG(
-                  llvm::dbgs()
-                  << "  no conflict: op bufferizes to element-wise access\n");
-              continue;
+        // Two equivalent operands of the same op are not conflicting if the op
+        // bufferizes to element-wise access. I.e., all loads at a position
+        // happen before all stores to the same position.
+        if (conflictingWritingOp == readingOp) {
+          if (auto bufferizableOp = options.dynCastBufferizableOp(readingOp)) {
+            if (bufferizableOp.bufferizesToElementwiseAccess(
+                    state, {uRead, uConflictingWrite})) {
+              if (hasEquivalentValueInReverseUseDefChain(
+                      state, uRead->get(), uConflictingWrite->get()) ||
+                  hasEquivalentValueInReverseUseDefChain(
+                      state, uConflictingWrite->get(), uRead->get())) {
+                LLVM_DEBUG(
+                    llvm::dbgs()
+                    << "  no conflict: op bufferizes to element-wise access\n");
+                continue;
+              }
             }
           }
         }
diff --git a/mlir/test/Dialect/Linalg/one-shot-bufferize-analysis.mlir b/mlir/test/Dialect/Linalg/one-shot-bufferize-analysis.mlir
index 2d79a80cddc2b..5b7c2baf9d84f 100644
--- a/mlir/test/Dialect/Linalg/one-shot-bufferize-analysis.mlir
+++ b/mlir/test/Dialect/Linalg/one-shot-bufferize-analysis.mlir
@@ -107,3 +107,31 @@ func.func @elementwise_no_conflict_4(%arg0: tensor<8x32x32x32xf32>, %arg1: tenso
   }
   return %r : tensor<8x32x32x32xf32>
 }
+
+// -----
+
+// CHECK-LABEL: func @elementwise_access_regression(
+//       CHECK:   linalg.fill {__inplace_operands_attr__ = ["none", "false"]}
+//       CHECK:   linalg.map
+//  CHECK-SAME:   {__inplace_operands_attr__ = ["true", "true", "true"]}
+//       CHECK:   linalg.map
+//  CHECK-SAME:   {__inplace_operands_attr__ = ["true", "true", "true"]}
+func.func private @f(%arg: tensor<32x1xf32>) -> ()
+func.func @elementwise_access_regression(%arg0: i32, %arg2: tensor<32x1xf32>, %arg3: tensor<32x1xf32>) {
+      %cst_0 = arith.constant 0.000000e+00 : f32
+      %c0_i32 = arith.constant 0 : i32
+      %c1_i32 = arith.constant 1 : i32
+      %0 = tensor.empty() : tensor<32x1xf32>
+
+      // This op must bufferize out-of-place so that the filled tensor is not
+      // overwritten by the ops inside of the loop.
+      %1 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<32x1xf32>) -> tensor<32x1xf32>
+
+      scf.for %arg1 = %c0_i32 to %arg0 step %c1_i32 : i32 {
+        %2 = linalg.map { arith.subf } ins(%1, %arg2 : tensor<32x1xf32>, tensor<32x1xf32>) outs(%0 : tensor<32x1xf32>)
+        %3 = tensor.empty() : tensor<32x1xf32>
+        %4 = linalg.map { arith.subf } ins(%2, %arg3 : tensor<32x1xf32>, tensor<32x1xf32>) outs(%3 : tensor<32x1xf32>)
+        func.call @f(%4) : (tensor<32x1xf32>) -> ()
+      }
+      return
+}

cxy-1993

Thanks for the patch. LGTM.

…vm#97209) There is an optimization in One-Shot Bufferize wrt. ops that bufferize to elementwise access. A copy can sometimes be avoided. E.g.: ``` %0 = tensor.empty() %1 = tensor.fill ... %2 = linalg.map ins(%1, ...) outs(%1) ``` In the above example, a buffer copy is not needed for %1, even though the same buffer is read/written by two different operands (of the same op). That's because the op bufferizes to elementwise access. ```c++ // Two equivalent operands of the same op are not conflicting if the op // bufferizes to element-wise access. I.e., all loads at a position // happen before all stores to the same position. ``` This optimization cannot be applied when op dominance cannot be used to rule out conflicts. E.g., when the `linalg.map` is inside of a loop. In such a case, the reads/writes happen multiple times and it is not guaranteed that "all loads at a position happen before all stores to the same position." Fixes llvm#90019.

matthias-springer requested a review from cxy-1993 June 30, 2024 11:57

llvmbot added mlir:linalg mlir mlir:bufferization Bufferization infrastructure labels Jun 30, 2024

matthias-springer requested a review from nicolasvasilache June 30, 2024 11:59

cxy-1993 approved these changes Jul 1, 2024

View reviewed changes

matthias-springer merged commit cf9b77a into main Jul 1, 2024

matthias-springer deleted the users/matthias-springer/fix_elementwise_bufferization branch July 1, 2024 17:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[mlir][bufferization] Fix bug in bufferization of elementwise ops #97209

[mlir][bufferization] Fix bug in bufferization of elementwise ops #97209

Uh oh!

matthias-springer commented Jun 30, 2024

Uh oh!

llvmbot commented Jun 30, 2024 •

edited

Loading

Uh oh!

llvmbot commented Jun 30, 2024

Uh oh!

cxy-1993 left a comment

Uh oh!

Uh oh!

[mlir][bufferization] Fix bug in bufferization of elementwise ops #97209

[mlir][bufferization] Fix bug in bufferization of elementwise ops #97209

Uh oh!

Conversation

matthias-springer commented Jun 30, 2024

Uh oh!

llvmbot commented Jun 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jun 30, 2024

Uh oh!

cxy-1993 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvmbot commented Jun 30, 2024 •

edited

Loading