Skip to content

[mlir][bufferization] Fix bug in bufferization of elementwise ops #97209

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

matthias-springer
Copy link
Member

There is an optimization in One-Shot Bufferize wrt. ops that bufferize to elementwise access. A copy can sometimes be avoided. E.g.:

%0 = tensor.empty()
%1 = tensor.fill ...
%2 = linalg.map ins(%1, ...) outs(%1)

In the above example, a buffer copy is not needed for %1, even though the same buffer is read/written by two different operands (of the same op). That's because the op bufferizes to elementwise access.

// Two equivalent operands of the same op are not conflicting if the op
// bufferizes to element-wise access. I.e., all loads at a position
// happen before all stores to the same position.

This optimization cannot be applied when op dominance cannot be used to rule out conflicts. E.g., when the linalg.map is inside of a loop. In such a case, the reads/writes happen multiple times and it is not guaranteed that "all loads at a position happen before all stores to the same position."

Fixes #90019.

There is an optimization in One-Shot Bufferize wrt. ops that bufferize to elementwise access. In such cases, a copy can sometimes be avoided. E.g.:
```
%0 = tensor.empty()
%1 = tensor.fill ...
%2 = linalg.map ins(%1, ...) outs(%1)
```

In the above example, a buffer copy is not needed for %1, even though the same buffer is read/written by two different operand. That's because the op bufferizes to elementwise access.

```c++
// Two equivalent operands of the same op are not conflicting if the op
// bufferizes to element-wise access. I.e., all loads at a position
// happen before all stores to the same position.
```

This optimization cannot be applied when op dominance cannot be used to rule out conflicts. E.g., when the `linalg.map` is inside of a loop. In such a case, the reads/writes happen multiple times and it is not guaranteed that "all loads at a position happen before all stores to the same position."

Fixes #90019.
@llvmbot llvmbot added mlir:linalg mlir mlir:bufferization Bufferization infrastructure labels Jun 30, 2024
@llvmbot
Copy link
Member

llvmbot commented Jun 30, 2024

@llvm/pr-subscribers-mlir-bufferization

@llvm/pr-subscribers-mlir-linalg

Author: Matthias Springer (matthias-springer)

Changes

There is an optimization in One-Shot Bufferize wrt. ops that bufferize to elementwise access. A copy can sometimes be avoided. E.g.:

%0 = tensor.empty()
%1 = tensor.fill ...
%2 = linalg.map ins(%1, ...) outs(%1)

In the above example, a buffer copy is not needed for %1, even though the same buffer is read/written by two different operands (of the same op). That's because the op bufferizes to elementwise access.

// Two equivalent operands of the same op are not conflicting if the op
// bufferizes to element-wise access. I.e., all loads at a position
// happen before all stores to the same position.

This optimization cannot be applied when op dominance cannot be used to rule out conflicts. E.g., when the linalg.map is inside of a loop. In such a case, the reads/writes happen multiple times and it is not guaranteed that "all loads at a position happen before all stores to the same position."

Fixes #90019.


Full diff: https://github.com/llvm/llvm-project/pull/97209.diff

2 Files Affected:

  • (modified) mlir/lib/Dialect/Bufferization/Transforms/OneShotAnalysis.cpp (+16-16)
  • (modified) mlir/test/Dialect/Linalg/one-shot-bufferize-analysis.mlir (+28)
diff --git a/mlir/lib/Dialect/Bufferization/Transforms/OneShotAnalysis.cpp b/mlir/lib/Dialect/Bufferization/Transforms/OneShotAnalysis.cpp
index d0b4e0dd4383e..975bfb4d41e0b 100644
--- a/mlir/lib/Dialect/Bufferization/Transforms/OneShotAnalysis.cpp
+++ b/mlir/lib/Dialect/Bufferization/Transforms/OneShotAnalysis.cpp
@@ -725,23 +725,23 @@ hasReadAfterWriteInterference(const DenseSet<OpOperand *> &usesRead,
                                      "mutually exclusive regions\n");
           continue;
         }
-      }
 
-      // Two equivalent operands of the same op are not conflicting if the op
-      // bufferizes to element-wise access. I.e., all loads at a position happen
-      // before all stores to the same position.
-      if (conflictingWritingOp == readingOp) {
-        if (auto bufferizableOp = options.dynCastBufferizableOp(readingOp)) {
-          if (bufferizableOp.bufferizesToElementwiseAccess(
-                  state, {uRead, uConflictingWrite})) {
-            if (hasEquivalentValueInReverseUseDefChain(
-                    state, uRead->get(), uConflictingWrite->get()) ||
-                hasEquivalentValueInReverseUseDefChain(
-                    state, uConflictingWrite->get(), uRead->get())) {
-              LLVM_DEBUG(
-                  llvm::dbgs()
-                  << "  no conflict: op bufferizes to element-wise access\n");
-              continue;
+        // Two equivalent operands of the same op are not conflicting if the op
+        // bufferizes to element-wise access. I.e., all loads at a position
+        // happen before all stores to the same position.
+        if (conflictingWritingOp == readingOp) {
+          if (auto bufferizableOp = options.dynCastBufferizableOp(readingOp)) {
+            if (bufferizableOp.bufferizesToElementwiseAccess(
+                    state, {uRead, uConflictingWrite})) {
+              if (hasEquivalentValueInReverseUseDefChain(
+                      state, uRead->get(), uConflictingWrite->get()) ||
+                  hasEquivalentValueInReverseUseDefChain(
+                      state, uConflictingWrite->get(), uRead->get())) {
+                LLVM_DEBUG(
+                    llvm::dbgs()
+                    << "  no conflict: op bufferizes to element-wise access\n");
+                continue;
+              }
             }
           }
         }
diff --git a/mlir/test/Dialect/Linalg/one-shot-bufferize-analysis.mlir b/mlir/test/Dialect/Linalg/one-shot-bufferize-analysis.mlir
index 2d79a80cddc2b..5b7c2baf9d84f 100644
--- a/mlir/test/Dialect/Linalg/one-shot-bufferize-analysis.mlir
+++ b/mlir/test/Dialect/Linalg/one-shot-bufferize-analysis.mlir
@@ -107,3 +107,31 @@ func.func @elementwise_no_conflict_4(%arg0: tensor<8x32x32x32xf32>, %arg1: tenso
   }
   return %r : tensor<8x32x32x32xf32>
 }
+
+// -----
+
+// CHECK-LABEL: func @elementwise_access_regression(
+//       CHECK:   linalg.fill {__inplace_operands_attr__ = ["none", "false"]}
+//       CHECK:   linalg.map
+//  CHECK-SAME:   {__inplace_operands_attr__ = ["true", "true", "true"]}
+//       CHECK:   linalg.map
+//  CHECK-SAME:   {__inplace_operands_attr__ = ["true", "true", "true"]}
+func.func private @f(%arg: tensor<32x1xf32>) -> ()
+func.func @elementwise_access_regression(%arg0: i32, %arg2: tensor<32x1xf32>, %arg3: tensor<32x1xf32>) {
+      %cst_0 = arith.constant 0.000000e+00 : f32
+      %c0_i32 = arith.constant 0 : i32
+      %c1_i32 = arith.constant 1 : i32
+      %0 = tensor.empty() : tensor<32x1xf32>
+
+      // This op must bufferize out-of-place so that the filled tensor is not
+      // overwritten by the ops inside of the loop.
+      %1 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<32x1xf32>) -> tensor<32x1xf32>
+
+      scf.for %arg1 = %c0_i32 to %arg0 step %c1_i32 : i32 {
+        %2 = linalg.map { arith.subf } ins(%1, %arg2 : tensor<32x1xf32>, tensor<32x1xf32>) outs(%0 : tensor<32x1xf32>)
+        %3 = tensor.empty() : tensor<32x1xf32>
+        %4 = linalg.map { arith.subf } ins(%2, %arg3 : tensor<32x1xf32>, tensor<32x1xf32>) outs(%3 : tensor<32x1xf32>)
+        func.call @f(%4) : (tensor<32x1xf32>) -> ()
+      }
+      return
+}

@llvmbot
Copy link
Member

llvmbot commented Jun 30, 2024

@llvm/pr-subscribers-mlir

Author: Matthias Springer (matthias-springer)

Changes

There is an optimization in One-Shot Bufferize wrt. ops that bufferize to elementwise access. A copy can sometimes be avoided. E.g.:

%0 = tensor.empty()
%1 = tensor.fill ...
%2 = linalg.map ins(%1, ...) outs(%1)

In the above example, a buffer copy is not needed for %1, even though the same buffer is read/written by two different operands (of the same op). That's because the op bufferizes to elementwise access.

// Two equivalent operands of the same op are not conflicting if the op
// bufferizes to element-wise access. I.e., all loads at a position
// happen before all stores to the same position.

This optimization cannot be applied when op dominance cannot be used to rule out conflicts. E.g., when the linalg.map is inside of a loop. In such a case, the reads/writes happen multiple times and it is not guaranteed that "all loads at a position happen before all stores to the same position."

Fixes #90019.


Full diff: https://github.com/llvm/llvm-project/pull/97209.diff

2 Files Affected:

  • (modified) mlir/lib/Dialect/Bufferization/Transforms/OneShotAnalysis.cpp (+16-16)
  • (modified) mlir/test/Dialect/Linalg/one-shot-bufferize-analysis.mlir (+28)
diff --git a/mlir/lib/Dialect/Bufferization/Transforms/OneShotAnalysis.cpp b/mlir/lib/Dialect/Bufferization/Transforms/OneShotAnalysis.cpp
index d0b4e0dd4383e..975bfb4d41e0b 100644
--- a/mlir/lib/Dialect/Bufferization/Transforms/OneShotAnalysis.cpp
+++ b/mlir/lib/Dialect/Bufferization/Transforms/OneShotAnalysis.cpp
@@ -725,23 +725,23 @@ hasReadAfterWriteInterference(const DenseSet<OpOperand *> &usesRead,
                                      "mutually exclusive regions\n");
           continue;
         }
-      }
 
-      // Two equivalent operands of the same op are not conflicting if the op
-      // bufferizes to element-wise access. I.e., all loads at a position happen
-      // before all stores to the same position.
-      if (conflictingWritingOp == readingOp) {
-        if (auto bufferizableOp = options.dynCastBufferizableOp(readingOp)) {
-          if (bufferizableOp.bufferizesToElementwiseAccess(
-                  state, {uRead, uConflictingWrite})) {
-            if (hasEquivalentValueInReverseUseDefChain(
-                    state, uRead->get(), uConflictingWrite->get()) ||
-                hasEquivalentValueInReverseUseDefChain(
-                    state, uConflictingWrite->get(), uRead->get())) {
-              LLVM_DEBUG(
-                  llvm::dbgs()
-                  << "  no conflict: op bufferizes to element-wise access\n");
-              continue;
+        // Two equivalent operands of the same op are not conflicting if the op
+        // bufferizes to element-wise access. I.e., all loads at a position
+        // happen before all stores to the same position.
+        if (conflictingWritingOp == readingOp) {
+          if (auto bufferizableOp = options.dynCastBufferizableOp(readingOp)) {
+            if (bufferizableOp.bufferizesToElementwiseAccess(
+                    state, {uRead, uConflictingWrite})) {
+              if (hasEquivalentValueInReverseUseDefChain(
+                      state, uRead->get(), uConflictingWrite->get()) ||
+                  hasEquivalentValueInReverseUseDefChain(
+                      state, uConflictingWrite->get(), uRead->get())) {
+                LLVM_DEBUG(
+                    llvm::dbgs()
+                    << "  no conflict: op bufferizes to element-wise access\n");
+                continue;
+              }
             }
           }
         }
diff --git a/mlir/test/Dialect/Linalg/one-shot-bufferize-analysis.mlir b/mlir/test/Dialect/Linalg/one-shot-bufferize-analysis.mlir
index 2d79a80cddc2b..5b7c2baf9d84f 100644
--- a/mlir/test/Dialect/Linalg/one-shot-bufferize-analysis.mlir
+++ b/mlir/test/Dialect/Linalg/one-shot-bufferize-analysis.mlir
@@ -107,3 +107,31 @@ func.func @elementwise_no_conflict_4(%arg0: tensor<8x32x32x32xf32>, %arg1: tenso
   }
   return %r : tensor<8x32x32x32xf32>
 }
+
+// -----
+
+// CHECK-LABEL: func @elementwise_access_regression(
+//       CHECK:   linalg.fill {__inplace_operands_attr__ = ["none", "false"]}
+//       CHECK:   linalg.map
+//  CHECK-SAME:   {__inplace_operands_attr__ = ["true", "true", "true"]}
+//       CHECK:   linalg.map
+//  CHECK-SAME:   {__inplace_operands_attr__ = ["true", "true", "true"]}
+func.func private @f(%arg: tensor<32x1xf32>) -> ()
+func.func @elementwise_access_regression(%arg0: i32, %arg2: tensor<32x1xf32>, %arg3: tensor<32x1xf32>) {
+      %cst_0 = arith.constant 0.000000e+00 : f32
+      %c0_i32 = arith.constant 0 : i32
+      %c1_i32 = arith.constant 1 : i32
+      %0 = tensor.empty() : tensor<32x1xf32>
+
+      // This op must bufferize out-of-place so that the filled tensor is not
+      // overwritten by the ops inside of the loop.
+      %1 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<32x1xf32>) -> tensor<32x1xf32>
+
+      scf.for %arg1 = %c0_i32 to %arg0 step %c1_i32 : i32 {
+        %2 = linalg.map { arith.subf } ins(%1, %arg2 : tensor<32x1xf32>, tensor<32x1xf32>) outs(%0 : tensor<32x1xf32>)
+        %3 = tensor.empty() : tensor<32x1xf32>
+        %4 = linalg.map { arith.subf } ins(%2, %arg3 : tensor<32x1xf32>, tensor<32x1xf32>) outs(%3 : tensor<32x1xf32>)
+        func.call @f(%4) : (tensor<32x1xf32>) -> ()
+      }
+      return
+}

Copy link
Contributor

@cxy-1993 cxy-1993 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch. LGTM.

@matthias-springer matthias-springer merged commit cf9b77a into main Jul 1, 2024
@matthias-springer matthias-springer deleted the users/matthias-springer/fix_elementwise_bufferization branch July 1, 2024 17:00
cpiaseque pushed a commit to cpiaseque/llvm-project that referenced this pull request Jul 3, 2024
…vm#97209)

There is an optimization in One-Shot Bufferize wrt. ops that bufferize
to elementwise access. A copy can sometimes be avoided. E.g.:
```
%0 = tensor.empty()
%1 = tensor.fill ...
%2 = linalg.map ins(%1, ...) outs(%1)
```

In the above example, a buffer copy is not needed for %1, even though
the same buffer is read/written by two different operands (of the same
op). That's because the op bufferizes to elementwise access.

```c++
// Two equivalent operands of the same op are not conflicting if the op
// bufferizes to element-wise access. I.e., all loads at a position
// happen before all stores to the same position.
```

This optimization cannot be applied when op dominance cannot be used to
rule out conflicts. E.g., when the `linalg.map` is inside of a loop. In
such a case, the reads/writes happen multiple times and it is not
guaranteed that "all loads at a position happen before all stores to the
same position."

Fixes llvm#90019.
kbluck pushed a commit to kbluck/llvm-project that referenced this pull request Jul 6, 2024
…vm#97209)

There is an optimization in One-Shot Bufferize wrt. ops that bufferize
to elementwise access. A copy can sometimes be avoided. E.g.:
```
%0 = tensor.empty()
%1 = tensor.fill ...
%2 = linalg.map ins(%1, ...) outs(%1)
```

In the above example, a buffer copy is not needed for %1, even though
the same buffer is read/written by two different operands (of the same
op). That's because the op bufferizes to elementwise access.

```c++
// Two equivalent operands of the same op are not conflicting if the op
// bufferizes to element-wise access. I.e., all loads at a position
// happen before all stores to the same position.
```

This optimization cannot be applied when op dominance cannot be used to
rule out conflicts. E.g., when the `linalg.map` is inside of a loop. In
such a case, the reads/writes happen multiple times and it is not
guaranteed that "all loads at a position happen before all stores to the
same position."

Fixes llvm#90019.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mlir:bufferization Bufferization infrastructure mlir:linalg mlir
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[mlir][bufferization] linalg.map in loops bufferize error
3 participants