-
Notifications
You must be signed in to change notification settings - Fork 14.8k
[mlir][bufferization] Fix bug in bufferization of elementwise ops #97209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[mlir][bufferization] Fix bug in bufferization of elementwise ops #97209
Conversation
There is an optimization in One-Shot Bufferize wrt. ops that bufferize to elementwise access. In such cases, a copy can sometimes be avoided. E.g.: ``` %0 = tensor.empty() %1 = tensor.fill ... %2 = linalg.map ins(%1, ...) outs(%1) ``` In the above example, a buffer copy is not needed for %1, even though the same buffer is read/written by two different operand. That's because the op bufferizes to elementwise access. ```c++ // Two equivalent operands of the same op are not conflicting if the op // bufferizes to element-wise access. I.e., all loads at a position // happen before all stores to the same position. ``` This optimization cannot be applied when op dominance cannot be used to rule out conflicts. E.g., when the `linalg.map` is inside of a loop. In such a case, the reads/writes happen multiple times and it is not guaranteed that "all loads at a position happen before all stores to the same position." Fixes #90019.
@llvm/pr-subscribers-mlir-bufferization @llvm/pr-subscribers-mlir-linalg Author: Matthias Springer (matthias-springer) ChangesThere is an optimization in One-Shot Bufferize wrt. ops that bufferize to elementwise access. A copy can sometimes be avoided. E.g.:
In the above example, a buffer copy is not needed for %1, even though the same buffer is read/written by two different operands (of the same op). That's because the op bufferizes to elementwise access. // Two equivalent operands of the same op are not conflicting if the op
// bufferizes to element-wise access. I.e., all loads at a position
// happen before all stores to the same position. This optimization cannot be applied when op dominance cannot be used to rule out conflicts. E.g., when the Fixes #90019. Full diff: https://github.com/llvm/llvm-project/pull/97209.diff 2 Files Affected:
diff --git a/mlir/lib/Dialect/Bufferization/Transforms/OneShotAnalysis.cpp b/mlir/lib/Dialect/Bufferization/Transforms/OneShotAnalysis.cpp
index d0b4e0dd4383e..975bfb4d41e0b 100644
--- a/mlir/lib/Dialect/Bufferization/Transforms/OneShotAnalysis.cpp
+++ b/mlir/lib/Dialect/Bufferization/Transforms/OneShotAnalysis.cpp
@@ -725,23 +725,23 @@ hasReadAfterWriteInterference(const DenseSet<OpOperand *> &usesRead,
"mutually exclusive regions\n");
continue;
}
- }
- // Two equivalent operands of the same op are not conflicting if the op
- // bufferizes to element-wise access. I.e., all loads at a position happen
- // before all stores to the same position.
- if (conflictingWritingOp == readingOp) {
- if (auto bufferizableOp = options.dynCastBufferizableOp(readingOp)) {
- if (bufferizableOp.bufferizesToElementwiseAccess(
- state, {uRead, uConflictingWrite})) {
- if (hasEquivalentValueInReverseUseDefChain(
- state, uRead->get(), uConflictingWrite->get()) ||
- hasEquivalentValueInReverseUseDefChain(
- state, uConflictingWrite->get(), uRead->get())) {
- LLVM_DEBUG(
- llvm::dbgs()
- << " no conflict: op bufferizes to element-wise access\n");
- continue;
+ // Two equivalent operands of the same op are not conflicting if the op
+ // bufferizes to element-wise access. I.e., all loads at a position
+ // happen before all stores to the same position.
+ if (conflictingWritingOp == readingOp) {
+ if (auto bufferizableOp = options.dynCastBufferizableOp(readingOp)) {
+ if (bufferizableOp.bufferizesToElementwiseAccess(
+ state, {uRead, uConflictingWrite})) {
+ if (hasEquivalentValueInReverseUseDefChain(
+ state, uRead->get(), uConflictingWrite->get()) ||
+ hasEquivalentValueInReverseUseDefChain(
+ state, uConflictingWrite->get(), uRead->get())) {
+ LLVM_DEBUG(
+ llvm::dbgs()
+ << " no conflict: op bufferizes to element-wise access\n");
+ continue;
+ }
}
}
}
diff --git a/mlir/test/Dialect/Linalg/one-shot-bufferize-analysis.mlir b/mlir/test/Dialect/Linalg/one-shot-bufferize-analysis.mlir
index 2d79a80cddc2b..5b7c2baf9d84f 100644
--- a/mlir/test/Dialect/Linalg/one-shot-bufferize-analysis.mlir
+++ b/mlir/test/Dialect/Linalg/one-shot-bufferize-analysis.mlir
@@ -107,3 +107,31 @@ func.func @elementwise_no_conflict_4(%arg0: tensor<8x32x32x32xf32>, %arg1: tenso
}
return %r : tensor<8x32x32x32xf32>
}
+
+// -----
+
+// CHECK-LABEL: func @elementwise_access_regression(
+// CHECK: linalg.fill {__inplace_operands_attr__ = ["none", "false"]}
+// CHECK: linalg.map
+// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "true"]}
+// CHECK: linalg.map
+// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "true"]}
+func.func private @f(%arg: tensor<32x1xf32>) -> ()
+func.func @elementwise_access_regression(%arg0: i32, %arg2: tensor<32x1xf32>, %arg3: tensor<32x1xf32>) {
+ %cst_0 = arith.constant 0.000000e+00 : f32
+ %c0_i32 = arith.constant 0 : i32
+ %c1_i32 = arith.constant 1 : i32
+ %0 = tensor.empty() : tensor<32x1xf32>
+
+ // This op must bufferize out-of-place so that the filled tensor is not
+ // overwritten by the ops inside of the loop.
+ %1 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<32x1xf32>) -> tensor<32x1xf32>
+
+ scf.for %arg1 = %c0_i32 to %arg0 step %c1_i32 : i32 {
+ %2 = linalg.map { arith.subf } ins(%1, %arg2 : tensor<32x1xf32>, tensor<32x1xf32>) outs(%0 : tensor<32x1xf32>)
+ %3 = tensor.empty() : tensor<32x1xf32>
+ %4 = linalg.map { arith.subf } ins(%2, %arg3 : tensor<32x1xf32>, tensor<32x1xf32>) outs(%3 : tensor<32x1xf32>)
+ func.call @f(%4) : (tensor<32x1xf32>) -> ()
+ }
+ return
+}
|
@llvm/pr-subscribers-mlir Author: Matthias Springer (matthias-springer) ChangesThere is an optimization in One-Shot Bufferize wrt. ops that bufferize to elementwise access. A copy can sometimes be avoided. E.g.:
In the above example, a buffer copy is not needed for %1, even though the same buffer is read/written by two different operands (of the same op). That's because the op bufferizes to elementwise access. // Two equivalent operands of the same op are not conflicting if the op
// bufferizes to element-wise access. I.e., all loads at a position
// happen before all stores to the same position. This optimization cannot be applied when op dominance cannot be used to rule out conflicts. E.g., when the Fixes #90019. Full diff: https://github.com/llvm/llvm-project/pull/97209.diff 2 Files Affected:
diff --git a/mlir/lib/Dialect/Bufferization/Transforms/OneShotAnalysis.cpp b/mlir/lib/Dialect/Bufferization/Transforms/OneShotAnalysis.cpp
index d0b4e0dd4383e..975bfb4d41e0b 100644
--- a/mlir/lib/Dialect/Bufferization/Transforms/OneShotAnalysis.cpp
+++ b/mlir/lib/Dialect/Bufferization/Transforms/OneShotAnalysis.cpp
@@ -725,23 +725,23 @@ hasReadAfterWriteInterference(const DenseSet<OpOperand *> &usesRead,
"mutually exclusive regions\n");
continue;
}
- }
- // Two equivalent operands of the same op are not conflicting if the op
- // bufferizes to element-wise access. I.e., all loads at a position happen
- // before all stores to the same position.
- if (conflictingWritingOp == readingOp) {
- if (auto bufferizableOp = options.dynCastBufferizableOp(readingOp)) {
- if (bufferizableOp.bufferizesToElementwiseAccess(
- state, {uRead, uConflictingWrite})) {
- if (hasEquivalentValueInReverseUseDefChain(
- state, uRead->get(), uConflictingWrite->get()) ||
- hasEquivalentValueInReverseUseDefChain(
- state, uConflictingWrite->get(), uRead->get())) {
- LLVM_DEBUG(
- llvm::dbgs()
- << " no conflict: op bufferizes to element-wise access\n");
- continue;
+ // Two equivalent operands of the same op are not conflicting if the op
+ // bufferizes to element-wise access. I.e., all loads at a position
+ // happen before all stores to the same position.
+ if (conflictingWritingOp == readingOp) {
+ if (auto bufferizableOp = options.dynCastBufferizableOp(readingOp)) {
+ if (bufferizableOp.bufferizesToElementwiseAccess(
+ state, {uRead, uConflictingWrite})) {
+ if (hasEquivalentValueInReverseUseDefChain(
+ state, uRead->get(), uConflictingWrite->get()) ||
+ hasEquivalentValueInReverseUseDefChain(
+ state, uConflictingWrite->get(), uRead->get())) {
+ LLVM_DEBUG(
+ llvm::dbgs()
+ << " no conflict: op bufferizes to element-wise access\n");
+ continue;
+ }
}
}
}
diff --git a/mlir/test/Dialect/Linalg/one-shot-bufferize-analysis.mlir b/mlir/test/Dialect/Linalg/one-shot-bufferize-analysis.mlir
index 2d79a80cddc2b..5b7c2baf9d84f 100644
--- a/mlir/test/Dialect/Linalg/one-shot-bufferize-analysis.mlir
+++ b/mlir/test/Dialect/Linalg/one-shot-bufferize-analysis.mlir
@@ -107,3 +107,31 @@ func.func @elementwise_no_conflict_4(%arg0: tensor<8x32x32x32xf32>, %arg1: tenso
}
return %r : tensor<8x32x32x32xf32>
}
+
+// -----
+
+// CHECK-LABEL: func @elementwise_access_regression(
+// CHECK: linalg.fill {__inplace_operands_attr__ = ["none", "false"]}
+// CHECK: linalg.map
+// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "true"]}
+// CHECK: linalg.map
+// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "true"]}
+func.func private @f(%arg: tensor<32x1xf32>) -> ()
+func.func @elementwise_access_regression(%arg0: i32, %arg2: tensor<32x1xf32>, %arg3: tensor<32x1xf32>) {
+ %cst_0 = arith.constant 0.000000e+00 : f32
+ %c0_i32 = arith.constant 0 : i32
+ %c1_i32 = arith.constant 1 : i32
+ %0 = tensor.empty() : tensor<32x1xf32>
+
+ // This op must bufferize out-of-place so that the filled tensor is not
+ // overwritten by the ops inside of the loop.
+ %1 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<32x1xf32>) -> tensor<32x1xf32>
+
+ scf.for %arg1 = %c0_i32 to %arg0 step %c1_i32 : i32 {
+ %2 = linalg.map { arith.subf } ins(%1, %arg2 : tensor<32x1xf32>, tensor<32x1xf32>) outs(%0 : tensor<32x1xf32>)
+ %3 = tensor.empty() : tensor<32x1xf32>
+ %4 = linalg.map { arith.subf } ins(%2, %arg3 : tensor<32x1xf32>, tensor<32x1xf32>) outs(%3 : tensor<32x1xf32>)
+ func.call @f(%4) : (tensor<32x1xf32>) -> ()
+ }
+ return
+}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patch. LGTM.
…vm#97209) There is an optimization in One-Shot Bufferize wrt. ops that bufferize to elementwise access. A copy can sometimes be avoided. E.g.: ``` %0 = tensor.empty() %1 = tensor.fill ... %2 = linalg.map ins(%1, ...) outs(%1) ``` In the above example, a buffer copy is not needed for %1, even though the same buffer is read/written by two different operands (of the same op). That's because the op bufferizes to elementwise access. ```c++ // Two equivalent operands of the same op are not conflicting if the op // bufferizes to element-wise access. I.e., all loads at a position // happen before all stores to the same position. ``` This optimization cannot be applied when op dominance cannot be used to rule out conflicts. E.g., when the `linalg.map` is inside of a loop. In such a case, the reads/writes happen multiple times and it is not guaranteed that "all loads at a position happen before all stores to the same position." Fixes llvm#90019.
…vm#97209) There is an optimization in One-Shot Bufferize wrt. ops that bufferize to elementwise access. A copy can sometimes be avoided. E.g.: ``` %0 = tensor.empty() %1 = tensor.fill ... %2 = linalg.map ins(%1, ...) outs(%1) ``` In the above example, a buffer copy is not needed for %1, even though the same buffer is read/written by two different operands (of the same op). That's because the op bufferizes to elementwise access. ```c++ // Two equivalent operands of the same op are not conflicting if the op // bufferizes to element-wise access. I.e., all loads at a position // happen before all stores to the same position. ``` This optimization cannot be applied when op dominance cannot be used to rule out conflicts. E.g., when the `linalg.map` is inside of a loop. In such a case, the reads/writes happen multiple times and it is not guaranteed that "all loads at a position happen before all stores to the same position." Fixes llvm#90019.
There is an optimization in One-Shot Bufferize wrt. ops that bufferize to elementwise access. A copy can sometimes be avoided. E.g.:
In the above example, a buffer copy is not needed for %1, even though the same buffer is read/written by two different operands (of the same op). That's because the op bufferizes to elementwise access.
This optimization cannot be applied when op dominance cannot be used to rule out conflicts. E.g., when the
linalg.map
is inside of a loop. In such a case, the reads/writes happen multiple times and it is not guaranteed that "all loads at a position happen before all stores to the same position."Fixes #90019.