-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[mlir][linalg] set inbounds on xfer_read/writes
for assumeDynamicDimsMatchVecSizes
#160839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[mlir][linalg] set inbounds on xfer_read/writes
for assumeDynamicDimsMatchVecSizes
#160839
Conversation
…umeDynamicDimsMatchVecSizes
@llvm/pr-subscribers-mlir-linalg Author: Ege Beysel (egebeysel) ChangesThe idea from #146531 was to introduce the flag In the existence of scalable tile sizes, subsequent patterns tend to overwrite the inbounds attribute and introduce masks further down when lowered to loads and stores. This PR explicitly sets the inbounds attribute to all-true for Full diff: https://github.com/llvm/llvm-project/pull/160839.diff 2 Files Affected:
diff --git a/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp b/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
index 15c467b21c81e..4db6057519da5 100644
--- a/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
+++ b/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
@@ -524,6 +524,23 @@ VectorizationState::maskOperation(RewriterBase &rewriter, Operation *opToMask,
if (!mask) {
LDBG() << "No mask required";
+ if (assumeDynamicDimsMatchVecSizes) {
+ LDBG() << "Assuming dynamic dimensions match vector sizes!";
+ // Set inbounds to all-true.
+ llvm::TypeSwitch<Operation *>(opToMask)
+ .Case<vector::TransferReadOp, vector::TransferWriteOp>(
+ [&](auto xferOp) {
+ SmallVector<bool> inBoundsMap(xferOp.getInBounds().size(),
+ true);
+ rewriter.modifyOpInPlace(xferOp, [&]() {
+ xferOp.setInBoundsAttr(
+ rewriter.getBoolArrayAttr(inBoundsMap));
+ });
+ })
+ .Default([](Operation *op) {
+ // No-op if the operation is not an xfer read or write.
+ });
+ }
return opToMask;
}
diff --git a/mlir/test/Dialect/Linalg/vectorization/linalg-ops.mlir b/mlir/test/Dialect/Linalg/vectorization/linalg-ops.mlir
index 62bf1f55c9af2..7fe707902072a 100644
--- a/mlir/test/Dialect/Linalg/vectorization/linalg-ops.mlir
+++ b/mlir/test/Dialect/Linalg/vectorization/linalg-ops.mlir
@@ -918,12 +918,16 @@ func.func @mmt4d_scalable_with_assume(%A: memref<16x16x8x1xf32>, %B: memref<16x1
// CHECK-SAME: %[[B:.*]]: memref<16x16x?x1xf32>,
// CHECK-SAME: %[[C_IN:.*]]: memref<16x16x8x?xf32>) {
// CHECK-NOT: mask
-// CHECK: %[[VEC_A:.*]] = vector.transfer_read %[[A]]{{.*}} : memref<16x16x8x1xf32>, vector<16x16x16x8x[4]x1xf32>
-// CHECK: %[[VEC_B:.*]] = vector.transfer_read %[[B]]{{.*}} : memref<16x16x?x1xf32>, vector<16x16x16x8x[4]x1xf32>
-// CHECK: %[[VEC_C:.*]] = vector.transfer_read %[[C_IN]]{{.*}} : memref<16x16x8x?xf32>, vector<16x16x8x[4]xf32>
+// CHECK: %[[VEC_A:.*]] = vector.transfer_read %[[A]]
+// CHECK-SAME: in_bounds = [true, true, true, true, true, true]{{.*}} : memref<16x16x8x1xf32>, vector<16x16x16x8x[4]x1xf32>
+// CHECK: %[[VEC_B:.*]] = vector.transfer_read %[[B]]
+// CHECK-SAME: in_bounds = [true, true, true, true, true, true]{{.*}} : memref<16x16x?x1xf32>, vector<16x16x16x8x[4]x1xf32>
+// CHECK: %[[VEC_C:.*]] = vector.transfer_read %[[C_IN]]
+// CHECK-SAME: in_bounds = [true, true, true, true]{{.*}} : memref<16x16x8x?xf32>, vector<16x16x8x[4]xf32>
// CHECK: %[[MUL:.*]] = arith.mulf %[[VEC_A]], %[[VEC_B]] : vector<16x16x16x8x[4]x1xf32>
// CHECK: %[[RED:.*]] = vector.multi_reduction <add>, %[[MUL]], %[[VEC_C]] [2, 5] : vector<16x16x16x8x[4]x1xf32> to vector<16x16x8x[4]xf32>
-// CHECK: vector.transfer_write %[[RED]], %[[C_IN]]{{.*}} : vector<16x16x8x[4]xf32>, memref<16x16x8x?xf32>
+// CHECK: vector.transfer_write %[[RED]], %[[C_IN]]
+// CHECK-SAME: in_bounds = [true, true, true, true]{{.*}} : vector<16x16x8x[4]xf32>, memref<16x16x8x?xf32>
module attributes {transform.with_named_sequence} {
transform.named_sequence @__transform_main(%arg1: !transform.any_op {transform.readonly}) {
@@ -1011,12 +1015,16 @@ func.func @batch_mmt4d_scalable_with_assume(%A: memref<2x16x16x8x1xf32>, %B: mem
// CHECK-SAME: %[[B:.*]]: memref<2x16x16x?x1xf32>,
// CHECK-SAME: %[[C_IN:.*]]: memref<2x16x16x8x?xf32>) {
// CHECK-NOT: mask
-// CHECK: %[[VEC_A:.*]] = vector.transfer_read %[[A]]{{.*}} : memref<2x16x16x8x1xf32>, vector<2x16x16x16x8x[4]x1xf32>
-// CHECK: %[[VEC_B:.*]] = vector.transfer_read %[[B]]{{.*}} : memref<2x16x16x?x1xf32>, vector<2x16x16x16x8x[4]x1xf32>
-// CHECK: %[[VEC_C:.*]] = vector.transfer_read %[[C_IN]]{{.*}} : memref<2x16x16x8x?xf32>, vector<2x16x16x8x[4]xf32>
+// CHECK: %[[VEC_A:.*]] = vector.transfer_read %[[A]]
+// CHECK-SAME: in_bounds = [true, true, true, true, true, true, true]{{.*}} : memref<2x16x16x8x1xf32>, vector<2x16x16x16x8x[4]x1xf32>
+// CHECK: %[[VEC_B:.*]] = vector.transfer_read %[[B]]
+// CHECK-SAME: in_bounds = [true, true, true, true, true, true, true]{{.*}} : memref<2x16x16x?x1xf32>, vector<2x16x16x16x8x[4]x1xf32>
+// CHECK: %[[VEC_C:.*]] = vector.transfer_read %[[C_IN]]
+// CHECK-SAME: in_bounds = [true, true, true, true, true]{{.*}} : memref<2x16x16x8x?xf32>, vector<2x16x16x8x[4]xf32>
// CHECK: %[[MUL:.*]] = arith.mulf %[[VEC_A]], %[[VEC_B]] : vector<2x16x16x16x8x[4]x1xf32>
// CHECK: %[[RED:.*]] = vector.multi_reduction <add>, %[[MUL]], %[[VEC_C]] [3, 6] : vector<2x16x16x16x8x[4]x1xf32> to vector<2x16x16x8x[4]xf32>
-// CHECK: vector.transfer_write %[[RED]], %[[C_IN]]{{.*}} : vector<2x16x16x8x[4]xf32>, memref<2x16x16x8x?xf32>
+// CHECK: vector.transfer_write %[[RED]], %[[C_IN]]
+// CHECK-SAME: in_bounds = [true, true, true, true, true]{{.*}} : vector<2x16x16x8x[4]xf32>, memref<2x16x16x8x?xf32>
module attributes {transform.with_named_sequence} {
transform.named_sequence @__transform_main(%arg1: !transform.any_op {transform.readonly}) {
|
@llvm/pr-subscribers-mlir Author: Ege Beysel (egebeysel) ChangesThe idea from #146531 was to introduce the flag In the existence of scalable tile sizes, subsequent patterns tend to overwrite the inbounds attribute and introduce masks further down when lowered to loads and stores. This PR explicitly sets the inbounds attribute to all-true for Full diff: https://github.com/llvm/llvm-project/pull/160839.diff 2 Files Affected:
diff --git a/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp b/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
index 15c467b21c81e..4db6057519da5 100644
--- a/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
+++ b/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
@@ -524,6 +524,23 @@ VectorizationState::maskOperation(RewriterBase &rewriter, Operation *opToMask,
if (!mask) {
LDBG() << "No mask required";
+ if (assumeDynamicDimsMatchVecSizes) {
+ LDBG() << "Assuming dynamic dimensions match vector sizes!";
+ // Set inbounds to all-true.
+ llvm::TypeSwitch<Operation *>(opToMask)
+ .Case<vector::TransferReadOp, vector::TransferWriteOp>(
+ [&](auto xferOp) {
+ SmallVector<bool> inBoundsMap(xferOp.getInBounds().size(),
+ true);
+ rewriter.modifyOpInPlace(xferOp, [&]() {
+ xferOp.setInBoundsAttr(
+ rewriter.getBoolArrayAttr(inBoundsMap));
+ });
+ })
+ .Default([](Operation *op) {
+ // No-op if the operation is not an xfer read or write.
+ });
+ }
return opToMask;
}
diff --git a/mlir/test/Dialect/Linalg/vectorization/linalg-ops.mlir b/mlir/test/Dialect/Linalg/vectorization/linalg-ops.mlir
index 62bf1f55c9af2..7fe707902072a 100644
--- a/mlir/test/Dialect/Linalg/vectorization/linalg-ops.mlir
+++ b/mlir/test/Dialect/Linalg/vectorization/linalg-ops.mlir
@@ -918,12 +918,16 @@ func.func @mmt4d_scalable_with_assume(%A: memref<16x16x8x1xf32>, %B: memref<16x1
// CHECK-SAME: %[[B:.*]]: memref<16x16x?x1xf32>,
// CHECK-SAME: %[[C_IN:.*]]: memref<16x16x8x?xf32>) {
// CHECK-NOT: mask
-// CHECK: %[[VEC_A:.*]] = vector.transfer_read %[[A]]{{.*}} : memref<16x16x8x1xf32>, vector<16x16x16x8x[4]x1xf32>
-// CHECK: %[[VEC_B:.*]] = vector.transfer_read %[[B]]{{.*}} : memref<16x16x?x1xf32>, vector<16x16x16x8x[4]x1xf32>
-// CHECK: %[[VEC_C:.*]] = vector.transfer_read %[[C_IN]]{{.*}} : memref<16x16x8x?xf32>, vector<16x16x8x[4]xf32>
+// CHECK: %[[VEC_A:.*]] = vector.transfer_read %[[A]]
+// CHECK-SAME: in_bounds = [true, true, true, true, true, true]{{.*}} : memref<16x16x8x1xf32>, vector<16x16x16x8x[4]x1xf32>
+// CHECK: %[[VEC_B:.*]] = vector.transfer_read %[[B]]
+// CHECK-SAME: in_bounds = [true, true, true, true, true, true]{{.*}} : memref<16x16x?x1xf32>, vector<16x16x16x8x[4]x1xf32>
+// CHECK: %[[VEC_C:.*]] = vector.transfer_read %[[C_IN]]
+// CHECK-SAME: in_bounds = [true, true, true, true]{{.*}} : memref<16x16x8x?xf32>, vector<16x16x8x[4]xf32>
// CHECK: %[[MUL:.*]] = arith.mulf %[[VEC_A]], %[[VEC_B]] : vector<16x16x16x8x[4]x1xf32>
// CHECK: %[[RED:.*]] = vector.multi_reduction <add>, %[[MUL]], %[[VEC_C]] [2, 5] : vector<16x16x16x8x[4]x1xf32> to vector<16x16x8x[4]xf32>
-// CHECK: vector.transfer_write %[[RED]], %[[C_IN]]{{.*}} : vector<16x16x8x[4]xf32>, memref<16x16x8x?xf32>
+// CHECK: vector.transfer_write %[[RED]], %[[C_IN]]
+// CHECK-SAME: in_bounds = [true, true, true, true]{{.*}} : vector<16x16x8x[4]xf32>, memref<16x16x8x?xf32>
module attributes {transform.with_named_sequence} {
transform.named_sequence @__transform_main(%arg1: !transform.any_op {transform.readonly}) {
@@ -1011,12 +1015,16 @@ func.func @batch_mmt4d_scalable_with_assume(%A: memref<2x16x16x8x1xf32>, %B: mem
// CHECK-SAME: %[[B:.*]]: memref<2x16x16x?x1xf32>,
// CHECK-SAME: %[[C_IN:.*]]: memref<2x16x16x8x?xf32>) {
// CHECK-NOT: mask
-// CHECK: %[[VEC_A:.*]] = vector.transfer_read %[[A]]{{.*}} : memref<2x16x16x8x1xf32>, vector<2x16x16x16x8x[4]x1xf32>
-// CHECK: %[[VEC_B:.*]] = vector.transfer_read %[[B]]{{.*}} : memref<2x16x16x?x1xf32>, vector<2x16x16x16x8x[4]x1xf32>
-// CHECK: %[[VEC_C:.*]] = vector.transfer_read %[[C_IN]]{{.*}} : memref<2x16x16x8x?xf32>, vector<2x16x16x8x[4]xf32>
+// CHECK: %[[VEC_A:.*]] = vector.transfer_read %[[A]]
+// CHECK-SAME: in_bounds = [true, true, true, true, true, true, true]{{.*}} : memref<2x16x16x8x1xf32>, vector<2x16x16x16x8x[4]x1xf32>
+// CHECK: %[[VEC_B:.*]] = vector.transfer_read %[[B]]
+// CHECK-SAME: in_bounds = [true, true, true, true, true, true, true]{{.*}} : memref<2x16x16x?x1xf32>, vector<2x16x16x16x8x[4]x1xf32>
+// CHECK: %[[VEC_C:.*]] = vector.transfer_read %[[C_IN]]
+// CHECK-SAME: in_bounds = [true, true, true, true, true]{{.*}} : memref<2x16x16x8x?xf32>, vector<2x16x16x8x[4]xf32>
// CHECK: %[[MUL:.*]] = arith.mulf %[[VEC_A]], %[[VEC_B]] : vector<2x16x16x16x8x[4]x1xf32>
// CHECK: %[[RED:.*]] = vector.multi_reduction <add>, %[[MUL]], %[[VEC_C]] [3, 6] : vector<2x16x16x16x8x[4]x1xf32> to vector<2x16x16x8x[4]xf32>
-// CHECK: vector.transfer_write %[[RED]], %[[C_IN]]{{.*}} : vector<2x16x16x8x[4]xf32>, memref<2x16x16x8x?xf32>
+// CHECK: vector.transfer_write %[[RED]], %[[C_IN]]
+// CHECK-SAME: in_bounds = [true, true, true, true, true]{{.*}} : vector<2x16x16x8x[4]xf32>, memref<2x16x16x8x?xf32>
module attributes {transform.with_named_sequence} {
transform.named_sequence @__transform_main(%arg1: !transform.any_op {transform.readonly}) {
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good % minor comments. I am away next week, so approving as is to unblock you.
Thanks for fixing this 🙏🏻
LDBG() << "No mask required"; | ||
if (assumeDynamicDimsMatchVecSizes) { | ||
LDBG() << "Assuming dynamic dimensions match vector sizes!"; | ||
// Set inbounds to all-true. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment tells me what is happening, but that's also what the code says :) Could you instead clarify why we need this extra step? I would also update the LLDB
message.
Basically,
- xfer_read/xfer_write are special and hence this special treatment (this does not concern any other Ops)
- We need to set
in_bounds
explicitly as otherwise things further down the line will assume "out-of-bouds".
// Set inbounds to all-true. | |
For vector.transfer_read and vector.transfer_write, there is also the `in-bounds` attribute that we need to set explicitly. Otherwise, "out-of-bounds" access will be assumed and masks will be generated. |
if (!mask) { | ||
LDBG() << "No mask required"; | ||
if (assumeDynamicDimsMatchVecSizes) { | ||
LDBG() << "Assuming dynamic dimensions match vector sizes!"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move this inside the lambda. Otherwise it will be printed for every Op, which could be noise, no?
LDBG() << "Assuming dynamic dimensions match vector sizes!"; | |
LDBG() << "Assuming dynamic dimensions match vector sizes, set in_bounds to true!"; |
SmallVector<bool> inBoundsMap(xferOp.getInBounds().size(), | ||
true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, assumeDynamicDimsMatchVecSizes
only applies to dynamic sizes ;-) Static should remain "out-of-bounds". IIRC, something else will later infer that it's in fact "in bounds" 🤞🏻
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, completely missed that 🥲 Thanks!
The idea from #146531 was to introduce the flag
assumeDynamicDimsMatchVecSizes
, to signal the vectorizer that the access should not be masked and is in-bounds. Though the masking part is handled,xfer_read/write
ops are created without explicitly setting the inbounds attribute, which defaults to all-false.In the existence of scalable tile sizes, subsequent patterns tend to overwrite the inbounds attribute and introduce masks further down when lowered to loads and stores. This PR explicitly sets the inbounds attribute to all-true for
xfer_read/write
ops if theassumeDynamicDimsMatchVecSizes
flag is set.