[mlir][linalg] set inbounds on `xfer_read/writes` for `assumeDynamicDimsMatchVecSizes` #160839

egebeysel · 2025-09-26T09:19:02Z

The idea from #146531 was to introduce the flag assumeDynamicDimsMatchVecSizes, to signal the vectorizer that the access should not be masked and is in-bounds. Though the masking part is handled, xfer_read/write ops are created without explicitly setting the inbounds attribute, which defaults to all-false.

In the existence of scalable tile sizes, subsequent patterns tend to overwrite the inbounds attribute and introduce masks further down when lowered to loads and stores. This PR explicitly sets the inbounds attribute to all-true for xfer_read/write ops if the assumeDynamicDimsMatchVecSizes flag is set.

…umeDynamicDimsMatchVecSizes

llvmbot · 2025-09-26T09:19:37Z

@llvm/pr-subscribers-mlir-linalg

Author: Ege Beysel (egebeysel)

Changes

The idea from #146531 was to introduce the flag assumeDynamicDimsMatchVecSizes, to signal the vectorizer that the access should not be masked and is in-bounds. Though the masking part is handled, xfer_read/write ops are created without explicitly setting the inbounds attribute, which defaults to all-false.

In the existence of scalable tile sizes, subsequent patterns tend to overwrite the inbounds attribute and introduce masks further down when lowered to loads and stores. This PR explicitly sets the inbounds attribute to all-true for xfer_read/write ops if the assumeDynamicDimsMatchVecSizes flag is set.

Full diff: https://github.com/llvm/llvm-project/pull/160839.diff

2 Files Affected:

(modified) mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp (+17)
(modified) mlir/test/Dialect/Linalg/vectorization/linalg-ops.mlir (+16-8)

diff --git a/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp b/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
index 15c467b21c81e..4db6057519da5 100644
--- a/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
+++ b/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
@@ -524,6 +524,23 @@ VectorizationState::maskOperation(RewriterBase &rewriter, Operation *opToMask,
 
   if (!mask) {
     LDBG() << "No mask required";
+    if (assumeDynamicDimsMatchVecSizes) {
+      LDBG() << "Assuming dynamic dimensions match vector sizes!";
+      // Set inbounds to all-true.
+      llvm::TypeSwitch<Operation *>(opToMask)
+          .Case<vector::TransferReadOp, vector::TransferWriteOp>(
+              [&](auto xferOp) {
+                SmallVector<bool> inBoundsMap(xferOp.getInBounds().size(),
+                                              true);
+                rewriter.modifyOpInPlace(xferOp, [&]() {
+                  xferOp.setInBoundsAttr(
+                      rewriter.getBoolArrayAttr(inBoundsMap));
+                });
+              })
+          .Default([](Operation *op) {
+            // No-op if the operation is not an xfer read or write.
+          });
+    }
     return opToMask;
   }
 
diff --git a/mlir/test/Dialect/Linalg/vectorization/linalg-ops.mlir b/mlir/test/Dialect/Linalg/vectorization/linalg-ops.mlir
index 62bf1f55c9af2..7fe707902072a 100644
--- a/mlir/test/Dialect/Linalg/vectorization/linalg-ops.mlir
+++ b/mlir/test/Dialect/Linalg/vectorization/linalg-ops.mlir
@@ -918,12 +918,16 @@ func.func @mmt4d_scalable_with_assume(%A: memref<16x16x8x1xf32>, %B: memref<16x1
 // CHECK-SAME:      %[[B:.*]]: memref<16x16x?x1xf32>,
 // CHECK-SAME:      %[[C_IN:.*]]: memref<16x16x8x?xf32>) {
 // CHECK-NOT:       mask
-// CHECK:           %[[VEC_A:.*]] = vector.transfer_read %[[A]]{{.*}} : memref<16x16x8x1xf32>, vector<16x16x16x8x[4]x1xf32>
-// CHECK:           %[[VEC_B:.*]] = vector.transfer_read %[[B]]{{.*}} : memref<16x16x?x1xf32>, vector<16x16x16x8x[4]x1xf32>
-// CHECK:           %[[VEC_C:.*]] = vector.transfer_read %[[C_IN]]{{.*}} : memref<16x16x8x?xf32>, vector<16x16x8x[4]xf32>
+// CHECK:           %[[VEC_A:.*]] = vector.transfer_read %[[A]]
+// CHECK-SAME:      in_bounds = [true, true, true, true, true, true]{{.*}} : memref<16x16x8x1xf32>, vector<16x16x16x8x[4]x1xf32>
+// CHECK:           %[[VEC_B:.*]] = vector.transfer_read %[[B]]
+// CHECK-SAME:      in_bounds = [true, true, true, true, true, true]{{.*}} : memref<16x16x?x1xf32>, vector<16x16x16x8x[4]x1xf32>
+// CHECK:           %[[VEC_C:.*]] = vector.transfer_read %[[C_IN]]
+// CHECK-SAME:      in_bounds = [true, true, true, true]{{.*}} : memref<16x16x8x?xf32>, vector<16x16x8x[4]xf32>
 // CHECK:           %[[MUL:.*]] = arith.mulf %[[VEC_A]], %[[VEC_B]] : vector<16x16x16x8x[4]x1xf32>
 // CHECK:           %[[RED:.*]] = vector.multi_reduction <add>, %[[MUL]], %[[VEC_C]] [2, 5] : vector<16x16x16x8x[4]x1xf32> to vector<16x16x8x[4]xf32>
-// CHECK:           vector.transfer_write %[[RED]], %[[C_IN]]{{.*}} : vector<16x16x8x[4]xf32>, memref<16x16x8x?xf32>
+// CHECK:           vector.transfer_write %[[RED]], %[[C_IN]]
+// CHECK-SAME:      in_bounds = [true, true, true, true]{{.*}} : vector<16x16x8x[4]xf32>, memref<16x16x8x?xf32>
 
 module attributes {transform.with_named_sequence} {
   transform.named_sequence @__transform_main(%arg1: !transform.any_op {transform.readonly}) {
@@ -1011,12 +1015,16 @@ func.func @batch_mmt4d_scalable_with_assume(%A: memref<2x16x16x8x1xf32>, %B: mem
 // CHECK-SAME:      %[[B:.*]]: memref<2x16x16x?x1xf32>,
 // CHECK-SAME:      %[[C_IN:.*]]: memref<2x16x16x8x?xf32>) {
 // CHECK-NOT:       mask
-// CHECK:           %[[VEC_A:.*]] = vector.transfer_read %[[A]]{{.*}} : memref<2x16x16x8x1xf32>, vector<2x16x16x16x8x[4]x1xf32>
-// CHECK:           %[[VEC_B:.*]] = vector.transfer_read %[[B]]{{.*}} : memref<2x16x16x?x1xf32>, vector<2x16x16x16x8x[4]x1xf32>
-// CHECK:           %[[VEC_C:.*]] = vector.transfer_read %[[C_IN]]{{.*}} : memref<2x16x16x8x?xf32>, vector<2x16x16x8x[4]xf32>
+// CHECK:           %[[VEC_A:.*]] = vector.transfer_read %[[A]]
+// CHECK-SAME:      in_bounds = [true, true, true, true, true, true, true]{{.*}} : memref<2x16x16x8x1xf32>, vector<2x16x16x16x8x[4]x1xf32>
+// CHECK:           %[[VEC_B:.*]] = vector.transfer_read %[[B]]
+// CHECK-SAME:      in_bounds = [true, true, true, true, true, true, true]{{.*}} : memref<2x16x16x?x1xf32>, vector<2x16x16x16x8x[4]x1xf32>
+// CHECK:           %[[VEC_C:.*]] = vector.transfer_read %[[C_IN]]
+// CHECK-SAME:      in_bounds = [true, true, true, true, true]{{.*}} : memref<2x16x16x8x?xf32>, vector<2x16x16x8x[4]xf32>
 // CHECK:           %[[MUL:.*]] = arith.mulf %[[VEC_A]], %[[VEC_B]] : vector<2x16x16x16x8x[4]x1xf32>
 // CHECK:           %[[RED:.*]] = vector.multi_reduction <add>, %[[MUL]], %[[VEC_C]] [3, 6] : vector<2x16x16x16x8x[4]x1xf32> to vector<2x16x16x8x[4]xf32>
-// CHECK:           vector.transfer_write %[[RED]], %[[C_IN]]{{.*}} : vector<2x16x16x8x[4]xf32>, memref<2x16x16x8x?xf32>
+// CHECK:           vector.transfer_write %[[RED]], %[[C_IN]]
+// CHECK-SAME:      in_bounds = [true, true, true, true, true]{{.*}} : vector<2x16x16x8x[4]xf32>, memref<2x16x16x8x?xf32>
 
 module attributes {transform.with_named_sequence} {
   transform.named_sequence @__transform_main(%arg1: !transform.any_op {transform.readonly}) {

llvmbot · 2025-09-26T09:19:37Z

@llvm/pr-subscribers-mlir

Author: Ege Beysel (egebeysel)

Changes

The idea from #146531 was to introduce the flag assumeDynamicDimsMatchVecSizes, to signal the vectorizer that the access should not be masked and is in-bounds. Though the masking part is handled, xfer_read/write ops are created without explicitly setting the inbounds attribute, which defaults to all-false.

In the existence of scalable tile sizes, subsequent patterns tend to overwrite the inbounds attribute and introduce masks further down when lowered to loads and stores. This PR explicitly sets the inbounds attribute to all-true for xfer_read/write ops if the assumeDynamicDimsMatchVecSizes flag is set.

Full diff: https://github.com/llvm/llvm-project/pull/160839.diff

2 Files Affected:

(modified) mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp (+17)
(modified) mlir/test/Dialect/Linalg/vectorization/linalg-ops.mlir (+16-8)

diff --git a/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp b/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
index 15c467b21c81e..4db6057519da5 100644
--- a/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
+++ b/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
@@ -524,6 +524,23 @@ VectorizationState::maskOperation(RewriterBase &rewriter, Operation *opToMask,
 
   if (!mask) {
     LDBG() << "No mask required";
+    if (assumeDynamicDimsMatchVecSizes) {
+      LDBG() << "Assuming dynamic dimensions match vector sizes!";
+      // Set inbounds to all-true.
+      llvm::TypeSwitch<Operation *>(opToMask)
+          .Case<vector::TransferReadOp, vector::TransferWriteOp>(
+              [&](auto xferOp) {
+                SmallVector<bool> inBoundsMap(xferOp.getInBounds().size(),
+                                              true);
+                rewriter.modifyOpInPlace(xferOp, [&]() {
+                  xferOp.setInBoundsAttr(
+                      rewriter.getBoolArrayAttr(inBoundsMap));
+                });
+              })
+          .Default([](Operation *op) {
+            // No-op if the operation is not an xfer read or write.
+          });
+    }
     return opToMask;
   }
 
diff --git a/mlir/test/Dialect/Linalg/vectorization/linalg-ops.mlir b/mlir/test/Dialect/Linalg/vectorization/linalg-ops.mlir
index 62bf1f55c9af2..7fe707902072a 100644
--- a/mlir/test/Dialect/Linalg/vectorization/linalg-ops.mlir
+++ b/mlir/test/Dialect/Linalg/vectorization/linalg-ops.mlir
@@ -918,12 +918,16 @@ func.func @mmt4d_scalable_with_assume(%A: memref<16x16x8x1xf32>, %B: memref<16x1
 // CHECK-SAME:      %[[B:.*]]: memref<16x16x?x1xf32>,
 // CHECK-SAME:      %[[C_IN:.*]]: memref<16x16x8x?xf32>) {
 // CHECK-NOT:       mask
-// CHECK:           %[[VEC_A:.*]] = vector.transfer_read %[[A]]{{.*}} : memref<16x16x8x1xf32>, vector<16x16x16x8x[4]x1xf32>
-// CHECK:           %[[VEC_B:.*]] = vector.transfer_read %[[B]]{{.*}} : memref<16x16x?x1xf32>, vector<16x16x16x8x[4]x1xf32>
-// CHECK:           %[[VEC_C:.*]] = vector.transfer_read %[[C_IN]]{{.*}} : memref<16x16x8x?xf32>, vector<16x16x8x[4]xf32>
+// CHECK:           %[[VEC_A:.*]] = vector.transfer_read %[[A]]
+// CHECK-SAME:      in_bounds = [true, true, true, true, true, true]{{.*}} : memref<16x16x8x1xf32>, vector<16x16x16x8x[4]x1xf32>
+// CHECK:           %[[VEC_B:.*]] = vector.transfer_read %[[B]]
+// CHECK-SAME:      in_bounds = [true, true, true, true, true, true]{{.*}} : memref<16x16x?x1xf32>, vector<16x16x16x8x[4]x1xf32>
+// CHECK:           %[[VEC_C:.*]] = vector.transfer_read %[[C_IN]]
+// CHECK-SAME:      in_bounds = [true, true, true, true]{{.*}} : memref<16x16x8x?xf32>, vector<16x16x8x[4]xf32>
 // CHECK:           %[[MUL:.*]] = arith.mulf %[[VEC_A]], %[[VEC_B]] : vector<16x16x16x8x[4]x1xf32>
 // CHECK:           %[[RED:.*]] = vector.multi_reduction <add>, %[[MUL]], %[[VEC_C]] [2, 5] : vector<16x16x16x8x[4]x1xf32> to vector<16x16x8x[4]xf32>
-// CHECK:           vector.transfer_write %[[RED]], %[[C_IN]]{{.*}} : vector<16x16x8x[4]xf32>, memref<16x16x8x?xf32>
+// CHECK:           vector.transfer_write %[[RED]], %[[C_IN]]
+// CHECK-SAME:      in_bounds = [true, true, true, true]{{.*}} : vector<16x16x8x[4]xf32>, memref<16x16x8x?xf32>
 
 module attributes {transform.with_named_sequence} {
   transform.named_sequence @__transform_main(%arg1: !transform.any_op {transform.readonly}) {
@@ -1011,12 +1015,16 @@ func.func @batch_mmt4d_scalable_with_assume(%A: memref<2x16x16x8x1xf32>, %B: mem
 // CHECK-SAME:      %[[B:.*]]: memref<2x16x16x?x1xf32>,
 // CHECK-SAME:      %[[C_IN:.*]]: memref<2x16x16x8x?xf32>) {
 // CHECK-NOT:       mask
-// CHECK:           %[[VEC_A:.*]] = vector.transfer_read %[[A]]{{.*}} : memref<2x16x16x8x1xf32>, vector<2x16x16x16x8x[4]x1xf32>
-// CHECK:           %[[VEC_B:.*]] = vector.transfer_read %[[B]]{{.*}} : memref<2x16x16x?x1xf32>, vector<2x16x16x16x8x[4]x1xf32>
-// CHECK:           %[[VEC_C:.*]] = vector.transfer_read %[[C_IN]]{{.*}} : memref<2x16x16x8x?xf32>, vector<2x16x16x8x[4]xf32>
+// CHECK:           %[[VEC_A:.*]] = vector.transfer_read %[[A]]
+// CHECK-SAME:      in_bounds = [true, true, true, true, true, true, true]{{.*}} : memref<2x16x16x8x1xf32>, vector<2x16x16x16x8x[4]x1xf32>
+// CHECK:           %[[VEC_B:.*]] = vector.transfer_read %[[B]]
+// CHECK-SAME:      in_bounds = [true, true, true, true, true, true, true]{{.*}} : memref<2x16x16x?x1xf32>, vector<2x16x16x16x8x[4]x1xf32>
+// CHECK:           %[[VEC_C:.*]] = vector.transfer_read %[[C_IN]]
+// CHECK-SAME:      in_bounds = [true, true, true, true, true]{{.*}} : memref<2x16x16x8x?xf32>, vector<2x16x16x8x[4]xf32>
 // CHECK:           %[[MUL:.*]] = arith.mulf %[[VEC_A]], %[[VEC_B]] : vector<2x16x16x16x8x[4]x1xf32>
 // CHECK:           %[[RED:.*]] = vector.multi_reduction <add>, %[[MUL]], %[[VEC_C]] [3, 6] : vector<2x16x16x16x8x[4]x1xf32> to vector<2x16x16x8x[4]xf32>
-// CHECK:           vector.transfer_write %[[RED]], %[[C_IN]]{{.*}} : vector<2x16x16x8x[4]xf32>, memref<2x16x16x8x?xf32>
+// CHECK:           vector.transfer_write %[[RED]], %[[C_IN]]
+// CHECK-SAME:      in_bounds = [true, true, true, true, true]{{.*}} : vector<2x16x16x8x[4]xf32>, memref<2x16x16x8x?xf32>
 
 module attributes {transform.with_named_sequence} {
   transform.named_sequence @__transform_main(%arg1: !transform.any_op {transform.readonly}) {

banach-space

Overall looks good % minor comments. I am away next week, so approving as is to unblock you.

Thanks for fixing this 🙏🏻

banach-space · 2025-09-26T11:01:52Z

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp

    LDBG() << "No mask required";
+    if (assumeDynamicDimsMatchVecSizes) {
+      LDBG() << "Assuming dynamic dimensions match vector sizes!";
+      // Set inbounds to all-true.


This comment tells me what is happening, but that's also what the code says :) Could you instead clarify why we need this extra step? I would also update the LLDB message.

Basically,

xfer_read/xfer_write are special and hence this special treatment (this does not concern any other Ops)

We need to set in_bounds explicitly as otherwise things further down the line will assume "out-of-bouds".

Suggested change

// Set inbounds to all-true.

For vector.transfer_read and vector.transfer_write, there is also the `in-bounds` attribute that we need to set explicitly. Otherwise, "out-of-bounds" access will be assumed and masks will be generated.

banach-space · 2025-09-26T11:02:37Z

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp

  if (!mask) {
    LDBG() << "No mask required";
+    if (assumeDynamicDimsMatchVecSizes) {
+      LDBG() << "Assuming dynamic dimensions match vector sizes!";


I would move this inside the lambda. Otherwise it will be printed for every Op, which could be noise, no?

Suggested change

LDBG() << "Assuming dynamic dimensions match vector sizes!";

LDBG() << "Assuming dynamic dimensions match vector sizes, set in_bounds to true!";

banach-space · 2025-09-26T11:03:45Z

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp

+                SmallVector<bool> inBoundsMap(xferOp.getInBounds().size(),
+                                              true);


Note, assumeDynamicDimsMatchVecSizes only applies to dynamic sizes ;-) Static should remain "out-of-bounds". IIRC, something else will later infer that it's in fact "in bounds" 🤞🏻

Oh, completely missed that 🥲 Thanks!

[mlir][linalg] set inbounds to all-true for xfer reads/writes for ass…

9987bcc

…umeDynamicDimsMatchVecSizes

egebeysel requested review from banach-space, dcaballe, hanhanW, nicolasvasilache and Groverkss as code owners September 26, 2025 09:19

llvmbot added mlir:linalg mlir labels Sep 26, 2025

banach-space approved these changes Sep 26, 2025

View reviewed changes

egebeysel added a commit to egebeysel/llvm-project that referenced this pull request Sep 27, 2025

cherry-pick llvm#160839 : set inbounds to true

e048a71

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[mlir][linalg] set inbounds on `xfer_read/writes` for `assumeDynamicDimsMatchVecSizes` #160839

[mlir][linalg] set inbounds on `xfer_read/writes` for `assumeDynamicDimsMatchVecSizes` #160839

Uh oh!

egebeysel commented Sep 26, 2025

Uh oh!

llvmbot commented Sep 26, 2025

Uh oh!

llvmbot commented Sep 26, 2025

Uh oh!

banach-space left a comment

Uh oh!

banach-space Sep 26, 2025

Uh oh!

banach-space Sep 26, 2025

Uh oh!

banach-space Sep 26, 2025

Uh oh!

egebeysel Sep 26, 2025

Uh oh!

Uh oh!

	// Set inbounds to all-true.
	For vector.transfer_read and vector.transfer_write, there is also the `in-bounds` attribute that we need to set explicitly. Otherwise, "out-of-bounds" access will be assumed and masks will be generated.

	LDBG() << "Assuming dynamic dimensions match vector sizes!";
	LDBG() << "Assuming dynamic dimensions match vector sizes, set in_bounds to true!";

		SmallVector<bool> inBoundsMap(xferOp.getInBounds().size(),
		true);

[mlir][linalg] set inbounds on xfer_read/writes for assumeDynamicDimsMatchVecSizes #160839

Are you sure you want to change the base?

[mlir][linalg] set inbounds on xfer_read/writes for assumeDynamicDimsMatchVecSizes #160839

Uh oh!

Conversation

egebeysel commented Sep 26, 2025

Uh oh!

llvmbot commented Sep 26, 2025

Uh oh!

llvmbot commented Sep 26, 2025

Uh oh!

banach-space left a comment

Choose a reason for hiding this comment

Uh oh!

banach-space Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

banach-space Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

banach-space Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

egebeysel Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

[mlir][linalg] set inbounds on `xfer_read/writes` for `assumeDynamicDimsMatchVecSizes` #160839

[mlir][linalg] set inbounds on `xfer_read/writes` for `assumeDynamicDimsMatchVecSizes` #160839