[MLIR][tensor] Improve `tensor.pack` verifier to catch more cases with unconditional runtime errors #77217

srcarroll · 2024-01-07T01:43:51Z

Previously, the tensor.pack verifier detects unconditional runtime errors only when tile sizes are static. Now, dynamic tiles are considered and we only require that the input and either corresponding tile or output size are static to determine if it will unconditionally produce errors at runtime.

llvmbot · 2024-01-07T01:58:01Z

@llvm/pr-subscribers-mlir-linalg
@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-mlir-tensor

Author: None (srcarroll)

Changes

Previously, the tensor.pack verifier detects undefined behavior only when tile sizes are static. Now, dynamic tiles are considered and we only require that the input and either corresponding tile or output size are static to determine UB.

Full diff: https://github.com/llvm/llvm-project/pull/77217.diff

4 Files Affected:

(modified) mlir/include/mlir/Dialect/Tensor/IR/TensorOps.td (+4-3)
(modified) mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp (+4-2)
(modified) mlir/lib/Dialect/Tensor/IR/TensorOps.cpp (+23-7)
(modified) mlir/test/Dialect/Tensor/invalid.mlir (+9-1)

diff --git a/mlir/include/mlir/Dialect/Tensor/IR/TensorOps.td b/mlir/include/mlir/Dialect/Tensor/IR/TensorOps.td
index eb0c79c01bee1a..1c61ece2676a90 100644
--- a/mlir/include/mlir/Dialect/Tensor/IR/TensorOps.td
+++ b/mlir/include/mlir/Dialect/Tensor/IR/TensorOps.td
@@ -1943,11 +1943,12 @@ def Tensor_PackOp : Tensor_RelayoutOp<"pack", [
 
     // Returns true if we have enough static information to catch undefined
     // behavior when the tile size does not divide perfectly the dimension of
-    // the input tensor. If a given dimension or a tile associated with it is
-    // dynamic, the dimension is not considered as we don't have enough static
-    // information to understand if the tile perfectly divides that dimension.
+    // the input tensor. Detecting UB requires that the input size and either
+    // corresponding tile or output size are static.
     static bool requirePaddingValue(ArrayRef<int64_t> inputShape,
                                     ArrayRef<int64_t> innerDimsPos,
+                                    ArrayRef<int64_t> outputShape,
+                                    ArrayRef<int64_t> outerDimsPerm,
                                     ArrayRef<OpFoldResult> innerTiles);
 
     static Value createDestinationTensor(OpBuilder &b, Location loc,
diff --git a/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp b/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
index 9d230e2c2e5749..c7fed41d234fd0 100644
--- a/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
+++ b/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
@@ -582,8 +582,10 @@ FailureOr<PackResult> linalg::pack(RewriterBase &rewriter,
             return getConstantIntValue(tile).has_value();
           });
       if (areConstantTiles && operandType.hasStaticShape() &&
-          !tensor::PackOp::requirePaddingValue(operandType.getShape(), innerPos,
-                                               innerPackSizes)) {
+          !tensor::PackOp::requirePaddingValue(
+              operandType.getShape(), innerPos,
+              dest.getType().cast<ShapedType>().getShape(), {},
+              innerPackSizes)) {
         packOps.push_back(rewriter.create<tensor::PackOp>(
             loc, operand, dest, innerPos, innerPackSizes));
       } else {
diff --git a/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp b/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
index 816e6ba8fed94e..4318e55fd213e3 100644
--- a/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
+++ b/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
@@ -3742,14 +3742,27 @@ SmallVector<int64_t> PackOp::getStaticTiles() {
 
 bool PackOp::requirePaddingValue(ArrayRef<int64_t> inputShape,
                                  ArrayRef<int64_t> innerDimsPos,
+                                 ArrayRef<int64_t> outputShape,
+                                 ArrayRef<int64_t> outerDimsPerm,
                                  ArrayRef<OpFoldResult> innerTiles) {
+  SmallVector<int64_t> outputTileSizes(
+      outputShape.take_front(inputShape.size()));
+  if (!outerDimsPerm.empty()) {
+    assert(outerDimsPerm.size() == outputTileSizes.size() &&
+           "expected output and outer_dims_perm to have same size");
+    applyPermutationToVector(outputTileSizes,
+                             invertPermutationVector(outerDimsPerm));
+  }
   for (auto [pos, tileSize] : llvm::zip_equal(innerDimsPos, innerTiles)) {
     if (ShapedType::isDynamic(inputShape[pos]))
       continue;
     std::optional<int64_t> constantTile = getConstantIntValue(tileSize);
-    if (!constantTile)
-      continue;
-    if (inputShape[pos] % (*constantTile) != 0)
+
+    if (!constantTile) {
+      if (!ShapedType::isDynamic(outputTileSizes[pos]) &&
+          (inputShape[pos] % outputTileSizes[pos] != 0))
+        return true;
+    } else if (inputShape[pos] % (*constantTile) != 0)
       return true;
   }
   return false;
@@ -3772,9 +3785,11 @@ LogicalResult PackOp::verify() {
 
   if (!paddingValue &&
       requirePaddingValue(getSourceType().getShape(), getInnerDimsPos(),
+                          getDestType().getShape(), getOuterDimsPerm(),
                           getMixedTiles())) {
-    return emitOpError("invalid tile factor provided. Only full tiles are "
-                       "supported when padding_value is not set");
+    return emitOpError(
+        "invalid tile factor or output size provided. Only full tiles are "
+        "supported when padding_value is not set");
   }
   return success();
 }
@@ -3975,8 +3990,9 @@ static bool paddingIsNotNeeded(PackOp op) {
     return false;
   if (ShapedType::isDynamicShape(op.getStaticInnerTiles()))
     return false;
-  return !PackOp::requirePaddingValue(srcType.getShape(), op.getInnerDimsPos(),
-                                      op.getMixedTiles());
+  return !PackOp::requirePaddingValue(
+      srcType.getShape(), op.getInnerDimsPos(), op.getDestType().getShape(),
+      op.getOuterDimsPerm(), op.getMixedTiles());
 }
 
 LogicalResult PackOp::canonicalize(PackOp packOp, PatternRewriter &rewriter) {
diff --git a/mlir/test/Dialect/Tensor/invalid.mlir b/mlir/test/Dialect/Tensor/invalid.mlir
index 735e5146e9dbc8..0eb8672d4d41b3 100644
--- a/mlir/test/Dialect/Tensor/invalid.mlir
+++ b/mlir/test/Dialect/Tensor/invalid.mlir
@@ -597,13 +597,21 @@ func.func @empty_wrong_number_of_operands(%sz : index) {
 // -----
 
 func.func @pack_invalid_no_padding_no_full_tiles(%input: tensor<256x128xf32>, %output: tensor<8x8x16x33xf32>) -> tensor<8x8x16x33xf32> {
-  // expected-error@+1 {{invalid tile factor provided. Only full tiles are supported when padding_value is not set}}
+  // expected-error@+1 {{invalid tile factor or output size provided. Only full tiles are supported when padding_value is not set}}
   %0 = tensor.pack %input inner_dims_pos = [1, 0] inner_tiles = [16, 33] into %output : tensor<256x128xf32>  -> tensor<8x8x16x33xf32>
   return %0 : tensor<8x8x16x33xf32>
 }
 
 // -----
 
+func.func @pack_invalid_no_padding_no_full_tiles_dyn_tiles(%input: tensor<256x128xf32>, %output: tensor<10x8x?x?xf32>, %tile_size_0: index, %tile_size_1: index) -> tensor<10x8x?x?xf32> {
+  // expected-error@+1 {{invalid tile factor or output size provided. Only full tiles are supported when padding_value is not set}}
+  %0 = tensor.pack %input inner_dims_pos = [1, 0] inner_tiles = [%tile_size_0, %tile_size_1] into %output : tensor<256x128xf32>  -> tensor<10x8x?x?xf32>
+  return %0 : tensor<10x8x?x?xf32>
+} 
+
+// -----
+
 func.func @pad_and_pack_invalid_type(%input: tensor<13x15xf32>, %output: tensor<2x8x8x2xf32>, %pad: i32) -> tensor<2x8x8x2xf32> {
   // expected-error@+1 {{expected padding_value has 'f32' but got: 'i32'}}
   %0 = tensor.pack %input padding_value(%pad: i32) inner_dims_pos = [0, 1] inner_tiles = [8, 2] into %output : tensor<13x15xf32> -> tensor<2x8x8x2xf32>

joker-eph · 2024-01-07T18:26:26Z

Improve tensor.pack verifier to catch more UB cases

I'm confused to see "verifier" and "UB" in the same sentence: these are non-overlapping concepts. If the verifier defines an IR to be invalid, it can't be UB. Only valid IR can be UB since UB is a runtime concept.

srcarroll · 2024-01-07T18:31:01Z

If the verifier defines an IR to be invalid, it can't be UB. Only valid IR can be UB since UB is a runtime concept.

Good point. Sorry I tend to abuse terminology sometimes. Also the current code uses this phrasing. I'll try to come up with a better description, but any thoughts?

Basically the verifier is catching cases that would be unconditionally UB at runtime.

nicolasvasilache · 2024-01-08T07:37:54Z

I would just drop any notion of UB for the static case: it is invalid for an op to be a partial tile and to not provide a padding value.

In the dynamic case, we could always require a padding value (if feasible, I forget).

joker-eph · 2024-01-08T12:20:26Z

I would just drop any notion of UB for the static case: it is invalid for an op to be a partial tile and to not provide a padding value.

This is fine, just to keep in mind that canonicalization/folder must include the same checks as the verifier when trying to constant fold the dynamic case into the static one.

srcarroll · 2024-01-08T16:16:39Z

In the dynamic case, we could always require a padding value (if feasible, I forget

A padding value isn't required if the corresponding output dim is static and divides the the corresponding static input dim. That's the whole point of this PR. Unless I'm mistaken about something.

chelini · 2024-01-08T16:39:00Z

I would just drop any notion of UB for the static case: it is invalid for an op to be a partial tile and to not provide a padding value.

In the dynamic case, we could always require a padding value (if feasible, I forget).

Would it make sense to keep the padding behavior consistent between the static and dynamic behavior, having padding optional in both cases?

hanhanW · 2024-01-08T17:57:30Z

Would it make sense to keep the padding behavior consistent between the static and dynamic behavior, having padding optional in both cases?

+1 on keeping the padding behavior consistent between static and dynamic cases. It is valid to me. I don't have a specific use case, but I think user can allocate big enough buffer based on inner_tile_sizes.

srcarroll · 2024-01-08T18:18:06Z

Would it make sense to keep the padding behavior consistent between the static and dynamic behavior, having padding optional in both cases?

I dont think I understand this. Can you elaborate? I thought that's what I was doing. That is, in current main, when tile sizes are static we require padding when tile size doesn't divide input size. The proprosed changes here extend this to the dynamic tile case, but require padding when output size doesn't divide input size. That's consistent by my definition so obviously I'm missing something here.

chelini · 2024-01-08T19:55:18Z

Would it make sense to keep the padding behavior consistent between the static and dynamic behavior, having padding optional in both cases?

I dont think I understand this. Can you elaborate? I thought that's what I was doing. That is, in current main, when tile sizes are static we require padding when tile size doesn't divide input size. The proprosed changes here extend this to the dynamic tile case, but require padding when output size doesn't divide input size. That's consistent by my definition so obviously I'm missing something here.

I answered Nicolas's comment and argued that keeping the padding optional in the dynamic case would be better (unless I misunderstood his message). I see op verification this way: If padding is specified, we will pad along high dimensions to make the tile complete; the tile will divide the dimension and the IR is valid. If padding is not specified, we have two options:

If we have enough static information to prove that a tile does not divide perfectly a dimension, the verifier should emit an error. As far as I can see in this PR, you are going in this direction: input and output are constant, and you are erroring out because the tile cannot fully divide the dimension unless we pad.
If we have dynamic behaviors and it is impossible to prove that the tile perfectly divides the dimension, the verifier should not emit any error, and it is an undefined behavior at runtime if the tile does not divide unless the padding attribute is specified.

srcarroll · 2024-01-16T17:42:29Z

@nicolasvasilache any thoughts?

hanhanW

This is very rare and I can't think a scenario about it... FYI that the changes about requirePaddingValue will break all the downstream projects that use the method. I'm fine with the change, but do we really have a need for this?

%0 = tensor.pack %input inner_dims_pos = [1, 0] inner_tiles = [%tile_size_0, %tile_size_1] into %output : tensor<256x128xf32> -> tensor<10x8x?x?xf32>

hanhanW · 2024-02-07T04:21:20Z

mlir/lib/Dialect/Tensor/IR/TensorOps.cpp

+    if (!constantTile) {
+      if (!ShapedType::isDynamic(outputTileSizes[pos]) &&
+          (inputShape[pos] % outputTileSizes[pos] != 0))
+        return true;
+    } else if (inputShape[pos] % (*constantTile) != 0)
      return true;


We should use braces for else if, which keeps it uniforms with if block.

https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements

nicolasvasilache · 2024-02-07T05:20:03Z

My argument goes this way:

we can remove UB by always requiring the padding value to be specified. It generally seems desirable to me to avoid needlessly introducing UB and special runtime cases.
in the static case, when we have enough info, we can relax this to omit the padding value iff we can statically prove what we need to.

Ultimately, this smells like the "in_bounds" attribute that LLVM has on load/store and that we also added to vector.transfer_read/write.

The situation seems close enough to me to justify a similar attribute addition.

 I'm fine with the change, but do we really have a need for this?

%0 = tensor.pack %input inner_dims_pos = [1, 0] inner_tiles = [%tile_size_0, %tile_size_1] into %output : tensor<256x128xf32> -> tensor<10x8x?x?xf32>

Not planning to use it myself right now but I can see how a transformation guaranteeing that a dynamic tile size will not result in padding seems pretty useful. We would probably be better off with an "in_bounds" style attribute for this.

FYI that the changes about requirePaddingValue will break all the downstream projects that use the method.

This has never been, and continues to not be, a factor for upstream MLIR. If it is the right thing to do we should, if not that's ok. Arguing design positions based on breaking downstream projects is dangerous: it gives the wrong incentives as it pigeonholes evolution.

rengolin · 2024-02-07T10:52:44Z

This has never been, and continues to not be, a factor for upstream MLIR. If it is the right thing to do we should, if not that's ok. Arguing design positions based on breaking downstream projects is dangerous: it gives the wrong incentives as it pigeonholes evolution.

Strong support for this statement.

we can remove UB by always requiring the padding value to be specified. It generally seems desirable to me to avoid needlessly introducing UB and special runtime cases.

I wonder what is the value of requiring an actual value at run time. IIUC, the padding is just the remainder of the division of the run time dimension by the run time tiling factor, on the last tile. Anything else would make no sense. Requiring users/lowering to add a mod op for the padding value seems redundant. Just a flag to pad to the nearest multiple on the last tile would do, probably.

If the source language "defines" undefined behaviour for run time cases (like C++), then we let it by pass. If not, then they're responsible for lowering run time checks to IR and do what they normally would do (crash, trap, raise exception). It is not a job for the IR to know what.

srcarroll · 2024-02-07T17:04:51Z

I'm fine with the change, but do we really have a need for this?

@hanhanW To be honest, I personally don't need it. this was just a follow up to the discussion in this PR #76003 . Which I also don't personally need.

we can remove UB by always requiring the padding value to be specified. It generally seems desirable to me to avoid needlessly introducing UB and special runtime cases.

seems reasonable to me. by the time i started touching this code there was already a precedence to detect UB, so I was just trying to finish that off

in the static case, when we have enough info, we can relax this to omit the padding value iff we can statically prove what we need to.

if i'm not mistaken, the proposed changes should cover all the cases

We would probably be better off with an "in_bounds" style attribute for this.

I'm not opposed to that, but honestly don't know how that would work yet.

I wonder what is the value of requiring an actual value at run time. IIUC, the padding is just the remainder of the division of the run time dimension by the run time tiling factor, on the last tile. Anything else would make no sense. Requiring users/lowering to add a mod op for the padding value seems redundant. Just a flag to pad to the nearest multiple on the last tile would do, probably.

you are referring to the padding size, not the padding value (the value filling the padded regions). A padding value is always required when padding is needed. and padding is needed when the the tile size doesn't divide the input.

hanhanW

LGTM, thanks for the patch. Now we should cover all the cases in the verifier.

Ultimately, this smells like the "in_bounds" attribute that LLVM has on load/store and that we also added to vector.transfer_read/write.

IMO. we don't want to propagate "in_bounds" attribute to high level operations. This looks like a good stop point to me at tensor level, and we handle the case in vectorization.

srcarroll force-pushed the pack-op-verifier branch from 278fc0a to 52cc3fa Compare January 7, 2024 01:53

Include output size in determining UB for tensor.pack

e016ccb

srcarroll force-pushed the pack-op-verifier branch from 52cc3fa to e016ccb Compare January 7, 2024 01:57

srcarroll marked this pull request as ready for review January 7, 2024 01:57

srcarroll requested review from hanhanW and nicolasvasilache as code owners January 7, 2024 01:57

llvmbot added mlir:linalg mlir mlir:tensor labels Jan 7, 2024

srcarroll mentioned this pull request Jan 7, 2024

[mlir][Linalg] Support dynamic tiles in lower_pack transform #76003

Open

srcarroll changed the title ~~Improve tensor.pack verifier to catch more UB cases~~ [MLIR][tensor] Improve tensor.pack verifier to catch more UB cases Jan 7, 2024

Add test with outer_dims_perm

d841037

srcarroll changed the title ~~[MLIR][tensor] Improve tensor.pack verifier to catch more UB cases~~ [MLIR][tensor] Improve tensor.pack verifier to catch more cases with unconditional runtime errors Jan 7, 2024

chelini self-requested a review January 8, 2024 15:08

srcarroll added 2 commits February 2, 2024 12:17

Merge branch 'main' into pack-op-verifier

47ab257

Merge branch 'main' into pack-op-verifier

98013c5

hanhanW reviewed Feb 7, 2024

View reviewed changes

srcarroll and others added 3 commits February 12, 2024 10:32

Merge branch 'main' into pack-op-verifier

612ce4e

Merge branch 'main' into pack-op-verifier

432dfab

Addres review comment

f542370

hanhanW approved these changes Feb 14, 2024

View reviewed changes

srcarroll merged commit 9466c4e into llvm:main Feb 19, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MLIR][tensor] Improve `tensor.pack` verifier to catch more cases with unconditional runtime errors #77217

[MLIR][tensor] Improve `tensor.pack` verifier to catch more cases with unconditional runtime errors #77217

srcarroll commented Jan 7, 2024 •

edited

llvmbot commented Jan 7, 2024 •

edited

joker-eph commented Jan 7, 2024

srcarroll commented Jan 7, 2024 •

edited

nicolasvasilache commented Jan 8, 2024

joker-eph commented Jan 8, 2024

srcarroll commented Jan 8, 2024

chelini commented Jan 8, 2024

hanhanW commented Jan 8, 2024

srcarroll commented Jan 8, 2024

chelini commented Jan 8, 2024

srcarroll commented Jan 16, 2024

hanhanW left a comment

hanhanW Feb 7, 2024

nicolasvasilache commented Feb 7, 2024

rengolin commented Feb 7, 2024

srcarroll commented Feb 7, 2024 •

edited

hanhanW left a comment

[MLIR][tensor] Improve tensor.pack verifier to catch more cases with unconditional runtime errors #77217

[MLIR][tensor] Improve tensor.pack verifier to catch more cases with unconditional runtime errors #77217

Conversation

srcarroll commented Jan 7, 2024 • edited

llvmbot commented Jan 7, 2024 • edited

joker-eph commented Jan 7, 2024

srcarroll commented Jan 7, 2024 • edited

nicolasvasilache commented Jan 8, 2024

joker-eph commented Jan 8, 2024

srcarroll commented Jan 8, 2024

chelini commented Jan 8, 2024

hanhanW commented Jan 8, 2024

srcarroll commented Jan 8, 2024

chelini commented Jan 8, 2024

srcarroll commented Jan 16, 2024

hanhanW left a comment

Choose a reason for hiding this comment

hanhanW Feb 7, 2024

Choose a reason for hiding this comment

nicolasvasilache commented Feb 7, 2024

rengolin commented Feb 7, 2024

srcarroll commented Feb 7, 2024 • edited

hanhanW left a comment

Choose a reason for hiding this comment

[MLIR][tensor] Improve `tensor.pack` verifier to catch more cases with unconditional runtime errors #77217

[MLIR][tensor] Improve `tensor.pack` verifier to catch more cases with unconditional runtime errors #77217

srcarroll commented Jan 7, 2024 •

edited

llvmbot commented Jan 7, 2024 •

edited

srcarroll commented Jan 7, 2024 •

edited

srcarroll commented Feb 7, 2024 •

edited