Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MLIR][tensor] Improve tensor.pack verifier to catch more cases with unconditional runtime errors #77217

Merged
merged 7 commits into from
Feb 19, 2024

Conversation

srcarroll
Copy link
Contributor

@srcarroll srcarroll commented Jan 7, 2024

Previously, the tensor.pack verifier detects unconditional runtime errors only when tile sizes are static. Now, dynamic tiles are considered and we only require that the input and either corresponding tile or output size are static to determine if it will unconditionally produce errors at runtime.

@llvmbot
Copy link
Collaborator

llvmbot commented Jan 7, 2024

@llvm/pr-subscribers-mlir-linalg
@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-mlir-tensor

Author: None (srcarroll)

Changes

Previously, the tensor.pack verifier detects undefined behavior only when tile sizes are static. Now, dynamic tiles are considered and we only require that the input and either corresponding tile or output size are static to determine UB.


Full diff: https://github.com/llvm/llvm-project/pull/77217.diff

4 Files Affected:

  • (modified) mlir/include/mlir/Dialect/Tensor/IR/TensorOps.td (+4-3)
  • (modified) mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp (+4-2)
  • (modified) mlir/lib/Dialect/Tensor/IR/TensorOps.cpp (+23-7)
  • (modified) mlir/test/Dialect/Tensor/invalid.mlir (+9-1)
diff --git a/mlir/include/mlir/Dialect/Tensor/IR/TensorOps.td b/mlir/include/mlir/Dialect/Tensor/IR/TensorOps.td
index eb0c79c01bee1a..1c61ece2676a90 100644
--- a/mlir/include/mlir/Dialect/Tensor/IR/TensorOps.td
+++ b/mlir/include/mlir/Dialect/Tensor/IR/TensorOps.td
@@ -1943,11 +1943,12 @@ def Tensor_PackOp : Tensor_RelayoutOp<"pack", [
 
     // Returns true if we have enough static information to catch undefined
     // behavior when the tile size does not divide perfectly the dimension of
-    // the input tensor. If a given dimension or a tile associated with it is
-    // dynamic, the dimension is not considered as we don't have enough static
-    // information to understand if the tile perfectly divides that dimension.
+    // the input tensor. Detecting UB requires that the input size and either
+    // corresponding tile or output size are static.
     static bool requirePaddingValue(ArrayRef<int64_t> inputShape,
                                     ArrayRef<int64_t> innerDimsPos,
+                                    ArrayRef<int64_t> outputShape,
+                                    ArrayRef<int64_t> outerDimsPerm,
                                     ArrayRef<OpFoldResult> innerTiles);
 
     static Value createDestinationTensor(OpBuilder &b, Location loc,
diff --git a/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp b/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
index 9d230e2c2e5749..c7fed41d234fd0 100644
--- a/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
+++ b/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
@@ -582,8 +582,10 @@ FailureOr<PackResult> linalg::pack(RewriterBase &rewriter,
             return getConstantIntValue(tile).has_value();
           });
       if (areConstantTiles && operandType.hasStaticShape() &&
-          !tensor::PackOp::requirePaddingValue(operandType.getShape(), innerPos,
-                                               innerPackSizes)) {
+          !tensor::PackOp::requirePaddingValue(
+              operandType.getShape(), innerPos,
+              dest.getType().cast<ShapedType>().getShape(), {},
+              innerPackSizes)) {
         packOps.push_back(rewriter.create<tensor::PackOp>(
             loc, operand, dest, innerPos, innerPackSizes));
       } else {
diff --git a/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp b/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
index 816e6ba8fed94e..4318e55fd213e3 100644
--- a/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
+++ b/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
@@ -3742,14 +3742,27 @@ SmallVector<int64_t> PackOp::getStaticTiles() {
 
 bool PackOp::requirePaddingValue(ArrayRef<int64_t> inputShape,
                                  ArrayRef<int64_t> innerDimsPos,
+                                 ArrayRef<int64_t> outputShape,
+                                 ArrayRef<int64_t> outerDimsPerm,
                                  ArrayRef<OpFoldResult> innerTiles) {
+  SmallVector<int64_t> outputTileSizes(
+      outputShape.take_front(inputShape.size()));
+  if (!outerDimsPerm.empty()) {
+    assert(outerDimsPerm.size() == outputTileSizes.size() &&
+           "expected output and outer_dims_perm to have same size");
+    applyPermutationToVector(outputTileSizes,
+                             invertPermutationVector(outerDimsPerm));
+  }
   for (auto [pos, tileSize] : llvm::zip_equal(innerDimsPos, innerTiles)) {
     if (ShapedType::isDynamic(inputShape[pos]))
       continue;
     std::optional<int64_t> constantTile = getConstantIntValue(tileSize);
-    if (!constantTile)
-      continue;
-    if (inputShape[pos] % (*constantTile) != 0)
+
+    if (!constantTile) {
+      if (!ShapedType::isDynamic(outputTileSizes[pos]) &&
+          (inputShape[pos] % outputTileSizes[pos] != 0))
+        return true;
+    } else if (inputShape[pos] % (*constantTile) != 0)
       return true;
   }
   return false;
@@ -3772,9 +3785,11 @@ LogicalResult PackOp::verify() {
 
   if (!paddingValue &&
       requirePaddingValue(getSourceType().getShape(), getInnerDimsPos(),
+                          getDestType().getShape(), getOuterDimsPerm(),
                           getMixedTiles())) {
-    return emitOpError("invalid tile factor provided. Only full tiles are "
-                       "supported when padding_value is not set");
+    return emitOpError(
+        "invalid tile factor or output size provided. Only full tiles are "
+        "supported when padding_value is not set");
   }
   return success();
 }
@@ -3975,8 +3990,9 @@ static bool paddingIsNotNeeded(PackOp op) {
     return false;
   if (ShapedType::isDynamicShape(op.getStaticInnerTiles()))
     return false;
-  return !PackOp::requirePaddingValue(srcType.getShape(), op.getInnerDimsPos(),
-                                      op.getMixedTiles());
+  return !PackOp::requirePaddingValue(
+      srcType.getShape(), op.getInnerDimsPos(), op.getDestType().getShape(),
+      op.getOuterDimsPerm(), op.getMixedTiles());
 }
 
 LogicalResult PackOp::canonicalize(PackOp packOp, PatternRewriter &rewriter) {
diff --git a/mlir/test/Dialect/Tensor/invalid.mlir b/mlir/test/Dialect/Tensor/invalid.mlir
index 735e5146e9dbc8..0eb8672d4d41b3 100644
--- a/mlir/test/Dialect/Tensor/invalid.mlir
+++ b/mlir/test/Dialect/Tensor/invalid.mlir
@@ -597,13 +597,21 @@ func.func @empty_wrong_number_of_operands(%sz : index) {
 // -----
 
 func.func @pack_invalid_no_padding_no_full_tiles(%input: tensor<256x128xf32>, %output: tensor<8x8x16x33xf32>) -> tensor<8x8x16x33xf32> {
-  // expected-error@+1 {{invalid tile factor provided. Only full tiles are supported when padding_value is not set}}
+  // expected-error@+1 {{invalid tile factor or output size provided. Only full tiles are supported when padding_value is not set}}
   %0 = tensor.pack %input inner_dims_pos = [1, 0] inner_tiles = [16, 33] into %output : tensor<256x128xf32>  -> tensor<8x8x16x33xf32>
   return %0 : tensor<8x8x16x33xf32>
 }
 
 // -----
 
+func.func @pack_invalid_no_padding_no_full_tiles_dyn_tiles(%input: tensor<256x128xf32>, %output: tensor<10x8x?x?xf32>, %tile_size_0: index, %tile_size_1: index) -> tensor<10x8x?x?xf32> {
+  // expected-error@+1 {{invalid tile factor or output size provided. Only full tiles are supported when padding_value is not set}}
+  %0 = tensor.pack %input inner_dims_pos = [1, 0] inner_tiles = [%tile_size_0, %tile_size_1] into %output : tensor<256x128xf32>  -> tensor<10x8x?x?xf32>
+  return %0 : tensor<10x8x?x?xf32>
+} 
+
+// -----
+
 func.func @pad_and_pack_invalid_type(%input: tensor<13x15xf32>, %output: tensor<2x8x8x2xf32>, %pad: i32) -> tensor<2x8x8x2xf32> {
   // expected-error@+1 {{expected padding_value has 'f32' but got: 'i32'}}
   %0 = tensor.pack %input padding_value(%pad: i32) inner_dims_pos = [0, 1] inner_tiles = [8, 2] into %output : tensor<13x15xf32> -> tensor<2x8x8x2xf32>

@srcarroll srcarroll changed the title Improve tensor.pack verifier to catch more UB cases [MLIR][tensor] Improve tensor.pack verifier to catch more UB cases Jan 7, 2024
@joker-eph
Copy link
Collaborator

Improve tensor.pack verifier to catch more UB cases

I'm confused to see "verifier" and "UB" in the same sentence: these are non-overlapping concepts. If the verifier defines an IR to be invalid, it can't be UB. Only valid IR can be UB since UB is a runtime concept.

@srcarroll
Copy link
Contributor Author

srcarroll commented Jan 7, 2024

If the verifier defines an IR to be invalid, it can't be UB. Only valid IR can be UB since UB is a runtime concept.

Good point. Sorry I tend to abuse terminology sometimes. Also the current code uses this phrasing. I'll try to come up with a better description, but any thoughts?

Basically the verifier is catching cases that would be unconditionally UB at runtime.

@srcarroll srcarroll changed the title [MLIR][tensor] Improve tensor.pack verifier to catch more UB cases [MLIR][tensor] Improve tensor.pack verifier to catch more cases with unconditional runtime errors Jan 7, 2024
@nicolasvasilache
Copy link
Contributor

I would just drop any notion of UB for the static case: it is invalid for an op to be a partial tile and to not provide a padding value.

In the dynamic case, we could always require a padding value (if feasible, I forget).

@joker-eph
Copy link
Collaborator

I would just drop any notion of UB for the static case: it is invalid for an op to be a partial tile and to not provide a padding value.

This is fine, just to keep in mind that canonicalization/folder must include the same checks as the verifier when trying to constant fold the dynamic case into the static one.

@chelini chelini self-requested a review January 8, 2024 15:08
@srcarroll
Copy link
Contributor Author

In the dynamic case, we could always require a padding value (if feasible, I forget

A padding value isn't required if the corresponding output dim is static and divides the the corresponding static input dim. That's the whole point of this PR. Unless I'm mistaken about something.

@chelini
Copy link
Contributor

chelini commented Jan 8, 2024

I would just drop any notion of UB for the static case: it is invalid for an op to be a partial tile and to not provide a padding value.

In the dynamic case, we could always require a padding value (if feasible, I forget).

Would it make sense to keep the padding behavior consistent between the static and dynamic behavior, having padding optional in both cases?

@hanhanW
Copy link
Contributor

hanhanW commented Jan 8, 2024

Would it make sense to keep the padding behavior consistent between the static and dynamic behavior, having padding optional in both cases?

+1 on keeping the padding behavior consistent between static and dynamic cases. It is valid to me. I don't have a specific use case, but I think user can allocate big enough buffer based on inner_tile_sizes.

@srcarroll
Copy link
Contributor Author

Would it make sense to keep the padding behavior consistent between the static and dynamic behavior, having padding optional in both cases?

I dont think I understand this. Can you elaborate? I thought that's what I was doing. That is, in current main, when tile sizes are static we require padding when tile size doesn't divide input size. The proprosed changes here extend this to the dynamic tile case, but require padding when output size doesn't divide input size. That's consistent by my definition so obviously I'm missing something here.

@chelini
Copy link
Contributor

chelini commented Jan 8, 2024

Would it make sense to keep the padding behavior consistent between the static and dynamic behavior, having padding optional in both cases?

I dont think I understand this. Can you elaborate? I thought that's what I was doing. That is, in current main, when tile sizes are static we require padding when tile size doesn't divide input size. The proprosed changes here extend this to the dynamic tile case, but require padding when output size doesn't divide input size. That's consistent by my definition so obviously I'm missing something here.

I answered Nicolas's comment and argued that keeping the padding optional in the dynamic case would be better (unless I misunderstood his message). I see op verification this way: If padding is specified, we will pad along high dimensions to make the tile complete; the tile will divide the dimension and the IR is valid. If padding is not specified, we have two options:

  • If we have enough static information to prove that a tile does not divide perfectly a dimension, the verifier should emit an error. As far as I can see in this PR, you are going in this direction: input and output are constant, and you are erroring out because the tile cannot fully divide the dimension unless we pad.
  • If we have dynamic behaviors and it is impossible to prove that the tile perfectly divides the dimension, the verifier should not emit any error, and it is an undefined behavior at runtime if the tile does not divide unless the padding attribute is specified.

@srcarroll
Copy link
Contributor Author

@nicolasvasilache any thoughts?

Copy link
Contributor

@hanhanW hanhanW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very rare and I can't think a scenario about it... FYI that the changes about requirePaddingValue will break all the downstream projects that use the method. I'm fine with the change, but do we really have a need for this?

%0 = tensor.pack %input inner_dims_pos = [1, 0] inner_tiles = [%tile_size_0, %tile_size_1] into %output : tensor<256x128xf32> -> tensor<10x8x?x?xf32>

Comment on lines 3765 to 3770
if (!constantTile) {
if (!ShapedType::isDynamic(outputTileSizes[pos]) &&
(inputShape[pos] % outputTileSizes[pos] != 0))
return true;
} else if (inputShape[pos] % (*constantTile) != 0)
return true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nicolasvasilache
Copy link
Contributor

My argument goes this way:

  • we can remove UB by always requiring the padding value to be specified. It generally seems desirable to me to avoid needlessly introducing UB and special runtime cases.
  • in the static case, when we have enough info, we can relax this to omit the padding value iff we can statically prove what we need to.

Ultimately, this smells like the "in_bounds" attribute that LLVM has on load/store and that we also added to vector.transfer_read/write.

The situation seems close enough to me to justify a similar attribute addition.

 I'm fine with the change, but do we really have a need for this?

%0 = tensor.pack %input inner_dims_pos = [1, 0] inner_tiles = [%tile_size_0, %tile_size_1] into %output : tensor<256x128xf32> -> tensor<10x8x?x?xf32>

Not planning to use it myself right now but I can see how a transformation guaranteeing that a dynamic tile size will not result in padding seems pretty useful. We would probably be better off with an "in_bounds" style attribute for this.

FYI that the changes about requirePaddingValue will break all the downstream projects that use the method. 

This has never been, and continues to not be, a factor for upstream MLIR. If it is the right thing to do we should, if not that's ok. Arguing design positions based on breaking downstream projects is dangerous: it gives the wrong incentives as it pigeonholes evolution.

@rengolin
Copy link
Member

rengolin commented Feb 7, 2024

This has never been, and continues to not be, a factor for upstream MLIR. If it is the right thing to do we should, if not that's ok. Arguing design positions based on breaking downstream projects is dangerous: it gives the wrong incentives as it pigeonholes evolution.

Strong support for this statement.

  • we can remove UB by always requiring the padding value to be specified. It generally seems desirable to me to avoid needlessly introducing UB and special runtime cases.

I wonder what is the value of requiring an actual value at run time. IIUC, the padding is just the remainder of the division of the run time dimension by the run time tiling factor, on the last tile. Anything else would make no sense. Requiring users/lowering to add a mod op for the padding value seems redundant. Just a flag to pad to the nearest multiple on the last tile would do, probably.

If the source language "defines" undefined behaviour for run time cases (like C++), then we let it by pass. If not, then they're responsible for lowering run time checks to IR and do what they normally would do (crash, trap, raise exception). It is not a job for the IR to know what.

@srcarroll
Copy link
Contributor Author

srcarroll commented Feb 7, 2024

I'm fine with the change, but do we really have a need for this?

@hanhanW To be honest, I personally don't need it. this was just a follow up to the discussion in this PR #76003 . Which I also don't personally need.

we can remove UB by always requiring the padding value to be specified. It generally seems desirable to me to avoid needlessly introducing UB and special runtime cases.

seems reasonable to me. by the time i started touching this code there was already a precedence to detect UB, so I was just trying to finish that off

in the static case, when we have enough info, we can relax this to omit the padding value iff we can statically prove what we need to.

if i'm not mistaken, the proposed changes should cover all the cases

We would probably be better off with an "in_bounds" style attribute for this.

I'm not opposed to that, but honestly don't know how that would work yet.

I wonder what is the value of requiring an actual value at run time. IIUC, the padding is just the remainder of the division of the run time dimension by the run time tiling factor, on the last tile. Anything else would make no sense. Requiring users/lowering to add a mod op for the padding value seems redundant. Just a flag to pad to the nearest multiple on the last tile would do, probably.

you are referring to the padding size, not the padding value (the value filling the padded regions). A padding value is always required when padding is needed. and padding is needed when the the tile size doesn't divide the input.

Copy link
Contributor

@hanhanW hanhanW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the patch. Now we should cover all the cases in the verifier.

Ultimately, this smells like the "in_bounds" attribute that LLVM has on load/store and that we also added to vector.transfer_read/write.

IMO. we don't want to propagate "in_bounds" attribute to high level operations. This looks like a good stop point to me at tensor level, and we handle the case in vectorization.

@srcarroll srcarroll merged commit 9466c4e into llvm:main Feb 19, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants