-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MLIR][tensor] Improve tensor.pack
verifier to catch more cases with unconditional runtime errors
#77217
Conversation
278fc0a
to
52cc3fa
Compare
52cc3fa
to
e016ccb
Compare
@llvm/pr-subscribers-mlir-linalg @llvm/pr-subscribers-mlir-tensor Author: None (srcarroll) ChangesPreviously, the Full diff: https://github.com/llvm/llvm-project/pull/77217.diff 4 Files Affected:
diff --git a/mlir/include/mlir/Dialect/Tensor/IR/TensorOps.td b/mlir/include/mlir/Dialect/Tensor/IR/TensorOps.td
index eb0c79c01bee1a..1c61ece2676a90 100644
--- a/mlir/include/mlir/Dialect/Tensor/IR/TensorOps.td
+++ b/mlir/include/mlir/Dialect/Tensor/IR/TensorOps.td
@@ -1943,11 +1943,12 @@ def Tensor_PackOp : Tensor_RelayoutOp<"pack", [
// Returns true if we have enough static information to catch undefined
// behavior when the tile size does not divide perfectly the dimension of
- // the input tensor. If a given dimension or a tile associated with it is
- // dynamic, the dimension is not considered as we don't have enough static
- // information to understand if the tile perfectly divides that dimension.
+ // the input tensor. Detecting UB requires that the input size and either
+ // corresponding tile or output size are static.
static bool requirePaddingValue(ArrayRef<int64_t> inputShape,
ArrayRef<int64_t> innerDimsPos,
+ ArrayRef<int64_t> outputShape,
+ ArrayRef<int64_t> outerDimsPerm,
ArrayRef<OpFoldResult> innerTiles);
static Value createDestinationTensor(OpBuilder &b, Location loc,
diff --git a/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp b/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
index 9d230e2c2e5749..c7fed41d234fd0 100644
--- a/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
+++ b/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
@@ -582,8 +582,10 @@ FailureOr<PackResult> linalg::pack(RewriterBase &rewriter,
return getConstantIntValue(tile).has_value();
});
if (areConstantTiles && operandType.hasStaticShape() &&
- !tensor::PackOp::requirePaddingValue(operandType.getShape(), innerPos,
- innerPackSizes)) {
+ !tensor::PackOp::requirePaddingValue(
+ operandType.getShape(), innerPos,
+ dest.getType().cast<ShapedType>().getShape(), {},
+ innerPackSizes)) {
packOps.push_back(rewriter.create<tensor::PackOp>(
loc, operand, dest, innerPos, innerPackSizes));
} else {
diff --git a/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp b/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
index 816e6ba8fed94e..4318e55fd213e3 100644
--- a/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
+++ b/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
@@ -3742,14 +3742,27 @@ SmallVector<int64_t> PackOp::getStaticTiles() {
bool PackOp::requirePaddingValue(ArrayRef<int64_t> inputShape,
ArrayRef<int64_t> innerDimsPos,
+ ArrayRef<int64_t> outputShape,
+ ArrayRef<int64_t> outerDimsPerm,
ArrayRef<OpFoldResult> innerTiles) {
+ SmallVector<int64_t> outputTileSizes(
+ outputShape.take_front(inputShape.size()));
+ if (!outerDimsPerm.empty()) {
+ assert(outerDimsPerm.size() == outputTileSizes.size() &&
+ "expected output and outer_dims_perm to have same size");
+ applyPermutationToVector(outputTileSizes,
+ invertPermutationVector(outerDimsPerm));
+ }
for (auto [pos, tileSize] : llvm::zip_equal(innerDimsPos, innerTiles)) {
if (ShapedType::isDynamic(inputShape[pos]))
continue;
std::optional<int64_t> constantTile = getConstantIntValue(tileSize);
- if (!constantTile)
- continue;
- if (inputShape[pos] % (*constantTile) != 0)
+
+ if (!constantTile) {
+ if (!ShapedType::isDynamic(outputTileSizes[pos]) &&
+ (inputShape[pos] % outputTileSizes[pos] != 0))
+ return true;
+ } else if (inputShape[pos] % (*constantTile) != 0)
return true;
}
return false;
@@ -3772,9 +3785,11 @@ LogicalResult PackOp::verify() {
if (!paddingValue &&
requirePaddingValue(getSourceType().getShape(), getInnerDimsPos(),
+ getDestType().getShape(), getOuterDimsPerm(),
getMixedTiles())) {
- return emitOpError("invalid tile factor provided. Only full tiles are "
- "supported when padding_value is not set");
+ return emitOpError(
+ "invalid tile factor or output size provided. Only full tiles are "
+ "supported when padding_value is not set");
}
return success();
}
@@ -3975,8 +3990,9 @@ static bool paddingIsNotNeeded(PackOp op) {
return false;
if (ShapedType::isDynamicShape(op.getStaticInnerTiles()))
return false;
- return !PackOp::requirePaddingValue(srcType.getShape(), op.getInnerDimsPos(),
- op.getMixedTiles());
+ return !PackOp::requirePaddingValue(
+ srcType.getShape(), op.getInnerDimsPos(), op.getDestType().getShape(),
+ op.getOuterDimsPerm(), op.getMixedTiles());
}
LogicalResult PackOp::canonicalize(PackOp packOp, PatternRewriter &rewriter) {
diff --git a/mlir/test/Dialect/Tensor/invalid.mlir b/mlir/test/Dialect/Tensor/invalid.mlir
index 735e5146e9dbc8..0eb8672d4d41b3 100644
--- a/mlir/test/Dialect/Tensor/invalid.mlir
+++ b/mlir/test/Dialect/Tensor/invalid.mlir
@@ -597,13 +597,21 @@ func.func @empty_wrong_number_of_operands(%sz : index) {
// -----
func.func @pack_invalid_no_padding_no_full_tiles(%input: tensor<256x128xf32>, %output: tensor<8x8x16x33xf32>) -> tensor<8x8x16x33xf32> {
- // expected-error@+1 {{invalid tile factor provided. Only full tiles are supported when padding_value is not set}}
+ // expected-error@+1 {{invalid tile factor or output size provided. Only full tiles are supported when padding_value is not set}}
%0 = tensor.pack %input inner_dims_pos = [1, 0] inner_tiles = [16, 33] into %output : tensor<256x128xf32> -> tensor<8x8x16x33xf32>
return %0 : tensor<8x8x16x33xf32>
}
// -----
+func.func @pack_invalid_no_padding_no_full_tiles_dyn_tiles(%input: tensor<256x128xf32>, %output: tensor<10x8x?x?xf32>, %tile_size_0: index, %tile_size_1: index) -> tensor<10x8x?x?xf32> {
+ // expected-error@+1 {{invalid tile factor or output size provided. Only full tiles are supported when padding_value is not set}}
+ %0 = tensor.pack %input inner_dims_pos = [1, 0] inner_tiles = [%tile_size_0, %tile_size_1] into %output : tensor<256x128xf32> -> tensor<10x8x?x?xf32>
+ return %0 : tensor<10x8x?x?xf32>
+}
+
+// -----
+
func.func @pad_and_pack_invalid_type(%input: tensor<13x15xf32>, %output: tensor<2x8x8x2xf32>, %pad: i32) -> tensor<2x8x8x2xf32> {
// expected-error@+1 {{expected padding_value has 'f32' but got: 'i32'}}
%0 = tensor.pack %input padding_value(%pad: i32) inner_dims_pos = [0, 1] inner_tiles = [8, 2] into %output : tensor<13x15xf32> -> tensor<2x8x8x2xf32>
|
tensor.pack
verifier to catch more UB casestensor.pack
verifier to catch more UB cases
I'm confused to see "verifier" and "UB" in the same sentence: these are non-overlapping concepts. If the verifier defines an IR to be invalid, it can't be UB. Only valid IR can be UB since UB is a runtime concept. |
Good point. Sorry I tend to abuse terminology sometimes. Also the current code uses this phrasing. I'll try to come up with a better description, but any thoughts? Basically the verifier is catching cases that would be unconditionally UB at runtime. |
tensor.pack
verifier to catch more UB casestensor.pack
verifier to catch more cases with unconditional runtime errors
I would just drop any notion of UB for the static case: it is invalid for an op to be a partial tile and to not provide a padding value. In the dynamic case, we could always require a padding value (if feasible, I forget). |
This is fine, just to keep in mind that canonicalization/folder must include the same checks as the verifier when trying to constant fold the dynamic case into the static one. |
A padding value isn't required if the corresponding output dim is static and divides the the corresponding static input dim. That's the whole point of this PR. Unless I'm mistaken about something. |
Would it make sense to keep the padding behavior consistent between the static and dynamic behavior, having padding optional in both cases? |
+1 on keeping the padding behavior consistent between static and dynamic cases. It is valid to me. I don't have a specific use case, but I think user can allocate big enough buffer based on inner_tile_sizes. |
I dont think I understand this. Can you elaborate? I thought that's what I was doing. That is, in current |
I answered Nicolas's comment and argued that keeping the padding optional in the dynamic case would be better (unless I misunderstood his message). I see op verification this way: If padding is specified, we will pad along high dimensions to make the tile complete; the tile will divide the dimension and the IR is valid. If padding is not specified, we have two options:
|
@nicolasvasilache any thoughts? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very rare and I can't think a scenario about it... FYI that the changes about requirePaddingValue
will break all the downstream projects that use the method. I'm fine with the change, but do we really have a need for this?
%0 = tensor.pack %input inner_dims_pos = [1, 0] inner_tiles = [%tile_size_0, %tile_size_1] into %output : tensor<256x128xf32> -> tensor<10x8x?x?xf32>
if (!constantTile) { | ||
if (!ShapedType::isDynamic(outputTileSizes[pos]) && | ||
(inputShape[pos] % outputTileSizes[pos] != 0)) | ||
return true; | ||
} else if (inputShape[pos] % (*constantTile) != 0) | ||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use braces for else if
, which keeps it uniforms with if
block.
My argument goes this way:
Ultimately, this smells like the "in_bounds" attribute that LLVM has on load/store and that we also added to vector.transfer_read/write. The situation seems close enough to me to justify a similar attribute addition.
Not planning to use it myself right now but I can see how a transformation guaranteeing that a dynamic tile size will not result in padding seems pretty useful. We would probably be better off with an "in_bounds" style attribute for this.
This has never been, and continues to not be, a factor for upstream MLIR. If it is the right thing to do we should, if not that's ok. Arguing design positions based on breaking downstream projects is dangerous: it gives the wrong incentives as it pigeonholes evolution. |
Strong support for this statement.
I wonder what is the value of requiring an actual value at run time. IIUC, the padding is just the remainder of the division of the run time dimension by the run time tiling factor, on the last tile. Anything else would make no sense. Requiring users/lowering to add a If the source language "defines" undefined behaviour for run time cases (like C++), then we let it by pass. If not, then they're responsible for lowering run time checks to IR and do what they normally would do (crash, trap, raise exception). It is not a job for the IR to know what. |
@hanhanW To be honest, I personally don't need it. this was just a follow up to the discussion in this PR #76003 . Which I also don't personally need.
seems reasonable to me. by the time i started touching this code there was already a precedence to detect UB, so I was just trying to finish that off
if i'm not mistaken, the proposed changes should cover all the cases
I'm not opposed to that, but honestly don't know how that would work yet.
you are referring to the padding size, not the padding value (the value filling the padded regions). A padding value is always required when padding is needed. and padding is needed when the the tile size doesn't divide the input. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the patch. Now we should cover all the cases in the verifier.
Ultimately, this smells like the "in_bounds" attribute that LLVM has on load/store and that we also added to vector.transfer_read/write.
IMO. we don't want to propagate "in_bounds" attribute to high level operations. This looks like a good stop point to me at tensor level, and we handle the case in vectorization.
Previously, the
tensor.pack
verifier detects unconditional runtime errors only when tile sizes are static. Now, dynamic tiles are considered and we only require that the input and either corresponding tile or output size are static to determine if it will unconditionally produce errors at runtime.