-
Notifications
You must be signed in to change notification settings - Fork 543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error when vectorising linalg.generic #11779
Comments
There is a bug that hopefully I fixed here: https://reviews.llvm.org/D141413 Basically Could you please provide a small test that we can use for the patch? I tried to simplify your example but I couldn't make it fail. |
Much reduced module {
func.func @pipeline_dispatch_1_generic_1080x1920() {
%c1 = arith.constant 1 : index
%c4 = arith.constant 4 : index
%c120 = arith.constant 120 : index
%c64 = arith.constant 64 : index
%c1080 = arith.constant 1080 : index
%c1920 = arith.constant 1920 : index
%c0 = arith.constant 0 : index
%cst_6 = arith.constant 4.000000e+00 : f32
%1 = hal.interface.binding.subspan set(0) binding(1) type(storage_buffer) offset(%c0) alignment(64) : !flow.dispatch.tensor<writeonly:tensor<1080x1920xf32>>
%workgroup_id_x = hal.interface.workgroup.id[0] : index
%workgroup_count_x = hal.interface.workgroup.count[0] : index
%workgroup_id_y = hal.interface.workgroup.id[1] : index
%workgroup_count_y = hal.interface.workgroup.count[1] : index
%c120_7 = arith.constant 120 : index
%3 = arith.muli %workgroup_id_y, %c120_7 : index
%c120_8 = arith.constant 120 : index
%4 = arith.muli %workgroup_count_y, %c120_8 : index
%c64_9 = arith.constant 64 : index
%5 = arith.muli %workgroup_id_x, %c64_9 : index
%c64_10 = arith.constant 64 : index
%6 = arith.muli %workgroup_count_x, %c64_10 : index
scf.for %arg0 = %3 to %c1080 step %4 {
scf.for %arg1 = %5 to %c1920 step %6 {
%7 = flow.dispatch.tensor.load %1, offsets = [%arg0, %arg1], sizes = [120, 64], strides = [1, 1] : !flow.dispatch.tensor<writeonly:tensor<1080x1920xf32>> -> tensor<120x64xf32>
%8 = scf.for %arg2 = %c0 to %c120 step %c1 iter_args(%arg3 = %7) -> (tensor<120x64xf32>) {
%9 = scf.for %arg4 = %c0 to %c64 step %c4 iter_args(%arg5 = %arg3) -> (tensor<120x64xf32>) {
%extracted_slice = tensor.extract_slice %arg5[%c0, %arg4] [1, 4] [1, 1] : tensor<120x64xf32> to tensor<1x4xf32>
%10 = linalg.fill {__internal_linalg_transform__ = "1"} ins(%cst_6 : f32) outs(%extracted_slice : tensor<1x4xf32>) -> tensor<1x4xf32>
%11 = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>], iterator_types = ["parallel", "parallel"]} outs(%10 : tensor<1x4xf32>) {
^bb0(%out: f32):
%12 = linalg.index 0 : index
%13 = arith.addi %arg4, %12 : index
%18 = arith.index_cast %13 : index to i32
%20 = arith.uitofp %18 : i32 to f32
%67 = arith.mulf %out, %20 : f32
linalg.yield %67 : f32
} -> tensor<1x4xf32>
%inserted_slice = tensor.insert_slice %11 into %arg5[%c0, %arg4] [1, 4] [1, 1] : tensor<1x4xf32> into tensor<120x64xf32>
scf.yield %inserted_slice : tensor<120x64xf32>
}
scf.yield %9 : tensor<120x64xf32>
}
flow.dispatch.tensor.store %8, %1, offsets = [%arg0, %arg1], sizes = [120, 64], strides = [1, 1] : tensor<120x64xf32> -> !flow.dispatch.tensor<writeonly:tensor<1080x1920xf32>>
}
}
return
}
transform.sequence failures(propagate) {
^bb0(%arg0: !pdl.operation):
%0 = transform.structured.match ops{["linalg.generic"]} in %arg0
%1 = get_closest_isolated_parent %0 : (!pdl.operation) -> !pdl.operation
%2 = transform.structured.vectorize %1
}
} |
When detecting reductions, make sure the block argument is from the linalg generic op. This fixes iree-org/iree#11779. Co-authored-by: Andrzej Warzynski <andrzej.warzynski@arm.com> Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D141413
Hi 馃憢馃徎 !
Here's the assertion that I am hitting:
It originates from reduceIfNeeded.
INPUT MLIR:
TO REPRODUCE
CONTEXT
The MLIR above was generated from the following TOSA snippet. It's a "trimmed" dump just before the
LinalgStrategyVectorizePass
. I only really removedtensor.extract
ops, which are currently not vectorised (and for this crash to happen, I had to make sure that mylinalg.generic
operator is being vectorised).OBSERVATIONS
I can see that the Linalg vectorizer looks for reductions, identifies one and then things go wrong. It's not clear to me why it thinks that there's a reduction there?
Without
tosa.mul
in my original TOSA example, the vectorizer no longer thinks that there are reductions and everything goes fine.Apologies for the lengthy reproducer. It took me a while to get here, and I wanted to share this before doing more investigation.
Happy to reduce more. Your pointers are much appreciated :)
-Andrzej
The text was updated successfully, but these errors were encountered: