-
Notifications
You must be signed in to change notification settings - Fork 553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DT] Turns encodings into nop for all the backends that not yet support data-tiling #17719
Comments
I'm not able to create the repro, because it looks like we can handle the case at codegen level. @lialan can you help add the The goal of the issue is making everything happy when we turn off early materialization pass: iree/compiler/src/iree/compiler/GlobalOptimization/Passes.cpp Lines 38 to 46 in fe571e4
There is a separate issue besides nop pass. The issue I had is in linalg_quantized_matmul_vs_linalg_matmul.mlir. It looks like the upstream linalg shape inference drops the encodings, which is incorrect to me. @lialan can you help fix it and do the further investigations? To repro: (cc @bjacob ) |
This is the IR before and after canonicalization: https://gist.github.com/hanhanW/959cf2809098c3485ee1ebd6394e5836 Looking at Before: %6 = iree_encoding.set_encoding %0 : tensor<?x?xi8> -> tensor<?x?xi8, #iree_encoding.encoding<role = LHS, element_types = [i8, i8, i32], original_type = tensor<?x?xi8>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>>
%7 = iree_encoding.set_encoding %1 : tensor<?x?xi8> -> tensor<?x?xi8, #iree_encoding.encoding<role = RHS, element_types = [i8, i8, i32], original_type = tensor<?x?xi8>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>>
%8 = tensor.empty(%c3, %c5) : tensor<?x?xi32, #iree_encoding.encoding<role = RESULT, element_types = [i8, i8, i32], original_type = tensor<?x?xi32>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>>
%9 = linalg.fill ins(%c0_i32 : i32) outs(%8 : tensor<?x?xi32, #iree_encoding.encoding<role = RESULT, element_types = [i8, i8, i32], original_type = tensor<?x?xi32>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>>) -> tensor<?x?xi32, #iree_encoding.encoding<role = RESULT, element_types = [i8, i8, i32], original_type = tensor<?x?xi32>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>>
%10 = linalg.matmul ins(%6, %7 : tensor<?x?xi8, #iree_encoding.encoding<role = LHS, element_types = [i8, i8, i32], original_type = tensor<?x?xi8>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>>, tensor<?x?xi8, #iree_encoding.encoding<role = RHS, element_types = [i8, i8, i32], original_type = tensor<?x?xi8>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>>) outs(%9 : tensor<?x?xi32, #iree_encoding.encoding<role = RESULT, element_types = [i8, i8, i32], original_type = tensor<?x?xi32>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>>) -> tensor<?x?xi32, #iree_encoding.encoding<role = RESULT, element_types = [i8, i8, i32], original_type = tensor<?x?xi32>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>> After: %5 = iree_encoding.set_encoding %0 : tensor<?x?xi8> -> tensor<?x?xi8, #iree_encoding.encoding<role = LHS, element_types = [i8, i8, i32], original_type = tensor<?x?xi8>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>>
%6 = iree_encoding.set_encoding %1 : tensor<?x?xi8> -> tensor<?x?xi8, #iree_encoding.encoding<role = RHS, element_types = [i8, i8, i32], original_type = tensor<?x?xi8>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>>
%7 = tensor.empty() : tensor<3x5xi32, #iree_encoding.encoding<role = RESULT, element_types = [i8, i8, i32], original_type = tensor<?x?xi32>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>>
%8 = linalg.fill ins(%c0_i32 : i32) outs(%7 : tensor<3x5xi32, #iree_encoding.encoding<role = RESULT, element_types = [i8, i8, i32], original_type = tensor<?x?xi32>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>>) -> tensor<3x5xi32, #iree_encoding.encoding<role = RESULT, element_types = [i8, i8, i32], original_type = tensor<?x?xi32>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>>
%cast_2 = tensor.cast %5 : tensor<?x?xi8, #iree_encoding.encoding<role = LHS, element_types = [i8, i8, i32], original_type = tensor<?x?xi8>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>> to tensor<3x?xi8>
%cast_3 = tensor.cast %6 : tensor<?x?xi8, #iree_encoding.encoding<role = RHS, element_types = [i8, i8, i32], original_type = tensor<?x?xi8>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>> to tensor<?x5xi8>
%9 = linalg.matmul ins(%cast_2, %cast_3 : tensor<3x?xi8>, tensor<?x5xi8>) outs(%8 : tensor<3x5xi32, #iree_encoding.encoding<role = RESULT, element_types = [i8, i8, i32], original_type = tensor<?x?xi32>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>>) -> tensor<3x5xi32, #iree_encoding.encoding<role = RESULT, element_types = [i8, i8, i32], original_type = tensor<?x?xi32>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>>
|
To integrate data-tiling with multi-device and heterogeneous computing, we need to disable the early materialization pass in GlobalOptimization phase. Also, we are going to move set_encoding to the stage after dispatch formation. The early materialization pass won't work in many cases. To complete the support of data-tiling for all other backends, we add MaterializeEncodingIntoNopPass to their pipelines. This is what's happening in MaterializeHomogeneousEncodingsPass today, and we should be able to defer it to codegen for other pipelines.
iree/compiler/src/iree/compiler/GlobalOptimization/MaterializeHomogeneousEncodings.cpp
Lines 38 to 45 in ac418d1
E.g., on CPU side, it's added to
buildLLVMCPUCodegenConfigurationPassPipelineImpl
iree/compiler/src/iree/compiler/Codegen/LLVMCPU/Passes.cpp
Lines 752 to 765 in ac418d1
We can do the same for other backends. E.g., on LLVMGPU side, it'd be:
iree/compiler/src/iree/compiler/Codegen/LLVMGPU/Passes.cpp
Lines 1041 to 1051 in ac418d1
note: this also needs to be done for vmvx and spirv backends. Like mentioned in the title, this needs to be done for all the backends.
This is an incremental step to enable gpu data-tiling.
The text was updated successfully, but these errors were encountered: