Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DT] Turns encodings into nop for all the backends that not yet support data-tiling #17719

Open
Tracked by #17722
hanhanW opened this issue Jun 21, 2024 · 2 comments
Open
Tracked by #17722
Assignees
Labels
codegen Shared code generation infrastructure and dialects

Comments

@hanhanW
Copy link
Contributor

hanhanW commented Jun 21, 2024

To integrate data-tiling with multi-device and heterogeneous computing, we need to disable the early materialization pass in GlobalOptimization phase. Also, we are going to move set_encoding to the stage after dispatch formation. The early materialization pass won't work in many cases. To complete the support of data-tiling for all other backends, we add MaterializeEncodingIntoNopPass to their pipelines. This is what's happening in MaterializeHomogeneousEncodingsPass today, and we should be able to defer it to codegen for other pipelines.

void runNopPipeline(ModuleOp &moduleOp) {
OpPassManager passManager(moduleOp.getOperationName());
FunctionLikeNest(passManager).addPass(createMaterializeEncodingIntoNopPass);
FunctionLikeNest(passManager).addPass(createCanonicalizerPass);
if (failed(runPipeline(passManager, moduleOp))) {
return signalPassFailure();
}
}

E.g., on CPU side, it's added to buildLLVMCPUCodegenConfigurationPassPipelineImpl

void buildLLVMCPUCodegenConfigurationPassPipelineImpl(
OpPassManager &modulePassManager) {
{
FunctionLikeNest funcPassManager(modulePassManager);
addCommonTargetExecutablePreprocessingPasses(funcPassManager,
clUseSoftmaxInterFusion);
}
modulePassManager.addPass(createMaterializeUserConfigsPass());
FunctionLikeNest(modulePassManager)
.addPass(createRematerializeParallelOpsPass)
// TODO(#13888): This(createExpandF16OpToF32Pass()) pass is being added
// way to late and should insted be be done during lowering to LLVM.
.addPass(createExpandF16OpToF32Pass)
.addPass([&]() { return createCPUMaterializeEncodingPass(); })

We can do the same for other backends. E.g., on LLVMGPU side, it'd be:

static void buildLLVMGPUCodegenConfigurationPassPipelineImpl(
OpPassManager &modulePassManager) {
{
FunctionLikeNest funcPassManager(modulePassManager);
funcPassManager.addPass(createGPUGeneralizeNamedOpsPass);
addCommonTargetExecutablePreprocessingPasses(funcPassManager);
}
modulePassManager.addPass(createMaterializeUserConfigsPass());
modulePassManager.addPass(createLLVMGPUSelectLoweringStrategyPass());
}

note: this also needs to be done for vmvx and spirv backends. Like mentioned in the title, this needs to be done for all the backends.

This is an incremental step to enable gpu data-tiling.

@hanhanW hanhanW added the codegen Shared code generation infrastructure and dialects label Jun 21, 2024
@hanhanW hanhanW self-assigned this Jun 21, 2024
@hanhanW hanhanW changed the title [DTv2] Turns encodings into nop for all the backends that not yet support data-tiling [DT] Turns encodings into nop for all the backends that not yet support data-tiling Jun 24, 2024
@hanhanW
Copy link
Contributor Author

hanhanW commented Jun 24, 2024

I'm not able to create the repro, because it looks like we can handle the case at codegen level. @lialan can you help add the createMaterializeEncodingIntoNopPass to all the other backends?

The goal of the issue is making everything happy when we turn off early materialization pass:

// TODO(hanchung): Remove the flag. We don't want to do early materialization by
// default. Because it won't work for heterogeneous computing. This is not the
// right layer for handling such information.
static llvm::cl::opt<bool> clEnableEarlyMaterialization(
"iree-global-opt-enable-early-materialization",
llvm::cl::desc(
"Enables early materialization on encodings. Note, this flag should be "
"false eventually. This does not work for heterogeneous computing."),
llvm::cl::init(true));

There is a separate issue besides nop pass. The issue I had is in linalg_quantized_matmul_vs_linalg_matmul.mlir. It looks like the upstream linalg shape inference drops the encodings, which is incorrect to me. @lialan can you help fix it and do the further investigations?

To repro: iree-compile --output-format=vm-bytecode --iree-hal-target-backends=llvm-cpu tests/e2e/regression/linalg_quantized_matmul_vs_linalg_matmul.mlir -o /tmp/a.vmfb --iree-global-opt-enable-early-materialization=false

(cc @bjacob )

@hanhanW hanhanW assigned lialan and unassigned hanhanW Jun 24, 2024
@hanhanW
Copy link
Contributor Author

hanhanW commented Jun 24, 2024

This is the IR before and after canonicalization: https://gist.github.com/hanhanW/959cf2809098c3485ee1ebd6394e5836 Looking at check_one_quantized_matmul_as_matmul_dynamic function, the shape inference creates tensor.cast. Because it does not take encodings into account.

Before:

    %6 = iree_encoding.set_encoding %0 : tensor<?x?xi8> -> tensor<?x?xi8, #iree_encoding.encoding<role =  LHS, element_types = [i8, i8, i32], original_type = tensor<?x?xi8>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>>
    %7 = iree_encoding.set_encoding %1 : tensor<?x?xi8> -> tensor<?x?xi8, #iree_encoding.encoding<role =  RHS, element_types = [i8, i8, i32], original_type = tensor<?x?xi8>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>>
    %8 = tensor.empty(%c3, %c5) : tensor<?x?xi32, #iree_encoding.encoding<role =  RESULT, element_types = [i8, i8, i32], original_type = tensor<?x?xi32>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>>
    %9 = linalg.fill ins(%c0_i32 : i32) outs(%8 : tensor<?x?xi32, #iree_encoding.encoding<role =  RESULT, element_types = [i8, i8, i32], original_type = tensor<?x?xi32>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>>) -> tensor<?x?xi32, #iree_encoding.encoding<role =  RESULT, element_types = [i8, i8, i32], original_type = tensor<?x?xi32>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>>
    %10 = linalg.matmul ins(%6, %7 : tensor<?x?xi8, #iree_encoding.encoding<role =  LHS, element_types = [i8, i8, i32], original_type = tensor<?x?xi8>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>>, tensor<?x?xi8, #iree_encoding.encoding<role =  RHS, element_types = [i8, i8, i32], original_type = tensor<?x?xi8>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>>) outs(%9 : tensor<?x?xi32, #iree_encoding.encoding<role =  RESULT, element_types = [i8, i8, i32], original_type = tensor<?x?xi32>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>>) -> tensor<?x?xi32, #iree_encoding.encoding<role =  RESULT, element_types = [i8, i8, i32], original_type = tensor<?x?xi32>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>>

After:

    %5 = iree_encoding.set_encoding %0 : tensor<?x?xi8> -> tensor<?x?xi8, #iree_encoding.encoding<role =  LHS, element_types = [i8, i8, i32], original_type = tensor<?x?xi8>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>>
    %6 = iree_encoding.set_encoding %1 : tensor<?x?xi8> -> tensor<?x?xi8, #iree_encoding.encoding<role =  RHS, element_types = [i8, i8, i32], original_type = tensor<?x?xi8>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>>
    %7 = tensor.empty() : tensor<3x5xi32, #iree_encoding.encoding<role =  RESULT, element_types = [i8, i8, i32], original_type = tensor<?x?xi32>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>>
    %8 = linalg.fill ins(%c0_i32 : i32) outs(%7 : tensor<3x5xi32, #iree_encoding.encoding<role =  RESULT, element_types = [i8, i8, i32], original_type = tensor<?x?xi32>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>>) -> tensor<3x5xi32, #iree_encoding.encoding<role =  RESULT, element_types = [i8, i8, i32], original_type = tensor<?x?xi32>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>>
    %cast_2 = tensor.cast %5 : tensor<?x?xi8, #iree_encoding.encoding<role =  LHS, element_types = [i8, i8, i32], original_type = tensor<?x?xi8>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>> to tensor<3x?xi8>
    %cast_3 = tensor.cast %6 : tensor<?x?xi8, #iree_encoding.encoding<role =  RHS, element_types = [i8, i8, i32], original_type = tensor<?x?xi8>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>> to tensor<?x5xi8>
    %9 = linalg.matmul ins(%cast_2, %cast_3 : tensor<3x?xi8>, tensor<?x5xi8>) outs(%8 : tensor<3x5xi32, #iree_encoding.encoding<role =  RESULT, element_types = [i8, i8, i32], original_type = tensor<?x?xi32>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>>) -> tensor<3x5xi32, #iree_encoding.encoding<role =  RESULT, element_types = [i8, i8, i32], original_type = tensor<?x?xi32>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
codegen Shared code generation infrastructure and dialects
Projects
None yet
Development

No branches or pull requests

2 participants