[DT] Turns encodings into nop for all the backends that not yet support data-tiling #17719

hanhanW · 2024-06-21T17:23:28Z

To integrate data-tiling with multi-device and heterogeneous computing, we need to disable the early materialization pass in GlobalOptimization phase. Also, we are going to move set_encoding to the stage after dispatch formation. The early materialization pass won't work in many cases. To complete the support of data-tiling for all other backends, we add MaterializeEncodingIntoNopPass to their pipelines. This is what's happening in MaterializeHomogeneousEncodingsPass today, and we should be able to defer it to codegen for other pipelines.

iree/compiler/src/iree/compiler/GlobalOptimization/MaterializeHomogeneousEncodings.cpp

Lines 38 to 45 in ac418d1

    
           void runNopPipeline(ModuleOp &moduleOp) { 
        
             OpPassManager passManager(moduleOp.getOperationName()); 
        
             FunctionLikeNest(passManager).addPass(createMaterializeEncodingIntoNopPass); 
        
             FunctionLikeNest(passManager).addPass(createCanonicalizerPass); 
        
             if (failed(runPipeline(passManager, moduleOp))) { 
        
               return signalPassFailure(); 
        
             } 
        
           }

E.g., on CPU side, it's added to buildLLVMCPUCodegenConfigurationPassPipelineImpl

iree/compiler/src/iree/compiler/Codegen/LLVMCPU/Passes.cpp

Lines 752 to 765 in ac418d1

    
           void buildLLVMCPUCodegenConfigurationPassPipelineImpl( 
        
               OpPassManager &modulePassManager) { 
        
             { 
        
               FunctionLikeNest funcPassManager(modulePassManager); 
        
               addCommonTargetExecutablePreprocessingPasses(funcPassManager, 
        
                                                            clUseSoftmaxInterFusion); 
        
             } 
        
             modulePassManager.addPass(createMaterializeUserConfigsPass()); 
        
             FunctionLikeNest(modulePassManager) 
        
                 .addPass(createRematerializeParallelOpsPass) 
        
                 // TODO(#13888): This(createExpandF16OpToF32Pass()) pass is being added 
        
                 // way to late and should insted be be done during lowering to LLVM. 
        
                 .addPass(createExpandF16OpToF32Pass) 
        
                 .addPass([&]() { return createCPUMaterializeEncodingPass(); })

We can do the same for other backends. E.g., on LLVMGPU side, it'd be:

iree/compiler/src/iree/compiler/Codegen/LLVMGPU/Passes.cpp

Lines 1041 to 1051 in ac418d1

    
           static void buildLLVMGPUCodegenConfigurationPassPipelineImpl( 
        
               OpPassManager &modulePassManager) { 
        
             { 
        
               FunctionLikeNest funcPassManager(modulePassManager); 
        
               funcPassManager.addPass(createGPUGeneralizeNamedOpsPass); 
        
               addCommonTargetExecutablePreprocessingPasses(funcPassManager); 
        
             } 
        
             modulePassManager.addPass(createMaterializeUserConfigsPass()); 
        
             modulePassManager.addPass(createLLVMGPUSelectLoweringStrategyPass()); 
        
           }

note: this also needs to be done for vmvx and spirv backends. Like mentioned in the title, this needs to be done for all the backends.

This is an incremental step to enable gpu data-tiling.

The text was updated successfully, but these errors were encountered:

hanhanW · 2024-06-24T19:06:14Z

I'm not able to create the repro, because it looks like we can handle the case at codegen level. @lialan can you help add the createMaterializeEncodingIntoNopPass to all the other backends?

The goal of the issue is making everything happy when we turn off early materialization pass:

iree/compiler/src/iree/compiler/GlobalOptimization/Passes.cpp

Lines 38 to 46 in fe571e4

    
           // TODO(hanchung): Remove the flag. We don't want to do early materialization by 
        
           // default. Because it won't work for heterogeneous computing. This is not the 
        
           // right layer for handling such information. 
        
           static llvm::cl::opt<bool> clEnableEarlyMaterialization( 
        
               "iree-global-opt-enable-early-materialization", 
        
               llvm::cl::desc( 
        
                   "Enables early materialization on encodings. Note, this flag should be " 
        
                   "false eventually. This does not work for heterogeneous computing."), 
        
               llvm::cl::init(true));

There is a separate issue besides nop pass. The issue I had is in linalg_quantized_matmul_vs_linalg_matmul.mlir. It looks like the upstream linalg shape inference drops the encodings, which is incorrect to me. @lialan can you help fix it and do the further investigations?

To repro: iree-compile --output-format=vm-bytecode --iree-hal-target-backends=llvm-cpu tests/e2e/regression/linalg_quantized_matmul_vs_linalg_matmul.mlir -o /tmp/a.vmfb --iree-global-opt-enable-early-materialization=false

(cc @bjacob )

hanhanW · 2024-06-24T19:42:12Z

This is the IR before and after canonicalization: https://gist.github.com/hanhanW/959cf2809098c3485ee1ebd6394e5836 Looking at check_one_quantized_matmul_as_matmul_dynamic function, the shape inference creates tensor.cast. Because it does not take encodings into account.

Before:

    %6 = iree_encoding.set_encoding %0 : tensor<?x?xi8> -> tensor<?x?xi8, #iree_encoding.encoding<role =  LHS, element_types = [i8, i8, i32], original_type = tensor<?x?xi8>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>>
    %7 = iree_encoding.set_encoding %1 : tensor<?x?xi8> -> tensor<?x?xi8, #iree_encoding.encoding<role =  RHS, element_types = [i8, i8, i32], original_type = tensor<?x?xi8>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>>
    %8 = tensor.empty(%c3, %c5) : tensor<?x?xi32, #iree_encoding.encoding<role =  RESULT, element_types = [i8, i8, i32], original_type = tensor<?x?xi32>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>>
    %9 = linalg.fill ins(%c0_i32 : i32) outs(%8 : tensor<?x?xi32, #iree_encoding.encoding<role =  RESULT, element_types = [i8, i8, i32], original_type = tensor<?x?xi32>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>>) -> tensor<?x?xi32, #iree_encoding.encoding<role =  RESULT, element_types = [i8, i8, i32], original_type = tensor<?x?xi32>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>>
    %10 = linalg.matmul ins(%6, %7 : tensor<?x?xi8, #iree_encoding.encoding<role =  LHS, element_types = [i8, i8, i32], original_type = tensor<?x?xi8>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>>, tensor<?x?xi8, #iree_encoding.encoding<role =  RHS, element_types = [i8, i8, i32], original_type = tensor<?x?xi8>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>>) outs(%9 : tensor<?x?xi32, #iree_encoding.encoding<role =  RESULT, element_types = [i8, i8, i32], original_type = tensor<?x?xi32>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>>) -> tensor<?x?xi32, #iree_encoding.encoding<role =  RESULT, element_types = [i8, i8, i32], original_type = tensor<?x?xi32>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>>

After:

    %5 = iree_encoding.set_encoding %0 : tensor<?x?xi8> -> tensor<?x?xi8, #iree_encoding.encoding<role =  LHS, element_types = [i8, i8, i32], original_type = tensor<?x?xi8>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>>
    %6 = iree_encoding.set_encoding %1 : tensor<?x?xi8> -> tensor<?x?xi8, #iree_encoding.encoding<role =  RHS, element_types = [i8, i8, i32], original_type = tensor<?x?xi8>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>>
    %7 = tensor.empty() : tensor<3x5xi32, #iree_encoding.encoding<role =  RESULT, element_types = [i8, i8, i32], original_type = tensor<?x?xi32>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>>
    %8 = linalg.fill ins(%c0_i32 : i32) outs(%7 : tensor<3x5xi32, #iree_encoding.encoding<role =  RESULT, element_types = [i8, i8, i32], original_type = tensor<?x?xi32>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>>) -> tensor<3x5xi32, #iree_encoding.encoding<role =  RESULT, element_types = [i8, i8, i32], original_type = tensor<?x?xi32>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>>
    %cast_2 = tensor.cast %5 : tensor<?x?xi8, #iree_encoding.encoding<role =  LHS, element_types = [i8, i8, i32], original_type = tensor<?x?xi8>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>> to tensor<3x?xi8>
    %cast_3 = tensor.cast %6 : tensor<?x?xi8, #iree_encoding.encoding<role =  RHS, element_types = [i8, i8, i32], original_type = tensor<?x?xi8>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>> to tensor<?x5xi8>
    %9 = linalg.matmul ins(%cast_2, %cast_3 : tensor<3x?xi8>, tensor<?x5xi8>) outs(%8 : tensor<3x5xi32, #iree_encoding.encoding<role =  RESULT, element_types = [i8, i8, i32], original_type = tensor<?x?xi32>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>>) -> tensor<3x5xi32, #iree_encoding.encoding<role =  RESULT, element_types = [i8, i8, i32], original_type = tensor<?x?xi32>, user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>>

hanhanW added the codegen Shared code generation infrastructure and dialects label Jun 21, 2024

hanhanW self-assigned this Jun 21, 2024

hanhanW mentioned this issue Jun 21, 2024

[DT] Dependency graph of data-tiling fusion and GPU data-tiling issues #17722

Open

7 tasks

hanhanW changed the title ~~[DTv2] Turns encodings into nop for all the backends that not yet support data-tiling~~ [DT] Turns encodings into nop for all the backends that not yet support data-tiling Jun 24, 2024

hanhanW assigned lialan and unassigned hanhanW Jun 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DT] Turns encodings into nop for all the backends that not yet support data-tiling #17719

[DT] Turns encodings into nop for all the backends that not yet support data-tiling #17719

hanhanW commented Jun 21, 2024

hanhanW commented Jun 24, 2024

hanhanW commented Jun 24, 2024

[DT] Turns encodings into nop for all the backends that not yet support data-tiling #17719

[DT] Turns encodings into nop for all the backends that not yet support data-tiling #17719

Comments

hanhanW commented Jun 21, 2024

hanhanW commented Jun 24, 2024

hanhanW commented Jun 24, 2024