[GPU][DT] Materialize encodings for GPU targets #17185

hanhanW · 2024-04-25T23:47:07Z

IMO, the pass would look at IREE::GPU::MmaInterfaceAttr attribute to enumerate tile sizes and set up encoding configs for each role. It gives us the set of supported intrinsics. E.g.,

iree/compiler/src/iree/compiler/Codegen/LLVMGPU/KernelConfig.cpp

Lines 389 to 399 in 1fcb89d

    
           auto mmaAttrs = 
        
               llvm::to_vector(mmaKinds->getAsRange<IREE::GPU::MmaInterfaceAttr>()); 
        
           SmallVector<GPUMatmulShapeType> intrinsics; 
        
           intrinsics.reserve(mmaKinds->size()); 
        
           for (auto mma : mmaAttrs) { 
        
             auto [mSize, nSize, kSize] = mma.getMNKShape(); 
        
             auto [aType, bType, cType] = mma.getABCElementTypes(); 
        
             if (mma.getSubgroupSize() != targetSubgroupSize) 
        
               continue; 
        
             intrinsics.emplace_back(mSize, nSize, kSize, aType, bType, cType); 
        
           }

I think as an initial step, we can set inner tile sizes to intrinsic sizes and increase them (for better unrolling) later. As discussed today, let's use linalg.generic to represent the mmt4d-like op for now.

The text was updated successfully, but these errors were encountered:

hanhanW · 2024-05-02T00:35:57Z

Putting an additional resource here before I forget it. We can review it together when we're discussing details.

Here is an upstream method which has very helpful logics. Ideally, we should refactor the method, and create a new method to return the packed generic op. We can't use the method directly in the materialization pass because it also generates pack/unpack ops. What we need is getting the linalg op in the materialization pattern; replace the contraction op with it.

https://github.com/llvm/llvm-project/blob/4b75fcf0a50f4be955b611e8e20d84d90ea133c8/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h#L1120-L1130

hanhanW · 2024-06-21T17:31:16Z

Use the issue because it already has some context. E.g., the upstream method that we can use in the work. In the recent LLVMGPUTileAndFuse pipeline (pipeline_test), we already generates some pack/unpack ops in matmul codegen. This is what we want to do in GPU data-tiling.

The goal now is moving those packs to the materialization of set/unset encodings.

hanhanW assigned bjacob Apr 25, 2024

hanhanW mentioned this issue Apr 25, 2024

[EPIC][GPU][DT] Bring up GPU data-tiling with reasonable performance #17181

Open

10 tasks

bjacob mentioned this issue Jun 7, 2024

Dependency graph of data-tiling issues #17608

Open

13 tasks

hanhanW added the codegen/hip ROCm code generation compiler backend label Jun 21, 2024

hanhanW mentioned this issue Jun 21, 2024

[DT] Dependency graph of data-tiling fusion and GPU data-tiling issues #17722

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GPU][DT] Materialize encodings for GPU targets #17185

[GPU][DT] Materialize encodings for GPU targets #17185

hanhanW commented Apr 25, 2024

hanhanW commented May 2, 2024

hanhanW commented Jun 21, 2024 •

edited

Loading

[GPU][DT] Materialize encodings for GPU targets #17185

[GPU][DT] Materialize encodings for GPU targets #17185

Comments

hanhanW commented Apr 25, 2024

hanhanW commented May 2, 2024

hanhanW commented Jun 21, 2024 • edited Loading

hanhanW commented Jun 21, 2024 •

edited

Loading