[LLVMCPU] Bad packing codegen with different `outer_dims_perm` #17461

Max191 · 2024-05-21T19:47:55Z

Some tensor.pack and tensor.unpack ops become significantly slower with different outer_dims_perm values. The following gist has a bad tensor.pack and tensor.unpack case, as well as a good tensor.pack case: https://gist.github.com/Max191/a32c07b72272e74cf625cd810ae09c0a

Compile with

iree-compile packing.mlir \
  --iree-hal-target-backends=llvm-cpu \
  --iree-llvmcpu-target-cpu=znver4 \
  --iree-llvmcpu-enable-ukernels=mmt4d \
  -o /tmp/packing.vmfb

This gist also shows the difference: https://gist.github.com/Max191/2d6a74f4f7be1951ac359b6fd8db60ca
One of the benchmarks is an unpack + transpose, and the other is a pure unpack that does the same thing. There are differences in tile size selection here. These benchmarks can be compiled with:

iree-compile packing.mlir \
  --iree-hal-target-backends=llvm-cpu \
  --iree-llvmcpu-target-cpu=znver4 \
  --iree-llvmcpu-enable-ukernels=mmt4d \
  --compile-from=executable-sources \
  -o /tmp/packing.vmfb

The text was updated successfully, but these errors were encountered:

hanhanW · 2024-05-22T00:21:09Z

The pack issue is temporarily covered by ukernel. Let's focus on unpack kernel in the issue. There are transpose variants in the unpack op. We need to take it into account now. @pashu123 please take a stab at it.

  func.func @unpack_bad(%arg0: tensor<64x1828x8x16x16xf32>) -> tensor<29241x128x64xf32> {
    %cst = arith.constant 0.000000e+00 : bf16
    %4 = tensor.empty() : tensor<29241x128x64xf32>
    %unpack = tensor.unpack %arg0 outer_dims_perm = [2, 0, 1] inner_dims_pos = [0, 1] inner_tiles = [16, 16] into %4 : tensor<64x1828x8x16x16xf32> -> tensor<29241x128x64xf32>
    return %unpack : tensor<29241x128x64xf32>
  }

hanhanW · 2024-05-23T21:12:13Z

Putting a note here. I think the current plan is

Enable unpack ukernels
Learn performance gap
Plan out the work for unpack codegen.

Max191 assigned Max191 and pashu123 May 21, 2024

hanhanW added the codegen/llvm LLVM code generation compiler backend label May 21, 2024

hanhanW unassigned Max191 May 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LLVMCPU] Bad packing codegen with different `outer_dims_perm` #17461

[LLVMCPU] Bad packing codegen with different `outer_dims_perm` #17461

Max191 commented May 21, 2024

hanhanW commented May 22, 2024

hanhanW commented May 23, 2024

[LLVMCPU] Bad packing codegen with different outer_dims_perm #17461

[LLVMCPU] Bad packing codegen with different outer_dims_perm #17461

Comments

Max191 commented May 21, 2024

hanhanW commented May 22, 2024

hanhanW commented May 23, 2024

[LLVMCPU] Bad packing codegen with different `outer_dims_perm` #17461

[LLVMCPU] Bad packing codegen with different `outer_dims_perm` #17461