[LLVMCPU] Bad packing codegen with different outer_dims_perm
#17461
Labels
codegen/llvm
LLVM code generation compiler backend
outer_dims_perm
#17461
Some
tensor.pack
andtensor.unpack
ops become significantly slower with differentouter_dims_perm
values. The following gist has a badtensor.pack
andtensor.unpack
case, as well as a goodtensor.pack
case: https://gist.github.com/Max191/a32c07b72272e74cf625cd810ae09c0aCompile with
This gist also shows the difference: https://gist.github.com/Max191/2d6a74f4f7be1951ac359b6fd8db60ca
One of the benchmarks is an unpack + transpose, and the other is a pure unpack that does the same thing. There are differences in tile size selection here. These benchmarks can be compiled with:
The text was updated successfully, but these errors were encountered: