[CPU] Extra buffer created for tensor.unpack
when second dim is not distributed
#16868
Labels
codegen/llvm
LLVM code generation compiler backend
codegen
Shared code generation infrastructure and dialects
I'm hitting a compilation error when changing the way we distribute
tensor.pack
. It looks like if we only distribute one of the dims (e.g.,[32, 0]
distribution tile sizes), we end up generating an extra buffer and compilation crashes due to stack allocation.I managed to reduce the issue to the following code before
ConvertToDestinationPassingStyle
:and this is the code after
ConvertToDestinationPassingStyle
:It looks like we generate a buffer allocation (
%45 = bufferization.alloc_tensor(%39, %42) : tensor<?x?xf32>
) that leads to the extra stack allocation at the end of the pipeline.CC: @hanhanW, @MaheshRavishankar
The text was updated successfully, but these errors were encountered: