-
Notifications
You must be signed in to change notification settings - Fork 544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DT][CPU] Some casting ops are not fused with mmt4d kernels #15826
Comments
@Max191 @MaheshRavishankar and I had an offline discussion about the issue. If we fuse the truncf with mmt4d op and the ukernel is not supported for the type, we will pay costs for load/store before calling ukernels. We should not set encodings on all the casting ops. We should do it only for casting ops that widen bitwidth. In this context, the truncf op will be fused with its producers (i.e., generics). So we no longer see the issue that we set encodings on cast ops but not fuse them with mmt4d ops. There are two sub-tasks to address the issue:
@Max191 given that you've touched these files recently, can you help fix the issue? (FYI @bjacob ) |
I think #15760 is also related to this so I have a question: We want to extend i8 to f32 on LHS/RHS in some cases so even if we don't have the corresponding ukernels, we can still use f32 ukernels with LHS/RHS extended. My current idea is to fuse cast ops with the pack op (and potentially its producer, so |
Hey sorry that I wrote the tasks in a opposite case. I updated the comment. We don't want to set encodings on |
…ree-org#15845) This is a step towards iree-org#15826
We set encodings on CastOpInterface ops, so we can fuse in cast ops to mmt4d dispatch. However, we only fuse cast ops when they are not "group dequantization" ops. It does not happen when it is something like
arith.truncf %lhs f32 to f16
+f16.f16.f16
matmul. This results in additional dispatches. We are havingpack
dispatch,arith.truncf
dispatch andmmt4d
dispatch. We should either fix it in set_encoding or dispatch formation. In the former solution, we will havearith.truncf + pack
dispatch and mmt4d dispatch. In the latter solution, we will have[optional consumers] + pack
dispatch andarith.truncf + mmt4d
dispatch.To repro:
Run
iree-compile --output-format=vm-bytecode --iree-hal-target-backends=llvm-cpu --iree-llvmcpu-target-cpu=cascadelake --iree-llvmcpu-target-triple=x86_64-unknown-linux-gnu ~/repro.mlir -o /tmp/a.vmfb
The text was updated successfully, but these errors were encountered: