New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Draft] PR to track changes in SDXL. #16854
base: main
Are you sure you want to change the base?
Conversation
Insert slices always fold into the `flow.dispatch.tensor.store` ops and can be fused with all producers.
Enables certain transpose fusions. Handles this case by swapping the operands of the contraction and transposing the result. Does not change any default behavior for SDXL because this path is not yet exercised.
…16748) This allows converting convolutions with >1 batch size and 1x1 filter to matmuls.
Repurpose the workgroup swizzling pass to do more general workgroup reordering. Add filter function to run in mma pipelines only. Do not use workgroup counts from the runtime as these don't currently work on rocm.
Co-authored-by: MaheshRavishankar <mahesh@nod-labs.com> Co-authored-by: MaheshRavishankar <1663364+MaheshRavishankar@users.noreply.github.com>
This yields the same performance on the full model but has lower overhead on isolated microbenchmarks.
It excludes the tiling from first level of tiling; promotes images, and tiles the filter.
This adds a winograd pipeline for LLVMGPU. The `--iree-codegen-winograd-use-forall` flag is needed to get distribution on input and output transforms. --------- Co-authored-by: harsh <harsh@nod-labs.com>
Additionally adds a flag to control promotion of the filter. Co-authored-by: MaheshRavishankar <mahesh@nod-labs.com>
Consider batch size in the heuristic. This is so that we do not create allocas. Co-authored-by: Jakub Kuderski <jkudersk@amd.com>
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
CLA check is complaining about a commit from @kuhar , fyi: https://github.com/openxla/iree/pull/16854/checks?check_run_id=22897648244 (check which email address you use as your default / to commit changes) |
This allows for importing all or some parameters from a parameter file into the compiler. Currently only one import can be specified with the flag but we could extend that to multiple in the future by following the same flag conventions as the runtime tooling. If a scope is provided only parameters with that scope will be imported. If a parameter (optionally within a scope) is named explicitly it will be imported. If a maximum size is specified all parameters <= that size will be imported. This also renams the export flags, as they are inconsistent. --------- Co-authored-by: Ben Vanik <ben.vanik@gmail.com>
Thanks for the heads up, I have to move across machines frequently and this was a one-off. Feel free to update it before landing on main. My main commit email is jakub@nod-labs.com. |
We are observing that it is always better to promote the filter, so turn it on by default.
Not planning to land this. Just a place to see all the commits in the branch w.r.t main |
The smallest bounding box inference could fail while the op is really bounded by tile sizes. We provide an option for smallest bounding values as fallback: shark-infra/llvm-project@55ff42c To enable the new pipeline, we add `--iree-codegen-llvmgpu-use-vector-distribution` to the `iree-compile` tool. Full IR dump: https://gist.github.com/hanhanW/d9ee3111c5f86b0e7ad7ebdac46fe7c9 --------- Co-authored-by: Kunwar Grover <groverkss@gmail.com>
No description provided.