Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finish LinalgExt operation support on all backends #16886

Open
MaheshRavishankar opened this issue Mar 22, 2024 · 0 comments
Open

Finish LinalgExt operation support on all backends #16886

MaheshRavishankar opened this issue Mar 22, 2024 · 0 comments

Comments

@MaheshRavishankar
Copy link
Contributor

One of the issues faced during SDXL support (#16854) was the missing support for operations added in LinalgExt on all codegen backends i.e, CPU, SPIRV and LLVMGPU.

Main Issues

  1. iree_linalg_ext.attention https://github.com/openxla/iree/blob/2cdf1452bb2f877baf8723ab567363094bea10bd/compiler/src/iree/compiler/Dialect/LinalgExt/IR/LinalgExtOps.td#L514
    The main issue here was that the TileAndDecomposeAttentionPass is not really tested on any end-to-end compilation path. An efficient compilation of this op was built up using transform dialect script that was custom tuned for a single architecture. So it was hard to test models that had these operations on any other hardware.
  2. iree_linalg_ext.winograd.input_transform
    https://github.com/openxla/iree/blob/2cdf1452bb2f877baf8723ab567363094bea10bd/compiler/src/iree/compiler/Dialect/LinalgExt/IR/LinalgExtOps.td#L1043
    This operation was working on SPIR-V backend and CPU backend, but not on the LLVMGPU backend. Again this wasnt tested end-to-end on all backends, but it was somewhat tested on CPU and SPIR-V backends (https://github.com/openxla/iree/blob/main/tests/e2e/linalg_ext_ops/winograd_input.mlir . So it was relatively easy to get working on LLVMGPU backend
  3. iree_linalg_ext.winograd.filter_transform
    This operation actually does not exist. The filter transform for winograd was implemented by constant folding the weights and constant filters. To support this the filters for the convolution needed to be converted from resources to inline constants and were evaluated (very slowly) at compile time.
  4. iree_linalg_ext.winograd.output_transform
    This operation was working on SPIR-V backend and CPU backend, but not on the LLVMGPU backend. Again this wasnt tested end-to-end on all backends, but it was somewhat tested on CPU and SPIR-V backends (https://github.com/openxla/iree/blob/main/tests/e2e/linalg_ext_ops/winograd_output.mlir . So it was relatively easy to get working on LLVMGPU backend

Covered commits

Immediate next steps

  1. Make iree_linalg_ext.attention work on all backends (at least CPU and LLVMGPU backend) and have them tested in CI. They should be relatively functional on different architectures, which will make them robust and easily portable.
    • More in-tree end-to-end tests are needed to ensure op support. Even the modest testing of iree_linalg_ext.winograd.input_transform and iree_linalg_ext.winograd.output_transform on CPU and SPIR-V backend made it easy to port to LLVMGPU backend
    • The TileAndDecomposeAttentionPass needs to be fixed. This might require re-evaluating the pass implementation to use the PartialReductionTilingOpInterface
  2. Adding an iree_linalg_ext.filter_transform operation to LinalgExt dialect.
    • Also need to make sure that the const_eval framework in IREE can pick up and fold away these operations.
  3. Add more testing for the iree_linalg_ext.winograd.input_transform and iree_linalg_ext.winograd.output_transform ops by themselves as well as adding tests that convert a convolution into winograd and check that they work as a whole.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant