Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "[LinalgExt] Add online_attention op" #17658

Merged
merged 1 commit into from
Jun 12, 2024

Conversation

ScottTodd
Copy link
Member

Reverts #17536

This caused sdxl-scheduled-unet-3-tank to hit timeouts when compiling for cpu: https://github.com/iree-org/iree/actions/runs/9484305572/job/26134004282

@ScottTodd
Copy link
Member Author

test_models :: cpu_llvm_task passed, bazel build flaked, and other builds are still pending. Going to merge without waiting for the other checks.

@ScottTodd ScottTodd marked this pull request as ready for review June 12, 2024 23:49
@ScottTodd ScottTodd merged commit 2ff4102 into main Jun 12, 2024
52 of 53 checks passed
@ScottTodd ScottTodd deleted the revert-17536-new-decomposition-attention branch June 12, 2024 23:49
@MaheshRavishankar
Copy link
Contributor

Cc @Groverkss

monorimet added a commit that referenced this pull request Jun 14, 2024
Groverkss added a commit that referenced this pull request Jun 17, 2024
This patch adds a new online_attention op. This op represents a
partially reduced attention op which can be tiled along it's k2
reduction dimension. This op also has indexing maps, supports tiling on
all dimensions other than k1 dimension, and can decompose based on any
given indexing maps.

This patch also makes the CPU backend use online attention to decompose
and tile reduction dimension, allowing it to be tiled along N and batch
dimensions, and tiling using LLVMCPUTile.

This is a reland of #17658 , with
more conservative tile size selection to not unroll too much.
LLITCHEV pushed a commit to LLITCHEV/iree that referenced this pull request Jul 30, 2024
Reverts iree-org#17536

This caused `sdxl-scheduled-unet-3-tank` to hit timeouts when compiling
for cpu:
https://github.com/iree-org/iree/actions/runs/9484305572/job/26134004282

Signed-off-by: Lubo Litchev <lubol@google.com>
LLITCHEV pushed a commit to LLITCHEV/iree that referenced this pull request Jul 30, 2024
This patch adds a new online_attention op. This op represents a
partially reduced attention op which can be tiled along it's k2
reduction dimension. This op also has indexing maps, supports tiling on
all dimensions other than k1 dimension, and can decompose based on any
given indexing maps.

This patch also makes the CPU backend use online attention to decompose
and tile reduction dimension, allowing it to be tiled along N and batch
dimensions, and tiling using LLVMCPUTile.

This is a reland of iree-org#17658 , with
more conservative tile size selection to not unroll too much.

Signed-off-by: Lubo Litchev <lubol@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants