Re-implement FlashAttention with new Xe atoms #547

petercad · 2025-10-04T00:55:15Z

This PR updates FlashAttention to the new copy/MMA atoms.

Changes:

Prefill and decode unified into a single implementation, allowing simultaneous K and Q subgroup-level parallelization rather than an either-or.
GEMMs and softmax grouped together and the full k loop consolidated into an FMHA mainloop class.
- This will facilitate further manual pipelining/overlap of GEMM with softmax.
Use new copy/MMA atoms and reorders to transparently support arbitrary data types.
Automatic copy/MMA operator selection.

Current status: prefill/decode examples almost all working, similar/better performance to old examples.

Known issues:

Head size 192 decode config doesn't compile yet -- to be fixed.
Strange SYCL compiler behavior/bug with tSrS->tArP reorder. Apparently the compiler believes there is UB somewhere and will omit a large section of the kernel as a result. For the moment, there's a direct copy as a workaround while I pin down the issue. I'm not able to reproduce this behavior with the reorder in isolation.

Additional features (causal masking, variable sequence lengths, etc.) to be added later.

petercad · 2025-10-04T01:56:37Z

I will break up this large commit into self-contained smaller commits after review is complete.

[Umbrella commit] Re-implement FlashAttention with new Xe atoms

3917f24

petercad changed the title ~~[Umbrella commit] Re-implement FlashAttention with new Xe atoms~~ Re-implement FlashAttention with new Xe atoms Oct 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Re-implement FlashAttention with new Xe atoms #547

Re-implement FlashAttention with new Xe atoms #547

Uh oh!

petercad commented Oct 4, 2025 •

edited

Loading

Uh oh!

petercad commented Oct 4, 2025

Uh oh!

Uh oh!

Re-implement FlashAttention with new Xe atoms #547

Are you sure you want to change the base?

Re-implement FlashAttention with new Xe atoms #547

Uh oh!

Conversation

petercad commented Oct 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

petercad commented Oct 4, 2025

Uh oh!

Uh oh!

petercad commented Oct 4, 2025 •

edited

Loading