Skip to content

Conversation

@YangKai0616
Copy link
Contributor

This PR:

  1. Refactored the code to improve performance.
  2. Added support for PagedKV functionality.
  3. Test results in transformers' UTs are consistent with CUDA.

I have successfully built this PR locally using nix.

@YangKai0616
Copy link
Contributor Author

Hi @drbh @danieldk , please help review this PR.
Once the binary files of this PR is uploaded, XPU can efficiently use flash_attention_2 in transformers. Thanks!

Copy link
Member

@danieldk danieldk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Running CI and merging through #66.

@danieldk
Copy link
Member

Merged in #66.

@danieldk danieldk closed this Nov 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants