-
Notifications
You must be signed in to change notification settings - Fork 39
Adds comprehensive documentation with logo and Chinese translation #89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -1,9 +1,25 @@ | ||||||
| # Flash Dynamic Mask Attention | ||||||
| <div align="center"> | ||||||
| <img src="./assets/logo.png" alt="SmallDoges" width="100%"> | ||||||
| </div> | ||||||
|
|
||||||
| <div align="center"> | ||||||
|
|
||||||
|
|
||||||
| **English** | [简体中文](./README_zh.md) | ||||||
|
|
||||||
| </div> | ||||||
|
|
||||||
| **Trainable Dynamic Mask Sparse Attention** | ||||||
|
|
||||||
| > Jingze Shi, Yifan Wu, Bingheng Wu, Yiran Peng, Liangdong Wang, Guang Liu, Yuyu Luo | ||||||
|
|
||||||
| > Paper: https://huggingface.co/papers/2508.02124 | ||||||
|
|
||||||
|  | ||||||
|
|
||||||
| Flash-DMA is a high-performance attention implementation that integrates Flash Attention's memory efficiency with Dynamic Mask Attention's sparse computation capabilities for processing extremely long sequences in transformer models. | ||||||
|
|
||||||
|
|
||||||
| ## Key Features | ||||||
|
|
||||||
| - **Sparse Attention Computation**: Dynamically selects the most important keys for each query, reducing computation from $O(N^2)$ to $O(N \cdot w)$ where $w \ll N$. | ||||||
|
|
@@ -12,6 +28,14 @@ Flash-DMA is a high-performance attention implementation that integrates Flash A | |||||
| - **Long Sequence Support**: Efficiently handles sequences of 128K+ tokens through dynamic masking when sequence length exceeds `keep_window_size`. | ||||||
| - **Advanced Integration**: Complete integration from Python frontend to CUDA backend with optimized memory layouts and sparse computation strategies. | ||||||
|
|
||||||
|
|
||||||
| ## Performance | ||||||
|
|
||||||
| We present expected speedup of Flash-DMA over standard PyTorch SDPA. | ||||||
|
|
||||||
|  | ||||||
|
||||||
|  | |
| <!--  --> |
| Original file line number | Diff line number | Diff line change | ||
|---|---|---|---|---|
| @@ -0,0 +1,282 @@ | ||||
| <div align="center"> | ||||
| <img src="./assets/logo.png" alt="SmallDoges" width="100%"> | ||||
|
||||
| <img src="./assets/logo.png" alt="SmallDoges" width="100%"> |
Copilot
AI
Aug 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The image reference uses a relative path that may not exist. Consider verifying that the assets/flash_dmattn_banner.png file exists in the repository before adding this reference.
|  |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The image reference uses a relative path that may not exist. Consider verifying that the
assets/logo.pngfile exists in the repository before adding this reference.