Skip to content

Add Sage Attention for Lumina#22

Merged
sdbds merged 1 commit intosdbds:luminafrom
rockerBOO:sage_attn
Mar 3, 2025
Merged

Add Sage Attention for Lumina#22
sdbds merged 1 commit intosdbds:luminafrom
rockerBOO:sage_attn

Conversation

@rockerBOO
Copy link

Requires https://github.com/thu-ml/SageAttention

  • Sage Attention 2: 3.8s/it
  • Flash Attention 2: 4.33s/it

Actual numbers may be different as not a full run to estimate with.

@sdbds
Copy link
Owner

sdbds commented Mar 2, 2025

sage attention cant use in training.
I think add SpargeAttn
can get more speed.

@rockerBOO
Copy link
Author

rockerBOO commented Mar 2, 2025

Hmm good point but it does seem to be "training" the LoRA. The weights are present and seem to have actual weights. I will look a little deeper to see what is actually going on. Maybe it's just keeping the initialized weights or something.

thu-ml/SageAttention#60

SpargeAttn might require a significant "autotune" to produce a sparse attention model. thu-ml/SpargeAttn#6 I was initially exploring adding SpargeAttn but found that issue.

@rockerBOO
Copy link
Author

Ok it seems it isn't adjusting the attention parameters but the others would be updated (like MLP/feed forward). I think keeping SageAttention could be fine but noting it is not for training? Possibly some work-arounds for the backwards that were applied for flash attention might make it work in the future.

@sdbds sdbds merged commit 09c4710 into sdbds:lumina Mar 3, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants