-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature request] Support attention logits cap with tanh #257
Comments
Sounds good, should be easy to support. |
Is there is formal name with this "Attention with Logits Cap" method? |
there is no formal name. maybe just call it "logit cap" |
@yzh119 Any progress on this issue? |
Done in #298 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The grok model uses tanh to cap the attention logits. Could you support this feature in flashinfer? If you need community help, any instructions on how to add this will be appreciated.
Grok (jax):
https://github.com/xai-org/grok-1/blob/7050ed204b8206bb8645c7b7bbef7252f79561b0/model.py#L864-L865
SGLang implementation (triton):
https://github.com/sgl-project/sglang/blob/2cea6146d8735780da602c0dfa0569b0fb5d47ba/python/sglang/srt/layers/extend_attention.py#L101-L102
The text was updated successfully, but these errors were encountered: