Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Support attention logits cap with tanh #257

Closed
merrymercy opened this issue May 24, 2024 · 5 comments
Closed

[Feature request] Support attention logits cap with tanh #257

merrymercy opened this issue May 24, 2024 · 5 comments
Assignees

Comments

@merrymercy
Copy link

merrymercy commented May 24, 2024

The grok model uses tanh to cap the attention logits. Could you support this feature in flashinfer? If you need community help, any instructions on how to add this will be appreciated.

Grok (jax):
https://github.com/xai-org/grok-1/blob/7050ed204b8206bb8645c7b7bbef7252f79561b0/model.py#L864-L865

SGLang implementation (triton):
https://github.com/sgl-project/sglang/blob/2cea6146d8735780da602c0dfa0569b0fb5d47ba/python/sglang/srt/layers/extend_attention.py#L101-L102

@yzh119 yzh119 self-assigned this May 24, 2024
@yzh119
Copy link
Collaborator

yzh119 commented May 24, 2024

Sounds good, should be easy to support.

@yzh119
Copy link
Collaborator

yzh119 commented May 24, 2024

Is there is formal name with this "Attention with Logits Cap" method?

@merrymercy
Copy link
Author

there is no formal name. maybe just call it "logit cap"

@merrymercy
Copy link
Author

@yzh119 Any progress on this issue?
FYI, TensorRT-LLM recently added this feature https://github.com/NVIDIA/TensorRT-LLM/blob/db4edea1e1359bcfcac7bbb87c1b639b5611c721/tensorrt_llm/functional.py#L4519-L4521

yzh119 added a commit that referenced this issue Jun 14, 2024
@yzh119
Copy link
Collaborator

yzh119 commented Jun 14, 2024

Done in #298

@yzh119 yzh119 closed this as completed Jun 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants