We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hi, I'm writing my operator using fused attention as a template. However, I found that fused attention requires an Ampere arch:
https://github.com/openai/triton/blob/d376020f90002757eea3ea9475d4f7cfc2ec5ead/python/triton/ops/flash_attention.py#L200
I do not understand this.
Besides, it seems that only head_dim=64 is supported, right? How can I fix it for the head_dim=32 case?
https://github.com/openai/triton/blob/d376020f90002757eea3ea9475d4f7cfc2ec5ead/python/triton/ops/flash_attention.py#L207
The text was updated successfully, but these errors were encountered:
there is some more information in #616
Sorry, something went wrong.
No branches or pull requests
Hi, I'm writing my operator using fused attention as a template. However, I found that fused attention requires an Ampere arch:
https://github.com/openai/triton/blob/d376020f90002757eea3ea9475d4f7cfc2ec5ead/python/triton/ops/flash_attention.py#L200
I do not understand this.
Besides, it seems that only head_dim=64 is supported, right? How can I fix it for the head_dim=32 case?
https://github.com/openai/triton/blob/d376020f90002757eea3ea9475d4f7cfc2ec5ead/python/triton/ops/flash_attention.py#L207
The text was updated successfully, but these errors were encountered: