Why fused attention is only applicable on Ampere GPUs? #1279

rayleizhu · 2023-03-05T11:21:10Z

Hi, I'm writing my operator using fused attention as a template. However, I found that fused attention requires an Ampere arch:

https://github.com/openai/triton/blob/d376020f90002757eea3ea9475d4f7cfc2ec5ead/python/triton/ops/flash_attention.py#L200

I do not understand this.

Does it mean this template uses some arch-specific operators?
To use it on Volta GPU, how should I modify it?

Besides, it seems that only head_dim=64 is supported, right? How can I fix it for the head_dim=32 case?

https://github.com/openai/triton/blob/d376020f90002757eea3ea9475d4f7cfc2ec5ead/python/triton/ops/flash_attention.py#L207

ptillet · 2023-03-06T01:37:55Z

there is some more information in #616

…ang#1279) The core Triton is a small number of people, and we receive many PRs (thank you!). To help us review your code more quickly, **if you are a new contributor (less than 3 PRs merged) we ask that you fill out the following checklist and include it in your PR description.** Fill out the checklist by replacing `[ ]` with `[x]`. - [x] I am not making a trivial change, such as fixing a typo in a comment. - [x] I have written a PR description following these [rules](https://cbea.ms/git-commit/#why-not-how). - [ ] I have used an LLM to copyedit my PR description and and code comments. - [x] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`. - Select one of the following. - [ ] I have added tests. - `/test` for `lit` tests - `/unittest` for C++ tests - `/python/test` for end-to-end tests - [x] This PR does not need a test because `it doesn't change any triton code`. - Select one of the following. - [x] I have not added any `lit` tests. - [ ] The `lit` tests I have added are "minimal" -- they contain only the instructions necessary to exercise the bug. (Usually running Python code and using the instructions it generates is not minimal.) --------- Signed-off-by: Gregory Shimansky <gshimansky@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why fused attention is only applicable on Ampere GPUs? #1279

Why fused attention is only applicable on Ampere GPUs? #1279

rayleizhu commented Mar 5, 2023

ptillet commented Mar 6, 2023 •

edited

Loading

Why fused attention is only applicable on Ampere GPUs? #1279

Why fused attention is only applicable on Ampere GPUs? #1279

Comments

rayleizhu commented Mar 5, 2023

ptillet commented Mar 6, 2023 • edited Loading

ptillet commented Mar 6, 2023 •

edited

Loading