Initial Flash Attention support on ROCM #114309

xinyazhang · 2023-11-21T22:25:39Z

This pull requests add initial Flash Attention support for AMD/ROCM platform. It added a specialized Triton repository/branch as a compile-time dependency for Flash Attention math library on AMD/ROCM. This triton submodule is not used at runtime and will not be shipped to the final pytorch package. We have the plan to release this specialized Triton as a separate project.

Know limitations:

Only supports MI200 series GPU (i.e., gcnArchName == gfx90a:sramecc+:xnack-.
Only supports power of two sequence lengths.
No support for varlen APIs.
Only support head dimension 16,32,64,128.
Performance is still being optimized.

Fixes #112997

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang

pytorch-bot · 2023-11-21T22:25:42Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/114309

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit f75038b with merge base 310f6ab ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

xinyazhang · 2023-11-21T23:35:00Z

@pytorchbot label ciflow/rocm

jeffdaily · 2023-11-29T16:48:11Z

@pytorchbot rebase

pytorchmergebot · 2023-11-29T16:52:01Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2023-11-29T16:52:09Z

Successfully rebased xinyazhang/up-fa-mathaot onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout xinyazhang/up-fa-mathaot && git pull --rebase)

xinyazhang · 2023-11-29T20:08:34Z

CI Failures are likely irrelevant to this PR

jithunnair-amd · 2023-11-30T22:06:54Z

@malfet Hi Nikita, please add anyone else who would be relevant as a reviewer.

xinyazhang · 2023-12-01T16:56:29Z

@mruberry @ngimel can you review this PR? These failures (dynamos, inductors, etc.) are unlikely related to these changes

CMakeLists.txt

aten/src/ATen/native/transformers/attention.cpp

aten/src/ATen/native/transformers/cuda/sdp_utils.cpp

test/test_transformers.py

atalman · 2023-12-16T01:59:53Z

@pytorchbot revert -m "trunk tests are failing" -c nosignal

pytorchmergebot · 2023-12-16T02:01:36Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot · 2023-12-16T02:01:41Z

Reverting PR 114309 failed

Reason: 1 mandatory check(s) are pending/not yet run. The first few are:

EasyCLA

Dig deeper by viewing the pending checks on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

atalman · 2023-12-16T02:16:02Z

/easycla

atalman · 2023-12-16T02:16:21Z

@pytorchbot revert -m "trunk tests are failing" -c nosignal

linux-foundation-easycla · 2023-12-16T02:16:21Z

✅ login: xinyazhang / name: Xinya Zhang (9523fcd, 28d9b64, 1474303, 7aad9fc, 64e8d07, c71a513, 10920b6, f434221, a8847b8, 7053938, b360e05, 1a7ce85, 4bf2304, fb2fb55, 341160f, 6ec8e86, fc15668, 17b5f90, f3dabaa, d757ef7, 5d61a80, 4057238, 775b647, c277e90, 310d119, 3fb09eb, 411e5a3, 6b2a7b1, 30ba318, c9da318, 7548000, 69f6f98, 0d1dfd9, 5826e44, fa13562, 0fffe65, 7eb581f, e3a9634, 136d38c, 70455fe, a1eac3b, c1accfb, a8a429a, 3178cef, 5171467, d5094a9, bff71ea, 33ea319, 17b9870, efda108, 6d265c8, 28dd330, 4d940d2, f2a350c, 1681f5b, d04e1b1, 5985095, 552c5de, 50c359a, 03ba205, 7408529, 53494ef, 89d8b48, 3348625, 944e377, bb288f9, cf726c6, 6687394, 7903452, 31d7a25, eafc62e, eff3ef2, 5a133c3, 0b93760, 7f640b7, 296c8f2, 73450da, 7e1b1d7, 70b035c, f9ab4ad, 7b8e64f, 9b98522, 8a36224, 738b22d, 11e2403, eadd714, b76df83, 9d97be1, f75038b)
✅ login: groenenboomj / name: Joseph Groenenboom (7bc7421, 5f66842, 77055e8, bfeaedb, 3d237ab, d359132, fee8877, 1f4a41b, 5bc37c7, b4c7710, 8eebb24, 23313e5, edc74e9, c537134, ed7bcff, 80a82de, e13de3a, cee8c07, 7ec6740, 4ae8482, a1f1af7, 73b8281, 8d80f87, 28bc067, 8625ea1, 9f372f3, bcb5ec1, ecca0c4, e7e9f80, 476af58, 997f1ed, 7adbb72, 97bc6a3, 814c13a, 5738fbb, 4ad35a9, 0376365, 5e8c8ed, 701b970, 4a38a24, 5ee1421, a98248d, c2c225d, dd7e012, e7db919, a719b4c, 7cfdaf9, 05c53a3, 2e87176, 82fca71, fb168f2, b7c1a02, 17c9f1a, 4b4bdba, 4c2ff36, 5f254de, c435a76, b954684, 67b305b, 0db1bba, 396fd98, 5242674, 1e185fd, 3f2fa78, 4eb61e4, 04606f6, d1b1379, 0953acf, c693bbc, 50bb9de, b55d029, 62f9c4c, 917f9d1, 95c6b1f, ff2f57c, 6220b85)
❌ The email address for the commit (e9f7cbd) is not linked to the GitHub account, preventing the EasyCLA check. Consult this Help Article and GitHub Help to resolve. (To view the commit's email address, add .patch at the end of this PR page's URL.) For further assistance with EasyCLA, please submit a support request ticket.

pytorchmergebot · 2023-12-16T02:18:08Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot · 2023-12-16T02:18:13Z

Reverting PR 114309 failed

Reason: 1 mandatory check(s) failed. The first few are:

EasyCLA

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

This reverts commit 5bddbed.

This reverts commit 5bddbed. Pull Request resolved: #115975 Approved by: https://github.com/atalman, https://github.com/malfet

This pull requests add initial Flash Attention support for AMD/ROCM platform. It added a specialized Triton repository/branch as a compile-time dependency for Flash Attention math library on AMD/ROCM. This triton submodule is not used at runtime and will not be shipped to the final pytorch package. We have the plan to release this specialized Triton as a separate project. Know limitations: - [ ] Only supports MI200 series GPU (i.e., `gcnArchName == gfx90a:sramecc+:xnack-`. - [ ] Only supports power of two sequence lengths. - [ ] No support for varlen APIs. - [ ] Only support head dimension 16,32,64,128. - [ ] Performance is still being optimized. Fixes pytorch#112997 Pull Request resolved: pytorch#114309 Approved by: https://github.com/jeffdaily, https://github.com/malfet --------- Co-authored-by: Joseph Groenenboom <joseph.groenenboom@amd.com>

…torch#115975) This reverts commit 5bddbed. Pull Request resolved: pytorch#115975 Approved by: https://github.com/atalman, https://github.com/malfet

This pull requests add initial Flash Attention support for AMD/ROCM platform. It added a specialized Triton repository/branch as a compile-time dependency for Flash Attention math library on AMD/ROCM. This triton submodule is not used at runtime and will not be shipped to the final pytorch package. We have the plan to release this specialized Triton as a separate project. Know limitations: - [ ] Only supports MI200 series GPU (i.e., `gcnArchName == gfx90a:sramecc+:xnack-`. - [ ] Only supports power of two sequence lengths. - [ ] No support for varlen APIs. - [ ] Only support head dimension 16,32,64,128. - [ ] Performance is still being optimized. Fixes pytorch#112997 Pull Request resolved: pytorch#114309 Approved by: https://github.com/jeffdaily, https://github.com/malfet --------- Co-authored-by: Joseph Groenenboom <joseph.groenenboom@amd.com>

…torch#115975) This reverts commit 5bddbed. Pull Request resolved: pytorch#115975 Approved by: https://github.com/atalman, https://github.com/malfet

This pull requests add initial Flash Attention support for AMD/ROCM platform. It added a specialized Triton repository/branch as a compile-time dependency for Flash Attention math library on AMD/ROCM. This triton submodule is not used at runtime and will not be shipped to the final pytorch package. We have the plan to release this specialized Triton as a separate project. Know limitations: - [ ] Only supports MI200 series GPU (i.e., `gcnArchName == gfx90a:sramecc+:xnack-`. - [ ] Only supports power of two sequence lengths. - [ ] No support for varlen APIs. - [ ] Only support head dimension 16,32,64,128. - [ ] Performance is still being optimized. Fixes pytorch#112997 Pull Request resolved: pytorch#114309 Approved by: https://github.com/jeffdaily, https://github.com/malfet --------- Co-authored-by: Joseph Groenenboom <joseph.groenenboom@amd.com>

…torch#115975) This reverts commit 5bddbed. Pull Request resolved: pytorch#115975 Approved by: https://github.com/atalman, https://github.com/malfet

This pull requests add initial Flash Attention support for AMD/ROCM platform. It added a specialized Triton repository/branch as a compile-time dependency for Flash Attention math library on AMD/ROCM. This triton submodule is not used at runtime and will not be shipped to the final pytorch package. We have the plan to release this specialized Triton as a separate project. Know limitations: - [ ] Only supports MI200 series GPU (i.e., `gcnArchName == gfx90a:sramecc+:xnack-`. - [ ] Only supports power of two sequence lengths. - [ ] No support for varlen APIs. - [ ] Only support head dimension 16,32,64,128. - [ ] Performance is still being optimized. Fixes pytorch#112997 Pull Request resolved: pytorch#114309 Approved by: https://github.com/jeffdaily, https://github.com/malfet --------- Co-authored-by: Joseph Groenenboom <joseph.groenenboom@amd.com>

…torch#115975) This reverts commit 5bddbed. Pull Request resolved: pytorch#115975 Approved by: https://github.com/atalman, https://github.com/malfet

Note about the Updates: This PR: 1. skips more flash attention related UTs on MI200 2. Fix additional ATen compiling errors after hipification 3. Fix the author "root" of a specific commit 4. Includes the patch from Nikita in favor of block level static initialization. CAVEAT: This revised PR has a commit that modifies the CI to force its running on MI200 nodes. That specific commit must be reverted before merge. Original PR (#114309) Note: This pull requests add initial Flash Attention support for AMD/ROCM platform. It added a specialized Triton repository/branch as a compile-time dependency for Flash Attention math library on AMD/ROCM. This triton submodule is not used at runtime and will not be shipped to the final pytorch package. We have the plan to release this specialized Triton as a separate project. Know limitations: - Only supports MI200 series GPU (i.e., `gcnArchName == gfx90a:sramecc+:xnack-`. - Only supports power of two sequence lengths. - No support for varlen APIs. - Only support head dimension 16,32,64,128. - Performance is still being optimized. Fixes #112997 Pull Request resolved: #115981 Approved by: https://github.com/malfet

netw0rkf10w · 2024-02-12T16:09:18Z

@xinyazhang It got UserWarning: 1Torch was not compiled with flash attention with the latest nightly on MI250x. Could you tell me if this is expected? Thanks!

xinyazhang · 2024-02-12T16:43:05Z

@xinyazhang It got UserWarning: 1Torch was not compiled with flash attention with the latest nightly on MI250x. Could you tell me if this is expected? Thanks!

Hi, this is a bug from the build system changes related to AOTriton. Our internal branch already has fixes along with AOTriton V2 Integration. They will be sent to upstream as PR when the whole feature set is completed.

netw0rkf10w · 2024-02-13T12:13:15Z

@xinyazhang Thanks for the information! Any ETA please?

pytorch-bot bot added ciflow/rocm module: rocm AMD GPU support for Pytorch labels Nov 21, 2023

pytorchbot added the open source label Nov 21, 2023

jithunnair-amd added the release notes: rocm mandatorylabel label Nov 21, 2023

jeffdaily removed the ciflow/rocm label Nov 21, 2023

pytorch-bot bot added the ciflow/rocm label Nov 21, 2023

jithunnair-amd added the keep-going Don't stop on first failure, keep running tests until the end label Nov 28, 2023

howiejayz mentioned this pull request Nov 29, 2023

Feature request: Sliding Window Attention ROCm/flash-attention#22

Open

pytorchmergebot force-pushed the xinyazhang/up-fa-mathaot branch from 5693b15 to d38d585 Compare November 29, 2023 16:52

xinyazhang force-pushed the xinyazhang/up-fa-mathaot branch from d38d585 to f252833 Compare November 29, 2023 17:10

xinyazhang marked this pull request as ready for review November 29, 2023 20:08

xinyazhang requested review from jeffdaily, jithunnair-amd, mruberry and ngimel as code owners November 29, 2023 20:08

jithunnair-amd requested a review from malfet November 30, 2023 22:07

mikaylagawarecki requested a review from drisspg December 1, 2023 03:21

mikaylagawarecki added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Dec 1, 2023

drisspg reviewed Dec 1, 2023

View reviewed changes

CMakeLists.txt Outdated Show resolved Hide resolved

drisspg reviewed Dec 1, 2023

View reviewed changes

aten/src/ATen/native/transformers/attention.cpp Outdated Show resolved Hide resolved

drisspg reviewed Dec 1, 2023

View reviewed changes

aten/src/ATen/native/transformers/cuda/sdp_utils.cpp Outdated Show resolved Hide resolved

drisspg reviewed Dec 1, 2023

View reviewed changes

test/test_transformers.py Outdated Show resolved Hide resolved

jeffdaily added a commit to ROCm/pytorch that referenced this pull request Dec 16, 2023

Revert "Initial Flash Attention support on ROCM (pytorch#114309)"

8d368e5

This reverts commit 5bddbed.

pytorchmergebot pushed a commit that referenced this pull request Dec 16, 2023

Revert "Initial Flash Attention support on ROCM (#114309)" (#115975)

e3aefe2

This reverts commit 5bddbed. Pull Request resolved: #115975 Approved by: https://github.com/atalman, https://github.com/malfet

jeffdaily mentioned this pull request Dec 18, 2023

UNSTABLE rocm / linux-focal-rocm5.7-py3.8 / test (default) #115312

Closed

xinyazhang mentioned this pull request Dec 18, 2023

Re-add initial Flash Attention support on ROCM #115981

Closed

drisspg mentioned this pull request Dec 20, 2023

Float16 scaled_dot_product_attention returns NaNs on ROCm #116176

Closed

Epliz mentioned this pull request May 3, 2024

Add support for memory efficient attention for AMD/ROCm ROCm/pytorch#1390

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial Flash Attention support on ROCM #114309

Initial Flash Attention support on ROCM #114309

xinyazhang commented Nov 21, 2023 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Nov 21, 2023 •

edited

Loading

xinyazhang commented Nov 21, 2023

jeffdaily commented Nov 29, 2023

pytorchmergebot commented Nov 29, 2023

pytorchmergebot commented Nov 29, 2023

xinyazhang commented Nov 29, 2023 •

edited

Loading

jithunnair-amd commented Nov 30, 2023

xinyazhang commented Dec 1, 2023

atalman commented Dec 16, 2023

pytorchmergebot commented Dec 16, 2023

pytorchmergebot commented Dec 16, 2023

atalman commented Dec 16, 2023

atalman commented Dec 16, 2023

linux-foundation-easycla bot commented Dec 16, 2023

pytorchmergebot commented Dec 16, 2023

pytorchmergebot commented Dec 16, 2023

netw0rkf10w commented Feb 12, 2024

xinyazhang commented Feb 12, 2024 •

edited

Loading

netw0rkf10w commented Feb 13, 2024

Initial Flash Attention support on ROCM #114309

Initial Flash Attention support on ROCM #114309

Conversation

xinyazhang commented Nov 21, 2023 • edited by pytorch-bot bot Loading

pytorch-bot bot commented Nov 21, 2023 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/114309

✅ No Failures

xinyazhang commented Nov 21, 2023

jeffdaily commented Nov 29, 2023

pytorchmergebot commented Nov 29, 2023

pytorchmergebot commented Nov 29, 2023

xinyazhang commented Nov 29, 2023 • edited Loading

jithunnair-amd commented Nov 30, 2023

xinyazhang commented Dec 1, 2023

atalman commented Dec 16, 2023

pytorchmergebot commented Dec 16, 2023

pytorchmergebot commented Dec 16, 2023

Reverting PR 114309 failed

atalman commented Dec 16, 2023

atalman commented Dec 16, 2023

linux-foundation-easycla bot commented Dec 16, 2023

pytorchmergebot commented Dec 16, 2023

pytorchmergebot commented Dec 16, 2023

Reverting PR 114309 failed

netw0rkf10w commented Feb 12, 2024

xinyazhang commented Feb 12, 2024 • edited Loading

netw0rkf10w commented Feb 13, 2024

xinyazhang commented Nov 21, 2023 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Nov 21, 2023 •

edited

Loading

xinyazhang commented Nov 29, 2023 •

edited

Loading

xinyazhang commented Feb 12, 2024 •

edited

Loading