[2.0.1] Disable SDPA FlashAttention backward and mem eff attention on sm86+ for head_dim above 64 #99736

cpuhrsch · 2023-04-21T17:17:43Z

2.0.1 submission of #99105 plus sm89 CI enablement.

Expand sdpa_utils.h check to disable FlashAttention when using autograd and mem eff attention for the following cases

head_dim > 64
sm86 or newer

Previously we only disable these kernels on sm86 and for head_dim equal to 128.

pytorch-bot · 2023-04-21T17:17:45Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/99736

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

ROCm jobs are failing with No space left on device

⏳ No Failures, 2 Pending

As of commit d3c53c4:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

malfet

Please make sure to remove pull.yml change, but otherwise looks good to me

malfet · 2023-04-21T18:46:35Z

.github/workflows/pull.yml

+
+  linux-bionic-cuda11_8-py3_10-gcc7-sm89-build:
+    name: linux-bionic-cuda11.8-py3.10-gcc7-sm89
+    uses: ./.github/workflows/_linux-build.yml
+    with:
+      build-environment: linux-bionic-cuda11.8-py3.10-gcc7-sm89
+      docker-image-name: pytorch-linux-bionic-cuda11.8-cudnn8-py3-gcc7
+      cuda-arch-list: '8.9'
+      test-matrix: |
+        { include: [
+          { config: "default", shard: 1, num_shards: 4, runner: "linux.gcp.l4" },
+          { config: "default", shard: 2, num_shards: 4, runner: "linux.gcp.l4" },
+          { config: "default", shard: 3, num_shards: 4, runner: "linux.gcp.l4" },
+          { config: "default", shard: 4, num_shards: 4, runner: "linux.gcp.l4" },
+        ]}
+  linux-bionic-cuda11_8-py3_10-gcc7-sm89-test:
+    name: linux-bionic-cuda11.8-py3.10-gcc7-sm89
+    uses: ./.github/workflows/_linux-test.yml
+    needs: linux-bionic-cuda11_8-py3_10-gcc7-sm89-build
+    with:
+      build-environment: linux-bionic-cuda11.8-py3.10-gcc7-sm89
+      docker-image: ${{ needs.linux-bionic-cuda11_8-py3_10-gcc7-sm89-build.outputs.docker-image }}
+      test-matrix: ${{ needs.linux-bionic-cuda11_8-py3_10-gcc7-sm89-build.outputs.test-matrix }}
+      use-gha: anything-non-empty-to-use-gha


This needs to be removed before cherry-pick is landed

Oh I thought you wanted to include this so we can get sm89 tests on release branch. I'll remove it.

…or head_dim above 64 (pytorch#99105) Expand sdpa_utils.h check to disable FlashAttention when using autograd and mem eff attention for the following cases - head_dim > 64 - sm86 or newer Previously we only disable these kernels on sm86 and for head_dim equal to 128. Pull Request resolved: pytorch#99105 Approved by: https://github.com/malfet

albanD · 2023-04-24T13:08:33Z

The new workflow has been removed, rebased and force pushed.

cpuhrsch force-pushed the sdpasm86fix201 branch from cf3534a to cb45ca9 Compare April 21, 2023 17:17

cpuhrsch mentioned this pull request Apr 21, 2023

[v2.0.1] Release Tracker #97272

Closed

cpuhrsch requested a review from drisspg April 21, 2023 17:20

cpuhrsch force-pushed the sdpasm86fix201 branch from cb45ca9 to 6c31735 Compare April 21, 2023 17:41

cpuhrsch requested a review from a team as a code owner April 21, 2023 17:41

cpuhrsch force-pushed the sdpasm86fix201 branch from 6c31735 to 85810e3 Compare April 21, 2023 18:28

bdhirsh mentioned this pull request Apr 21, 2023

aot autograd: handle detach() and no_grad() mutations on input (#95980) #99740

Merged

malfet approved these changes Apr 21, 2023

View reviewed changes

cpuhrsch and others added 2 commits April 24, 2023 09:07

remove master only test

d3c53c4

albanD force-pushed the sdpasm86fix201 branch from 85810e3 to d3c53c4 Compare April 24, 2023 13:08

atalman merged commit e9ebda2 into pytorch:release/2.0 Apr 24, 2023
78 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[2.0.1] Disable SDPA FlashAttention backward and mem eff attention on sm86+ for head_dim above 64 #99736

[2.0.1] Disable SDPA FlashAttention backward and mem eff attention on sm86+ for head_dim above 64 #99736

cpuhrsch commented Apr 21, 2023

pytorch-bot bot commented Apr 21, 2023 •

edited

malfet left a comment

malfet Apr 21, 2023

cpuhrsch Apr 21, 2023

albanD commented Apr 24, 2023 •

edited

[2.0.1] Disable SDPA FlashAttention backward and mem eff attention on sm86+ for head_dim above 64 #99736

[2.0.1] Disable SDPA FlashAttention backward and mem eff attention on sm86+ for head_dim above 64 #99736

Conversation

cpuhrsch commented Apr 21, 2023

pytorch-bot bot commented Apr 21, 2023 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/99736

❗ 1 Active SEVs

⏳ No Failures, 2 Pending

malfet left a comment

Choose a reason for hiding this comment

malfet Apr 21, 2023

Choose a reason for hiding this comment

cpuhrsch Apr 21, 2023

Choose a reason for hiding this comment

albanD commented Apr 24, 2023 • edited

pytorch-bot bot commented Apr 21, 2023 •

edited

albanD commented Apr 24, 2023 •

edited