Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2.0.1] Disable SDPA FlashAttention backward and mem eff attention on sm86+ for head_dim above 64 #99736

Merged
merged 2 commits into from Apr 24, 2023

Conversation

cpuhrsch
Copy link
Contributor

2.0.1 submission of #99105 plus sm89 CI enablement.

Expand sdpa_utils.h check to disable FlashAttention when using autograd and mem eff attention for the following cases

  • head_dim > 64
  • sm86 or newer

Previously we only disable these kernels on sm86 and for head_dim equal to 128.

@pytorch-bot
Copy link

pytorch-bot bot commented Apr 21, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/99736

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

⏳ No Failures, 2 Pending

As of commit d3c53c4:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure to remove pull.yml change, but otherwise looks good to me

Comment on lines 369 to 392

linux-bionic-cuda11_8-py3_10-gcc7-sm89-build:
name: linux-bionic-cuda11.8-py3.10-gcc7-sm89
uses: ./.github/workflows/_linux-build.yml
with:
build-environment: linux-bionic-cuda11.8-py3.10-gcc7-sm89
docker-image-name: pytorch-linux-bionic-cuda11.8-cudnn8-py3-gcc7
cuda-arch-list: '8.9'
test-matrix: |
{ include: [
{ config: "default", shard: 1, num_shards: 4, runner: "linux.gcp.l4" },
{ config: "default", shard: 2, num_shards: 4, runner: "linux.gcp.l4" },
{ config: "default", shard: 3, num_shards: 4, runner: "linux.gcp.l4" },
{ config: "default", shard: 4, num_shards: 4, runner: "linux.gcp.l4" },
]}
linux-bionic-cuda11_8-py3_10-gcc7-sm89-test:
name: linux-bionic-cuda11.8-py3.10-gcc7-sm89
uses: ./.github/workflows/_linux-test.yml
needs: linux-bionic-cuda11_8-py3_10-gcc7-sm89-build
with:
build-environment: linux-bionic-cuda11.8-py3.10-gcc7-sm89
docker-image: ${{ needs.linux-bionic-cuda11_8-py3_10-gcc7-sm89-build.outputs.docker-image }}
test-matrix: ${{ needs.linux-bionic-cuda11_8-py3_10-gcc7-sm89-build.outputs.test-matrix }}
use-gha: anything-non-empty-to-use-gha
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be removed before cherry-pick is landed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I thought you wanted to include this so we can get sm89 tests on release branch. I'll remove it.

cpuhrsch and others added 2 commits April 24, 2023 09:07
…or head_dim above 64 (pytorch#99105)

Expand sdpa_utils.h check to disable FlashAttention when using autograd and mem eff attention for the following cases
- head_dim > 64
- sm86 or newer

Previously we only disable these kernels on sm86 and for head_dim equal to 128.

Pull Request resolved: pytorch#99105
Approved by: https://github.com/malfet
@albanD
Copy link
Collaborator

albanD commented Apr 24, 2023

The new workflow has been removed, rebased and force pushed.

@atalman atalman merged commit e9ebda2 into pytorch:release/2.0 Apr 24, 2023
78 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants