[aten decomp] Update sdpa decom #108371

kimishpatel · 2023-08-31T20:47:58Z

Stack from ghstack (oldest at bottom):

-> [aten decomp] Update sdpa decom #108371

Summary:
Earlier decomp was routing _flash* variant to _match variant and this
was result in failure during torch.export, for some reason that I
couldnt trace.

However, it seems that we should really have a decomp for
scaled_dot_product_attention, instead of
scaled_dot_product_flash_attention. Right?

This diff adds that. Plus it adds a test to check if the model exported
via two stage export, has decomposed the op. This test needs improvement
to figur eout what the core aten opset is and check for anything that is
not inside.

Test Plan:
test_model_exports_to_core_aten

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: D48917461

Summary: Earlier decomp was routing _flash* variant to _match variant and this was result in failure during torch.export, for some reason that I couldnt trace. However, it seems that we should really have a decomp for scaled_dot_product_attention, instead of scaled_dot_product_flash_attention. Right? This diff adds that. Plus it adds a test to check if the model exported via two stage export, has decomposed the op. This test needs improvement to figur eout what the core aten opset is and check for anything that is not inside. Test Plan: test_model_exports_to_core_aten Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

pytorch-bot · 2023-08-31T20:48:00Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/108371

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 49068e0 with merge base b9fc6d7 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Summary: Earlier decomp was routing _flash* variant to _match variant and this was result in failure during torch.export, for some reason that I couldnt trace. However, it seems that we should really have a decomp for scaled_dot_product_attention, instead of scaled_dot_product_flash_attention. Right? This diff adds that. Plus it adds a test to check if the model exported via two stage export, has decomposed the op. This test needs improvement to figur eout what the core aten opset is and check for anything that is not inside. Test Plan: test_model_exports_to_core_aten Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 326253868dfe705e7f84ff62043a8d333139d5c1 Pull Request resolved: #108371

kimishpatel · 2023-08-31T23:25:30Z

talked to @larryliu0820 and realized that this is wrong. Need a different solution

Summary: Earlier decomp was routing _flash* variant to _match variant and this was result in failure during torch.export, for some reason that I couldnt trace. However, it seems that we should really have a decomp for scaled_dot_product_attention, instead of scaled_dot_product_flash_attention. Right? This diff adds that. Plus it adds a test to check if the model exported via two stage export, has decomposed the op. This test needs improvement to figur eout what the core aten opset is and check for anything that is not inside. Test Plan: test_model_exports_to_core_aten Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Earlier decomp was routing _flash* variant to _match variant and this was result in failure during torch.export, for some reason that I couldnt trace. However, it seems that we should really have a decomp for scaled_dot_product_attention, instead of scaled_dot_product_flash_attention. Right? This diff adds that. Plus it adds a test to check if the model exported via two stage export, has decomposed the op. This test needs improvement to figur eout what the core aten opset is and check for anything that is not inside. Test Plan: test_model_exports_to_core_aten Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 22b36d764c6feacc7a5d51117009cab57f451092 Pull Request resolved: #108371

Summary: Earlier decomp was routing _flash* variant to _match variant and this was result in failure during torch.export, for some reason that I couldnt trace. However, it seems that we should really have a decomp for scaled_dot_product_attention, instead of scaled_dot_product_flash_attention. Right? This diff adds that. Plus it adds a test to check if the model exported via two stage export, has decomposed the op. This test needs improvement to figur eout what the core aten opset is and check for anything that is not inside. Test Plan: test_model_exports_to_core_aten Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Earlier decomp was routing _flash* variant to _match variant and this was result in failure during torch.export, for some reason that I couldnt trace. However, it seems that we should really have a decomp for scaled_dot_product_attention, instead of scaled_dot_product_flash_attention. Right? This diff adds that. Plus it adds a test to check if the model exported via two stage export, has decomposed the op. This test needs improvement to figur eout what the core aten opset is and check for anything that is not inside. Test Plan: test_model_exports_to_core_aten Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 9f853418211822789fb20d8621bbf94d1a0772f1 Pull Request resolved: #108371

larryliu0820 · 2023-09-01T20:08:32Z

torch/_decomp/decompositions.py

@@ -3993,7 +3993,7 @@ def scaled_dot_product_flash_attention(
        query, key, value, attn_mask, dropout_p, is_causal, None, scale=scale
    )
    return (
-        output,
+        output.transpose(1, 2),


oh wow this is so weird

it is indeed. flash attnetion, https://fburl.com/38pwyabk, does that while the other one does not. I am gonna check if i trigger non flash variant what happens. But surprising calls to contiguous, https://github.com/pytorch/pytorch/blob/main/torch/nn/functional.py#L5441, does not appear in trace. which is also very weird.

Is it a bug? It would be great to add this explanation in the comment. What's the relationship to decomposition statement in the summary?

kimishpatel · 2023-09-01T20:09:20Z

@kimishpatel has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Summary: Earlier decomp was routing _flash* variant to _match variant and this was result in failure during torch.export, for some reason that I couldnt trace. However, it seems that we should really have a decomp for scaled_dot_product_attention, instead of scaled_dot_product_flash_attention. Right? This diff adds that. Plus it adds a test to check if the model exported via two stage export, has decomposed the op. This test needs improvement to figur eout what the core aten opset is and check for anything that is not inside. Test Plan: test_model_exports_to_core_aten Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D48917461](https://our.internmc.facebook.com/intern/diff/D48917461) [ghstack-poisoned]

Summary: Earlier decomp was routing _flash* variant to _match variant and this was result in failure during torch.export, for some reason that I couldnt trace. However, it seems that we should really have a decomp for scaled_dot_product_attention, instead of scaled_dot_product_flash_attention. Right? This diff adds that. Plus it adds a test to check if the model exported via two stage export, has decomposed the op. This test needs improvement to figur eout what the core aten opset is and check for anything that is not inside. Test Plan: test_model_exports_to_core_aten Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 730e4790241d63376f8c0652f1f4289ce90b105a Pull Request resolved: #108371

kimishpatel · 2023-09-01T23:03:55Z

@kimishpatel has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Summary: Earlier decomp was routing _flash* variant to _match variant and this was result in failure during torch.export, for some reason that I couldnt trace. However, it seems that we should really have a decomp for scaled_dot_product_attention, instead of scaled_dot_product_flash_attention. Right? This diff adds that. Plus it adds a test to check if the model exported via two stage export, has decomposed the op. This test needs improvement to figur eout what the core aten opset is and check for anything that is not inside. Test Plan: test_model_exports_to_core_aten Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D48917461](https://our.internmc.facebook.com/intern/diff/D48917461) [ghstack-poisoned]

Summary: Earlier decomp was routing _flash* variant to _match variant and this was result in failure during torch.export, for some reason that I couldnt trace. However, it seems that we should really have a decomp for scaled_dot_product_attention, instead of scaled_dot_product_flash_attention. Right? This diff adds that. Plus it adds a test to check if the model exported via two stage export, has decomposed the op. This test needs improvement to figur eout what the core aten opset is and check for anything that is not inside. Test Plan: test_model_exports_to_core_aten Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 44a94e7bb6ec354833c4b5049311bf80fd17119b Pull Request resolved: #108371

kimishpatel · 2023-09-02T16:44:45Z

@kimishpatel has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2023-09-03T15:15:17Z

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

pytorchmergebot · 2023-09-03T15:17:04Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

github-actions bot added the ciflow/inductor label Aug 31, 2023

kimishpatel requested a review from larryliu0820 August 31, 2023 20:51

larryliu0820 approved these changes Sep 1, 2023

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 3, 2023

pytorchmergebot added the merging label Sep 3, 2023

pytorchmergebot added Merged and removed merging labels Sep 3, 2023

pytorchmergebot closed this in cc50e65 Sep 3, 2023

facebook-github-bot deleted the gh/kimishpatel/178/head branch September 7, 2023 14:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[aten decomp] Update sdpa decom #108371

[aten decomp] Update sdpa decom #108371

kimishpatel commented Aug 31, 2023 •

edited

pytorch-bot bot commented Aug 31, 2023 •

edited

kimishpatel commented Aug 31, 2023

larryliu0820 Sep 1, 2023

kimishpatel Sep 1, 2023

iseeyuan Sep 3, 2023

kimishpatel commented Sep 1, 2023

kimishpatel commented Sep 1, 2023

kimishpatel commented Sep 2, 2023

facebook-github-bot commented Sep 3, 2023

pytorchmergebot commented Sep 3, 2023

[aten decomp] Update sdpa decom #108371

[aten decomp] Update sdpa decom #108371

Conversation

kimishpatel commented Aug 31, 2023 • edited

pytorch-bot bot commented Aug 31, 2023 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/108371

✅ No Failures

kimishpatel commented Aug 31, 2023

larryliu0820 Sep 1, 2023

Choose a reason for hiding this comment

kimishpatel Sep 1, 2023

Choose a reason for hiding this comment

iseeyuan Sep 3, 2023

Choose a reason for hiding this comment

kimishpatel commented Sep 1, 2023

kimishpatel commented Sep 1, 2023

kimishpatel commented Sep 2, 2023

facebook-github-bot commented Sep 3, 2023

pytorchmergebot commented Sep 3, 2023

Merge started

kimishpatel commented Aug 31, 2023 •

edited

pytorch-bot bot commented Aug 31, 2023 •

edited