New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[decomp] Fix _scaled_dot_product_flash_attention decomposition bug #113102
Conversation
For `F.scaled_dot_product_attention` we have a positional argument: `Tensor? attn_mask=None` This is not handled correctly in `python_variables.cpp`, and it is causing issue in tracing. Fix it so that `is_default` lambda correctly returns true when the default value is None Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/113102
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit f7a6836 with merge base bdfde62 (): This comment was automatically generated by Dr. CI and updates every 15 minutes. |
For `F.scaled_dot_product_attention` we have a positional argument: `Tensor? attn_mask=None` This is not handled correctly in `python_variables.cpp`, and it is causing issue in tracing. Fix it so that `is_default` lambda correctly returns true when the default value is None Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 5a280b4d902f76c20b2f8761cb6c559b108681e4 Pull Request resolved: #113102
…tion bug" For `_scaled_dot_product_flash_attention` we don't have `Tensor? attn_mask=None` but `scaled_dot_product_attention` has. In the original decomp there's a mixup where I added this argument to `_scaled_dot_product_flash_attention`. Fix it so that `_scaled_dot_product_flash_attention` is being decomposed correctly. Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
For `_scaled_dot_product_flash_attention` we don't have `Tensor? attn_mask=None` but `scaled_dot_product_attention` has. In the original decomp there's a mixup where I added this argument to `_scaled_dot_product_flash_attention`. Fix it so that `_scaled_dot_product_flash_attention` is being decomposed correctly. Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 2804646edd35bdf556b392edccbc5cdac7b76d13 Pull Request resolved: #113102
…tion bug" For `_scaled_dot_product_flash_attention` we don't have `Tensor? attn_mask=None` but `scaled_dot_product_attention` has. In the original decomp there's a mixup where I added this argument to `_scaled_dot_product_flash_attention`. Fix it so that `_scaled_dot_product_flash_attention` is being decomposed correctly. Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
For `_scaled_dot_product_flash_attention` we don't have `Tensor? attn_mask=None` but `scaled_dot_product_attention` has. In the original decomp there's a mixup where I added this argument to `_scaled_dot_product_flash_attention`. Fix it so that `_scaled_dot_product_flash_attention` is being decomposed correctly. Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: bb643d20f2e3657893e64c7cfe6245071befe322 Pull Request resolved: #113102
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot!
@pytorchbot merge |
Merge failedReason: This PR needs a If not, please add the To add a label, you can comment to pytorchbot, for example For more information, see Details for Dev Infra teamRaised by workflow job |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…ytorch#113102) For `_scaled_dot_product_flash_attention` we don't have `Tensor? attn_mask=None` but `scaled_dot_product_attention` has. In the original decomp there's a mixup where I added this argument to `_scaled_dot_product_flash_attention`. Fix it so that `_scaled_dot_product_flash_attention` is being decomposed correctly. Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: pytorch#113102 Approved by: https://github.com/ezyang
Stack from ghstack (oldest at bottom):
For
_scaled_dot_product_flash_attention
we don't haveTensor? attn_mask=None
but
scaled_dot_product_attention
has. In the original decomp there's amixup where I added this argument to
_scaled_dot_product_flash_attention
.Fix it so that
_scaled_dot_product_flash_attention
is being decomposed correctly.Summary:
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags: