Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[decomp] Fix _scaled_dot_product_flash_attention decomposition bug #113102

Closed
wants to merge 3 commits into from

Conversation

larryliu0820
Copy link
Contributor

@larryliu0820 larryliu0820 commented Nov 7, 2023

Stack from ghstack (oldest at bottom):

For _scaled_dot_product_flash_attention we don't have

Tensor? attn_mask=None

but scaled_dot_product_attention has. In the original decomp there's a
mixup where I added this argument to
_scaled_dot_product_flash_attention.

Fix it so that _scaled_dot_product_flash_attention is being decomposed correctly.

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

For `F.scaled_dot_product_attention` we have a positional argument:

`Tensor? attn_mask=None`

This is not handled correctly in `python_variables.cpp`, and it is
causing issue in tracing.

Fix it so that `is_default` lambda  correctly returns true when the default value is None

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Nov 7, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/113102

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit f7a6836 with merge base bdfde62 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

larryliu0820 added a commit that referenced this pull request Nov 7, 2023
For `F.scaled_dot_product_attention` we have a positional argument:

`Tensor? attn_mask=None`

This is not handled correctly in `python_variables.cpp`, and it is
causing issue in tracing.

Fix it so that `is_default` lambda  correctly returns true when the default value is None

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 5a280b4d902f76c20b2f8761cb6c559b108681e4
Pull Request resolved: #113102
@larryliu0820 larryliu0820 changed the title Fix a small bug for positional argument default value being None [decomp] Fix _scaled_dot_product_flash_attention decomposition bug Nov 7, 2023
…tion bug"

For `_scaled_dot_product_flash_attention` we don't have

`Tensor? attn_mask=None`

but `scaled_dot_product_attention` has. In the original decomp there's a
mixup where I added this argument to
`_scaled_dot_product_flash_attention`.

Fix it so that `_scaled_dot_product_flash_attention` is being decomposed correctly.

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
larryliu0820 added a commit that referenced this pull request Nov 7, 2023
For `_scaled_dot_product_flash_attention` we don't have

`Tensor? attn_mask=None`

but `scaled_dot_product_attention` has. In the original decomp there's a
mixup where I added this argument to
`_scaled_dot_product_flash_attention`.

Fix it so that `_scaled_dot_product_flash_attention` is being decomposed correctly.

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 2804646edd35bdf556b392edccbc5cdac7b76d13
Pull Request resolved: #113102
…tion bug"

For `_scaled_dot_product_flash_attention` we don't have

`Tensor? attn_mask=None`

but `scaled_dot_product_attention` has. In the original decomp there's a
mixup where I added this argument to
`_scaled_dot_product_flash_attention`.

Fix it so that `_scaled_dot_product_flash_attention` is being decomposed correctly.

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
larryliu0820 added a commit that referenced this pull request Nov 7, 2023
For `_scaled_dot_product_flash_attention` we don't have

`Tensor? attn_mask=None`

but `scaled_dot_product_attention` has. In the original decomp there's a
mixup where I added this argument to
`_scaled_dot_product_flash_attention`.

Fix it so that `_scaled_dot_product_flash_attention` is being decomposed correctly.

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: bb643d20f2e3657893e64c7cfe6245071befe322
Pull Request resolved: #113102
Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot!

@albanD albanD requested a review from drisspg November 8, 2023 18:55
@larryliu0820
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 8, 2023
@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team Raised by workflow job

@larryliu0820
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@facebook-github-bot facebook-github-bot deleted the gh/larryliu0820/43/head branch November 12, 2023 15:24
Skylion007 pushed a commit to Skylion007/pytorch that referenced this pull request Nov 14, 2023
…ytorch#113102)

For `_scaled_dot_product_flash_attention` we don't have

`Tensor? attn_mask=None`

but `scaled_dot_product_attention` has. In the original decomp there's a
mixup where I added this argument to
`_scaled_dot_product_flash_attention`.

Fix it so that `_scaled_dot_product_flash_attention` is being decomposed correctly.

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: pytorch#113102
Approved by: https://github.com/ezyang
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged topic: not user facing topic category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants