Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable check for dropout in MultiheadAttention fast_path #88831

Closed

Conversation

ecly
Copy link
Contributor

@ecly ecly commented Nov 10, 2022

Since we already enforce eval mode for the fast_path, we do not need to also check for a falsy dropout value, as a model trained with dropout will have a non-zero dropout during eval mode, even though it won't be applied.

Fixes #88806

Since we already enforce eval mode for the fast_path, we do not need to
also check for a falsy dropout value, as a model trained with dropout
will have a non-zero dropout during eval mode, even though it won't be
applied.

Fixes pytorch#88806
@pytorch-bot
Copy link

pytorch-bot bot commented Nov 10, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/88831

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit a7396dc:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Nov 10, 2022

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: ecly / name: Emil Lynegaard (a7396dc)

@ecly
Copy link
Contributor Author

ecly commented Nov 10, 2022

Will sign CLA ASAP.

@ecly
Copy link
Contributor Author

ecly commented Nov 10, 2022

I didn't think this path was unit test worthy, but if you want I can port over the test from the related issue.

@ngimel
Copy link
Collaborator

ngimel commented Nov 10, 2022

@drisspg please reassign if someone else from BT should review this

@ngimel ngimel added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Nov 10, 2022
@erichan1
Copy link
Member

erichan1 commented Nov 10, 2022

Looks fine to me as per discussion in #88806. Will leave for @drisspg to approve since I've been out of the loop for BT.

Edit: The only possible issue is if there is a case when we're in eval mode and allow dropout. Is there such a case?

@ecly
Copy link
Contributor Author

ecly commented Nov 10, 2022

@erichan1 looking at the implementation in F.multi_head_attention_forward, which I suppose should serve as the reference for the fast path in terms of behavior, this does not seem to be supported, as dropout_p is force set to 0 when in eval mode:

pytorch/torch/nn/functional.py

Lines 5167 to 5168 in 1ae772a

if not training:
dropout_p = 0.0

Copy link
Contributor

@drisspg drisspg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, nice catch LGTM!

@ngimel
Copy link
Collaborator

ngimel commented Nov 11, 2022

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 11, 2022
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request Dec 10, 2022
)

Since we already enforce eval mode for the fast_path, we do not need to also check for a falsy dropout value, as a model trained with dropout will have a non-zero dropout during eval mode, even though it won't be applied.

Fixes pytorch#88806

Pull Request resolved: pytorch#88831
Approved by: https://github.com/drisspg
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fast path for MultiheadAttention does not work with instances with dropout despite being in eval mode
6 participants