-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable src_mask
in fast path of TransformerEncoderLayer
#87377
Enable src_mask
in fast path of TransformerEncoderLayer
#87377
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/87377
Note: Links to docs will display an error until the docs builds have been completed. ✅ No Failures, 2 PendingAs of commit 5c70c50: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
/easycla |
@pytorchbot rebase |
You don't have permissions to rebase this PR, only people with write permissions may rebase PRs. |
src_mask
in fast path of TransformerEncoderLayer
src_mask
in fast path of TransformerEncoderLayer
@sgrigory has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@sgrigory has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@sgrigory has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
elif src.is_nested and src_key_padding_mask is not None: | ||
why_not_sparsity_fast_path = "src_key_padding_mask is not supported with NestedTensor input for fastpath" | ||
elif src.is_nested and (src_key_padding_mask is not None or src_mask is not None): | ||
why_not_sparsity_fast_path = "src_key_padding_mask and src_mask are not supported with NestedTensor input" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the message clear? i.e., that either one causes this to fail. I am hesitant to put too much time in error message engineering but helping users get to fastpath is important
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it can be made clearer, e.g. "supplying both src_key_padding_mask and src_mask is not supported with NestedTensor input" or "with NestedTensor input, only one of src_key_padding_mask and src_mask can be provided". Since in the next PR I'll working on making 4D inputs possible from Python level and modifying this code, I can as well adjust this message
@pytorchbot merge |
Merge failedReason: Approval needed from one of the following (Rule 'superuser'): Details for Dev Infra teamRaised by workflow job |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Symbolic review to address this comment this comment #87377 (comment)
But ideally we should not require this. Michael has already reviewed and that should be all that is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
semicolon after closing curly bracket is only nedeed after class definitions (or if one is assigning anonymous lambda to a variable)
@sgrigory has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
1 similar comment
@sgrigory has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
ebd3a24
to
12979e8
Compare
@pytorchbot rebase |
@pytorchbot successfully started a rebase job. Check the current status here |
Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com>
Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com>
Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com>
Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com>
Successfully rebased |
12979e8
to
5c70c50
Compare
@sgrigory has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
1 similar comment
@sgrigory has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@pytorchbot merge (Initiating merge automatically since Phabricator Diff has merged) |
1 similar comment
@pytorchbot merge (Initiating merge automatically since Phabricator Diff has merged) |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…#87377) ## Issues Fixes pytorch#81129 (comment) ## Description Passing a 2D attention mask `src_mask` into the fast path of `TransformerEncoderLayer` in CPU was causing an error and so was disabled in pytorch#81277. This PR unrolls this fix, enabling `src_mask` on the fast path: - Either attention mask `src_mask` of shape `(L, L)` or padding mask `src_key_padding_mask` of shape `(B, L)` are now allowed on the CPU fast path. If softmax is applied along the last dimension (as in multi-head attention), these masks are processed without expanding them to 4D. Instead, when iterating through the input, `Softmax.cpp::host_softmax` converts the index to match the mask dimensions, depending on the type. - If softmax is applied along the dimension other than the last, `Softmax.cpp::masked_softmax_cpu` expands masks to 4D, converting them to `mask_type=2`. Theoretically one could also add special optimized cases for `dim=0, 1, 2` and process them without mask expansion, but I don't know how often is that used ## Tests: - `test_transformerencoderlayer_fast_path` is extended to cover both attention mask and padding mask - `test_masked_softmax_mask_types_0_1` is added to ensure results from CPU softmax with attention and padding masks match the explicit slow calculation - `test_masked_softmax_devices_parity` is added to ensure results from masked softmax on CPU and CUDA match ## Note I had to replace `float` with `torch.get_default_dtype()` in a couple of tests for the following reason: - `test_nn.py` [sets the default type to `torch.double`](https://github.com/pytorch/pytorch/blob/master/test/test_nn.py#L24-L26) - If I execute `test_nn.py` and `test_transformers.py` in one `pytest` run, this default still holds for transformer tests - Some tests in `test_transformers.py` which were previously following the slow path now switched to fast path, and hard-coded `float` started clashing with default `double` Let me know if there is a better way around it - or maybe I'm not supposed to run tests with `pytest` like this Pull Request resolved: pytorch#87377 Approved by: https://github.com/mikekgfb, https://github.com/weiwangmeta, https://github.com/malfet
) Fixes T135842750 (follow-up for #87377) ## Description At present, having both `src_key_padding_mask` and `src_mask` at the same time is not supported on the fastpath in Transformer and Multi-Head Attention. This PR enables using both masks on the fastpath on CPU and GPU: if both masks are passed, we merge them into a 4D mask in Python and change mask type to 2 before passing downstream. Downstream processing in native code is not changed, as it already supports 4D mask. Indeed, it is done depending on the device: - on CUDA, by `SoftMax.cu::masked_softmax_cuda`. When mask type is 2, it calls either `dispatch_softmax_forward` -> `softmax_warp_forward` or `at::softmax` (depending on the input size). In both cases 4D mask is supported. - on CPU, by `SoftMax.cpp::masked_softmax_cpp`. It calls `hosted_softmax` which supports 4D mask. ## Tests - Extended `test_mask_check_fastpath` to check that fast path is indeed taken in Transformer when two masks are passed - Added `test_multihead_self_attn_two_masks_fast_path_mock` to check that fast path is taken in MHA when two masks are passed - Added `test_multihead_self_attn_two_masks_fast_path` to check that fast and slow paths give the same result when two masks are passed in MHA - `test_masked_softmax_mask_types` now covers mask type 2 - `test_transformerencoderlayer_fast_path` (CPU smoke test) is expanded to the case of both masks provided simultaneously - `test_masked_softmax_devices_parity` checks that mask type 2 is accepted by CPU and CUDA paths Pull Request resolved: #88488 Approved by: https://github.com/mikekgfb
…#87377) ## Issues Fixes pytorch#81129 (comment) ## Description Passing a 2D attention mask `src_mask` into the fast path of `TransformerEncoderLayer` in CPU was causing an error and so was disabled in pytorch#81277. This PR unrolls this fix, enabling `src_mask` on the fast path: - Either attention mask `src_mask` of shape `(L, L)` or padding mask `src_key_padding_mask` of shape `(B, L)` are now allowed on the CPU fast path. If softmax is applied along the last dimension (as in multi-head attention), these masks are processed without expanding them to 4D. Instead, when iterating through the input, `Softmax.cpp::host_softmax` converts the index to match the mask dimensions, depending on the type. - If softmax is applied along the dimension other than the last, `Softmax.cpp::masked_softmax_cpu` expands masks to 4D, converting them to `mask_type=2`. Theoretically one could also add special optimized cases for `dim=0, 1, 2` and process them without mask expansion, but I don't know how often is that used ## Tests: - `test_transformerencoderlayer_fast_path` is extended to cover both attention mask and padding mask - `test_masked_softmax_mask_types_0_1` is added to ensure results from CPU softmax with attention and padding masks match the explicit slow calculation - `test_masked_softmax_devices_parity` is added to ensure results from masked softmax on CPU and CUDA match ## Note I had to replace `float` with `torch.get_default_dtype()` in a couple of tests for the following reason: - `test_nn.py` [sets the default type to `torch.double`](https://github.com/pytorch/pytorch/blob/master/test/test_nn.py#L24-L26) - If I execute `test_nn.py` and `test_transformers.py` in one `pytest` run, this default still holds for transformer tests - Some tests in `test_transformers.py` which were previously following the slow path now switched to fast path, and hard-coded `float` started clashing with default `double` Let me know if there is a better way around it - or maybe I'm not supposed to run tests with `pytest` like this Pull Request resolved: pytorch#87377 Approved by: https://github.com/mikekgfb, https://github.com/weiwangmeta, https://github.com/malfet
…orch#88488) Fixes T135842750 (follow-up for pytorch#87377) ## Description At present, having both `src_key_padding_mask` and `src_mask` at the same time is not supported on the fastpath in Transformer and Multi-Head Attention. This PR enables using both masks on the fastpath on CPU and GPU: if both masks are passed, we merge them into a 4D mask in Python and change mask type to 2 before passing downstream. Downstream processing in native code is not changed, as it already supports 4D mask. Indeed, it is done depending on the device: - on CUDA, by `SoftMax.cu::masked_softmax_cuda`. When mask type is 2, it calls either `dispatch_softmax_forward` -> `softmax_warp_forward` or `at::softmax` (depending on the input size). In both cases 4D mask is supported. - on CPU, by `SoftMax.cpp::masked_softmax_cpp`. It calls `hosted_softmax` which supports 4D mask. ## Tests - Extended `test_mask_check_fastpath` to check that fast path is indeed taken in Transformer when two masks are passed - Added `test_multihead_self_attn_two_masks_fast_path_mock` to check that fast path is taken in MHA when two masks are passed - Added `test_multihead_self_attn_two_masks_fast_path` to check that fast and slow paths give the same result when two masks are passed in MHA - `test_masked_softmax_mask_types` now covers mask type 2 - `test_transformerencoderlayer_fast_path` (CPU smoke test) is expanded to the case of both masks provided simultaneously - `test_masked_softmax_devices_parity` checks that mask type 2 is accepted by CPU and CUDA paths Pull Request resolved: pytorch#88488 Approved by: https://github.com/mikekgfb
Issues
Fixes #81129 (comment)
Description
Passing a 2D attention mask
src_mask
into the fast path ofTransformerEncoderLayer
in CPU was causing an error and so was disabled in #81277. This PR unrolls this fix, enablingsrc_mask
on the fast path:src_mask
of shape(L, L)
or padding masksrc_key_padding_mask
of shape(B, L)
are now allowed on the CPU fast path. If softmax is applied along the last dimension (as in multi-head attention), these masks are processed without expanding them to 4D. Instead, when iterating through the input,Softmax.cpp::host_softmax
converts the index to match the mask dimensions, depending on the type.Softmax.cpp::masked_softmax_cpu
expands masks to 4D, converting them tomask_type=2
. Theoretically one could also add special optimized cases fordim=0, 1, 2
and process them without mask expansion, but I don't know how often is that usedTests:
test_transformerencoderlayer_fast_path
is extended to cover both attention mask and padding masktest_masked_softmax_mask_types_0_1
is added to ensure results from CPU softmax with attention and padding masks match the explicit slow calculationtest_masked_softmax_devices_parity
is added to ensure results from masked softmax on CPU and CUDA matchNote
I had to replace
float
withtorch.get_default_dtype()
in a couple of tests for the following reason:test_nn.py
sets the default type totorch.double
test_nn.py
andtest_transformers.py
in onepytest
run, this default still holds for transformer teststest_transformers.py
which were previously following the slow path now switched to fast path, and hard-codedfloat
started clashing with defaultdouble
Let me know if there is a better way around it - or maybe I'm not supposed to run tests with
pytest
like this