Support src_mask and src_key_padding_mask for Better Transformer #88488

sgrigory · 2022-11-04T12:47:27Z

Fixes T135842750 (follow-up for #87377)

Description

At present, having both src_key_padding_mask and src_mask at the same time is not supported on the fastpath in Transformer and Multi-Head Attention.

This PR enables using both masks on the fastpath on CPU and GPU: if both masks are passed, we merge them into a 4D mask in Python and change mask type to 2 before passing downstream.

Downstream processing in native code is not changed, as it already supports 4D mask. Indeed, it is done depending on the device:

on CUDA, by SoftMax.cu::masked_softmax_cuda. When mask type is 2, it calls either dispatch_softmax_forward -> softmax_warp_forward or at::softmax (depending on the input size). In both cases 4D mask is supported.
on CPU, by SoftMax.cpp::masked_softmax_cpp. It calls hosted_softmax which supports 4D mask.

Tests

Extended test_mask_check_fastpath to check that fast path is indeed taken in Transformer when two masks are passed
Added test_multihead_self_attn_two_masks_fast_path_mock to check that fast path is taken in MHA when two masks are passed
Added test_multihead_self_attn_two_masks_fast_path to check that fast and slow paths give the same result when two masks are passed in MHA
test_masked_softmax_mask_types now covers mask type 2
test_transformerencoderlayer_fast_path (CPU smoke test) is expanded to the case of both masks provided simultaneously
test_masked_softmax_devices_parity checks that mask type 2 is accepted by CPU and CUDA paths

pytorch-bot · 2022-11-04T12:47:31Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/88488

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 4366984:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2022-11-04T18:40:03Z

@sgrigory has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2022-11-07T13:49:29Z

@sgrigory has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2022-11-08T09:33:10Z

@sgrigory has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2022-11-08T11:32:49Z

@sgrigory has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

mikekgfb · 2022-11-08T22:48:08Z

@pytorchbot merge

pytorchmergebot · 2022-11-08T22:49:44Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2022-11-08T23:30:07Z

Merge failed

Reason: 2 additional jobs have failed, first few of them are: trunk ,trunk / cuda11.6-py3.10-gcc7-sm86 / test (default, 2, 4, linux.g5.4xlarge.nvidia.gpu)

Details for Dev Infra team

Raised by workflow job

sgrigory · 2022-11-09T06:58:31Z

@malfet The merge above failed, but the errors seem to be unrelated to the PR:

"ERROR ENCOUNTERED WHEN UPLOADING TO SCRIBE",
"KeyError: 'jobs'" in "Get workflow job id"
NVIDIA kernel loading error.

Could you have a look and say if those are indeed known infra failures?

sgrigory · 2022-11-09T08:05:39Z

@pytorchbot rebase

pytorchmergebot · 2022-11-09T08:08:16Z

@pytorchbot successfully started a rebase job. Check the current status here

pytorchmergebot · 2022-11-09T08:08:22Z

Successfully rebased support-two-masks-better-transformer onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout support-two-masks-better-transformer && git pull --rebase)

facebook-github-bot · 2022-11-09T20:31:17Z

@sgrigory has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

sgrigory · 2022-11-10T08:11:16Z

@pytorchbot merge

pytorchmergebot · 2022-11-10T08:12:52Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…orch#88488) Fixes T135842750 (follow-up for pytorch#87377) ## Description At present, having both `src_key_padding_mask` and `src_mask` at the same time is not supported on the fastpath in Transformer and Multi-Head Attention. This PR enables using both masks on the fastpath on CPU and GPU: if both masks are passed, we merge them into a 4D mask in Python and change mask type to 2 before passing downstream. Downstream processing in native code is not changed, as it already supports 4D mask. Indeed, it is done depending on the device: - on CUDA, by `SoftMax.cu::masked_softmax_cuda`. When mask type is 2, it calls either `dispatch_softmax_forward` -> `softmax_warp_forward` or `at::softmax` (depending on the input size). In both cases 4D mask is supported. - on CPU, by `SoftMax.cpp::masked_softmax_cpp`. It calls `hosted_softmax` which supports 4D mask. ## Tests - Extended `test_mask_check_fastpath` to check that fast path is indeed taken in Transformer when two masks are passed - Added `test_multihead_self_attn_two_masks_fast_path_mock` to check that fast path is taken in MHA when two masks are passed - Added `test_multihead_self_attn_two_masks_fast_path` to check that fast and slow paths give the same result when two masks are passed in MHA - `test_masked_softmax_mask_types` now covers mask type 2 - `test_transformerencoderlayer_fast_path` (CPU smoke test) is expanded to the case of both masks provided simultaneously - `test_masked_softmax_devices_parity` checks that mask type 2 is accepted by CPU and CUDA paths Pull Request resolved: pytorch#88488 Approved by: https://github.com/mikekgfb

pytorch-bot bot added the release notes: nn release notes category label Nov 4, 2022

sgrigory changed the title ~~[WIP] Support src_mask and src_key_padding_mask for Better Transformer~~ Support src_mask and src_key_padding_mask for Better Transformer Nov 8, 2022

sgrigory marked this pull request as ready for review November 8, 2022 17:09

sgrigory requested review from albanD and jbschlosser as code owners November 8, 2022 17:09

sgrigory requested a review from mikekgfb November 8, 2022 17:37

mikekgfb approved these changes Nov 8, 2022

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 8, 2022

sgrigory added 10 commits November 9, 2022 08:08

Allow both masks on fast path in transformer and MHA

465fc9d

Fix JIT issue in MHA

42e0e5c

Add test for MHA with 2 masks on fast path

5c7061d

Run linter

89de858

Add test for Transformer fastpath with 2 masks

cce510c

Add mock test for MHA fast path with 2 masks

e1f3049

Restrict MHA fast path test to native devices, no autocast

6fc3278

Disable crossref tests for MHA fastpath mock

be76d55

Fix MHA fastpath mock test

525e3ba

Fix skipping MHA fastpath mock test on non CPU or CUDA

4366984

pytorchmergebot force-pushed the support-two-masks-better-transformer branch from f327c24 to 4366984 Compare November 9, 2022 08:08

pytorchmergebot added the Merged label Nov 10, 2022

pytorchmergebot closed this in 7ad87f6 Nov 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support src_mask and src_key_padding_mask for Better Transformer #88488

Support src_mask and src_key_padding_mask for Better Transformer #88488

Uh oh!

sgrigory commented Nov 4, 2022 •

edited

Loading

Uh oh!

pytorch-bot bot commented Nov 4, 2022 •

edited

Loading

Uh oh!

facebook-github-bot commented Nov 4, 2022

Uh oh!

facebook-github-bot commented Nov 7, 2022

Uh oh!

facebook-github-bot commented Nov 8, 2022

Uh oh!

facebook-github-bot commented Nov 8, 2022

Uh oh!

mikekgfb commented Nov 8, 2022

Uh oh!

pytorchmergebot commented Nov 8, 2022

Uh oh!

pytorchmergebot commented Nov 8, 2022

Uh oh!

sgrigory commented Nov 9, 2022

Uh oh!

sgrigory commented Nov 9, 2022

Uh oh!

pytorchmergebot commented Nov 9, 2022

Uh oh!

pytorchmergebot commented Nov 9, 2022

Uh oh!

facebook-github-bot commented Nov 9, 2022

Uh oh!

sgrigory commented Nov 10, 2022

Uh oh!

pytorchmergebot commented Nov 10, 2022

Uh oh!

Uh oh!

Support src_mask and src_key_padding_mask for Better Transformer #88488

Support src_mask and src_key_padding_mask for Better Transformer #88488

Uh oh!

Conversation

sgrigory commented Nov 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Uh oh!

pytorch-bot bot commented Nov 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/88488

✅ No Failures

Uh oh!

facebook-github-bot commented Nov 4, 2022

Uh oh!

facebook-github-bot commented Nov 7, 2022

Uh oh!

facebook-github-bot commented Nov 8, 2022

Uh oh!

facebook-github-bot commented Nov 8, 2022

Uh oh!

mikekgfb commented Nov 8, 2022

Uh oh!

pytorchmergebot commented Nov 8, 2022

Merge started

Uh oh!

pytorchmergebot commented Nov 8, 2022

Merge failed

Uh oh!

sgrigory commented Nov 9, 2022

Uh oh!

sgrigory commented Nov 9, 2022

Uh oh!

pytorchmergebot commented Nov 9, 2022

Uh oh!

pytorchmergebot commented Nov 9, 2022

Uh oh!

facebook-github-bot commented Nov 9, 2022

Uh oh!

sgrigory commented Nov 10, 2022

Uh oh!

pytorchmergebot commented Nov 10, 2022

Merge started

Uh oh!

Uh oh!

sgrigory commented Nov 4, 2022 •

edited

Loading

pytorch-bot bot commented Nov 4, 2022 •

edited

Loading