Fix gradient shape error for DPMultiheadAttention (issue 650) #651

HuanyuZhang · 2024-05-16T17:31:37Z

Summary:
When batch_first = True, the activation and partial gradient for each linear layer in DPMultiheadAttention still has batch_size in the second dimension, thus causing wrong gradient shape in optimizer.step().

Details in: #650

Differential Revision: D57446245

facebook-github-bot · 2024-05-16T17:31:46Z

This pull request was exported from Phabricator. Differential Revision: D57446245

…h#651) Summary: When batch_first = True, the activation and partial gradient for each linear layer in DPMultiheadAttention still has batch_size in the second dimension, thus causing wrong gradient shape in optimizer.step(). Details in: pytorch#650 Differential Revision: D57446245

facebook-github-bot · 2024-05-16T19:16:44Z

This pull request was exported from Phabricator. Differential Revision: D57446245

…h#651) Summary: When batch_first = True, the activation and partial gradient for each linear layer in DPMultiheadAttention still has batch_size in the second dimension, thus causing wrong gradient shape in optimizer.step(). Details in: pytorch#650 Differential Revision: D57446245

facebook-github-bot · 2024-05-30T00:05:58Z

This pull request was exported from Phabricator. Differential Revision: D57446245

…h#651) Summary: When batch_first = True, the activation and partial gradient for each linear layer in DPMultiheadAttention still has batch_size in the second dimension, thus causing wrong gradient shape in optimizer.step(). Details in: pytorch#650 Reviewed By: EnayatUllah Differential Revision: D57446245

facebook-github-bot · 2024-05-31T18:54:26Z

This pull request was exported from Phabricator. Differential Revision: D57446245

facebook-github-bot · 2024-05-31T20:23:47Z

This pull request has been merged in 202c58a.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 16, 2024

facebook-github-bot added the fb-exported label May 16, 2024

HuanyuZhang force-pushed the export-D57446245 branch from cbc61b8 to 0d8ccfd Compare May 16, 2024 19:16

HuanyuZhang force-pushed the export-D57446245 branch from 0d8ccfd to 98af8b9 Compare May 30, 2024 00:05

HuanyuZhang force-pushed the export-D57446245 branch from 98af8b9 to d4bc8c7 Compare May 31, 2024 18:54

facebook-github-bot closed this in 202c58a May 31, 2024

facebook-github-bot added the Merged label May 31, 2024

HuanyuZhang mentioned this pull request May 31, 2024

Error in DPOptimizer: Inconsistency between batch_first argument of PrivacyEngine and DPMultiheadAttention #650

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix gradient shape error for DPMultiheadAttention (issue 650) #651

Fix gradient shape error for DPMultiheadAttention (issue 650) #651

HuanyuZhang commented May 16, 2024

facebook-github-bot commented May 16, 2024

facebook-github-bot commented May 16, 2024

facebook-github-bot commented May 30, 2024

facebook-github-bot commented May 31, 2024

facebook-github-bot commented May 31, 2024

Fix gradient shape error for DPMultiheadAttention (issue 650) #651

Fix gradient shape error for DPMultiheadAttention (issue 650) #651

Conversation

HuanyuZhang commented May 16, 2024

facebook-github-bot commented May 16, 2024

facebook-github-bot commented May 16, 2024

facebook-github-bot commented May 30, 2024

facebook-github-bot commented May 31, 2024

facebook-github-bot commented May 31, 2024