-
Notifications
You must be signed in to change notification settings - Fork 21.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MHA: Fix regression and apply bias flag to both in/out proj #52537
Conversation
💊 CI failures summary and remediationsAs of commit 75dfb0f (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group. |
I don't think we need to worry about removing _LinearWithBias. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbschlosser has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Codecov Report
@@ Coverage Diff @@
## master #52537 +/- ##
=======================================
Coverage 80.64% 80.64%
=======================================
Files 1969 1969
Lines 215943 215944 +1
=======================================
+ Hits 174137 174138 +1
Misses 41806 41806 |
@@ -65,7 +65,7 @@ def __init__(self, embed_dim: int, num_heads: int, | |||
# TODO: The use of the `_LinearWithBias` increases the quantization noise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this comment still relevant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went back and forth on this, but I left it in for the case where an older model is loaded and then quantized. Is this valid? The comment should probably be at least updated to reflect that out_proj
could or could not be a _LinearWithBias
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgmt. @vkuzo mind taking a look at the quantization code?
@jbschlosser merged this pull request in a39b1c4. |
cc @z-a-f as an fyi |
…52537) Summary: Fixes pytorch#52257 ## Background Reverts MHA behavior for `bias` flag to that of v1.5: flag enables or disables both in and out projection biases. Updates type annotations for both in and out projections biases from `Tensor` to `Optional[Tensor]` for `torch.jit.script` usage. Note: With this change, `_LinearWithBias` defined in `torch/nn/modules/linear.py` is no longer utilized. Completely removing it would require updates to quantization logic in the following files: ``` test/quantization/test_quantized_module.py torch/nn/quantizable/modules/activation.py torch/nn/quantized/dynamic/modules/linear.py torch/nn/quantized/modules/linear.py torch/quantization/quantization_mappings.py ``` This PR takes a conservative initial approach and leaves these files unchanged. **Is it safe to fully remove `_LinearWithBias`?** Pull Request resolved: pytorch#52537 Test Plan: ``` python test/test_nn.py TestNN.test_multihead_attn_no_bias ``` ## BC-Breaking Note In v1.6, the behavior of `MultiheadAttention`'s `bias` flag was incorrectly changed to affect only the in projection layer. That is, setting `bias=False` would fail to disable the bias for the out projection layer. This regression has been fixed, and the `bias` flag now correctly applies to both the in and out projection layers. Reviewed By: bdhirsh Differential Revision: D26583639 Pulled By: jbschlosser fbshipit-source-id: b805f3a052628efb28b89377a41e06f71747ac5b
…52537) Summary: Fixes pytorch#52257 ## Background Reverts MHA behavior for `bias` flag to that of v1.5: flag enables or disables both in and out projection biases. Updates type annotations for both in and out projections biases from `Tensor` to `Optional[Tensor]` for `torch.jit.script` usage. Note: With this change, `_LinearWithBias` defined in `torch/nn/modules/linear.py` is no longer utilized. Completely removing it would require updates to quantization logic in the following files: ``` test/quantization/test_quantized_module.py torch/nn/quantizable/modules/activation.py torch/nn/quantized/dynamic/modules/linear.py torch/nn/quantized/modules/linear.py torch/quantization/quantization_mappings.py ``` This PR takes a conservative initial approach and leaves these files unchanged. **Is it safe to fully remove `_LinearWithBias`?** Pull Request resolved: pytorch#52537 Test Plan: ``` python test/test_nn.py TestNN.test_multihead_attn_no_bias ``` ## BC-Breaking Note In v1.6, the behavior of `MultiheadAttention`'s `bias` flag was incorrectly changed to affect only the in projection layer. That is, setting `bias=False` would fail to disable the bias for the out projection layer. This regression has been fixed, and the `bias` flag now correctly applies to both the in and out projection layers. Reviewed By: bdhirsh Differential Revision: D26583639 Pulled By: jbschlosser fbshipit-source-id: b805f3a052628efb28b89377a41e06f71747ac5b
Fixes a post-1.8 regression in nn.MultiheadAttention + quantization scriptability introduced in #52537. Passes the new test introduced in that PR, and fixes the repro found by @ngimel [here](https://gist.github.com/bhosmer/ef517d0774f2f10336b8140116fd6b62). Per comments in #52537 there's definitely a carnal dependency between quantization and the `_LinearWithBias` class by name that I'm reinstating here, but there may be cleaner ways to solve this - I don't really know what I'm doing 😁 . @jbschlosser @z-a-f LMK if you have ideas, happy to change this as desired. It'd be nice to get a fix into 1.9. Differential Revision: [D28593830](https://our.internmc.facebook.com/intern/diff/D28593830) [ghstack-poisoned]
Fixes a post-1.8 regression in nn.MultiheadAttention + quantization scriptability introduced in #52537. Passes the new test introduced in that PR, and fixes the repro found by @ngimel [here](https://gist.github.com/bhosmer/ef517d0774f2f10336b8140116fd6b62). Per comments in #52537 there's definitely a carnal dependency between quantization and the `_LinearWithBias` class by name that I'm reinstating here, but there may be cleaner ways to solve this - I don't really know what I'm doing 😁 . @jbschlosser @z-a-f LMK if you have ideas, happy to change this as desired. It'd be nice to get a fix into 1.9. _[Update: now using a better name instead of `_LinearWithBias`, but this remains a short-term fix to re-suppress a quantization API usage error that should properly be raised upstream. See issue #58969]_ Differential Revision: [D28593830](https://our.internmc.facebook.com/intern/diff/D28593830) [ghstack-poisoned]
Fixes a post-1.8 regression in nn.MultiheadAttention + quantization scriptability introduced in #52537. Passes the new test introduced in that PR, and fixes the repro found by @ngimel [here](https://gist.github.com/bhosmer/ef517d0774f2f10336b8140116fd6b62). Per comments in #52537 there's definitely a carnal dependency between quantization and the `_LinearWithBias` class by name that I'm reinstating here, but there may be cleaner ways to solve this - I don't really know what I'm doing 😁 . @jbschlosser @z-a-f LMK if you have ideas, happy to change this as desired. It'd be nice to get a fix into 1.9. _[Update: now using a better name instead of `_LinearWithBias`, but this remains a short-term fix to re-suppress a quantization API usage error that should properly be raised upstream. See issue #58969]_ Differential Revision: [D28593830](https://our.internmc.facebook.com/intern/diff/D28593830) [ghstack-poisoned]
Fixes a post-1.8 regression in nn.MultiheadAttention + quantization scriptability introduced in #52537. Passes the new test introduced in that PR, and fixes the repro found by @ngimel [here](https://gist.github.com/bhosmer/ef517d0774f2f10336b8140116fd6b62). Per comments in #52537 there's definitely a carnal dependency between quantization and the `_LinearWithBias` class by name that I'm reinstating here, but there may be cleaner ways to solve this - I don't really know what I'm doing 😁 . @jbschlosser @z-a-f LMK if you have ideas, happy to change this as desired. It'd be nice to get a fix into 1.9. _[Update: now using a better name instead of `_LinearWithBias`, but this remains a short-term fix to re-suppress a quantization API usage error that should properly be raised upstream. See issue #58969]_ Differential Revision: [D28593830](https://our.internmc.facebook.com/intern/diff/D28593830) [ghstack-poisoned]
Fixes #52257
Background
Reverts MHA behavior for
bias
flag to that of v1.5: flag enables or disables both in and out projection biases.Updates type annotations for both in and out projections biases from
Tensor
toOptional[Tensor]
fortorch.jit.script
usage.Note: With this change,
_LinearWithBias
defined intorch/nn/modules/linear.py
is no longer utilized. Completely removing it would require updates to quantization logic in the following files:This PR takes a conservative initial approach and leaves these files unchanged.
Is it safe to fully remove
_LinearWithBias
?Test Plan
BC-Breaking Note
In v1.6, the behavior of
MultiheadAttention
'sbias
flag was incorrectly changed to affect only the in projection layer. That is, settingbias=False
would fail to disable the bias for the out projection layer. This regression has been fixed, and thebias
flag now correctly applies to both the in and out projection layers.