fix nn.MHA + quantized scriptability #58727

bhosmer · 2021-05-21T03:26:06Z

Stack from ghstack:

fix nn.MHA + quantized scriptability #58727 fix nn.MHA + quantized scriptability

Fixes a post-1.8 regression in nn.MultiheadAttention + quantization scriptability introduced in #52537. Passes the new test introduced in that PR, and fixes the repro found by @ngimel here.

Per comments in #52537 there's definitely a carnal dependency between quantization and the _LinearWithBias class by name that I'm reinstating here, but there may be cleaner ways to solve this - I don't really know what I'm doing 😁 .

@jbschlosser @z-a-f LMK if you have ideas, happy to change this as desired. It'd be nice to get a fix into 1.9.

[Update: now using a better name instead of _LinearWithBias, but this remains a short-term fix to re-suppress a quantization API usage error that should properly be raised upstream. See issue #58969]

Differential Revision: D28593830

[ghstack-poisoned]

facebook-github-bot · 2021-05-21T03:26:12Z

💊 CI failures summary and remediations

As of commit c4e8358 (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

ghstack-source-id: 570d2647e3761ca48b415e3f4487b5e4453bbb57 Pull Request resolved: #58727

bhosmer · 2021-05-21T03:26:50Z

@bhosmer has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

jbschlosser

Thanks for tracking this down!

Strategically-placed "type: ignore"s work for passing mypy checks. Should also add a test doing quantization -> script (as done in Natalia's repro script) to catch a regression like this in the future.

jbschlosser · 2021-05-21T04:43:04Z

torch/nn/modules/linear.py

@@ -109,8 +109,10 @@ def extra_repr(self) -> str:
 class _LinearWithBias(Linear):


Comment above isn't quite right now wrt bias never being None.

jbschlosser · 2021-05-21T05:17:30Z

Although the fix with some added "type: ignore"s seems to make the problem go away, I really want to understand this more. Running @ngimel's repro gives me:

RuntimeError:
method cannot be used as a value:
  File "/Users/jbschlosser/misc/mish/torch/nn/modules/activation.py", line 1024
                self.in_proj_weight, self.in_proj_bias,
                self.bias_k, self.bias_v, self.add_zero_attn,
                self.dropout, self.out_proj.weight, self.out_proj.bias,
                              ~~~~~~~~~~~~~~~~~~~~ <--- HERE
                training=self.training,
                key_padding_mask=key_padding_mask, need_weights=need_weights,

Note that it's pointing to self.out_proj.weight, not self.out_proj.bias.

After quantization, self.out_proj.weight is a torch.nn.quantized.dynamic.Linear module, which derives from torch.nn.quantized.Linear, which has a method named weight(), explaining the error message.

It looks to me like swapping out Linear for _LinearWithBias just breaks the quantization layer swapping for out_proj, incidentally avoiding the above error. Compare post-quantization state when MHA has a _LinearWithBias out_proj:

TransformerEncoderLayer(
  (self_attn): MultiheadAttention(
    (out_proj): _LinearWithBias(in_features=1024, out_features=1024, bias=True)
  )
  (linear1): DynamicQuantizedLinear(in_features=1024, out_features=4096, dtype=torch.qint8, qscheme=torch.per_tensor_affine)
  ...
)

vs. when MHA has a Linear out_proj (note that it successfully swaps in DynamicQuantizedLinear, and this causes the error):

TransformerEncoderLayer(
  (self_attn): MultiheadAttention(
    (out_proj): DynamicQuantizedLinear(in_features=1024, out_features=1024, dtype=torch.qint8, qscheme=torch.per_tensor_affine)
  )
  (linear1): DynamicQuantizedLinear(in_features=1024, out_features=4096, dtype=torch.qint8, qscheme=torch.per_tensor_affine)
  ...
)

@z-a-f Can you or someone else familiar with quantization explain how this is intended to work / what is going wrong here?

bhosmer · 2021-05-21T07:08:40Z

@jbschlosser thanks for the detailed analysis, both your conclusions make sense to me (the source of the error, and the fact that this change just suppresses it by preventing quantization).

In the interests of expediting a possible "real" 1.9 fix, also ccing @vkuzo @raghuramank100 who participated in @z-a-f's original MHA quantization PR #49866. What do you folks think, given Joel's analysis above - is a real fix quick enough to cherry pick into 1.9, or should we just go with suppressing the error?

jbschlosser · 2021-05-25T13:58:51Z

@bhosmer @z-a-f @vkuzo @raghuramank100 Note that the deadline to get this in for 1.9.0 is EOD this Thursday (since Friday is a holiday). Seems unlikely we'll be able to get a real fix done in time, so unless we hear otherwise very soon I vote to move forward with a (well-documented) error suppression.

z-a-f

@z-a-f Can you or someone else familiar with quantization explain how this is intended to work / what is going wrong here?

The MHA is not supposed to be quantized. That's why we introduced quantizable modules, which are supposed to be used in the case of MHA.

z-a-f · 2021-05-25T16:03:29Z

torch/nn/modules/linear.py

+        super().__init__(in_features, out_features, bias=bias,
+                         device=device, dtype=dtype)


I think adding bias as an argument defies the whole name of the module

@z-a-f not sure what you mean by this, can you elaborate?

I think he is saying it's contradictory that the module is called _LinearWithBias and yet it may or may not have a bias, based on the value of the bias flag.

ah got it, thanks 😁

...though I think the module does in fact need to pass the full set of Linear ctor params through for the fix to work, I think (given the call site in MultiheadAttention.init)

You're correct about that, since we want to maintain the regression fix by having bias apply to both in / out projection layers.

z-a-f · 2021-05-25T16:18:55Z

@z-a-f Can you or someone else familiar with quantization explain how this is intended to work / what is going wrong here?

The MHA is not supposed to be quantized. That's why we introduced quantizable modules, which are supposed to be used in the case of MHA.

To elaborate on this:

The MHA cannot be quantized directly, so we introduced a "quantizable" MHA, which can be used by providing the custom module args during quantization:
- https://github.com/pytorch/pytorch/blob/master/torch/nn/quantizable/modules/activation.py
- https://github.com/pytorch/pytorch/blob/master/test/quantization/test_quantized_op.py#L2418-L2529
If fixing the nn.MHA breaks the quantization of that module, this means that the custom module was not provided, and the nn.Linear was replaced by the quantized version
The reasoning behind using the "quantizable" MHA is so that we could use the parameters of the internal layers without breaking anything -- for example, reuse of the weight

bhosmer · 2021-05-25T18:49:44Z

@jbschlosser if I understand @z-a-f's description correctly, it sounds like the original bug was due an improper use of the quantization API (assuming @ngimel's repro uses the same pattern). In which case, this PR is basically reinstating the dodge that prevents the improper use from breaking scripting (due to the weight vs weight() typecheck failure).

But if the provoking usage is improper, then arguably we should leave your cleanup work in place and abandon this PR. WDYT?

ngimel · 2021-05-25T18:54:58Z

If the provoking usage is improper, then there should be a documented suggested workflow for quantizing + scripting MHA models, and the error message should be better. Given it's really easy to stumble upon this usage (just quantize and script your model, which is what many people do), and FAIM apparently did stumble upon this usage, and I couldn't find documentation on how to do it properly, I'd rather go with this fix.

bhosmer · 2021-05-25T19:08:22Z

If the provoking usage is improper, then there should be a documented suggested workflow for quantizing + scripting MHA models, and the error message should be better. Given it's really easy to stumble upon this usage (just quantize and script your model, which is what many people do), and FAIM apparently did stumble upon this usage, and I couldn't find documentation on how to do it properly, I'd rather go with this fix.

Agreed, it's easy enough and there's no real way for the error to be more specific; it's a vanilla type mismatch on a property.

@z-a-f are there docs that would warn people away from doing this?

jbschlosser · 2021-05-25T19:17:32Z

Should we ideally get an error if quantization is attempted on a module with a (non-quantizable) MHA layer? Adding this check would be BC-breaking but it should probably be in place longer term if we want to dissuade users from trying to quantize MHA directly.

@ngimel

Fixes a post-1.8 regression in nn.MultiheadAttention + quantization scriptability introduced in #52537. Passes the new test introduced in that PR, and fixes the repro found by @ngimel [here](https://gist.github.com/bhosmer/ef517d0774f2f10336b8140116fd6b62). Per comments in #52537 there's definitely a carnal dependency between quantization and the `_LinearWithBias` class by name that I'm reinstating here, but there may be cleaner ways to solve this - I don't really know what I'm doing 😁 . @jbschlosser @z-a-f LMK if you have ideas, happy to change this as desired. It'd be nice to get a fix into 1.9. Differential Revision: [D28593830](https://our.internmc.facebook.com/intern/diff/D28593830) [ghstack-poisoned]

ghstack-source-id: 42e2e78a8de5106dc1287221acc4ac231e22c44d Pull Request resolved: #58727

bhosmer · 2021-05-26T02:51:31Z

@bhosmer has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

bhosmer · 2021-05-26T02:55:13Z

@jbschlosser @ngimel FYI updated with new name and a comment block pointing to issue #58969 which points back here 😁

@ngimel

Fixes a post-1.8 regression in nn.MultiheadAttention + quantization scriptability introduced in #52537. Passes the new test introduced in that PR, and fixes the repro found by @ngimel [here](https://gist.github.com/bhosmer/ef517d0774f2f10336b8140116fd6b62). Per comments in #52537 there's definitely a carnal dependency between quantization and the `_LinearWithBias` class by name that I'm reinstating here, but there may be cleaner ways to solve this - I don't really know what I'm doing 😁 . @jbschlosser @z-a-f LMK if you have ideas, happy to change this as desired. It'd be nice to get a fix into 1.9. _[Update: now using a better name instead of `_LinearWithBias`, but this remains a short-term fix to re-suppress a quantization API usage error that should properly be raised upstream. See issue #58969]_ Differential Revision: [D28593830](https://our.internmc.facebook.com/intern/diff/D28593830) [ghstack-poisoned]

@ngimel

Fixes a post-1.8 regression in nn.MultiheadAttention + quantization scriptability introduced in #52537. Passes the new test introduced in that PR, and fixes the repro found by @ngimel [here](https://gist.github.com/bhosmer/ef517d0774f2f10336b8140116fd6b62). Per comments in #52537 there's definitely a carnal dependency between quantization and the `_LinearWithBias` class by name that I'm reinstating here, but there may be cleaner ways to solve this - I don't really know what I'm doing 😁 . @jbschlosser @z-a-f LMK if you have ideas, happy to change this as desired. It'd be nice to get a fix into 1.9. _[Update: now using a better name instead of `_LinearWithBias`, but this remains a short-term fix to re-suppress a quantization API usage error that should properly be raised upstream. See issue #58969]_ Differential Revision: [D28593830](https://our.internmc.facebook.com/intern/diff/D28593830) [ghstack-poisoned]

bhosmer · 2021-05-26T06:36:17Z

@bhosmer has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ngimel

Fixes a post-1.8 regression in nn.MultiheadAttention + quantization scriptability introduced in #52537. Passes the new test introduced in that PR, and fixes the repro found by @ngimel [here](https://gist.github.com/bhosmer/ef517d0774f2f10336b8140116fd6b62). Per comments in #52537 there's definitely a carnal dependency between quantization and the `_LinearWithBias` class by name that I'm reinstating here, but there may be cleaner ways to solve this - I don't really know what I'm doing 😁 . @jbschlosser @z-a-f LMK if you have ideas, happy to change this as desired. It'd be nice to get a fix into 1.9. _[Update: now using a better name instead of `_LinearWithBias`, but this remains a short-term fix to re-suppress a quantization API usage error that should properly be raised upstream. See issue #58969]_ Differential Revision: [D28593830](https://our.internmc.facebook.com/intern/diff/D28593830) [ghstack-poisoned]

ghstack-source-id: b967e6c2fe295a31fec28b0a5639085d8d7a0dde Pull Request resolved: #58727

bhosmer · 2021-05-26T06:41:01Z

@bhosmer has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

jbschlosser

Looks good- thanks for making the changes!

I know it's "technically incorrect", but perhaps a test that does dynamic quantization -> scripting could be added to ensure this regression doesn't happen again until we've cleaned up the story around quantizing MHA?

facebook-github-bot · 2021-05-26T22:31:20Z

@bhosmer merged this pull request in 58d1b36.

Summary: Pull Request resolved: #58727 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28593830 Pulled By: bhosmer fbshipit-source-id: 37dee9efededaea9985a2bf040df1ba4b46f6580

Summary: Pull Request resolved: pytorch#58727 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28593830 Pulled By: bhosmer fbshipit-source-id: 37dee9efededaea9985a2bf040df1ba4b46f6580

fix nn.MHA scriptability

113ae37

[ghstack-poisoned]

bhosmer requested review from albanD and jbschlosser as code owners May 21, 2021 03:26

facebook-github-bot added the cla signed label May 21, 2021

bhosmer added a commit that referenced this pull request May 21, 2021

fix nn.MHA scriptability

8060791

ghstack-source-id: 570d2647e3761ca48b415e3f4487b5e4453bbb57 Pull Request resolved: #58727

bhosmer requested a review from z-a-f May 21, 2021 03:40

bhosmer changed the title ~~fix nn.MHA scriptability~~ fix nn.MHA + quantized scriptability May 21, 2021

bhosmer removed the request for review from albanD May 21, 2021 03:43

jbschlosser reviewed May 21, 2021

View reviewed changes

ngimel added this to the 1.9.0 milestone May 21, 2021

z-a-f reviewed May 25, 2021

View reviewed changes

bhosmer mentioned this pull request May 26, 2021

Improper use of quantization API for MHA should fail fast #58969

Open

bhosmer added a commit that referenced this pull request May 26, 2021

fix nn.MHA scriptability

4699d22

ghstack-source-id: 42e2e78a8de5106dc1287221acc4ac231e22c44d Pull Request resolved: #58727

ngimel approved these changes May 26, 2021

View reviewed changes

bhosmer added a commit that referenced this pull request May 26, 2021

fix nn.MHA scriptability

7b12ed3

ghstack-source-id: b967e6c2fe295a31fec28b0a5639085d8d7a0dde Pull Request resolved: #58727

jbschlosser approved these changes May 26, 2021

View reviewed changes

facebook-github-bot closed this in 58d1b36 May 26, 2021

facebook-github-bot added the Merged label May 26, 2021

This was referenced May 27, 2021

fix nn.MHA scriptability (#58727) #59072

Merged

[v1.9.0] Release Tracker #58518

Closed

facebook-github-bot deleted the gh/bhosmer/57/head branch May 30, 2021 14:17

jhoareau mentioned this pull request Jun 17, 2021

torch.load non backwards compatible on Transformer between 1.8.1 and 1.9.0 #60165

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix nn.MHA + quantized scriptability #58727

fix nn.MHA + quantized scriptability #58727

bhosmer commented May 21, 2021 •

edited

Loading

facebook-github-bot commented May 21, 2021 •

edited

Loading

bhosmer commented May 21, 2021

jbschlosser left a comment

jbschlosser May 21, 2021

jbschlosser commented May 21, 2021 •

edited

Loading

bhosmer commented May 21, 2021

jbschlosser commented May 25, 2021

z-a-f left a comment

z-a-f May 25, 2021

bhosmer May 25, 2021

jbschlosser May 25, 2021

bhosmer May 25, 2021

bhosmer May 25, 2021

jbschlosser May 25, 2021

z-a-f commented May 25, 2021

bhosmer commented May 25, 2021

ngimel commented May 25, 2021

bhosmer commented May 25, 2021

jbschlosser commented May 25, 2021

bhosmer commented May 26, 2021

bhosmer commented May 26, 2021

bhosmer commented May 26, 2021

bhosmer commented May 26, 2021

jbschlosser left a comment

facebook-github-bot commented May 26, 2021

		@@ -109,8 +109,10 @@ def extra_repr(self) -> str:
		class _LinearWithBias(Linear):

		super().__init__(in_features, out_features, bias=bias,
		device=device, dtype=dtype)

fix nn.MHA + quantized scriptability #58727

fix nn.MHA + quantized scriptability #58727

Conversation

bhosmer commented May 21, 2021 • edited Loading

facebook-github-bot commented May 21, 2021 • edited Loading

💊 CI failures summary and remediations

bhosmer commented May 21, 2021

jbschlosser left a comment

Choose a reason for hiding this comment

jbschlosser May 21, 2021

Choose a reason for hiding this comment

jbschlosser commented May 21, 2021 • edited Loading

bhosmer commented May 21, 2021

jbschlosser commented May 25, 2021

z-a-f left a comment

Choose a reason for hiding this comment

z-a-f May 25, 2021

Choose a reason for hiding this comment

bhosmer May 25, 2021

Choose a reason for hiding this comment

jbschlosser May 25, 2021

Choose a reason for hiding this comment

bhosmer May 25, 2021

Choose a reason for hiding this comment

bhosmer May 25, 2021

Choose a reason for hiding this comment

jbschlosser May 25, 2021

Choose a reason for hiding this comment

z-a-f commented May 25, 2021

bhosmer commented May 25, 2021

ngimel commented May 25, 2021

bhosmer commented May 25, 2021

jbschlosser commented May 25, 2021

bhosmer commented May 26, 2021

bhosmer commented May 26, 2021

bhosmer commented May 26, 2021

bhosmer commented May 26, 2021

jbschlosser left a comment

Choose a reason for hiding this comment

facebook-github-bot commented May 26, 2021

bhosmer commented May 21, 2021 •

edited

Loading

facebook-github-bot commented May 21, 2021 •

edited

Loading

jbschlosser commented May 21, 2021 •

edited

Loading