Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix nn.MHA + quantized scriptability #58727

Closed
wants to merge 5 commits into from

Conversation

bhosmer
Copy link
Contributor

@bhosmer bhosmer commented May 21, 2021

Stack from ghstack:

Fixes a post-1.8 regression in nn.MultiheadAttention + quantization scriptability introduced in #52537. Passes the new test introduced in that PR, and fixes the repro found by @ngimel here.

Per comments in #52537 there's definitely a carnal dependency between quantization and the _LinearWithBias class by name that I'm reinstating here, but there may be cleaner ways to solve this - I don't really know what I'm doing 😁 .

@jbschlosser @z-a-f LMK if you have ideas, happy to change this as desired. It'd be nice to get a fix into 1.9.

[Update: now using a better name instead of _LinearWithBias, but this remains a short-term fix to re-suppress a quantization API usage error that should properly be raised upstream. See issue #58969]

Differential Revision: D28593830

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented May 21, 2021

💊 CI failures summary and remediations

As of commit c4e8358 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

bhosmer added a commit that referenced this pull request May 21, 2021
ghstack-source-id: 570d2647e3761ca48b415e3f4487b5e4453bbb57
Pull Request resolved: #58727
@bhosmer
Copy link
Contributor Author

bhosmer commented May 21, 2021

@bhosmer has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@bhosmer bhosmer requested a review from z-a-f May 21, 2021 03:40
@bhosmer bhosmer changed the title fix nn.MHA scriptability fix nn.MHA + quantized scriptability May 21, 2021
@bhosmer bhosmer removed the request for review from albanD May 21, 2021 03:43
Copy link
Contributor

@jbschlosser jbschlosser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tracking this down!

Strategically-placed "type: ignore"s work for passing mypy checks. Should also add a test doing quantization -> script (as done in Natalia's repro script) to catch a regression like this in the future.

@@ -109,8 +109,10 @@ def extra_repr(self) -> str:
class _LinearWithBias(Linear):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment above isn't quite right now wrt bias never being None.

@jbschlosser
Copy link
Contributor

jbschlosser commented May 21, 2021

Although the fix with some added "type: ignore"s seems to make the problem go away, I really want to understand this more. Running @ngimel's repro gives me:

RuntimeError:
method cannot be used as a value:
  File "/Users/jbschlosser/misc/mish/torch/nn/modules/activation.py", line 1024
                self.in_proj_weight, self.in_proj_bias,
                self.bias_k, self.bias_v, self.add_zero_attn,
                self.dropout, self.out_proj.weight, self.out_proj.bias,
                              ~~~~~~~~~~~~~~~~~~~~ <--- HERE
                training=self.training,
                key_padding_mask=key_padding_mask, need_weights=need_weights,

Note that it's pointing to self.out_proj.weight, not self.out_proj.bias.

After quantization, self.out_proj.weight is a torch.nn.quantized.dynamic.Linear module, which derives from torch.nn.quantized.Linear, which has a method named weight(), explaining the error message.

It looks to me like swapping out Linear for _LinearWithBias just breaks the quantization layer swapping for out_proj, incidentally avoiding the above error. Compare post-quantization state when MHA has a _LinearWithBias out_proj:

TransformerEncoderLayer(
  (self_attn): MultiheadAttention(
    (out_proj): _LinearWithBias(in_features=1024, out_features=1024, bias=True)
  )
  (linear1): DynamicQuantizedLinear(in_features=1024, out_features=4096, dtype=torch.qint8, qscheme=torch.per_tensor_affine)
  ...
)

vs. when MHA has a Linear out_proj (note that it successfully swaps in DynamicQuantizedLinear, and this causes the error):

TransformerEncoderLayer(
  (self_attn): MultiheadAttention(
    (out_proj): DynamicQuantizedLinear(in_features=1024, out_features=1024, dtype=torch.qint8, qscheme=torch.per_tensor_affine)
  )
  (linear1): DynamicQuantizedLinear(in_features=1024, out_features=4096, dtype=torch.qint8, qscheme=torch.per_tensor_affine)
  ...
)

@z-a-f Can you or someone else familiar with quantization explain how this is intended to work / what is going wrong here?

@bhosmer
Copy link
Contributor Author

bhosmer commented May 21, 2021

@jbschlosser thanks for the detailed analysis, both your conclusions make sense to me (the source of the error, and the fact that this change just suppresses it by preventing quantization).

In the interests of expediting a possible "real" 1.9 fix, also ccing @vkuzo @raghuramank100 who participated in @z-a-f's original MHA quantization PR #49866. What do you folks think, given Joel's analysis above - is a real fix quick enough to cherry pick into 1.9, or should we just go with suppressing the error?

@ngimel ngimel added this to the 1.9.0 milestone May 21, 2021
@jbschlosser
Copy link
Contributor

@bhosmer @z-a-f @vkuzo @raghuramank100 Note that the deadline to get this in for 1.9.0 is EOD this Thursday (since Friday is a holiday). Seems unlikely we'll be able to get a real fix done in time, so unless we hear otherwise very soon I vote to move forward with a (well-documented) error suppression.

Copy link
Contributor

@z-a-f z-a-f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@z-a-f Can you or someone else familiar with quantization explain how this is intended to work / what is going wrong here?

The MHA is not supposed to be quantized. That's why we introduced quantizable modules, which are supposed to be used in the case of MHA.

Comment on lines +114 to +115
super().__init__(in_features, out_features, bias=bias,
device=device, dtype=dtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding bias as an argument defies the whole name of the module

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@z-a-f not sure what you mean by this, can you elaborate?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think he is saying it's contradictory that the module is called _LinearWithBias and yet it may or may not have a bias, based on the value of the bias flag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah got it, thanks 😁

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...though I think the module does in fact need to pass the full set of Linear ctor params through for the fix to work, I think (given the call site in MultiheadAttention.init)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct about that, since we want to maintain the regression fix by having bias apply to both in / out projection layers.

@z-a-f
Copy link
Contributor

z-a-f commented May 25, 2021

@z-a-f Can you or someone else familiar with quantization explain how this is intended to work / what is going wrong here?

The MHA is not supposed to be quantized. That's why we introduced quantizable modules, which are supposed to be used in the case of MHA.

To elaborate on this:

  1. The MHA cannot be quantized directly, so we introduced a "quantizable" MHA, which can be used by providing the custom module args during quantization:
  2. If fixing the nn.MHA breaks the quantization of that module, this means that the custom module was not provided, and the nn.Linear was replaced by the quantized version
  3. The reasoning behind using the "quantizable" MHA is so that we could use the parameters of the internal layers without breaking anything -- for example, reuse of the weight

@bhosmer
Copy link
Contributor Author

bhosmer commented May 25, 2021

@jbschlosser if I understand @z-a-f's description correctly, it sounds like the original bug was due an improper use of the quantization API (assuming @ngimel's repro uses the same pattern). In which case, this PR is basically reinstating the dodge that prevents the improper use from breaking scripting (due to the weight vs weight() typecheck failure).

But if the provoking usage is improper, then arguably we should leave your cleanup work in place and abandon this PR. WDYT?

@ngimel
Copy link
Collaborator

ngimel commented May 25, 2021

If the provoking usage is improper, then there should be a documented suggested workflow for quantizing + scripting MHA models, and the error message should be better. Given it's really easy to stumble upon this usage (just quantize and script your model, which is what many people do), and FAIM apparently did stumble upon this usage, and I couldn't find documentation on how to do it properly, I'd rather go with this fix.

@bhosmer
Copy link
Contributor Author

bhosmer commented May 25, 2021

If the provoking usage is improper, then there should be a documented suggested workflow for quantizing + scripting MHA models, and the error message should be better. Given it's really easy to stumble upon this usage (just quantize and script your model, which is what many people do), and FAIM apparently did stumble upon this usage, and I couldn't find documentation on how to do it properly, I'd rather go with this fix.

Agreed, it's easy enough and there's no real way for the error to be more specific; it's a vanilla type mismatch on a property.

@z-a-f are there docs that would warn people away from doing this?

@jbschlosser
Copy link
Contributor

Should we ideally get an error if quantization is attempted on a module with a (non-quantizable) MHA layer? Adding this check would be BC-breaking but it should probably be in place longer term if we want to dissuade users from trying to quantize MHA directly.

Fixes a post-1.8 regression in nn.MultiheadAttention + quantization scriptability introduced in #52537. Passes the new test introduced in that PR, and fixes the repro found by @ngimel [here](https://gist.github.com/bhosmer/ef517d0774f2f10336b8140116fd6b62). 

Per comments in #52537 there's definitely a carnal dependency between quantization and the `_LinearWithBias` class by name that I'm reinstating here, but there may be cleaner ways to solve this - I don't really know what I'm doing 😁 . 

@jbschlosser @z-a-f LMK if you have ideas, happy to change this as desired. It'd be nice to get a fix into 1.9.

Differential Revision: [D28593830](https://our.internmc.facebook.com/intern/diff/D28593830)

[ghstack-poisoned]
bhosmer added a commit that referenced this pull request May 26, 2021
ghstack-source-id: 42e2e78a8de5106dc1287221acc4ac231e22c44d
Pull Request resolved: #58727
@bhosmer
Copy link
Contributor Author

bhosmer commented May 26, 2021

@bhosmer has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@bhosmer
Copy link
Contributor Author

bhosmer commented May 26, 2021

@jbschlosser @ngimel FYI updated with new name and a comment block pointing to issue #58969 which points back here 😁

Fixes a post-1.8 regression in nn.MultiheadAttention + quantization scriptability introduced in #52537. Passes the new test introduced in that PR, and fixes the repro found by @ngimel [here](https://gist.github.com/bhosmer/ef517d0774f2f10336b8140116fd6b62). 

Per comments in #52537 there's definitely a carnal dependency between quantization and the `_LinearWithBias` class by name that I'm reinstating here, but there may be cleaner ways to solve this - I don't really know what I'm doing 😁 . 

@jbschlosser @z-a-f LMK if you have ideas, happy to change this as desired. It'd be nice to get a fix into 1.9.

_[Update: now using a better name instead of `_LinearWithBias`, but this remains a short-term fix to re-suppress a quantization API usage error that should properly be raised upstream. See issue #58969]_

Differential Revision: [D28593830](https://our.internmc.facebook.com/intern/diff/D28593830)

[ghstack-poisoned]
Fixes a post-1.8 regression in nn.MultiheadAttention + quantization scriptability introduced in #52537. Passes the new test introduced in that PR, and fixes the repro found by @ngimel [here](https://gist.github.com/bhosmer/ef517d0774f2f10336b8140116fd6b62). 

Per comments in #52537 there's definitely a carnal dependency between quantization and the `_LinearWithBias` class by name that I'm reinstating here, but there may be cleaner ways to solve this - I don't really know what I'm doing 😁 . 

@jbschlosser @z-a-f LMK if you have ideas, happy to change this as desired. It'd be nice to get a fix into 1.9.

_[Update: now using a better name instead of `_LinearWithBias`, but this remains a short-term fix to re-suppress a quantization API usage error that should properly be raised upstream. See issue #58969]_

Differential Revision: [D28593830](https://our.internmc.facebook.com/intern/diff/D28593830)

[ghstack-poisoned]
@bhosmer
Copy link
Contributor Author

bhosmer commented May 26, 2021

@bhosmer has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Fixes a post-1.8 regression in nn.MultiheadAttention + quantization scriptability introduced in #52537. Passes the new test introduced in that PR, and fixes the repro found by @ngimel [here](https://gist.github.com/bhosmer/ef517d0774f2f10336b8140116fd6b62). 

Per comments in #52537 there's definitely a carnal dependency between quantization and the `_LinearWithBias` class by name that I'm reinstating here, but there may be cleaner ways to solve this - I don't really know what I'm doing 😁 . 

@jbschlosser @z-a-f LMK if you have ideas, happy to change this as desired. It'd be nice to get a fix into 1.9.

_[Update: now using a better name instead of `_LinearWithBias`, but this remains a short-term fix to re-suppress a quantization API usage error that should properly be raised upstream. See issue #58969]_

Differential Revision: [D28593830](https://our.internmc.facebook.com/intern/diff/D28593830)

[ghstack-poisoned]
bhosmer added a commit that referenced this pull request May 26, 2021
ghstack-source-id: b967e6c2fe295a31fec28b0a5639085d8d7a0dde
Pull Request resolved: #58727
@bhosmer
Copy link
Contributor Author

bhosmer commented May 26, 2021

@bhosmer has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@jbschlosser jbschlosser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good- thanks for making the changes!

I know it's "technically incorrect", but perhaps a test that does dynamic quantization -> scripting could be added to ensure this regression doesn't happen again until we've cleaned up the story around quantizing MHA?

@facebook-github-bot
Copy link
Contributor

@bhosmer merged this pull request in 58d1b36.

driazati pushed a commit that referenced this pull request May 27, 2021
Summary: Pull Request resolved: #58727

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D28593830

Pulled By: bhosmer

fbshipit-source-id: 37dee9efededaea9985a2bf040df1ba4b46f6580
@facebook-github-bot facebook-github-bot deleted the gh/bhosmer/57/head branch May 30, 2021 14:17
deniskokarev pushed a commit to deniskokarev/pytorch that referenced this pull request Jun 9, 2021
Summary: Pull Request resolved: pytorch#58727

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D28593830

Pulled By: bhosmer

fbshipit-source-id: 37dee9efededaea9985a2bf040df1ba4b46f6580
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants