-
Notifications
You must be signed in to change notification settings - Fork 21.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix nn.MHA + quantized scriptability #58727
Conversation
[ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit c4e8358 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group. |
ghstack-source-id: 570d2647e3761ca48b415e3f4487b5e4453bbb57 Pull Request resolved: #58727
@bhosmer has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for tracking this down!
Strategically-placed "type: ignore"s work for passing mypy checks. Should also add a test doing quantization -> script (as done in Natalia's repro script) to catch a regression like this in the future.
torch/nn/modules/linear.py
Outdated
@@ -109,8 +109,10 @@ def extra_repr(self) -> str: | |||
class _LinearWithBias(Linear): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment above isn't quite right now wrt bias never being None.
Although the fix with some added "type: ignore"s seems to make the problem go away, I really want to understand this more. Running @ngimel's repro gives me:
Note that it's pointing to After quantization, It looks to me like swapping out
vs. when MHA has a
@z-a-f Can you or someone else familiar with quantization explain how this is intended to work / what is going wrong here? |
@jbschlosser thanks for the detailed analysis, both your conclusions make sense to me (the source of the error, and the fact that this change just suppresses it by preventing quantization). In the interests of expediting a possible "real" 1.9 fix, also ccing @vkuzo @raghuramank100 who participated in @z-a-f's original MHA quantization PR #49866. What do you folks think, given Joel's analysis above - is a real fix quick enough to cherry pick into 1.9, or should we just go with suppressing the error? |
@bhosmer @z-a-f @vkuzo @raghuramank100 Note that the deadline to get this in for 1.9.0 is EOD this Thursday (since Friday is a holiday). Seems unlikely we'll be able to get a real fix done in time, so unless we hear otherwise very soon I vote to move forward with a (well-documented) error suppression. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@z-a-f Can you or someone else familiar with quantization explain how this is intended to work / what is going wrong here?
The MHA is not supposed to be quantized. That's why we introduced quantizable
modules, which are supposed to be used in the case of MHA.
super().__init__(in_features, out_features, bias=bias, | ||
device=device, dtype=dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think adding bias
as an argument defies the whole name of the module
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@z-a-f not sure what you mean by this, can you elaborate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think he is saying it's contradictory that the module is called _LinearWithBias
and yet it may or may not have a bias, based on the value of the bias
flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah got it, thanks 😁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...though I think the module does in fact need to pass the full set of Linear ctor params through for the fix to work, I think (given the call site in MultiheadAttention.init)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're correct about that, since we want to maintain the regression fix by having bias
apply to both in / out projection layers.
To elaborate on this:
|
@jbschlosser if I understand @z-a-f's description correctly, it sounds like the original bug was due an improper use of the quantization API (assuming @ngimel's repro uses the same pattern). In which case, this PR is basically reinstating the dodge that prevents the improper use from breaking scripting (due to the But if the provoking usage is improper, then arguably we should leave your cleanup work in place and abandon this PR. WDYT? |
If the provoking usage is improper, then there should be a documented suggested workflow for quantizing + scripting MHA models, and the error message should be better. Given it's really easy to stumble upon this usage (just quantize and script your model, which is what many people do), and FAIM apparently did stumble upon this usage, and I couldn't find documentation on how to do it properly, I'd rather go with this fix. |
Agreed, it's easy enough and there's no real way for the error to be more specific; it's a vanilla type mismatch on a property. @z-a-f are there docs that would warn people away from doing this? |
Should we ideally get an error if quantization is attempted on a module with a (non-quantizable) MHA layer? Adding this check would be BC-breaking but it should probably be in place longer term if we want to dissuade users from trying to quantize MHA directly. |
Fixes a post-1.8 regression in nn.MultiheadAttention + quantization scriptability introduced in #52537. Passes the new test introduced in that PR, and fixes the repro found by @ngimel [here](https://gist.github.com/bhosmer/ef517d0774f2f10336b8140116fd6b62). Per comments in #52537 there's definitely a carnal dependency between quantization and the `_LinearWithBias` class by name that I'm reinstating here, but there may be cleaner ways to solve this - I don't really know what I'm doing 😁 . @jbschlosser @z-a-f LMK if you have ideas, happy to change this as desired. It'd be nice to get a fix into 1.9. Differential Revision: [D28593830](https://our.internmc.facebook.com/intern/diff/D28593830) [ghstack-poisoned]
ghstack-source-id: 42e2e78a8de5106dc1287221acc4ac231e22c44d Pull Request resolved: #58727
@bhosmer has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
@jbschlosser @ngimel FYI updated with new name and a comment block pointing to issue #58969 which points back here 😁 |
Fixes a post-1.8 regression in nn.MultiheadAttention + quantization scriptability introduced in #52537. Passes the new test introduced in that PR, and fixes the repro found by @ngimel [here](https://gist.github.com/bhosmer/ef517d0774f2f10336b8140116fd6b62). Per comments in #52537 there's definitely a carnal dependency between quantization and the `_LinearWithBias` class by name that I'm reinstating here, but there may be cleaner ways to solve this - I don't really know what I'm doing 😁 . @jbschlosser @z-a-f LMK if you have ideas, happy to change this as desired. It'd be nice to get a fix into 1.9. _[Update: now using a better name instead of `_LinearWithBias`, but this remains a short-term fix to re-suppress a quantization API usage error that should properly be raised upstream. See issue #58969]_ Differential Revision: [D28593830](https://our.internmc.facebook.com/intern/diff/D28593830) [ghstack-poisoned]
Fixes a post-1.8 regression in nn.MultiheadAttention + quantization scriptability introduced in #52537. Passes the new test introduced in that PR, and fixes the repro found by @ngimel [here](https://gist.github.com/bhosmer/ef517d0774f2f10336b8140116fd6b62). Per comments in #52537 there's definitely a carnal dependency between quantization and the `_LinearWithBias` class by name that I'm reinstating here, but there may be cleaner ways to solve this - I don't really know what I'm doing 😁 . @jbschlosser @z-a-f LMK if you have ideas, happy to change this as desired. It'd be nice to get a fix into 1.9. _[Update: now using a better name instead of `_LinearWithBias`, but this remains a short-term fix to re-suppress a quantization API usage error that should properly be raised upstream. See issue #58969]_ Differential Revision: [D28593830](https://our.internmc.facebook.com/intern/diff/D28593830) [ghstack-poisoned]
@bhosmer has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Fixes a post-1.8 regression in nn.MultiheadAttention + quantization scriptability introduced in #52537. Passes the new test introduced in that PR, and fixes the repro found by @ngimel [here](https://gist.github.com/bhosmer/ef517d0774f2f10336b8140116fd6b62). Per comments in #52537 there's definitely a carnal dependency between quantization and the `_LinearWithBias` class by name that I'm reinstating here, but there may be cleaner ways to solve this - I don't really know what I'm doing 😁 . @jbschlosser @z-a-f LMK if you have ideas, happy to change this as desired. It'd be nice to get a fix into 1.9. _[Update: now using a better name instead of `_LinearWithBias`, but this remains a short-term fix to re-suppress a quantization API usage error that should properly be raised upstream. See issue #58969]_ Differential Revision: [D28593830](https://our.internmc.facebook.com/intern/diff/D28593830) [ghstack-poisoned]
ghstack-source-id: b967e6c2fe295a31fec28b0a5639085d8d7a0dde Pull Request resolved: #58727
@bhosmer has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good- thanks for making the changes!
I know it's "technically incorrect", but perhaps a test that does dynamic quantization -> scripting could be added to ensure this regression doesn't happen again until we've cleaned up the story around quantizing MHA?
Summary: Pull Request resolved: pytorch#58727 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28593830 Pulled By: bhosmer fbshipit-source-id: 37dee9efededaea9985a2bf040df1ba4b46f6580
Stack from ghstack:
Fixes a post-1.8 regression in nn.MultiheadAttention + quantization scriptability introduced in #52537. Passes the new test introduced in that PR, and fixes the repro found by @ngimel here.
Per comments in #52537 there's definitely a carnal dependency between quantization and the
_LinearWithBias
class by name that I'm reinstating here, but there may be cleaner ways to solve this - I don't really know what I'm doing 😁 .@jbschlosser @z-a-f LMK if you have ideas, happy to change this as desired. It'd be nice to get a fix into 1.9.
[Update: now using a better name instead of
_LinearWithBias
, but this remains a short-term fix to re-suppress a quantization API usage error that should properly be raised upstream. See issue #58969]Differential Revision: D28593830