Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improper use of quantization API for MHA should fail fast #58969

Open
bhosmer opened this issue May 26, 2021 · 5 comments
Open

Improper use of quantization API for MHA should fail fast #58969

bhosmer opened this issue May 26, 2021 · 5 comments
Assignees
Labels
oncall: quantization Quantization support in PyTorch triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@bhosmer
Copy link
Contributor

bhosmer commented May 26, 2021

Currently, direct quantization of a torch.nn.MultiheadAttention module is not supported via torch.quantization.quantize_dynamic() - a conversion must be performed first (reference). However, an erroneous attempt to quantize directly does not produce an error - instead, there's a silent failure to quantize certain layers, but the returned model is callable. There's no overt sign that something has gone wrong. See #58727 for full details.

A recent cleanup of torch.nn introduced a downstream error when scripting such an improperly quantized module, by rationalizing module names in a way that allowed the problematic quantization to go forward. The error is a vanilla TorchScript type error due to a naming collision and has no clues that would lead one back to the original problem without nontrivial sleuthing. Also, the error was reported by users, meaning the quantization API is in fact being inadvertently misused in this way. Repro used to track down the issue here, derived from a bug report.

In the short term we've reinstated the name difference that defeats quantization when the API is used improperly, restoring the original behavior. But a proper fix would be for quantize_dynamic (or something in the quantization API) to give the caller signal that an unsupported model has been specified.

cc @jerryzh168 @jianyuh @raghuramank100 @jamesr66a @vkuzo

bhosmer added a commit that referenced this issue May 26, 2021
Fixes a post-1.8 regression in nn.MultiheadAttention + quantization scriptability introduced in #52537. Passes the new test introduced in that PR, and fixes the repro found by @ngimel [here](https://gist.github.com/bhosmer/ef517d0774f2f10336b8140116fd6b62). 

Per comments in #52537 there's definitely a carnal dependency between quantization and the `_LinearWithBias` class by name that I'm reinstating here, but there may be cleaner ways to solve this - I don't really know what I'm doing 😁 . 

@jbschlosser @z-a-f LMK if you have ideas, happy to change this as desired. It'd be nice to get a fix into 1.9.

_[Update: now using a better name instead of `_LinearWithBias`, but this remains a short-term fix to re-suppress a quantization API usage error that should properly be raised upstream. See issue #58969]_

Differential Revision: [D28593830](https://our.internmc.facebook.com/intern/diff/D28593830)

[ghstack-poisoned]
bhosmer added a commit that referenced this issue May 26, 2021
Fixes a post-1.8 regression in nn.MultiheadAttention + quantization scriptability introduced in #52537. Passes the new test introduced in that PR, and fixes the repro found by @ngimel [here](https://gist.github.com/bhosmer/ef517d0774f2f10336b8140116fd6b62). 

Per comments in #52537 there's definitely a carnal dependency between quantization and the `_LinearWithBias` class by name that I'm reinstating here, but there may be cleaner ways to solve this - I don't really know what I'm doing 😁 . 

@jbschlosser @z-a-f LMK if you have ideas, happy to change this as desired. It'd be nice to get a fix into 1.9.

_[Update: now using a better name instead of `_LinearWithBias`, but this remains a short-term fix to re-suppress a quantization API usage error that should properly be raised upstream. See issue #58969]_

Differential Revision: [D28593830](https://our.internmc.facebook.com/intern/diff/D28593830)

[ghstack-poisoned]
bhosmer added a commit that referenced this issue May 26, 2021
Fixes a post-1.8 regression in nn.MultiheadAttention + quantization scriptability introduced in #52537. Passes the new test introduced in that PR, and fixes the repro found by @ngimel [here](https://gist.github.com/bhosmer/ef517d0774f2f10336b8140116fd6b62). 

Per comments in #52537 there's definitely a carnal dependency between quantization and the `_LinearWithBias` class by name that I'm reinstating here, but there may be cleaner ways to solve this - I don't really know what I'm doing 😁 . 

@jbschlosser @z-a-f LMK if you have ideas, happy to change this as desired. It'd be nice to get a fix into 1.9.

_[Update: now using a better name instead of `_LinearWithBias`, but this remains a short-term fix to re-suppress a quantization API usage error that should properly be raised upstream. See issue #58969]_

Differential Revision: [D28593830](https://our.internmc.facebook.com/intern/diff/D28593830)

[ghstack-poisoned]
@anjali411 anjali411 added oncall: quantization Quantization support in PyTorch triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels May 26, 2021
@github-actions github-actions bot added this to Need Triage in Quantization Triage May 26, 2021
@supriyar
Copy link
Contributor

supriyar commented Jun 2, 2021

cc @z-a-f

@supriyar supriyar moved this from Need Triage to Triage Review in Quantization Triage Jun 2, 2021
@supriyar supriyar moved this from Triage Review to High Priority in Quantization Triage Jul 6, 2021
@z-a-f
Copy link
Contributor

z-a-f commented Jul 19, 2021

On it!

@z-a-f
Copy link
Contributor

z-a-f commented Aug 29, 2022

I think this one was landed.

@HDCharles
Copy link
Contributor

@bhosmer is this still an issue, the MHA proj linear class is a NonDynamicallyQuantizableLinear?

Normally if you apply quantization to a model, only the supported ops are quantized so it not erroring and being callable should be expected, right?

If this is still an issue, I can either make quantize_dynamic actually quantize MHA or add a warning like you suggested, but having the api say "this op can't be quantized" seems a bit unintuitive since that's true for most things but they are just ignored (so leaning away from that)

@HDCharles
Copy link
Contributor

I'm just going to make it work automatically

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
oncall: quantization Quantization support in PyTorch triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
Quantization Triage
  
High Priority
Development

No branches or pull requests

6 participants