Improper use of quantization API for MHA should fail fast #58969

bhosmer · 2021-05-26T02:46:10Z

Currently, direct quantization of a torch.nn.MultiheadAttention module is not supported via torch.quantization.quantize_dynamic() - a conversion must be performed first (reference). However, an erroneous attempt to quantize directly does not produce an error - instead, there's a silent failure to quantize certain layers, but the returned model is callable. There's no overt sign that something has gone wrong. See #58727 for full details.

A recent cleanup of torch.nn introduced a downstream error when scripting such an improperly quantized module, by rationalizing module names in a way that allowed the problematic quantization to go forward. The error is a vanilla TorchScript type error due to a naming collision and has no clues that would lead one back to the original problem without nontrivial sleuthing. Also, the error was reported by users, meaning the quantization API is in fact being inadvertently misused in this way. Repro used to track down the issue here, derived from a bug report.

In the short term we've reinstated the name difference that defeats quantization when the API is used improperly, restoring the original behavior. But a proper fix would be for quantize_dynamic (or something in the quantization API) to give the caller signal that an unsupported model has been specified.

cc @jerryzh168 @jianyuh @raghuramank100 @jamesr66a @vkuzo

The text was updated successfully, but these errors were encountered:

@ngimel

Fixes a post-1.8 regression in nn.MultiheadAttention + quantization scriptability introduced in #52537. Passes the new test introduced in that PR, and fixes the repro found by @ngimel [here](https://gist.github.com/bhosmer/ef517d0774f2f10336b8140116fd6b62). Per comments in #52537 there's definitely a carnal dependency between quantization and the `_LinearWithBias` class by name that I'm reinstating here, but there may be cleaner ways to solve this - I don't really know what I'm doing 😁 . @jbschlosser @z-a-f LMK if you have ideas, happy to change this as desired. It'd be nice to get a fix into 1.9. _[Update: now using a better name instead of `_LinearWithBias`, but this remains a short-term fix to re-suppress a quantization API usage error that should properly be raised upstream. See issue #58969]_ Differential Revision: [D28593830](https://our.internmc.facebook.com/intern/diff/D28593830) [ghstack-poisoned]

@ngimel

Fixes a post-1.8 regression in nn.MultiheadAttention + quantization scriptability introduced in #52537. Passes the new test introduced in that PR, and fixes the repro found by @ngimel [here](https://gist.github.com/bhosmer/ef517d0774f2f10336b8140116fd6b62). Per comments in #52537 there's definitely a carnal dependency between quantization and the `_LinearWithBias` class by name that I'm reinstating here, but there may be cleaner ways to solve this - I don't really know what I'm doing 😁 . @jbschlosser @z-a-f LMK if you have ideas, happy to change this as desired. It'd be nice to get a fix into 1.9. _[Update: now using a better name instead of `_LinearWithBias`, but this remains a short-term fix to re-suppress a quantization API usage error that should properly be raised upstream. See issue #58969]_ Differential Revision: [D28593830](https://our.internmc.facebook.com/intern/diff/D28593830) [ghstack-poisoned]

@ngimel

Fixes a post-1.8 regression in nn.MultiheadAttention + quantization scriptability introduced in #52537. Passes the new test introduced in that PR, and fixes the repro found by @ngimel [here](https://gist.github.com/bhosmer/ef517d0774f2f10336b8140116fd6b62). Per comments in #52537 there's definitely a carnal dependency between quantization and the `_LinearWithBias` class by name that I'm reinstating here, but there may be cleaner ways to solve this - I don't really know what I'm doing 😁 . @jbschlosser @z-a-f LMK if you have ideas, happy to change this as desired. It'd be nice to get a fix into 1.9. _[Update: now using a better name instead of `_LinearWithBias`, but this remains a short-term fix to re-suppress a quantization API usage error that should properly be raised upstream. See issue #58969]_ Differential Revision: [D28593830](https://our.internmc.facebook.com/intern/diff/D28593830) [ghstack-poisoned]

supriyar · 2021-06-02T21:17:43Z

cc @z-a-f

z-a-f · 2021-07-19T07:38:02Z

On it!

z-a-f · 2022-08-29T09:10:56Z

I think this one was landed.

HDCharles · 2023-11-18T03:35:46Z

@bhosmer is this still an issue, the MHA proj linear class is a NonDynamicallyQuantizableLinear?

Normally if you apply quantization to a model, only the supported ops are quantized so it not erroring and being callable should be expected, right?

If this is still an issue, I can either make quantize_dynamic actually quantize MHA or add a warning like you suggested, but having the api say "this op can't be quantized" seems a bit unintuitive since that's true for most things but they are just ignored (so leaning away from that)

HDCharles · 2023-11-20T18:20:59Z

I'm just going to make it work automatically

bhosmer mentioned this issue May 26, 2021

fix nn.MHA + quantized scriptability #58727

Closed

anjali411 added oncall: quantization Quantization support in PyTorch triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels May 26, 2021

github-actions bot added this to Need Triage in Quantization Triage May 26, 2021

supriyar moved this from Need Triage to Triage Review in Quantization Triage Jun 2, 2021

supriyar assigned z-a-f Jul 6, 2021

supriyar added the high priority label Jul 6, 2021

supriyar moved this from Triage Review to High Priority in Quantization Triage Jul 6, 2021

andrewor14 assigned HDCharles and unassigned z-a-f Nov 17, 2023

andrewor14 removed the high priority label Nov 17, 2023

ELanning mentioned this issue Dec 31, 2023

Handle NonDynamicallyQuantizableLinear in smoothquant module pytorch/ao#30

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improper use of quantization API for MHA should fail fast #58969

Improper use of quantization API for MHA should fail fast #58969

bhosmer commented May 26, 2021 •

edited by pytorch-probot bot

supriyar commented Jun 2, 2021

z-a-f commented Jul 19, 2021

z-a-f commented Aug 29, 2022

HDCharles commented Nov 18, 2023

HDCharles commented Nov 20, 2023

Improper use of quantization API for MHA should fail fast #58969

Improper use of quantization API for MHA should fail fast #58969

Comments

bhosmer commented May 26, 2021 • edited by pytorch-probot bot

supriyar commented Jun 2, 2021

z-a-f commented Jul 19, 2021

z-a-f commented Aug 29, 2022

HDCharles commented Nov 18, 2023

HDCharles commented Nov 20, 2023

bhosmer commented May 26, 2021 •

edited by pytorch-probot bot