[Quant][bc-breaking] Remove overwrite_output_observer #88620

andrewor14 · 2022-11-07T23:40:09Z

Stack from ghstack (oldest at bottom):

-> [Quant][bc-breaking] Remove overwrite_output_observer #88620

Summary: When the BackendConfig was first introduced,
overwrite_output_observer and overwrite_output_fake_quantize
were added to ensure fixed qparams ops like torch.nn.Sigmoid
and torch.nn.Tanh used the correct observers and fake quantizes.
However, this is hacky because the BackendConfig should not set
the observer constructors themselves, but should instead specify
only requirements on the observers.

Later, #80184 added the
correct observers to get_default_qconfig_mapping along with
validation logic that throws an error if incorrect observers
were specified. With this change, we no longer need to overwrite
the observers from the BackendConfig, since we expect the user to
pass in the correct observers for these ops.

This commit removes these overwrite observer settings in the
BackendConfig. Instead, we represent the observer constraints for
fixed qparams ops through the existing DTypeWithConstraints
mechanism. Note that, however, to be consistent with other
DTypeWithConstraints checks, we no longer throw an error if an
incorrect observer is specified, but simply ignore the offending
QConfig and log a warning instead. This is the BC-breaking part
of the change.

BC-breaking notes:

from torch.ao.quantization.qconfig import default_qconfig
from torch.ao.quantization.quantize_fx import prepare_fx

model = ModelWithFixedQParamsOps()
qconfig_mapping = QConfigMapping().set_global(default_qconfig)
example_inputs = ...
prepare_fx(model, qconfig_mapping, example_inputs)

Before this commit, running the above leads to an exception
because the wrong observers are used for fixed qparams ops.
After this commit, the above will only encounter a warning,
and the fixed qparams ops will not be quantized. In both cases,
switching to get_default_qconfig_mapping will cause the
fixed qparams ops to be quantized.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

cc @jerryzh168 @jianyuh @raghuramank100 @jamesr66a @vkuzo @jgong5 @Xia-Weiwen @leslie-fang-intel

Summary: When the BackendConfig was first introduced, `overwrite_output_observer` and `overwrite_output_fake_quantize` were added to ensure fixed qparams ops like `torch.nn.Sigmoid` and `torch.nn.Tanh` used the correct observers and fake quantizes. However, this is hacky because the BackendConfig should only specify requirements on observers, but not the observer constructors themselves. Later, #80184 added the correct observers to `get_default_qconfig_mapping` along with validation logic that throws an error if incorrect observers were specified. With this change, we no longer need to overwrite the observers specified by the user. This commit removes these observer settings in the BackendConfig. Instead, we represent the observer constraints for fixed qparams ops through the existing DTypeWithConstraints mechanism. To be consistent with other DTypeWithConstraints checks, we no longer throw an error if an incorrect observer is specified, but simply ignore the offending QConfig and log a warning instead. This is the BC-breaking part of the change. BC-breaking notes: ``` from torch.ao.quantization.qconfig import default_qconfig from torch.ao.quantization.quantize_fx import prepare_fx model = ModelWithFixedQParamsOps() qconfig_mapping = QConfigMapping().set_global(default_qconfig) example_inputs = ... prepare_fx(model, qconfig_mapping, example_inputs) ``` Before this commit, running the above leads to an exception because the wrong observers are used for fixed qparams ops. After this commit, the above will only encounter a warning, and the fixed qparams ops will not be quantized. In both cases, switching to `get_default_qconfig_mapping` will cause the fixed qparams ops to be quantized. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo [ghstack-poisoned]

pytorch-bot · 2022-11-07T23:40:14Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/88620

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 59a8060:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Summary: When the BackendConfig was first introduced, `overwrite_output_observer` and `overwrite_output_fake_quantize` were added to ensure fixed qparams ops like `torch.nn.Sigmoid` and `torch.nn.Tanh` used the correct observers and fake quantizes. However, this is hacky because the BackendConfig should only specify requirements on observers, but not the observer constructors themselves. Later, #80184 added the correct observers to `get_default_qconfig_mapping` along with validation logic that throws an error if incorrect observers were specified. With this change, we no longer need to overwrite the observers specified by the user. This commit removes these observer settings in the BackendConfig. Instead, we represent the observer constraints for fixed qparams ops through the existing DTypeWithConstraints mechanism. To be consistent with other DTypeWithConstraints checks, we no longer throw an error if an incorrect observer is specified, but simply ignore the offending QConfig and log a warning instead. This is the BC-breaking part of the change. BC-breaking notes: ``` from torch.ao.quantization.qconfig import default_qconfig from torch.ao.quantization.quantize_fx import prepare_fx model = ModelWithFixedQParamsOps() qconfig_mapping = QConfigMapping().set_global(default_qconfig) example_inputs = ... prepare_fx(model, qconfig_mapping, example_inputs) ``` Before this commit, running the above leads to an exception because the wrong observers are used for fixed qparams ops. After this commit, the above will only encounter a warning, and the fixed qparams ops will not be quantized. In both cases, switching to `get_default_qconfig_mapping` will cause the fixed qparams ops to be quantized. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo ghstack-source-id: 773172e0eb88b32c4e527d41334077f9ae0e89c5 Pull Request resolved: #88620

Summary: When the BackendConfig was first introduced, `overwrite_output_observer` and `overwrite_output_fake_quantize` were added to ensure fixed qparams ops like `torch.nn.Sigmoid` and `torch.nn.Tanh` used the correct observers and fake quantizes. However, this is hacky because the BackendConfig should only specify requirements on observers, but not the observer constructors themselves. Later, #80184 added the correct observers to `get_default_qconfig_mapping` along with validation logic that throws an error if incorrect observers were specified. With this change, we no longer need to overwrite the observers specified by the user. This commit removes these observer settings in the BackendConfig. Instead, we represent the observer constraints for fixed qparams ops through the existing DTypeWithConstraints mechanism. To be consistent with other DTypeWithConstraints checks, we no longer throw an error if an incorrect observer is specified, but simply ignore the offending QConfig and log a warning instead. This is the BC-breaking part of the change. BC-breaking notes: ``` from torch.ao.quantization.qconfig import default_qconfig from torch.ao.quantization.quantize_fx import prepare_fx model = ModelWithFixedQParamsOps() qconfig_mapping = QConfigMapping().set_global(default_qconfig) example_inputs = ... prepare_fx(model, qconfig_mapping, example_inputs) ``` Before this commit, running the above leads to an exception because the wrong observers are used for fixed qparams ops. After this commit, the above will only encounter a warning, and the fixed qparams ops will not be quantized. In both cases, switching to `get_default_qconfig_mapping` will cause the fixed qparams ops to be quantized. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo cc jerryzh168 jianyuh raghuramank100 jamesr66a vkuzo jgong5 Xia-Weiwen leslie-fang-intel [ghstack-poisoned]

Summary: When the BackendConfig was first introduced, `overwrite_output_observer` and `overwrite_output_fake_quantize` were added to ensure fixed qparams ops like `torch.nn.Sigmoid` and `torch.nn.Tanh` used the correct observers and fake quantizes. However, this is hacky because the BackendConfig should only specify requirements on observers, but not the observer constructors themselves. Later, #80184 added the correct observers to `get_default_qconfig_mapping` along with validation logic that throws an error if incorrect observers were specified. With this change, we no longer need to overwrite the observers specified by the user. This commit removes these observer settings in the BackendConfig. Instead, we represent the observer constraints for fixed qparams ops through the existing DTypeWithConstraints mechanism. To be consistent with other DTypeWithConstraints checks, we no longer throw an error if an incorrect observer is specified, but simply ignore the offending QConfig and log a warning instead. This is the BC-breaking part of the change. BC-breaking notes: ``` from torch.ao.quantization.qconfig import default_qconfig from torch.ao.quantization.quantize_fx import prepare_fx model = ModelWithFixedQParamsOps() qconfig_mapping = QConfigMapping().set_global(default_qconfig) example_inputs = ... prepare_fx(model, qconfig_mapping, example_inputs) ``` Before this commit, running the above leads to an exception because the wrong observers are used for fixed qparams ops. After this commit, the above will only encounter a warning, and the fixed qparams ops will not be quantized. In both cases, switching to `get_default_qconfig_mapping` will cause the fixed qparams ops to be quantized. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo ghstack-source-id: fb72078aa1c42d594660dcb16d4f9823eaad664a Pull Request resolved: #88620

Summary: When the BackendConfig was first introduced, `overwrite_output_observer` and `overwrite_output_fake_quantize` were added to ensure fixed qparams ops like `torch.nn.Sigmoid` and `torch.nn.Tanh` used the correct observers and fake quantizes. However, this is hacky because the BackendConfig should not set the observer constructors themselves, but should instead specify only requirements on the observers. Later, #80184 added the correct observers to `get_default_qconfig_mapping` along with validation logic that throws an error if incorrect observers were specified. With this change, we no longer need to overwrite the observers from the BackendConfig, since we expect the user to pass in the correct observers for these ops. This commit removes these overwrite observer settings in the BackendConfig. Instead, we represent the observer constraints for fixed qparams ops through the existing DTypeWithConstraints mechanism. Note that, however, to be consistent with other DTypeWithConstraints checks, we no longer throw an error if an incorrect observer is specified, but simply ignore the offending QConfig and log a warning instead. This is the BC-breaking part of the change. BC-breaking notes: ``` from torch.ao.quantization.qconfig import default_qconfig from torch.ao.quantization.quantize_fx import prepare_fx model = ModelWithFixedQParamsOps() qconfig_mapping = QConfigMapping().set_global(default_qconfig) example_inputs = ... prepare_fx(model, qconfig_mapping, example_inputs) ``` Before this commit, running the above leads to an exception because the wrong observers are used for fixed qparams ops. After this commit, the above will only encounter a warning, and the fixed qparams ops will not be quantized. In both cases, switching to `get_default_qconfig_mapping` will cause the fixed qparams ops to be quantized. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo [ghstack-poisoned]

Summary: When the BackendConfig was first introduced, `overwrite_output_observer` and `overwrite_output_fake_quantize` were added to ensure fixed qparams ops like `torch.nn.Sigmoid` and `torch.nn.Tanh` used the correct observers and fake quantizes. However, this is hacky because the BackendConfig should not set the observer constructors themselves, but should instead specify only requirements on the observers. Later, #80184 added the correct observers to `get_default_qconfig_mapping` along with validation logic that throws an error if incorrect observers were specified. With this change, we no longer need to overwrite the observers from the BackendConfig, since we expect the user to pass in the correct observers for these ops. This commit removes these overwrite observer settings in the BackendConfig. Instead, we represent the observer constraints for fixed qparams ops through the existing DTypeWithConstraints mechanism. Note that, however, to be consistent with other DTypeWithConstraints checks, we no longer throw an error if an incorrect observer is specified, but simply ignore the offending QConfig and log a warning instead. This is the BC-breaking part of the change. BC-breaking notes: ``` from torch.ao.quantization.qconfig import default_qconfig from torch.ao.quantization.quantize_fx import prepare_fx model = ModelWithFixedQParamsOps() qconfig_mapping = QConfigMapping().set_global(default_qconfig) example_inputs = ... prepare_fx(model, qconfig_mapping, example_inputs) ``` Before this commit, running the above leads to an exception because the wrong observers are used for fixed qparams ops. After this commit, the above will only encounter a warning, and the fixed qparams ops will not be quantized. In both cases, switching to `get_default_qconfig_mapping` will cause the fixed qparams ops to be quantized. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo ghstack-source-id: fb72078aa1c42d594660dcb16d4f9823eaad664a Pull Request resolved: #88620

vkuzo · 2022-11-09T00:20:18Z

Note that, however, to be consistent with other
DTypeWithConstraints checks, we no longer throw an error if an
incorrect observer is specified, but simply ignore the offending
QConfig and log a warning instead.

Not sure I understand this yet. Could you provide a specific example? As a user, I would either expect things to work as I configured or to see an exception thrown, a warning and behavior change seems surprising

torch/ao/quantization/backend_config/_common_operator_config_utils.py

torch/ao/quantization/backend_config/backend_config.py

andrewor14 · 2022-11-10T16:07:44Z

Note that, however, to be consistent with other
DTypeWithConstraints checks, we no longer throw an error if an
incorrect observer is specified, but simply ignore the offending
QConfig and log a warning instead.

Not sure I understand this yet. Could you provide a specific example? As a user, I would either expect things to work as I configured or to see an exception thrown, a warning and behavior change seems surprising

Sure. Today we check whether a qconfig satisfies the dtype constraints set by the backend here when we insert observers. The contract is if the qconfig doesn't satisfy these constraints, then we will simply log a warning and not insert the observers. We do this today for quant range and scale range constraints.

This PR in its current state reuses this mechanism to ensure fixed qparams ops have the right observers inserted. We have three options here:

To be consistent with how other dtype constraints are handled, ignore the QConfig if the observers don’t have matching fixed qparams. This breaks BC and no longer throws an exception if the user provides a bad observer.   (This PR currently)
Keep BC and continue to throw an exception if the user provides a bad observer. However, this now complicates how DTypeWithConstraints are validated, since we now log warnings for some attributes (e.g. quant range) but throw errors for others (e.g. fixed scale/zp) 
Do not use DTypeWithConstraints to validate fixed qparams ops' observers

@jerryzh168 and I had a brief discussion about this and we thought maybe making the dtype constraints consistent (Option 1) is more important than throwing the error (Option 2). Another option (3) is to introduce some other mechanism to do this validation, but it seems more complicated. What are your thoughts on this @vkuzo ?

vkuzo · 2022-11-10T18:23:08Z

Sure. Today we check whether a qconfig satisfies the dtype constraints set by the backend here when we insert observers. The contract is if the qconfig doesn't satisfy these constraints, then we will simply log a warning and not insert the observers. We do this today for quant range and scale range constraints.

If I'm a user and I specify a qconfig with some constraints, I would expect for my configuration to either be honored, or to see an exception. I wouldn't expect to have to look at warnings to see if the framework did something different from what I asked. Can we just always throw an exception and remove the "silently change behavior and log a warning" cases?

andrewor14 · 2022-11-10T20:19:13Z

If I'm a user and I specify a qconfig with some constraints, I would expect for my configuration to either be honored, or to see an exception. I wouldn't expect to have to look at warnings to see if the framework did something different from what I asked. Can we just always throw an exception and remove the "silently change behavior and log a warning" cases?

OK, that seems like a broader change, since it also changes the existing behavior for other dtype constraints (e.g. quant range). I think that was also the original proposal in #85200 but we ended up deciding to do the warning thing instead. Maybe we should first align on how dtype constraints should be handled. @jerryzh168 Should we change the checks here to throw exceptions?

jerryzh168 · 2022-11-10T21:46:12Z

I think it comes down to how we view the relationship of backend config and qconfig_mapping, we can start with a simple example, e.g. QConfigMapping().set_global(default_qconfig), this means: user is requesting to quantize the whole model (all operators in the model) with default_qconfig, but today how we handle this is we look at what is the supported config in backend_config and only quantize the operators that supports default_qconfig.

Error out when the requested operator + qconfig is not supported in backend_config would be a big change in semantics and might break a lot of models I think, since then we also need to define what are all the operators in the model, we are operating in torch op IR, so "all operators" might be very broad, I feel there will be operators in the model that's not in backend_config, and we also don't want to quantize them. Maybe we can check a few models to see if this would make sense or not first?

vkuzo · 2022-11-10T22:12:12Z

we look at what is the supported config in backend_config and only quantize the operators that supports default_qconfig.

I think this is reasonable if it's applied consistently. Would we feel okay with removing the exception thrown from prepare.py::_validate_fixed_qparams_qconfigs then, to ensure this works the same way for all ops?

jerryzh168 · 2022-11-10T22:18:00Z

In summary, let's say we have
qconfig_mapping: specifies (op, qconfig)
backend_config: have config entry for (op, dtype_constraints)

Current Behavior:
case 1 (op, qconfig) in qconfig_mapping matches (op, dtype_constraints): quantize
case 2 (op, qconfig) in qconfig_mapping doesn't match (op, dtype_constraints): ignore or error depending on op
case 3 (op, qconfig) in qconfig_mapping, but no op configuration in backend_config: ignore

Proposed
case 1 (op, qconfig) in qconfig_mapping matches (op, dtype_constraints): quantize
case 2 (op, qconfig) in qconfig_mapping doesn't match (op, dtype_constraints): error
case 3 (op, qconfig) in qconfig_mapping, but no op configuration in backend_config: error

I'm afraid that there will be many ops in case 3 and it will break a lot of ops if we error out

and I agree that it's better to make case 2 consistent, I'm OK with warning/ignore/error, we can also discuss which one might make more sense.

jerryzh168 · 2022-11-10T22:22:07Z

we look at what is the supported config in backend_config and only quantize the operators that supports default_qconfig.

I think this is reasonable if it's applied consistently. Would we feel okay with removing the exception thrown from prepare.py::_validate_fixed_qparams_qconfigs then, to ensure this works the same way for all ops?

yeah agree we should do this consistently, I think we can remove exception for now to be consistent and this is also what this PR is trying to do?

vkuzo · 2022-11-10T22:25:54Z

I think we should document both https://pytorch.org/docs/master/generated/torch.ao.quantization.qconfig_mapping.QConfigMapping.html#torch.ao.quantization.qconfig_mapping.QConfigMapping and https://pytorch.org/docs/master/generated/torch.ao.quantization.backend_config.BackendConfig.html#torch.ao.quantization.backend_config.BackendConfig to clarify the behavior. I think what a user would assume is "honor or exception", let's clearly explain if this isn't the case.

Not quantizing if qconfig + backend_config don't align sounds fine. I think we should do the same thing for all ops, it's surprising to vary the behavior depending on the op.

andrewor14 · 2022-11-11T16:13:57Z

Ok, that sounds good. Just to confirm, what we decided on then is the following behavior, correct?

case 1 (op, qconfig) in qconfig_mapping matches (op, dtype_constraints): quantize
case 2 (op, qconfig) in qconfig_mapping doesn't match (op, dtype_constraints): ignore (+ warning)
case 3 (op, qconfig) in qconfig_mapping, but no op configuration in backend_config: ignore

I will add some more documentation to QConfigMapping and BackendConfig as well to clarify this relationship

vkuzo · 2022-11-11T16:36:03Z

case 1 (op, qconfig) in qconfig_mapping matches (op, dtype_constraints): quantize
case 2 (op, qconfig) in qconfig_mapping doesn't match (op, dtype_constraints): ignore (+ warning)
case 3 (op, qconfig) in qconfig_mapping, but no op configuration in backend_config: ignore

How about not separating case 2 and case 3? It doesn't have to be in this PR. I think a warning which is off by default but that the user could turn on to debug why things aren't getting quantized could make sense here. We already have a lot of log spew from quantization workflow APIs, so I don't think we should expect the user to pay attention to warnings unless they are actively trying to debug something.

Summary: When the BackendConfig was first introduced, `overwrite_output_observer` and `overwrite_output_fake_quantize` were added to ensure fixed qparams ops like `torch.nn.Sigmoid` and `torch.nn.Tanh` used the correct observers and fake quantizes. However, this is hacky because the BackendConfig should not set the observer constructors themselves, but should instead specify only requirements on the observers. Later, #80184 added the correct observers to `get_default_qconfig_mapping` along with validation logic that throws an error if incorrect observers were specified. With this change, we no longer need to overwrite the observers from the BackendConfig, since we expect the user to pass in the correct observers for these ops. This commit removes these overwrite observer settings in the BackendConfig. Instead, we represent the observer constraints for fixed qparams ops through the existing DTypeWithConstraints mechanism. Note that, however, to be consistent with other DTypeWithConstraints checks, we no longer throw an error if an incorrect observer is specified, but simply ignore the offending QConfig and log a warning instead. This is the BC-breaking part of the change. BC-breaking notes: ``` from torch.ao.quantization.qconfig import default_qconfig from torch.ao.quantization.quantize_fx import prepare_fx model = ModelWithFixedQParamsOps() qconfig_mapping = QConfigMapping().set_global(default_qconfig) example_inputs = ... prepare_fx(model, qconfig_mapping, example_inputs) ``` Before this commit, running the above leads to an exception because the wrong observers are used for fixed qparams ops. After this commit, the above will only encounter a warning, and the fixed qparams ops will not be quantized. In both cases, switching to `get_default_qconfig_mapping` will cause the fixed qparams ops to be quantized. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo [ghstack-poisoned]

Summary: When the BackendConfig was first introduced, `overwrite_output_observer` and `overwrite_output_fake_quantize` were added to ensure fixed qparams ops like `torch.nn.Sigmoid` and `torch.nn.Tanh` used the correct observers and fake quantizes. However, this is hacky because the BackendConfig should not set the observer constructors themselves, but should instead specify only requirements on the observers. Later, #80184 added the correct observers to `get_default_qconfig_mapping` along with validation logic that throws an error if incorrect observers were specified. With this change, we no longer need to overwrite the observers from the BackendConfig, since we expect the user to pass in the correct observers for these ops. This commit removes these overwrite observer settings in the BackendConfig. Instead, we represent the observer constraints for fixed qparams ops through the existing DTypeWithConstraints mechanism. Note that, however, to be consistent with other DTypeWithConstraints checks, we no longer throw an error if an incorrect observer is specified, but simply ignore the offending QConfig and log a warning instead. This is the BC-breaking part of the change. BC-breaking notes: ``` from torch.ao.quantization.qconfig import default_qconfig from torch.ao.quantization.quantize_fx import prepare_fx model = ModelWithFixedQParamsOps() qconfig_mapping = QConfigMapping().set_global(default_qconfig) example_inputs = ... prepare_fx(model, qconfig_mapping, example_inputs) ``` Before this commit, running the above leads to an exception because the wrong observers are used for fixed qparams ops. After this commit, the above will only encounter a warning, and the fixed qparams ops will not be quantized. In both cases, switching to `get_default_qconfig_mapping` will cause the fixed qparams ops to be quantized. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo ghstack-source-id: 3b5e8024601c27294befc37cef0ac2d4ae0678d2 Pull Request resolved: #88620

Summary: When the BackendConfig was first introduced, `overwrite_output_observer` and `overwrite_output_fake_quantize` were added to ensure fixed qparams ops like `torch.nn.Sigmoid` and `torch.nn.Tanh` used the correct observers and fake quantizes. However, this is hacky because the BackendConfig should not set the observer constructors themselves, but should instead specify only requirements on the observers. Later, #80184 added the correct observers to `get_default_qconfig_mapping` along with validation logic that throws an error if incorrect observers were specified. With this change, we no longer need to overwrite the observers from the BackendConfig, since we expect the user to pass in the correct observers for these ops. This commit removes these overwrite observer settings in the BackendConfig. Instead, we represent the observer constraints for fixed qparams ops through the existing DTypeWithConstraints mechanism. Note that, however, to be consistent with other DTypeWithConstraints checks, we no longer throw an error if an incorrect observer is specified, but simply ignore the offending QConfig and log a warning instead. This is the BC-breaking part of the change. BC-breaking notes: ``` from torch.ao.quantization.qconfig import default_qconfig from torch.ao.quantization.quantize_fx import prepare_fx model = ModelWithFixedQParamsOps() qconfig_mapping = QConfigMapping().set_global(default_qconfig) example_inputs = ... prepare_fx(model, qconfig_mapping, example_inputs) ``` Before this commit, running the above leads to an exception because the wrong observers are used for fixed qparams ops. After this commit, the above will only encounter a warning, and the fixed qparams ops will not be quantized. In both cases, switching to `get_default_qconfig_mapping` will cause the fixed qparams ops to be quantized. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo [ghstack-poisoned]

Summary: When the BackendConfig was first introduced, `overwrite_output_observer` and `overwrite_output_fake_quantize` were added to ensure fixed qparams ops like `torch.nn.Sigmoid` and `torch.nn.Tanh` used the correct observers and fake quantizes. However, this is hacky because the BackendConfig should not set the observer constructors themselves, but should instead specify only requirements on the observers. Later, #80184 added the correct observers to `get_default_qconfig_mapping` along with validation logic that throws an error if incorrect observers were specified. With this change, we no longer need to overwrite the observers from the BackendConfig, since we expect the user to pass in the correct observers for these ops. This commit removes these overwrite observer settings in the BackendConfig. Instead, we represent the observer constraints for fixed qparams ops through the existing DTypeWithConstraints mechanism. Note that, however, to be consistent with other DTypeWithConstraints checks, we no longer throw an error if an incorrect observer is specified, but simply ignore the offending QConfig and log a warning instead. This is the BC-breaking part of the change. BC-breaking notes: ``` from torch.ao.quantization.qconfig import default_qconfig from torch.ao.quantization.quantize_fx import prepare_fx model = ModelWithFixedQParamsOps() qconfig_mapping = QConfigMapping().set_global(default_qconfig) example_inputs = ... prepare_fx(model, qconfig_mapping, example_inputs) ``` Before this commit, running the above leads to an exception because the wrong observers are used for fixed qparams ops. After this commit, the above will only encounter a warning, and the fixed qparams ops will not be quantized. In both cases, switching to `get_default_qconfig_mapping` will cause the fixed qparams ops to be quantized. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo ghstack-source-id: 8f60c9b305a4d455a3204854f9beaab467d82cab Pull Request resolved: #88620

Summary: When the BackendConfig was first introduced, `overwrite_output_observer` and `overwrite_output_fake_quantize` were added to ensure fixed qparams ops like `torch.nn.Sigmoid` and `torch.nn.Tanh` used the correct observers and fake quantizes. However, this is hacky because the BackendConfig should not set the observer constructors themselves, but should instead specify only requirements on the observers. Later, #80184 added the correct observers to `get_default_qconfig_mapping` along with validation logic that throws an error if incorrect observers were specified. With this change, we no longer need to overwrite the observers from the BackendConfig, since we expect the user to pass in the correct observers for these ops. This commit removes these overwrite observer settings in the BackendConfig. Instead, we represent the observer constraints for fixed qparams ops through the existing DTypeWithConstraints mechanism. Note that, however, to be consistent with other DTypeWithConstraints checks, we no longer throw an error if an incorrect observer is specified, but simply ignore the offending QConfig and log a warning instead. This is the BC-breaking part of the change. BC-breaking notes: ``` from torch.ao.quantization.qconfig import default_qconfig from torch.ao.quantization.quantize_fx import prepare_fx model = ModelWithFixedQParamsOps() qconfig_mapping = QConfigMapping().set_global(default_qconfig) example_inputs = ... prepare_fx(model, qconfig_mapping, example_inputs) ``` Before this commit, running the above leads to an exception because the wrong observers are used for fixed qparams ops. After this commit, the above will only encounter a warning, and the fixed qparams ops will not be quantized. In both cases, switching to `get_default_qconfig_mapping` will cause the fixed qparams ops to be quantized. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo [ghstack-poisoned]

Summary: When the BackendConfig was first introduced, `overwrite_output_observer` and `overwrite_output_fake_quantize` were added to ensure fixed qparams ops like `torch.nn.Sigmoid` and `torch.nn.Tanh` used the correct observers and fake quantizes. However, this is hacky because the BackendConfig should not set the observer constructors themselves, but should instead specify only requirements on the observers. Later, #80184 added the correct observers to `get_default_qconfig_mapping` along with validation logic that throws an error if incorrect observers were specified. With this change, we no longer need to overwrite the observers from the BackendConfig, since we expect the user to pass in the correct observers for these ops. This commit removes these overwrite observer settings in the BackendConfig. Instead, we represent the observer constraints for fixed qparams ops through the existing DTypeWithConstraints mechanism. Note that, however, to be consistent with other DTypeWithConstraints checks, we no longer throw an error if an incorrect observer is specified, but simply ignore the offending QConfig and log a warning instead. This is the BC-breaking part of the change. BC-breaking notes: ``` from torch.ao.quantization.qconfig import default_qconfig from torch.ao.quantization.quantize_fx import prepare_fx model = ModelWithFixedQParamsOps() qconfig_mapping = QConfigMapping().set_global(default_qconfig) example_inputs = ... prepare_fx(model, qconfig_mapping, example_inputs) ``` Before this commit, running the above leads to an exception because the wrong observers are used for fixed qparams ops. After this commit, the above will only encounter a warning, and the fixed qparams ops will not be quantized. In both cases, switching to `get_default_qconfig_mapping` will cause the fixed qparams ops to be quantized. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo ghstack-source-id: bf01ac44e1626abef92a91ff9fb785625be7fa0d Pull Request resolved: #88620

torch/ao/quantization/fx/utils.py

torch/ao/quantization/backend_config/README.md

Summary: When the BackendConfig was first introduced, `overwrite_output_observer` and `overwrite_output_fake_quantize` were added to ensure fixed qparams ops like `torch.nn.Sigmoid` and `torch.nn.Tanh` used the correct observers and fake quantizes. However, this is hacky because the BackendConfig should not set the observer constructors themselves, but should instead specify only requirements on the observers. Later, #80184 added the correct observers to `get_default_qconfig_mapping` along with validation logic that throws an error if incorrect observers were specified. With this change, we no longer need to overwrite the observers from the BackendConfig, since we expect the user to pass in the correct observers for these ops. This commit removes these overwrite observer settings in the BackendConfig. Instead, we represent the observer constraints for fixed qparams ops through the existing DTypeWithConstraints mechanism. Note that, however, to be consistent with other DTypeWithConstraints checks, we no longer throw an error if an incorrect observer is specified, but simply ignore the offending QConfig and log a warning instead. This is the BC-breaking part of the change. BC-breaking notes: ``` from torch.ao.quantization.qconfig import default_qconfig from torch.ao.quantization.quantize_fx import prepare_fx model = ModelWithFixedQParamsOps() qconfig_mapping = QConfigMapping().set_global(default_qconfig) example_inputs = ... prepare_fx(model, qconfig_mapping, example_inputs) ``` Before this commit, running the above leads to an exception because the wrong observers are used for fixed qparams ops. After this commit, the above will only encounter a warning, and the fixed qparams ops will not be quantized. In both cases, switching to `get_default_qconfig_mapping` will cause the fixed qparams ops to be quantized. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo [ghstack-poisoned]

Summary: When the BackendConfig was first introduced, `overwrite_output_observer` and `overwrite_output_fake_quantize` were added to ensure fixed qparams ops like `torch.nn.Sigmoid` and `torch.nn.Tanh` used the correct observers and fake quantizes. However, this is hacky because the BackendConfig should not set the observer constructors themselves, but should instead specify only requirements on the observers. Later, #80184 added the correct observers to `get_default_qconfig_mapping` along with validation logic that throws an error if incorrect observers were specified. With this change, we no longer need to overwrite the observers from the BackendConfig, since we expect the user to pass in the correct observers for these ops. This commit removes these overwrite observer settings in the BackendConfig. Instead, we represent the observer constraints for fixed qparams ops through the existing DTypeWithConstraints mechanism. Note that, however, to be consistent with other DTypeWithConstraints checks, we no longer throw an error if an incorrect observer is specified, but simply ignore the offending QConfig and log a warning instead. This is the BC-breaking part of the change. BC-breaking notes: ``` from torch.ao.quantization.qconfig import default_qconfig from torch.ao.quantization.quantize_fx import prepare_fx model = ModelWithFixedQParamsOps() qconfig_mapping = QConfigMapping().set_global(default_qconfig) example_inputs = ... prepare_fx(model, qconfig_mapping, example_inputs) ``` Before this commit, running the above leads to an exception because the wrong observers are used for fixed qparams ops. After this commit, the above will only encounter a warning, and the fixed qparams ops will not be quantized. In both cases, switching to `get_default_qconfig_mapping` will cause the fixed qparams ops to be quantized. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo ghstack-source-id: 55991505ef46aaa365565fc5a286a48db39c024a Pull Request resolved: #88620

jerryzh168 · 2022-11-15T19:24:34Z

torch/ao/quantization/fx/prepare.py

-                matched_pattern,
-                is_qat)
-        observer = act_post_process_ctr()
+        observer = qconfig.activation()


jerryzh168 · 2022-11-15T19:26:04Z

torch/ao/quantization/fx/utils.py

+            )
+            if not isinstance(activation_post_process, FixedQParamsObserver) and \
+                    not isinstance(activation_post_process, FixedQParamsFakeQuantize):
+                warnings.warn(("QConfig must specify a FixedQParamsObserver or a FixedQParamsFakeQuantize "


just to confirm, in future PRs do we want to print warning for all branches that return False?

Yes exactly

jerryzh168

LGTM, this makes our api cleaner and is also a great cleanup of the code base as well, thanks a lot

torch/ao/quantization/backend_config/qnnpack.py

Summary: When the BackendConfig was first introduced, `overwrite_output_observer` and `overwrite_output_fake_quantize` were added to ensure fixed qparams ops like `torch.nn.Sigmoid` and `torch.nn.Tanh` used the correct observers and fake quantizes. However, this is hacky because the BackendConfig should not set the observer constructors themselves, but should instead specify only requirements on the observers. Later, #80184 added the correct observers to `get_default_qconfig_mapping` along with validation logic that throws an error if incorrect observers were specified. With this change, we no longer need to overwrite the observers from the BackendConfig, since we expect the user to pass in the correct observers for these ops. This commit removes these overwrite observer settings in the BackendConfig. Instead, we represent the observer constraints for fixed qparams ops through the existing DTypeWithConstraints mechanism. Note that, however, to be consistent with other DTypeWithConstraints checks, we no longer throw an error if an incorrect observer is specified, but simply ignore the offending QConfig and log a warning instead. This is the BC-breaking part of the change. BC-breaking notes: ``` from torch.ao.quantization.qconfig import default_qconfig from torch.ao.quantization.quantize_fx import prepare_fx model = ModelWithFixedQParamsOps() qconfig_mapping = QConfigMapping().set_global(default_qconfig) example_inputs = ... prepare_fx(model, qconfig_mapping, example_inputs) ``` Before this commit, running the above leads to an exception because the wrong observers are used for fixed qparams ops. After this commit, the above will only encounter a warning, and the fixed qparams ops will not be quantized. In both cases, switching to `get_default_qconfig_mapping` will cause the fixed qparams ops to be quantized. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo [ghstack-poisoned]

Summary: When the BackendConfig was first introduced, `overwrite_output_observer` and `overwrite_output_fake_quantize` were added to ensure fixed qparams ops like `torch.nn.Sigmoid` and `torch.nn.Tanh` used the correct observers and fake quantizes. However, this is hacky because the BackendConfig should not set the observer constructors themselves, but should instead specify only requirements on the observers. Later, #80184 added the correct observers to `get_default_qconfig_mapping` along with validation logic that throws an error if incorrect observers were specified. With this change, we no longer need to overwrite the observers from the BackendConfig, since we expect the user to pass in the correct observers for these ops. This commit removes these overwrite observer settings in the BackendConfig. Instead, we represent the observer constraints for fixed qparams ops through the existing DTypeWithConstraints mechanism. Note that, however, to be consistent with other DTypeWithConstraints checks, we no longer throw an error if an incorrect observer is specified, but simply ignore the offending QConfig and log a warning instead. This is the BC-breaking part of the change. BC-breaking notes: ``` from torch.ao.quantization.qconfig import default_qconfig from torch.ao.quantization.quantize_fx import prepare_fx model = ModelWithFixedQParamsOps() qconfig_mapping = QConfigMapping().set_global(default_qconfig) example_inputs = ... prepare_fx(model, qconfig_mapping, example_inputs) ``` Before this commit, running the above leads to an exception because the wrong observers are used for fixed qparams ops. After this commit, the above will only encounter a warning, and the fixed qparams ops will not be quantized. In both cases, switching to `get_default_qconfig_mapping` will cause the fixed qparams ops to be quantized. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo ghstack-source-id: 5ef5274187c5fa199adac8f0cd2fb78e8f2c2ed7 Pull Request resolved: #88620

andrewor14 · 2022-11-16T15:20:08Z

@pytorchbot merge

pytorchmergebot · 2022-11-16T15:21:57Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Summary: When the BackendConfig was first introduced, `overwrite_output_observer` and `overwrite_output_fake_quantize` were added to ensure fixed qparams ops like `torch.nn.Sigmoid` and `torch.nn.Tanh` used the correct observers and fake quantizes. However, this is hacky because the BackendConfig should not set the observer constructors themselves, but should instead specify only requirements on the observers. Later, pytorch#80184 added the correct observers to `get_default_qconfig_mapping` along with validation logic that throws an error if incorrect observers were specified. With this change, we no longer need to overwrite the observers from the BackendConfig, since we expect the user to pass in the correct observers for these ops. This commit removes these overwrite observer settings in the BackendConfig. Instead, we represent the observer constraints for fixed qparams ops through the existing DTypeWithConstraints mechanism. Note that, however, to be consistent with other DTypeWithConstraints checks, we no longer throw an error if an incorrect observer is specified, but simply ignore the offending QConfig and log a warning instead. This is the BC-breaking part of the change. BC-breaking notes: ``` from torch.ao.quantization.qconfig import default_qconfig from torch.ao.quantization.quantize_fx import prepare_fx model = ModelWithFixedQParamsOps() qconfig_mapping = QConfigMapping().set_global(default_qconfig) example_inputs = ... prepare_fx(model, qconfig_mapping, example_inputs) ``` Before this commit, running the above leads to an exception because the wrong observers are used for fixed qparams ops. After this commit, the above will only encounter a warning, and the fixed qparams ops will not be quantized. In both cases, switching to `get_default_qconfig_mapping` will cause the fixed qparams ops to be quantized. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo Pull Request resolved: pytorch#88620 Approved by: https://github.com/jerryzh168

andrewor14 requested review from jerryzh168 and z-a-f as code owners November 7, 2022 23:40

pytorch-bot bot added the release notes: quantization release notes category label Nov 7, 2022

github-actions bot added the oncall: quantization Quantization support in PyTorch label Nov 7, 2022

andrewor14 requested a review from vkuzo November 7, 2022 23:41

vkuzo reviewed Nov 9, 2022

View reviewed changes

torch/ao/quantization/backend_config/_common_operator_config_utils.py Outdated Show resolved Hide resolved

vkuzo reviewed Nov 9, 2022

View reviewed changes

torch/ao/quantization/backend_config/backend_config.py Outdated Show resolved Hide resolved

andrewor14 requested a review from vkuzo November 10, 2022 16:31

jerryzh168 mentioned this pull request Nov 12, 2022

[Quant] Add fused linear-leaky_relu op for onednn backend #88478

Closed

jerryzh168 reviewed Nov 15, 2022

View reviewed changes

torch/ao/quantization/fx/utils.py Outdated Show resolved Hide resolved

vkuzo reviewed Nov 15, 2022

View reviewed changes

torch/ao/quantization/backend_config/README.md Outdated Show resolved Hide resolved

jerryzh168 reviewed Nov 15, 2022

View reviewed changes

jerryzh168 approved these changes Nov 15, 2022

View reviewed changes

jerryzh168 reviewed Nov 15, 2022

View reviewed changes

torch/ao/quantization/backend_config/qnnpack.py Outdated Show resolved Hide resolved

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 16, 2022

andrewor14 added the topic: improvements topic category label Nov 16, 2022

pytorchmergebot added the Merged label Nov 16, 2022

pytorchmergebot closed this in 6180179 Nov 16, 2022

facebook-github-bot deleted the gh/andrewor14/33/head branch June 8, 2023 15:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Quant][bc-breaking] Remove overwrite_output_observer #88620

[Quant][bc-breaking] Remove overwrite_output_observer #88620

andrewor14 commented Nov 7, 2022 •

edited by pytorch-bot bot

pytorch-bot bot commented Nov 7, 2022 •

edited

vkuzo commented Nov 9, 2022

andrewor14 commented Nov 10, 2022 •

edited

vkuzo commented Nov 10, 2022

andrewor14 commented Nov 10, 2022 •

edited

jerryzh168 commented Nov 10, 2022 •

edited

vkuzo commented Nov 10, 2022

jerryzh168 commented Nov 10, 2022 •

edited

jerryzh168 commented Nov 10, 2022

vkuzo commented Nov 10, 2022

andrewor14 commented Nov 11, 2022

vkuzo commented Nov 11, 2022

jerryzh168 Nov 15, 2022

jerryzh168 Nov 15, 2022

andrewor14 Nov 15, 2022

jerryzh168 left a comment

andrewor14 commented Nov 16, 2022

pytorchmergebot commented Nov 16, 2022

[Quant][bc-breaking] Remove overwrite_output_observer #88620

[Quant][bc-breaking] Remove overwrite_output_observer #88620

Conversation

andrewor14 commented Nov 7, 2022 • edited by pytorch-bot bot

pytorch-bot bot commented Nov 7, 2022 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/88620

✅ No Failures

vkuzo commented Nov 9, 2022

andrewor14 commented Nov 10, 2022 • edited

vkuzo commented Nov 10, 2022

andrewor14 commented Nov 10, 2022 • edited

jerryzh168 commented Nov 10, 2022 • edited

vkuzo commented Nov 10, 2022

jerryzh168 commented Nov 10, 2022 • edited

jerryzh168 commented Nov 10, 2022

vkuzo commented Nov 10, 2022

andrewor14 commented Nov 11, 2022

vkuzo commented Nov 11, 2022

jerryzh168 Nov 15, 2022

Choose a reason for hiding this comment

jerryzh168 Nov 15, 2022

Choose a reason for hiding this comment

andrewor14 Nov 15, 2022

Choose a reason for hiding this comment

jerryzh168 left a comment

Choose a reason for hiding this comment

andrewor14 commented Nov 16, 2022

pytorchmergebot commented Nov 16, 2022

Merge started

andrewor14 commented Nov 7, 2022 •

edited by pytorch-bot bot

pytorch-bot bot commented Nov 7, 2022 •

edited

andrewor14 commented Nov 10, 2022 •

edited

andrewor14 commented Nov 10, 2022 •

edited

jerryzh168 commented Nov 10, 2022 •

edited

jerryzh168 commented Nov 10, 2022 •

edited