Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Quant][bc-breaking] Remove overwrite_output_observer #88620

Closed
wants to merge 8 commits into from

Conversation

andrewor14
Copy link
Contributor

@andrewor14 andrewor14 commented Nov 7, 2022

Stack from ghstack (oldest at bottom):

Summary: When the BackendConfig was first introduced,
overwrite_output_observer and overwrite_output_fake_quantize
were added to ensure fixed qparams ops like torch.nn.Sigmoid
and torch.nn.Tanh used the correct observers and fake quantizes.
However, this is hacky because the BackendConfig should not set
the observer constructors themselves, but should instead specify
only requirements on the observers.

Later, #80184 added the
correct observers to get_default_qconfig_mapping along with
validation logic that throws an error if incorrect observers
were specified. With this change, we no longer need to overwrite
the observers from the BackendConfig, since we expect the user to
pass in the correct observers for these ops.

This commit removes these overwrite observer settings in the
BackendConfig. Instead, we represent the observer constraints for
fixed qparams ops through the existing DTypeWithConstraints
mechanism. Note that, however, to be consistent with other
DTypeWithConstraints checks, we no longer throw an error if an
incorrect observer is specified, but simply ignore the offending
QConfig and log a warning instead. This is the BC-breaking part
of the change.

BC-breaking notes:

from torch.ao.quantization.qconfig import default_qconfig
from torch.ao.quantization.quantize_fx import prepare_fx

model = ModelWithFixedQParamsOps()
qconfig_mapping = QConfigMapping().set_global(default_qconfig)
example_inputs = ...
prepare_fx(model, qconfig_mapping, example_inputs)

Before this commit, running the above leads to an exception
because the wrong observers are used for fixed qparams ops.
After this commit, the above will only encounter a warning,
and the fixed qparams ops will not be quantized. In both cases,
switching to get_default_qconfig_mapping will cause the
fixed qparams ops to be quantized.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

cc @jerryzh168 @jianyuh @raghuramank100 @jamesr66a @vkuzo @jgong5 @Xia-Weiwen @leslie-fang-intel

Summary: When the BackendConfig was first introduced,
`overwrite_output_observer` and `overwrite_output_fake_quantize`
were added to ensure fixed qparams ops like `torch.nn.Sigmoid`
and `torch.nn.Tanh` used the correct observers and fake quantizes.
However, this is hacky because the BackendConfig should only
specify requirements on observers, but not the observer
constructors themselves.

Later, #80184 added the
correct observers to `get_default_qconfig_mapping` along with
validation logic that throws an error if incorrect observers
were specified. With this change, we no longer need to overwrite
the observers specified by the user.

This commit removes these observer settings in the BackendConfig.
Instead, we represent the observer constraints for fixed qparams
ops through the existing DTypeWithConstraints mechanism. To be
consistent with other DTypeWithConstraints checks, we no longer
throw an error if an incorrect observer is specified, but simply
ignore the offending QConfig and log a warning instead. This is
the BC-breaking part of the change.

BC-breaking notes:

```
from torch.ao.quantization.qconfig import default_qconfig
from torch.ao.quantization.quantize_fx import prepare_fx

model = ModelWithFixedQParamsOps()
qconfig_mapping = QConfigMapping().set_global(default_qconfig)
example_inputs = ...
prepare_fx(model, qconfig_mapping, example_inputs)
```

Before this commit, running the above leads to an exception
because the wrong observers are used for fixed qparams ops.
After this commit, the above will only encounter a warning,
and the fixed qparams ops will not be quantized. In both cases,
switching to `get_default_qconfig_mapping` will cause the
fixed qparams ops to be quantized.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

[ghstack-poisoned]
@pytorch-bot pytorch-bot bot added the release notes: quantization release notes category label Nov 7, 2022
@pytorch-bot
Copy link

pytorch-bot bot commented Nov 7, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/88620

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 59a8060:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

andrewor14 added a commit that referenced this pull request Nov 7, 2022
Summary: When the BackendConfig was first introduced,
`overwrite_output_observer` and `overwrite_output_fake_quantize`
were added to ensure fixed qparams ops like `torch.nn.Sigmoid`
and `torch.nn.Tanh` used the correct observers and fake quantizes.
However, this is hacky because the BackendConfig should only
specify requirements on observers, but not the observer
constructors themselves.

Later, #80184 added the
correct observers to `get_default_qconfig_mapping` along with
validation logic that throws an error if incorrect observers
were specified. With this change, we no longer need to overwrite
the observers specified by the user.

This commit removes these observer settings in the BackendConfig.
Instead, we represent the observer constraints for fixed qparams
ops through the existing DTypeWithConstraints mechanism. To be
consistent with other DTypeWithConstraints checks, we no longer
throw an error if an incorrect observer is specified, but simply
ignore the offending QConfig and log a warning instead. This is
the BC-breaking part of the change.

BC-breaking notes:

```
from torch.ao.quantization.qconfig import default_qconfig
from torch.ao.quantization.quantize_fx import prepare_fx

model = ModelWithFixedQParamsOps()
qconfig_mapping = QConfigMapping().set_global(default_qconfig)
example_inputs = ...
prepare_fx(model, qconfig_mapping, example_inputs)
```

Before this commit, running the above leads to an exception
because the wrong observers are used for fixed qparams ops.
After this commit, the above will only encounter a warning,
and the fixed qparams ops will not be quantized. In both cases,
switching to `get_default_qconfig_mapping` will cause the
fixed qparams ops to be quantized.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

ghstack-source-id: 773172e0eb88b32c4e527d41334077f9ae0e89c5
Pull Request resolved: #88620
@github-actions github-actions bot added the oncall: quantization Quantization support in PyTorch label Nov 7, 2022
@andrewor14 andrewor14 requested a review from vkuzo November 7, 2022 23:41
Summary: When the BackendConfig was first introduced,
`overwrite_output_observer` and `overwrite_output_fake_quantize`
were added to ensure fixed qparams ops like `torch.nn.Sigmoid`
and `torch.nn.Tanh` used the correct observers and fake quantizes.
However, this is hacky because the BackendConfig should only
specify requirements on observers, but not the observer
constructors themselves.

Later, #80184 added the
correct observers to `get_default_qconfig_mapping` along with
validation logic that throws an error if incorrect observers
were specified. With this change, we no longer need to overwrite
the observers specified by the user.

This commit removes these observer settings in the BackendConfig.
Instead, we represent the observer constraints for fixed qparams
ops through the existing DTypeWithConstraints mechanism. To be
consistent with other DTypeWithConstraints checks, we no longer
throw an error if an incorrect observer is specified, but simply
ignore the offending QConfig and log a warning instead. This is
the BC-breaking part of the change.

BC-breaking notes:

```
from torch.ao.quantization.qconfig import default_qconfig
from torch.ao.quantization.quantize_fx import prepare_fx

model = ModelWithFixedQParamsOps()
qconfig_mapping = QConfigMapping().set_global(default_qconfig)
example_inputs = ...
prepare_fx(model, qconfig_mapping, example_inputs)
```

Before this commit, running the above leads to an exception
because the wrong observers are used for fixed qparams ops.
After this commit, the above will only encounter a warning,
and the fixed qparams ops will not be quantized. In both cases,
switching to `get_default_qconfig_mapping` will cause the
fixed qparams ops to be quantized.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

cc jerryzh168 jianyuh raghuramank100 jamesr66a vkuzo jgong5 Xia-Weiwen leslie-fang-intel

[ghstack-poisoned]
andrewor14 added a commit that referenced this pull request Nov 8, 2022
Summary: When the BackendConfig was first introduced,
`overwrite_output_observer` and `overwrite_output_fake_quantize`
were added to ensure fixed qparams ops like `torch.nn.Sigmoid`
and `torch.nn.Tanh` used the correct observers and fake quantizes.
However, this is hacky because the BackendConfig should only
specify requirements on observers, but not the observer
constructors themselves.

Later, #80184 added the
correct observers to `get_default_qconfig_mapping` along with
validation logic that throws an error if incorrect observers
were specified. With this change, we no longer need to overwrite
the observers specified by the user.

This commit removes these observer settings in the BackendConfig.
Instead, we represent the observer constraints for fixed qparams
ops through the existing DTypeWithConstraints mechanism. To be
consistent with other DTypeWithConstraints checks, we no longer
throw an error if an incorrect observer is specified, but simply
ignore the offending QConfig and log a warning instead. This is
the BC-breaking part of the change.

BC-breaking notes:

```
from torch.ao.quantization.qconfig import default_qconfig
from torch.ao.quantization.quantize_fx import prepare_fx

model = ModelWithFixedQParamsOps()
qconfig_mapping = QConfigMapping().set_global(default_qconfig)
example_inputs = ...
prepare_fx(model, qconfig_mapping, example_inputs)
```

Before this commit, running the above leads to an exception
because the wrong observers are used for fixed qparams ops.
After this commit, the above will only encounter a warning,
and the fixed qparams ops will not be quantized. In both cases,
switching to `get_default_qconfig_mapping` will cause the
fixed qparams ops to be quantized.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

ghstack-source-id: fb72078aa1c42d594660dcb16d4f9823eaad664a
Pull Request resolved: #88620
Summary: When the BackendConfig was first introduced,
`overwrite_output_observer` and `overwrite_output_fake_quantize`
were added to ensure fixed qparams ops like `torch.nn.Sigmoid`
and `torch.nn.Tanh` used the correct observers and fake quantizes.
However, this is hacky because the BackendConfig should not set
the observer constructors themselves, but should instead specify
only requirements on the observers.

Later, #80184 added the
correct observers to `get_default_qconfig_mapping` along with
validation logic that throws an error if incorrect observers
were specified. With this change, we no longer need to overwrite
the observers from the BackendConfig, since we expect the user to
pass in the correct observers for these ops.

This commit removes these overwrite observer settings in the
BackendConfig. Instead, we represent the observer constraints for
fixed qparams ops through the existing DTypeWithConstraints
mechanism. Note that, however, to be consistent with other
DTypeWithConstraints checks, we no longer throw an error if an
incorrect observer is specified, but simply ignore the offending
QConfig and log a warning instead. This is the BC-breaking part
of the change.

BC-breaking notes:

```
from torch.ao.quantization.qconfig import default_qconfig
from torch.ao.quantization.quantize_fx import prepare_fx

model = ModelWithFixedQParamsOps()
qconfig_mapping = QConfigMapping().set_global(default_qconfig)
example_inputs = ...
prepare_fx(model, qconfig_mapping, example_inputs)
```

Before this commit, running the above leads to an exception
because the wrong observers are used for fixed qparams ops.
After this commit, the above will only encounter a warning,
and the fixed qparams ops will not be quantized. In both cases,
switching to `get_default_qconfig_mapping` will cause the
fixed qparams ops to be quantized.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

[ghstack-poisoned]
andrewor14 added a commit that referenced this pull request Nov 8, 2022
Summary: When the BackendConfig was first introduced,
`overwrite_output_observer` and `overwrite_output_fake_quantize`
were added to ensure fixed qparams ops like `torch.nn.Sigmoid`
and `torch.nn.Tanh` used the correct observers and fake quantizes.
However, this is hacky because the BackendConfig should not set
the observer constructors themselves, but should instead specify
only requirements on the observers.

Later, #80184 added the
correct observers to `get_default_qconfig_mapping` along with
validation logic that throws an error if incorrect observers
were specified. With this change, we no longer need to overwrite
the observers from the BackendConfig, since we expect the user to
pass in the correct observers for these ops.

This commit removes these overwrite observer settings in the
BackendConfig. Instead, we represent the observer constraints for
fixed qparams ops through the existing DTypeWithConstraints
mechanism. Note that, however, to be consistent with other
DTypeWithConstraints checks, we no longer throw an error if an
incorrect observer is specified, but simply ignore the offending
QConfig and log a warning instead. This is the BC-breaking part
of the change.

BC-breaking notes:

```
from torch.ao.quantization.qconfig import default_qconfig
from torch.ao.quantization.quantize_fx import prepare_fx

model = ModelWithFixedQParamsOps()
qconfig_mapping = QConfigMapping().set_global(default_qconfig)
example_inputs = ...
prepare_fx(model, qconfig_mapping, example_inputs)
```

Before this commit, running the above leads to an exception
because the wrong observers are used for fixed qparams ops.
After this commit, the above will only encounter a warning,
and the fixed qparams ops will not be quantized. In both cases,
switching to `get_default_qconfig_mapping` will cause the
fixed qparams ops to be quantized.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

ghstack-source-id: fb72078aa1c42d594660dcb16d4f9823eaad664a
Pull Request resolved: #88620
@vkuzo
Copy link
Contributor

vkuzo commented Nov 9, 2022

Note that, however, to be consistent with other
DTypeWithConstraints checks, we no longer throw an error if an
incorrect observer is specified, but simply ignore the offending
QConfig and log a warning instead.

Not sure I understand this yet. Could you provide a specific example? As a user, I would either expect things to work as I configured or to see an exception thrown, a warning and behavior change seems surprising

@andrewor14
Copy link
Contributor Author

andrewor14 commented Nov 10, 2022

Note that, however, to be consistent with other
DTypeWithConstraints checks, we no longer throw an error if an
incorrect observer is specified, but simply ignore the offending
QConfig and log a warning instead.

Not sure I understand this yet. Could you provide a specific example? As a user, I would either expect things to work as I configured or to see an exception thrown, a warning and behavior change seems surprising

Sure. Today we check whether a qconfig satisfies the dtype constraints set by the backend here when we insert observers. The contract is if the qconfig doesn't satisfy these constraints, then we will simply log a warning and not insert the observers. We do this today for quant range and scale range constraints.

This PR in its current state reuses this mechanism to ensure fixed qparams ops have the right observers inserted. We have three options here:

  1. To be consistent with how other dtype constraints are handled, ignore the QConfig if the observers don’t have matching fixed qparams. This breaks BC and no longer throws an exception if the user provides a bad observer.

 (This PR currently)

  2. Keep BC and continue to throw an exception if the user provides a bad observer. However, this now complicates how DTypeWithConstraints are validated, since we now log warnings for some attributes (e.g. quant range) but throw errors for others (e.g. fixed scale/zp)


  3. Do not use DTypeWithConstraints to validate fixed qparams ops' observers

@jerryzh168 and I had a brief discussion about this and we thought maybe making the dtype constraints consistent (Option 1) is more important than throwing the error (Option 2). Another option (3) is to introduce some other mechanism to do this validation, but it seems more complicated. What are your thoughts on this @vkuzo ?

@vkuzo
Copy link
Contributor

vkuzo commented Nov 10, 2022

Sure. Today we check whether a qconfig satisfies the dtype constraints set by the backend here when we insert observers. The contract is if the qconfig doesn't satisfy these constraints, then we will simply log a warning and not insert the observers. We do this today for quant range and scale range constraints.

If I'm a user and I specify a qconfig with some constraints, I would expect for my configuration to either be honored, or to see an exception. I wouldn't expect to have to look at warnings to see if the framework did something different from what I asked. Can we just always throw an exception and remove the "silently change behavior and log a warning" cases?

@andrewor14
Copy link
Contributor Author

andrewor14 commented Nov 10, 2022

If I'm a user and I specify a qconfig with some constraints, I would expect for my configuration to either be honored, or to see an exception. I wouldn't expect to have to look at warnings to see if the framework did something different from what I asked. Can we just always throw an exception and remove the "silently change behavior and log a warning" cases?

OK, that seems like a broader change, since it also changes the existing behavior for other dtype constraints (e.g. quant range). I think that was also the original proposal in #85200 but we ended up deciding to do the warning thing instead. Maybe we should first align on how dtype constraints should be handled. @jerryzh168 Should we change the checks here to throw exceptions?

@jerryzh168
Copy link
Contributor

jerryzh168 commented Nov 10, 2022

I think it comes down to how we view the relationship of backend config and qconfig_mapping, we can start with a simple example, e.g. QConfigMapping().set_global(default_qconfig), this means: user is requesting to quantize the whole model (all operators in the model) with default_qconfig, but today how we handle this is we look at what is the supported config in backend_config and only quantize the operators that supports default_qconfig.

Error out when the requested operator + qconfig is not supported in backend_config would be a big change in semantics and might break a lot of models I think, since then we also need to define what are all the operators in the model, we are operating in torch op IR, so "all operators" might be very broad, I feel there will be operators in the model that's not in backend_config, and we also don't want to quantize them. Maybe we can check a few models to see if this would make sense or not first?

@vkuzo
Copy link
Contributor

vkuzo commented Nov 10, 2022

we look at what is the supported config in backend_config and only quantize the operators that supports default_qconfig.

I think this is reasonable if it's applied consistently. Would we feel okay with removing the exception thrown from prepare.py::_validate_fixed_qparams_qconfigs then, to ensure this works the same way for all ops?

@jerryzh168
Copy link
Contributor

jerryzh168 commented Nov 10, 2022

In summary, let's say we have
qconfig_mapping: specifies (op, qconfig)
backend_config: have config entry for (op, dtype_constraints)

Current Behavior:
case 1 (op, qconfig) in qconfig_mapping matches (op, dtype_constraints): quantize
case 2 (op, qconfig) in qconfig_mapping doesn't match (op, dtype_constraints): ignore or error depending on op
case 3 (op, qconfig) in qconfig_mapping, but no op configuration in backend_config: ignore

Proposed
case 1 (op, qconfig) in qconfig_mapping matches (op, dtype_constraints): quantize
case 2 (op, qconfig) in qconfig_mapping doesn't match (op, dtype_constraints): error
case 3 (op, qconfig) in qconfig_mapping, but no op configuration in backend_config: error

I'm afraid that there will be many ops in case 3 and it will break a lot of ops if we error out

and I agree that it's better to make case 2 consistent, I'm OK with warning/ignore/error, we can also discuss which one might make more sense.

@jerryzh168
Copy link
Contributor

we look at what is the supported config in backend_config and only quantize the operators that supports default_qconfig.

I think this is reasonable if it's applied consistently. Would we feel okay with removing the exception thrown from prepare.py::_validate_fixed_qparams_qconfigs then, to ensure this works the same way for all ops?

yeah agree we should do this consistently, I think we can remove exception for now to be consistent and this is also what this PR is trying to do?

@vkuzo
Copy link
Contributor

vkuzo commented Nov 10, 2022

I think we should document both https://pytorch.org/docs/master/generated/torch.ao.quantization.qconfig_mapping.QConfigMapping.html#torch.ao.quantization.qconfig_mapping.QConfigMapping and https://pytorch.org/docs/master/generated/torch.ao.quantization.backend_config.BackendConfig.html#torch.ao.quantization.backend_config.BackendConfig to clarify the behavior. I think what a user would assume is "honor or exception", let's clearly explain if this isn't the case.

Not quantizing if qconfig + backend_config don't align sounds fine. I think we should do the same thing for all ops, it's surprising to vary the behavior depending on the op.

@andrewor14
Copy link
Contributor Author

Ok, that sounds good. Just to confirm, what we decided on then is the following behavior, correct?

case 1 (op, qconfig) in qconfig_mapping matches (op, dtype_constraints): quantize
case 2 (op, qconfig) in qconfig_mapping doesn't match (op, dtype_constraints): ignore (+ warning)
case 3 (op, qconfig) in qconfig_mapping, but no op configuration in backend_config: ignore

I will add some more documentation to QConfigMapping and BackendConfig as well to clarify this relationship

@vkuzo
Copy link
Contributor

vkuzo commented Nov 11, 2022

case 1 (op, qconfig) in qconfig_mapping matches (op, dtype_constraints): quantize
case 2 (op, qconfig) in qconfig_mapping doesn't match (op, dtype_constraints): ignore (+ warning)
case 3 (op, qconfig) in qconfig_mapping, but no op configuration in backend_config: ignore

How about not separating case 2 and case 3? It doesn't have to be in this PR. I think a warning which is off by default but that the user could turn on to debug why things aren't getting quantized could make sense here. We already have a lot of log spew from quantization workflow APIs, so I don't think we should expect the user to pay attention to warnings unless they are actively trying to debug something.

Summary: When the BackendConfig was first introduced,
`overwrite_output_observer` and `overwrite_output_fake_quantize`
were added to ensure fixed qparams ops like `torch.nn.Sigmoid`
and `torch.nn.Tanh` used the correct observers and fake quantizes.
However, this is hacky because the BackendConfig should not set
the observer constructors themselves, but should instead specify
only requirements on the observers.

Later, #80184 added the
correct observers to `get_default_qconfig_mapping` along with
validation logic that throws an error if incorrect observers
were specified. With this change, we no longer need to overwrite
the observers from the BackendConfig, since we expect the user to
pass in the correct observers for these ops.

This commit removes these overwrite observer settings in the
BackendConfig. Instead, we represent the observer constraints for
fixed qparams ops through the existing DTypeWithConstraints
mechanism. Note that, however, to be consistent with other
DTypeWithConstraints checks, we no longer throw an error if an
incorrect observer is specified, but simply ignore the offending
QConfig and log a warning instead. This is the BC-breaking part
of the change.

BC-breaking notes:

```
from torch.ao.quantization.qconfig import default_qconfig
from torch.ao.quantization.quantize_fx import prepare_fx

model = ModelWithFixedQParamsOps()
qconfig_mapping = QConfigMapping().set_global(default_qconfig)
example_inputs = ...
prepare_fx(model, qconfig_mapping, example_inputs)
```

Before this commit, running the above leads to an exception
because the wrong observers are used for fixed qparams ops.
After this commit, the above will only encounter a warning,
and the fixed qparams ops will not be quantized. In both cases,
switching to `get_default_qconfig_mapping` will cause the
fixed qparams ops to be quantized.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

[ghstack-poisoned]
andrewor14 added a commit that referenced this pull request Nov 14, 2022
Summary: When the BackendConfig was first introduced,
`overwrite_output_observer` and `overwrite_output_fake_quantize`
were added to ensure fixed qparams ops like `torch.nn.Sigmoid`
and `torch.nn.Tanh` used the correct observers and fake quantizes.
However, this is hacky because the BackendConfig should not set
the observer constructors themselves, but should instead specify
only requirements on the observers.

Later, #80184 added the
correct observers to `get_default_qconfig_mapping` along with
validation logic that throws an error if incorrect observers
were specified. With this change, we no longer need to overwrite
the observers from the BackendConfig, since we expect the user to
pass in the correct observers for these ops.

This commit removes these overwrite observer settings in the
BackendConfig. Instead, we represent the observer constraints for
fixed qparams ops through the existing DTypeWithConstraints
mechanism. Note that, however, to be consistent with other
DTypeWithConstraints checks, we no longer throw an error if an
incorrect observer is specified, but simply ignore the offending
QConfig and log a warning instead. This is the BC-breaking part
of the change.

BC-breaking notes:

```
from torch.ao.quantization.qconfig import default_qconfig
from torch.ao.quantization.quantize_fx import prepare_fx

model = ModelWithFixedQParamsOps()
qconfig_mapping = QConfigMapping().set_global(default_qconfig)
example_inputs = ...
prepare_fx(model, qconfig_mapping, example_inputs)
```

Before this commit, running the above leads to an exception
because the wrong observers are used for fixed qparams ops.
After this commit, the above will only encounter a warning,
and the fixed qparams ops will not be quantized. In both cases,
switching to `get_default_qconfig_mapping` will cause the
fixed qparams ops to be quantized.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

ghstack-source-id: 3b5e8024601c27294befc37cef0ac2d4ae0678d2
Pull Request resolved: #88620
Summary: When the BackendConfig was first introduced,
`overwrite_output_observer` and `overwrite_output_fake_quantize`
were added to ensure fixed qparams ops like `torch.nn.Sigmoid`
and `torch.nn.Tanh` used the correct observers and fake quantizes.
However, this is hacky because the BackendConfig should not set
the observer constructors themselves, but should instead specify
only requirements on the observers.

Later, #80184 added the
correct observers to `get_default_qconfig_mapping` along with
validation logic that throws an error if incorrect observers
were specified. With this change, we no longer need to overwrite
the observers from the BackendConfig, since we expect the user to
pass in the correct observers for these ops.

This commit removes these overwrite observer settings in the
BackendConfig. Instead, we represent the observer constraints for
fixed qparams ops through the existing DTypeWithConstraints
mechanism. Note that, however, to be consistent with other
DTypeWithConstraints checks, we no longer throw an error if an
incorrect observer is specified, but simply ignore the offending
QConfig and log a warning instead. This is the BC-breaking part
of the change.

BC-breaking notes:

```
from torch.ao.quantization.qconfig import default_qconfig
from torch.ao.quantization.quantize_fx import prepare_fx

model = ModelWithFixedQParamsOps()
qconfig_mapping = QConfigMapping().set_global(default_qconfig)
example_inputs = ...
prepare_fx(model, qconfig_mapping, example_inputs)
```

Before this commit, running the above leads to an exception
because the wrong observers are used for fixed qparams ops.
After this commit, the above will only encounter a warning,
and the fixed qparams ops will not be quantized. In both cases,
switching to `get_default_qconfig_mapping` will cause the
fixed qparams ops to be quantized.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

[ghstack-poisoned]
andrewor14 added a commit that referenced this pull request Nov 14, 2022
Summary: When the BackendConfig was first introduced,
`overwrite_output_observer` and `overwrite_output_fake_quantize`
were added to ensure fixed qparams ops like `torch.nn.Sigmoid`
and `torch.nn.Tanh` used the correct observers and fake quantizes.
However, this is hacky because the BackendConfig should not set
the observer constructors themselves, but should instead specify
only requirements on the observers.

Later, #80184 added the
correct observers to `get_default_qconfig_mapping` along with
validation logic that throws an error if incorrect observers
were specified. With this change, we no longer need to overwrite
the observers from the BackendConfig, since we expect the user to
pass in the correct observers for these ops.

This commit removes these overwrite observer settings in the
BackendConfig. Instead, we represent the observer constraints for
fixed qparams ops through the existing DTypeWithConstraints
mechanism. Note that, however, to be consistent with other
DTypeWithConstraints checks, we no longer throw an error if an
incorrect observer is specified, but simply ignore the offending
QConfig and log a warning instead. This is the BC-breaking part
of the change.

BC-breaking notes:

```
from torch.ao.quantization.qconfig import default_qconfig
from torch.ao.quantization.quantize_fx import prepare_fx

model = ModelWithFixedQParamsOps()
qconfig_mapping = QConfigMapping().set_global(default_qconfig)
example_inputs = ...
prepare_fx(model, qconfig_mapping, example_inputs)
```

Before this commit, running the above leads to an exception
because the wrong observers are used for fixed qparams ops.
After this commit, the above will only encounter a warning,
and the fixed qparams ops will not be quantized. In both cases,
switching to `get_default_qconfig_mapping` will cause the
fixed qparams ops to be quantized.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

ghstack-source-id: 8f60c9b305a4d455a3204854f9beaab467d82cab
Pull Request resolved: #88620
Summary: When the BackendConfig was first introduced,
`overwrite_output_observer` and `overwrite_output_fake_quantize`
were added to ensure fixed qparams ops like `torch.nn.Sigmoid`
and `torch.nn.Tanh` used the correct observers and fake quantizes.
However, this is hacky because the BackendConfig should not set
the observer constructors themselves, but should instead specify
only requirements on the observers.

Later, #80184 added the
correct observers to `get_default_qconfig_mapping` along with
validation logic that throws an error if incorrect observers
were specified. With this change, we no longer need to overwrite
the observers from the BackendConfig, since we expect the user to
pass in the correct observers for these ops.

This commit removes these overwrite observer settings in the
BackendConfig. Instead, we represent the observer constraints for
fixed qparams ops through the existing DTypeWithConstraints
mechanism. Note that, however, to be consistent with other
DTypeWithConstraints checks, we no longer throw an error if an
incorrect observer is specified, but simply ignore the offending
QConfig and log a warning instead. This is the BC-breaking part
of the change.

BC-breaking notes:

```
from torch.ao.quantization.qconfig import default_qconfig
from torch.ao.quantization.quantize_fx import prepare_fx

model = ModelWithFixedQParamsOps()
qconfig_mapping = QConfigMapping().set_global(default_qconfig)
example_inputs = ...
prepare_fx(model, qconfig_mapping, example_inputs)
```

Before this commit, running the above leads to an exception
because the wrong observers are used for fixed qparams ops.
After this commit, the above will only encounter a warning,
and the fixed qparams ops will not be quantized. In both cases,
switching to `get_default_qconfig_mapping` will cause the
fixed qparams ops to be quantized.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

[ghstack-poisoned]
andrewor14 added a commit that referenced this pull request Nov 14, 2022
Summary: When the BackendConfig was first introduced,
`overwrite_output_observer` and `overwrite_output_fake_quantize`
were added to ensure fixed qparams ops like `torch.nn.Sigmoid`
and `torch.nn.Tanh` used the correct observers and fake quantizes.
However, this is hacky because the BackendConfig should not set
the observer constructors themselves, but should instead specify
only requirements on the observers.

Later, #80184 added the
correct observers to `get_default_qconfig_mapping` along with
validation logic that throws an error if incorrect observers
were specified. With this change, we no longer need to overwrite
the observers from the BackendConfig, since we expect the user to
pass in the correct observers for these ops.

This commit removes these overwrite observer settings in the
BackendConfig. Instead, we represent the observer constraints for
fixed qparams ops through the existing DTypeWithConstraints
mechanism. Note that, however, to be consistent with other
DTypeWithConstraints checks, we no longer throw an error if an
incorrect observer is specified, but simply ignore the offending
QConfig and log a warning instead. This is the BC-breaking part
of the change.

BC-breaking notes:

```
from torch.ao.quantization.qconfig import default_qconfig
from torch.ao.quantization.quantize_fx import prepare_fx

model = ModelWithFixedQParamsOps()
qconfig_mapping = QConfigMapping().set_global(default_qconfig)
example_inputs = ...
prepare_fx(model, qconfig_mapping, example_inputs)
```

Before this commit, running the above leads to an exception
because the wrong observers are used for fixed qparams ops.
After this commit, the above will only encounter a warning,
and the fixed qparams ops will not be quantized. In both cases,
switching to `get_default_qconfig_mapping` will cause the
fixed qparams ops to be quantized.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

ghstack-source-id: bf01ac44e1626abef92a91ff9fb785625be7fa0d
Pull Request resolved: #88620
Summary: When the BackendConfig was first introduced,
`overwrite_output_observer` and `overwrite_output_fake_quantize`
were added to ensure fixed qparams ops like `torch.nn.Sigmoid`
and `torch.nn.Tanh` used the correct observers and fake quantizes.
However, this is hacky because the BackendConfig should not set
the observer constructors themselves, but should instead specify
only requirements on the observers.

Later, #80184 added the
correct observers to `get_default_qconfig_mapping` along with
validation logic that throws an error if incorrect observers
were specified. With this change, we no longer need to overwrite
the observers from the BackendConfig, since we expect the user to
pass in the correct observers for these ops.

This commit removes these overwrite observer settings in the
BackendConfig. Instead, we represent the observer constraints for
fixed qparams ops through the existing DTypeWithConstraints
mechanism. Note that, however, to be consistent with other
DTypeWithConstraints checks, we no longer throw an error if an
incorrect observer is specified, but simply ignore the offending
QConfig and log a warning instead. This is the BC-breaking part
of the change.

BC-breaking notes:

```
from torch.ao.quantization.qconfig import default_qconfig
from torch.ao.quantization.quantize_fx import prepare_fx

model = ModelWithFixedQParamsOps()
qconfig_mapping = QConfigMapping().set_global(default_qconfig)
example_inputs = ...
prepare_fx(model, qconfig_mapping, example_inputs)
```

Before this commit, running the above leads to an exception
because the wrong observers are used for fixed qparams ops.
After this commit, the above will only encounter a warning,
and the fixed qparams ops will not be quantized. In both cases,
switching to `get_default_qconfig_mapping` will cause the
fixed qparams ops to be quantized.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

[ghstack-poisoned]
andrewor14 added a commit that referenced this pull request Nov 15, 2022
Summary: When the BackendConfig was first introduced,
`overwrite_output_observer` and `overwrite_output_fake_quantize`
were added to ensure fixed qparams ops like `torch.nn.Sigmoid`
and `torch.nn.Tanh` used the correct observers and fake quantizes.
However, this is hacky because the BackendConfig should not set
the observer constructors themselves, but should instead specify
only requirements on the observers.

Later, #80184 added the
correct observers to `get_default_qconfig_mapping` along with
validation logic that throws an error if incorrect observers
were specified. With this change, we no longer need to overwrite
the observers from the BackendConfig, since we expect the user to
pass in the correct observers for these ops.

This commit removes these overwrite observer settings in the
BackendConfig. Instead, we represent the observer constraints for
fixed qparams ops through the existing DTypeWithConstraints
mechanism. Note that, however, to be consistent with other
DTypeWithConstraints checks, we no longer throw an error if an
incorrect observer is specified, but simply ignore the offending
QConfig and log a warning instead. This is the BC-breaking part
of the change.

BC-breaking notes:

```
from torch.ao.quantization.qconfig import default_qconfig
from torch.ao.quantization.quantize_fx import prepare_fx

model = ModelWithFixedQParamsOps()
qconfig_mapping = QConfigMapping().set_global(default_qconfig)
example_inputs = ...
prepare_fx(model, qconfig_mapping, example_inputs)
```

Before this commit, running the above leads to an exception
because the wrong observers are used for fixed qparams ops.
After this commit, the above will only encounter a warning,
and the fixed qparams ops will not be quantized. In both cases,
switching to `get_default_qconfig_mapping` will cause the
fixed qparams ops to be quantized.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

ghstack-source-id: 55991505ef46aaa365565fc5a286a48db39c024a
Pull Request resolved: #88620
matched_pattern,
is_qat)
observer = act_post_process_ctr()
observer = qconfig.activation()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

)
if not isinstance(activation_post_process, FixedQParamsObserver) and \
not isinstance(activation_post_process, FixedQParamsFakeQuantize):
warnings.warn(("QConfig must specify a FixedQParamsObserver or a FixedQParamsFakeQuantize "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to confirm, in future PRs do we want to print warning for all branches that return False?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes exactly

Copy link
Contributor

@jerryzh168 jerryzh168 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, this makes our api cleaner and is also a great cleanup of the code base as well, thanks a lot

Summary: When the BackendConfig was first introduced,
`overwrite_output_observer` and `overwrite_output_fake_quantize`
were added to ensure fixed qparams ops like `torch.nn.Sigmoid`
and `torch.nn.Tanh` used the correct observers and fake quantizes.
However, this is hacky because the BackendConfig should not set
the observer constructors themselves, but should instead specify
only requirements on the observers.

Later, #80184 added the
correct observers to `get_default_qconfig_mapping` along with
validation logic that throws an error if incorrect observers
were specified. With this change, we no longer need to overwrite
the observers from the BackendConfig, since we expect the user to
pass in the correct observers for these ops.

This commit removes these overwrite observer settings in the
BackendConfig. Instead, we represent the observer constraints for
fixed qparams ops through the existing DTypeWithConstraints
mechanism. Note that, however, to be consistent with other
DTypeWithConstraints checks, we no longer throw an error if an
incorrect observer is specified, but simply ignore the offending
QConfig and log a warning instead. This is the BC-breaking part
of the change.

BC-breaking notes:

```
from torch.ao.quantization.qconfig import default_qconfig
from torch.ao.quantization.quantize_fx import prepare_fx

model = ModelWithFixedQParamsOps()
qconfig_mapping = QConfigMapping().set_global(default_qconfig)
example_inputs = ...
prepare_fx(model, qconfig_mapping, example_inputs)
```

Before this commit, running the above leads to an exception
because the wrong observers are used for fixed qparams ops.
After this commit, the above will only encounter a warning,
and the fixed qparams ops will not be quantized. In both cases,
switching to `get_default_qconfig_mapping` will cause the
fixed qparams ops to be quantized.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

[ghstack-poisoned]
andrewor14 added a commit that referenced this pull request Nov 15, 2022
Summary: When the BackendConfig was first introduced,
`overwrite_output_observer` and `overwrite_output_fake_quantize`
were added to ensure fixed qparams ops like `torch.nn.Sigmoid`
and `torch.nn.Tanh` used the correct observers and fake quantizes.
However, this is hacky because the BackendConfig should not set
the observer constructors themselves, but should instead specify
only requirements on the observers.

Later, #80184 added the
correct observers to `get_default_qconfig_mapping` along with
validation logic that throws an error if incorrect observers
were specified. With this change, we no longer need to overwrite
the observers from the BackendConfig, since we expect the user to
pass in the correct observers for these ops.

This commit removes these overwrite observer settings in the
BackendConfig. Instead, we represent the observer constraints for
fixed qparams ops through the existing DTypeWithConstraints
mechanism. Note that, however, to be consistent with other
DTypeWithConstraints checks, we no longer throw an error if an
incorrect observer is specified, but simply ignore the offending
QConfig and log a warning instead. This is the BC-breaking part
of the change.

BC-breaking notes:

```
from torch.ao.quantization.qconfig import default_qconfig
from torch.ao.quantization.quantize_fx import prepare_fx

model = ModelWithFixedQParamsOps()
qconfig_mapping = QConfigMapping().set_global(default_qconfig)
example_inputs = ...
prepare_fx(model, qconfig_mapping, example_inputs)
```

Before this commit, running the above leads to an exception
because the wrong observers are used for fixed qparams ops.
After this commit, the above will only encounter a warning,
and the fixed qparams ops will not be quantized. In both cases,
switching to `get_default_qconfig_mapping` will cause the
fixed qparams ops to be quantized.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

ghstack-source-id: 5ef5274187c5fa199adac8f0cd2fb78e8f2c2ed7
Pull Request resolved: #88620
@andrewor14
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 16, 2022
@andrewor14 andrewor14 added the topic: improvements topic category label Nov 16, 2022
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request Dec 10, 2022
Summary: When the BackendConfig was first introduced,
`overwrite_output_observer` and `overwrite_output_fake_quantize`
were added to ensure fixed qparams ops like `torch.nn.Sigmoid`
and `torch.nn.Tanh` used the correct observers and fake quantizes.
However, this is hacky because the BackendConfig should not set
the observer constructors themselves, but should instead specify
only requirements on the observers.

Later, pytorch#80184 added the
correct observers to `get_default_qconfig_mapping` along with
validation logic that throws an error if incorrect observers
were specified. With this change, we no longer need to overwrite
the observers from the BackendConfig, since we expect the user to
pass in the correct observers for these ops.

This commit removes these overwrite observer settings in the
BackendConfig. Instead, we represent the observer constraints for
fixed qparams ops through the existing DTypeWithConstraints
mechanism. Note that, however, to be consistent with other
DTypeWithConstraints checks, we no longer throw an error if an
incorrect observer is specified, but simply ignore the offending
QConfig and log a warning instead. This is the BC-breaking part
of the change.

BC-breaking notes:

```
from torch.ao.quantization.qconfig import default_qconfig
from torch.ao.quantization.quantize_fx import prepare_fx

model = ModelWithFixedQParamsOps()
qconfig_mapping = QConfigMapping().set_global(default_qconfig)
example_inputs = ...
prepare_fx(model, qconfig_mapping, example_inputs)
```

Before this commit, running the above leads to an exception
because the wrong observers are used for fixed qparams ops.
After this commit, the above will only encounter a warning,
and the fixed qparams ops will not be quantized. In both cases,
switching to `get_default_qconfig_mapping` will cause the
fixed qparams ops to be quantized.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

Pull Request resolved: pytorch#88620
Approved by: https://github.com/jerryzh168
@facebook-github-bot facebook-github-bot deleted the gh/andrewor14/33/head branch June 8, 2023 15:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged oncall: quantization Quantization support in PyTorch release notes: quantization release notes category topic: improvements topic category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants