New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
eager quant: remove fake_quant after add/mul nodes during QAT #49213
Conversation
Summary: Changes behavior of FX graph mode quantization to insert a fake_quant after add and mul nodes. This is useful to model numerics better, since these nodes will be quantized during inference, and adding/multiplying by a scalar can result in values in between quantization bins during training. Test Plan: ``` python test/test_quantization.py TestQuantizeFxOps.test_quantized_add_qat python test/test_quantization.py TestQuantizeFxOps.test_quantized_mul_qat ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Changes behavior of FX graph mode quantization to insert a fake_quant after add and mul nodes. This is useful to model numerics better, since these nodes will be quantized during inference, and adding/multiplying by a scalar can result in values in between quantization bins during training. Test Plan: ``` python test/test_quantization.py TestQuantizeFxOps.test_quantized_add_qat python test/test_quantization.py TestQuantizeFxOps.test_quantized_mul_qat ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 2d662539da6de9d94c8707c73cfaff7eadb4f143 Pull Request resolved: #49213
💊 CI failures summary and remediationsAs of commit a620fe7 (more details on the Dr. CI page):
This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group. This comment has been revised 22 times. |
torch/quantization/fx/quantize.py
Outdated
else: | ||
# for QAT, we always insert a fake_quant after add/mul | ||
new_observer = qconfig.activation() | ||
insert_observer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'll need to use the same fake quantize module as the input here to properly simulate quantization since output of quantized::add with scalar input inherits quantization parameters from input
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I follow: The current logic inserts observers as long as one of the inputs is a tensor. why is this a problem for QAT?
we can match eager now and improve later. currently eager mode is incorrect, because the fake quantize is ignored, and the model before and after convert for the qat model would have different numerics. output of add_scalar/mul_scalar need to use the same fake quantize module as input so that the numerics of fake quantize matches the quantized op. |
Summary: Changes behavior of FX graph mode quantization to insert a fake_quant after add and mul nodes. This is useful to model numerics better, since these nodes will be quantized during inference, and adding/multiplying by a scalar can result in values in between quantization bins during training. Test Plan: ``` python test/test_quantization.py TestQuantizeFxOps.test_quantized_add_qat python test/test_quantization.py TestQuantizeFxOps.test_quantized_mul_qat ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D25486276](https://our.internmc.facebook.com/intern/diff/D25486276) [ghstack-poisoned]
self.conv2 = torch.nn.Conv2d(1, 1, 1) | ||
|
||
def forward(self, x): | ||
x = torch.add(x, 1.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the inputs and outputs quantized? In this case, shouldnt we have 5 fake-quants? (add, conv1, add-relu, conv2, output)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
output of add/input of conv1 is considered to be observed here I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the ideal state would be the output of add/input of conv1 being fake quantized with the same fq module as the input of add.
looks like this breaks some of the numeric_suite tests. we may need to keep the |
…g QAT" Summary: Changes behavior of Eager mode quantization to remove observation after `add_scalar/mul_scalar`. This is not used, and it removes one difference between Eager and FX modes. Test Plan: ``` python test/test_quantization.py TestQuantizeFxOps.test_quantized_add_qat python test/test_quantization.py TestQuantizeFxOps.test_quantized_mul_qat python test/test_quantization.py TestQuantizationAwareTraining.test_add_scalar_uses_input_qparams python test/test_quantization.py TestQuantizationAwareTraining.test_mul_scalar_uses_input_qparams ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D25486276](https://our.internmc.facebook.com/intern/diff/D25486276) [ghstack-poisoned]
thanks, the fix was to do the same change on |
…QAT" Summary: Changes behavior of Eager mode quantization to remove observation after `add_scalar/mul_scalar`. This is not used, and it removes one difference between Eager and FX modes. Test Plan: ``` python test/test_quantization.py TestQuantizeFxOps.test_quantized_add_qat python test/test_quantization.py TestQuantizeFxOps.test_quantized_mul_qat python test/test_quantization.py TestQuantizationAwareTraining.test_add_scalar_uses_input_qparams python test/test_quantization.py TestQuantizationAwareTraining.test_mul_scalar_uses_input_qparams ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D25486276](https://our.internmc.facebook.com/intern/diff/D25486276) [ghstack-poisoned]
This pull request has been merged in 36b2092. |
…h#49213) Summary: Pull Request resolved: pytorch#49213 Changes behavior of Eager mode quantization to remove observation after add_scalar/mul_scalar. This is not used, and it removes one difference between Eager and FX modes. Test Plan: ``` python test/test_quantization.py TestQuantizeFxOps.test_quantized_add_qat python test/test_quantization.py TestQuantizeFxOps.test_quantized_mul_qat python test/test_quantization.py TestQuantizationAwareTraining.test_add_scalar_uses_input_qparams python test/test_quantization.py TestQuantizationAwareTraining.test_mul_scalar_uses_input_qparams ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25486276 fbshipit-source-id: 34a5d6ce0d08739319ec0f8b197cfc1309d71040
Stack from ghstack:
Summary:
Changes behavior of Eager mode quantization to remove observation after
add_scalar/mul_scalar
.This is not used, and it removes one difference between Eager and FX modes.
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: D25486276