Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eager quant: remove fake_quant after add/mul nodes during QAT #49213

Closed
wants to merge 4 commits into from

Conversation

vkuzo
Copy link
Contributor

@vkuzo vkuzo commented Dec 11, 2020

Stack from ghstack:

Summary:

Changes behavior of Eager mode quantization to remove observation after add_scalar/mul_scalar.
This is not used, and it removes one difference between Eager and FX modes.

Test Plan:

python test/test_quantization.py TestQuantizeFxOps.test_quantized_add_qat
python test/test_quantization.py TestQuantizeFxOps.test_quantized_mul_qat
python test/test_quantization.py TestQuantizationAwareTraining.test_add_scalar_uses_input_qparams
python test/test_quantization.py TestQuantizationAwareTraining.test_mul_scalar_uses_input_qparams

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: D25486276

Summary:

Changes behavior of FX graph mode quantization to insert a fake_quant
after add and mul nodes.  This is useful to model numerics better,
since these nodes will be quantized during inference, and
adding/multiplying by a scalar can result in values in between
quantization bins during training.

Test Plan:

```
python test/test_quantization.py TestQuantizeFxOps.test_quantized_add_qat
python test/test_quantization.py TestQuantizeFxOps.test_quantized_mul_qat
```

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
vkuzo added a commit that referenced this pull request Dec 11, 2020
Summary:

Changes behavior of FX graph mode quantization to insert a fake_quant
after add and mul nodes.  This is useful to model numerics better,
since these nodes will be quantized during inference, and
adding/multiplying by a scalar can result in values in between
quantization bins during training.

Test Plan:

```
python test/test_quantization.py TestQuantizeFxOps.test_quantized_add_qat
python test/test_quantization.py TestQuantizeFxOps.test_quantized_mul_qat
```

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 2d662539da6de9d94c8707c73cfaff7eadb4f143
Pull Request resolved: #49213
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Dec 11, 2020

💊 CI failures summary and remediations

As of commit a620fe7 (more details on the Dr. CI page):


  • 1/1 failures possibly* introduced in this PR
    • 1/1 non-CircleCI failure(s)

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

This comment has been revised 22 times.

else:
# for QAT, we always insert a fake_quant after add/mul
new_observer = qconfig.activation()
insert_observer(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll need to use the same fake quantize module as the input here to properly simulate quantization since output of quantized::add with scalar input inherits quantization parameters from input

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow: The current logic inserts observers as long as one of the inputs is a tensor. why is this a problem for QAT?

@jerryzh168
Copy link
Contributor

jerryzh168 commented Dec 14, 2020

we can match eager now and improve later. currently eager mode is incorrect, because the fake quantize is ignored, and the model before and after convert for the qat model would have different numerics. output of add_scalar/mul_scalar need to use the same fake quantize module as input so that the numerics of fake quantize matches the quantized op.

Summary:

Changes behavior of FX graph mode quantization to insert a fake_quant
after add and mul nodes.  This is useful to model numerics better,
since these nodes will be quantized during inference, and
adding/multiplying by a scalar can result in values in between
quantization bins during training.

Test Plan:

```
python test/test_quantization.py TestQuantizeFxOps.test_quantized_add_qat
python test/test_quantization.py TestQuantizeFxOps.test_quantized_mul_qat
```

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D25486276](https://our.internmc.facebook.com/intern/diff/D25486276)

[ghstack-poisoned]
@vkuzo vkuzo changed the title fx quant: insert fake_quant after add/mul nodes during QAT eager quant: remove fake_quant after add/mul nodes during QAT Dec 15, 2020
self.conv2 = torch.nn.Conv2d(1, 1, 1)

def forward(self, x):
x = torch.add(x, 1.0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the inputs and outputs quantized? In this case, shouldnt we have 5 fake-quants? (add, conv1, add-relu, conv2, output)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output of add/input of conv1 is considered to be observed here I think

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the ideal state would be the output of add/input of conv1 being fake quantized with the same fq module as the input of add.

@jerryzh168
Copy link
Contributor

looks like this breaks some of the numeric_suite tests. we may need to keep the activation_post_process attribute (and just use Identity).

…g QAT"


Summary:

Changes behavior of Eager mode quantization to remove observation after `add_scalar/mul_scalar`.
This is not used, and it removes one difference between Eager and FX modes.

Test Plan:

```
python test/test_quantization.py TestQuantizeFxOps.test_quantized_add_qat
python test/test_quantization.py TestQuantizeFxOps.test_quantized_mul_qat
python test/test_quantization.py TestQuantizationAwareTraining.test_add_scalar_uses_input_qparams
python test/test_quantization.py TestQuantizationAwareTraining.test_mul_scalar_uses_input_qparams
```

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D25486276](https://our.internmc.facebook.com/intern/diff/D25486276)

[ghstack-poisoned]
@vkuzo
Copy link
Contributor Author

vkuzo commented Dec 15, 2020

looks like this breaks some of the numeric_suite tests. we may need to keep the activation_post_process attribute (and just use Identity).

thanks, the fix was to do the same change on QFunctional. add_scalar and mul_scalar will no longer be supported by Eager NS after this PR, which is intended as Eager NS only works on activations. Theoretically someone could hack it back in with a module swap, but that should probably be outside of this PR.

…QAT"


Summary:

Changes behavior of Eager mode quantization to remove observation after `add_scalar/mul_scalar`.
This is not used, and it removes one difference between Eager and FX modes.

Test Plan:

```
python test/test_quantization.py TestQuantizeFxOps.test_quantized_add_qat
python test/test_quantization.py TestQuantizeFxOps.test_quantized_mul_qat
python test/test_quantization.py TestQuantizationAwareTraining.test_add_scalar_uses_input_qparams
python test/test_quantization.py TestQuantizationAwareTraining.test_mul_scalar_uses_input_qparams
```

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D25486276](https://our.internmc.facebook.com/intern/diff/D25486276)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 36b2092.

@facebook-github-bot facebook-github-bot deleted the gh/vkuzo/185/head branch December 20, 2020 15:18
hwangdeyu pushed a commit to hwangdeyu/pytorch that referenced this pull request Jan 6, 2021
…h#49213)

Summary:
Pull Request resolved: pytorch#49213

Changes behavior of Eager mode quantization to remove observation after add_scalar/mul_scalar.
This is not used, and it removes one difference between Eager and FX modes.

Test Plan:
```
python test/test_quantization.py TestQuantizeFxOps.test_quantized_add_qat
python test/test_quantization.py TestQuantizeFxOps.test_quantized_mul_qat
python test/test_quantization.py TestQuantizationAwareTraining.test_add_scalar_uses_input_qparams
python test/test_quantization.py TestQuantizationAwareTraining.test_mul_scalar_uses_input_qparams
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D25486276

fbshipit-source-id: 34a5d6ce0d08739319ec0f8b197cfc1309d71040
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants