Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Quant] Add fused LinearLeakyReLU module for onednn backend #88661

Closed
wants to merge 16 commits into from

Conversation

Xia-Weiwen
Copy link
Collaborator

@Xia-Weiwen Xia-Weiwen commented Nov 8, 2022

Stack from ghstack (oldest at bottom):

Summary
Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused QLinearLeakyReLU module for onednn backend, which will be used for int8 inference with onednn backend. Cannot call this module with other quantization backends otherwise an error is thrown.

Test plan
python test_quantization.py TestStaticQuantizedModule

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

@pytorch-bot
Copy link

pytorch-bot bot commented Nov 8, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/88661

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 08a2e13:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Xia-Weiwen added a commit that referenced this pull request Nov 8, 2022
ghstack-source-id: ae263fe5175dfac481e98e40ec31dc3b54585530
Pull Request resolved: #88661
@Xia-Weiwen Xia-Weiwen marked this pull request as draft November 8, 2022 09:02
@Xia-Weiwen Xia-Weiwen added the intel This tag is for PR from Intel label Nov 8, 2022
cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
Xia-Weiwen added a commit that referenced this pull request Nov 8, 2022
ghstack-source-id: 2f46d71f686e94140c1a9242bbeb49efde77bf1a
Pull Request resolved: #88661
@jerryzh168
Copy link
Contributor

@Xia-Weiwen
Copy link
Collaborator Author

should we add a test in https://github.com/pytorch/pytorch/blob/master/test/quantization/eager/test_quantize_eager_ptq.py#L76

Thanks for the suggestion. But I did not see tests for fused module here.
Shall we add it here?

class TestStaticQuantizedModule(QuantizationTestCase):

If so, do you suggest adding a new test implementation for LinearLeakyReLU or modify _test_linear_api_impl below so that it supports more fusions?
def _test_linear_api_impl(self, batch_size, in_features, out_features, use_bias, use_fused, per_channel):

Copy link
Collaborator

@jgong5 jgong5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM on the changes. Consider to add a test as Jerry suggested.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
@Xia-Weiwen
Copy link
Collaborator Author

Hi @jerryzh168 I have added a test here: test/quantization/core/test_quantized_module.py. Please take a look.

@Xia-Weiwen Xia-Weiwen marked this pull request as ready for review November 11, 2022 05:30
cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
Comment on lines 89 to 98
class_map = {
True: nniq.LinearReLU,
False: nnq.Linear,
'none': nnq.Linear,
'relu': nniq.LinearReLU,
'leaky_relu': nniq.LinearLeakyReLU,
}
q_module_name_map = {
'none': 'QuantizedLinear',
'relu': 'QuantizedLinearReLU',
'leaky_relu': 'QuantizedLinearLeakyReLU',
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should probably pass these from argument, it looks a bit weird that we hardcode these in this helper function

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also for naming, I think we can rename test_linear_api to test_linear as well, to make it clearer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and we can split test_linear to test_linear and test_linear_relu to make sure each test only tests one thing

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I will do that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made these as arguments of the helper function and split test_qlinear to two tests for qlinear and qlinear_relu respectively.

@jerryzh168
Copy link
Contributor

please add the summary and test plan as suggested in previous PR as well. Also are we planning to use this in eager mode as well or just fx graph mode? and how urgent is this change? can you wait for PyTorch 2.0 where we integration quantization in PyTorch 2.0 export path and these changes may not be needed?

def from_reference(cls, ref_mod, output_scale, output_zero_point):
linear = ref_mod[0]
leaky_relu = ref_mod[1]
qlinear = cls(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: qlinear -> qlinear_leakyrelu to make it clearer

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fixed

@Xia-Weiwen
Copy link
Collaborator Author

please add the summary and test plan as suggested in previous PR as well.

Ok. I will add them later.

Also are we planning to use this in eager mode as well or just fx graph mode?

It's just for FX mode

and how urgent is this change? can you wait for PyTorch 2.0 where we integration quantization in PyTorch 2.0 export path and these changes may not be needed?

Not that urgent but it's nice to land ASAP. I wander how quantization path will be like in PyTorch 2.0. Do you mean that there will be a different mechanism for fusion, so these PRs are not needed? If so, we would like to align with 2.0 to support more fusion patterns. How to do that? Thanks.

@jerryzh168
Copy link
Contributor

yeah, looks good, please feel free to land

**Summary**
Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused `QLinearLeakyReLU` module for onednn backend, which will be used for int8 inference with onednn backend. Cannot call this module with other quantization backends otherwise an error is thrown.

**Test plan**
python test_quantization.py TestStaticQuantizedModule

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
@Xia-Weiwen
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: Approval needed from one of the following (Rule 'superuser'):
teng-li, anj-s, vtsyvina, hwangjeff, clee2000, ...

Details for Dev Infra team Raised by workflow job

@Xia-Weiwen
Copy link
Collaborator Author

Xia-Weiwen commented Dec 14, 2022

yeah, looks good, please feel free to land

Hi @jerryzh168 You did not approve this. Could you approve it? Thanks!

Copy link
Contributor

@jerryzh168 jerryzh168 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, missed this before, please revert changes to torch/nn/intrinsic/ folder, we want to deprecate this namespace and we should not add new things to it

**Summary**
Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused `QLinearLeakyReLU` module for onednn backend, which will be used for int8 inference with onednn backend. Cannot call this module with other quantization backends otherwise an error is thrown.

**Test plan**
python test_quantization.py TestStaticQuantizedModule

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
@Xia-Weiwen
Copy link
Collaborator Author

sorry, missed this before, please revert changes to torch/nn/intrinsic/ folder, we want to deprecate this namespace and we should not add new things to it

I have removed them.

**Summary**
Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused `QLinearLeakyReLU` module for onednn backend, which will be used for int8 inference with onednn backend. Cannot call this module with other quantization backends otherwise an error is thrown.

**Test plan**
python test_quantization.py TestStaticQuantizedModule

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
@Xia-Weiwen
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@Xia-Weiwen
Copy link
Collaborator Author

Xia-Weiwen commented Dec 16, 2022

@pytorchbot revert -m="This is breaking tests. Need to rebase." -c=nosignal

@pytorch-bot
Copy link

pytorch-bot bot commented Dec 16, 2022

❌ 🤖 pytorchbot command failed:

@pytorchbot revert: error: the following arguments are required: -c/--classification

usage: @pytorchbot revert -m MESSAGE -c
                          {nosignal,ignoredsignal,landrace,weird,ghfirst}

Try @pytorchbot --help for more info.

@Xia-Weiwen
Copy link
Collaborator Author

@pytorchbot revert -m="This is breaking tests. Need to rebase." -c=nosignal

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

@pytorchmergebot
Copy link
Collaborator

@Xia-Weiwen your PR has been successfully reverted.

pytorchmergebot added a commit that referenced this pull request Dec 16, 2022
…88661)"

This reverts commit 353c2e7.

Reverted #88661 on behalf of https://github.com/Xia-Weiwen due to This is breaking tests. Need to rebase.
@Xia-Weiwen Xia-Weiwen reopened this Dec 16, 2022
**Summary**
Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused `QLinearLeakyReLU` module for onednn backend, which will be used for int8 inference with onednn backend. Cannot call this module with other quantization backends otherwise an error is thrown.

**Test plan**
python test_quantization.py TestStaticQuantizedModule

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
@Xia-Weiwen
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@facebook-github-bot facebook-github-bot deleted the gh/Xia-Weiwen/2/head branch June 8, 2023 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request intel This tag is for PR from Intel Merged open source release notes: AO frontend release notes: quantization release notes category Reverted
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants