-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Quant] Add fused LinearLeakyReLU module for onednn backend #88661
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/88661
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 08a2e13: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
ghstack-source-id: ae263fe5175dfac481e98e40ec31dc3b54585530 Pull Request resolved: #88661
cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
ghstack-source-id: 2f46d71f686e94140c1a9242bbeb49efde77bf1a Pull Request resolved: #88661
Thanks for the suggestion. But I did not see tests for fused module here.
If so, do you suggest adding a new test implementation for LinearLeakyReLU or modify _test_linear_api_impl below so that it supports more fusions?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM on the changes. Consider to add a test as Jerry suggested.
cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
Hi @jerryzh168 I have added a test here: test/quantization/core/test_quantized_module.py. Please take a look. |
cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
class_map = { | ||
True: nniq.LinearReLU, | ||
False: nnq.Linear, | ||
'none': nnq.Linear, | ||
'relu': nniq.LinearReLU, | ||
'leaky_relu': nniq.LinearLeakyReLU, | ||
} | ||
q_module_name_map = { | ||
'none': 'QuantizedLinear', | ||
'relu': 'QuantizedLinearReLU', | ||
'leaky_relu': 'QuantizedLinearLeakyReLU', | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should probably pass these from argument, it looks a bit weird that we hardcode these in this helper function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also for naming, I think we can rename test_linear_api
to test_linear
as well, to make it clearer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and we can split test_linear
to test_linear
and test_linear_relu
to make sure each test only tests one thing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I will do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made these as arguments of the helper function and split test_qlinear
to two tests for qlinear
and qlinear_relu
respectively.
please add the summary and test plan as suggested in previous PR as well. Also are we planning to use this in eager mode as well or just fx graph mode? and how urgent is this change? can you wait for PyTorch 2.0 where we integration quantization in PyTorch 2.0 export path and these changes may not be needed? |
def from_reference(cls, ref_mod, output_scale, output_zero_point): | ||
linear = ref_mod[0] | ||
leaky_relu = ref_mod[1] | ||
qlinear = cls( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: qlinear -> qlinear_leakyrelu to make it clearer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fixed
Ok. I will add them later.
It's just for FX mode
Not that urgent but it's nice to land ASAP. I wander how quantization path will be like in PyTorch 2.0. Do you mean that there will be a different mechanism for fusion, so these PRs are not needed? If so, we would like to align with 2.0 to support more fusion patterns. How to do that? Thanks. |
yeah, looks good, please feel free to land |
**Summary** Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused `QLinearLeakyReLU` module for onednn backend, which will be used for int8 inference with onednn backend. Cannot call this module with other quantization backends otherwise an error is thrown. **Test plan** python test_quantization.py TestStaticQuantizedModule cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
@pytorchbot merge |
Merge failedReason: Approval needed from one of the following (Rule 'superuser'): Details for Dev Infra teamRaised by workflow job |
Hi @jerryzh168 You did not approve this. Could you approve it? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, missed this before, please revert changes to torch/nn/intrinsic/
folder, we want to deprecate this namespace and we should not add new things to it
**Summary** Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused `QLinearLeakyReLU` module for onednn backend, which will be used for int8 inference with onednn backend. Cannot call this module with other quantization backends otherwise an error is thrown. **Test plan** python test_quantization.py TestStaticQuantizedModule cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
I have removed them. |
**Summary** Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused `QLinearLeakyReLU` module for onednn backend, which will be used for int8 inference with onednn backend. Cannot call this module with other quantization backends otherwise an error is thrown. **Test plan** python test_quantization.py TestStaticQuantizedModule cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
@pytorchbot revert -m="This is breaking tests. Need to rebase." -c=nosignal |
❌ 🤖 pytorchbot command failed:
Try |
@pytorchbot revert -m="This is breaking tests. Need to rebase." -c=nosignal |
@pytorchbot successfully started a revert job. Check the current status here. |
@Xia-Weiwen your PR has been successfully reverted. |
…88661)" This reverts commit 353c2e7. Reverted #88661 on behalf of https://github.com/Xia-Weiwen due to This is breaking tests. Need to rebase.
**Summary** Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused `QLinearLeakyReLU` module for onednn backend, which will be used for int8 inference with onednn backend. Cannot call this module with other quantization backends otherwise an error is thrown. **Test plan** python test_quantization.py TestStaticQuantizedModule cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Stack from ghstack (oldest at bottom):
Summary
Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused
QLinearLeakyReLU
module for onednn backend, which will be used for int8 inference with onednn backend. Cannot call this module with other quantization backends otherwise an error is thrown.Test plan
python test_quantization.py TestStaticQuantizedModule
cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10