[Quant][PT2E] Enable qconv for quantization 2.0 export #104580

leslie-fang-intel · 2023-07-04T06:50:17Z

Stack from ghstack (oldest at bottom):

Summary
Enable qconv1d/2d/3d, qconv2d_relu, qconv2d_add, and qconv2d_add_relu operator for quantization 2.0 export with oneDNN library.

Test Plan

python -u -m pytest -s -v test_quantized_op.py -k test_qconv1d_pt2e
python -u -m pytest -s -v test_quantized_op.py -k test_qconv2d_pt2e
python -u -m pytest -s -v test_quantized_op.py -k test_qconv3d_pt2e
python -u -m pytest -s -v test_quantized_op.py -k test_qconv2d_relu_pt2e
python -u -m pytest -s -v test_quantized_op.py -k test_qconv2d_add_pt2e
python -u -m pytest -s -v test_quantized_op.py -k test_qconv2d_add_relu_pt2e

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

[ghstack-poisoned]

pytorch-bot · 2023-07-04T06:50:21Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/104580

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit e67e9bb with merge base 97a291f ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: e61ce96dea3554d80afba1785a11634c036482b7 Pull Request resolved: #104580

**Summary** Enable `qconv1d/2d/3d`, `qconv2d_relu`, `qconv2d_add`, and `qconv2d_add_relu` operator for quantization 2.0 export with oneDNN library. **Test Plan** ``` python -u -m pytest -s -v test_quantized_op.py -k test_qconv1d_pt2e python -u -m pytest -s -v test_quantized_op.py -k test_qconv2d_pt2e python -u -m pytest -s -v test_quantized_op.py -k test_qconv3d_pt2e python -u -m pytest -s -v test_quantized_op.py -k test_qconv2d_relu_pt2e python -u -m pytest -s -v test_quantized_op.py -k test_qconv2d_add_pt2e python -u -m pytest -s -v test_quantized_op.py -k test_qconv2d_add_relu_pt2e ``` cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

ghstack-source-id: 3b60dda2f7fdcf0fbda54936efbb615dab5e212f Pull Request resolved: pytorch#104580

aten/src/ATen/native/quantized/library.cpp

aten/src/ATen/native/quantized/cpu/qconv_prepack.cpp

aten/src/ATen/native/quantized/cpu/qconv.cpp

kimishpatel · 2023-07-05T14:19:51Z

Few questions.

If this is specific to onednn transforms, does it have to be an op in quantized aten lib? I can imagine this to be just onednn specific quantized op libs in which it can live?
dont see any tests.
Is this mostly copy paste of other implementation? In that case probably better to refactor.
I imagine you would have some pass to convert PT2E quantized model to replace ops with quantized ops?

In general though, I think we should aim to make this a custom op rather than op in quantized aten lib. cc: @jerryzh168

ghstack-source-id: 3b60dda2f7fdcf0fbda54936efbb615dab5e212f Pull Request resolved: pytorch#104580

**Summary** Enable `qconv1d/2d/3d`, `qconv2d_relu`, `qconv2d_add`, and `qconv2d_add_relu` operator for quantization 2.0 export with oneDNN library. **Test Plan** ``` python -u -m pytest -s -v test_quantized_op.py -k test_qconv1d_pt2e python -u -m pytest -s -v test_quantized_op.py -k test_qconv2d_pt2e python -u -m pytest -s -v test_quantized_op.py -k test_qconv3d_pt2e python -u -m pytest -s -v test_quantized_op.py -k test_qconv2d_relu_pt2e python -u -m pytest -s -v test_quantized_op.py -k test_qconv2d_add_pt2e python -u -m pytest -s -v test_quantized_op.py -k test_qconv2d_add_relu_pt2e ``` cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

leslie-fang-intel · 2023-08-25T17:32:04Z

@pytorchbot merge

pytorchmergebot · 2023-08-25T17:34:40Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…r constant folding (#104581) **Summary** Enable quantization conv weight prepack inside inductor constant folding. **Test Plan** ``` python -m pytest test_mkldnn_pattern_matcher.py -k test_qconv2d_unary ``` Pull Request resolved: #104581 Approved by: https://github.com/jgong5, https://github.com/eellison ghstack dependencies: #104580

…ctor (#104588) **Summary** Enable the `dequant-quantization-quant` pattern fusion and lowering inside inductor. **Test Plan** ``` python -m pytest test_mkldnn_pattern_matcher.py -k test_qconv2d_unary ``` Pull Request resolved: #104588 Approved by: https://github.com/jgong5, https://github.com/eellison ghstack dependencies: #104580, #104581

**Summary** Enable the `dequant pattern` promotion pass in inductor. Since in the qconv weight prepack pass, we will match the `dequant->conv2d` pattern. If the `dequant pattern` has multi user nodes, it will fail to be matched. Taking the example of ``` conv1 / \ conv2 conv3 ``` After quantization flow, it will generate pattern as ``` dequant1 | conv1 | quant2 | dequant2 / \ conv2 conv3 ``` We need to duplicate `dequant2` into `dequant2` and `dequant3`, in order to make `dequant2->conv2` and `dequant3->conv3` pattern matched. **Test Plan** ``` python -m pytest test_mkldnn_pattern_matcher.py -k test_dequant_promotion ``` Pull Request resolved: #104590 Approved by: https://github.com/jgong5, https://github.com/eellison ghstack dependencies: #104580, #104581, #104588

… inside inductor (#105455) **Summary** Enable the `dequant-conv2d-unary_postop(relu)-quant` pattern fusion and lowering inside inductor. **Test Plan** ``` clear && python -m pytest test_mkldnn_pattern_matcher.py -k test_qconv2d_unary ``` Pull Request resolved: #105455 Approved by: https://github.com/jgong5, https://github.com/eellison ghstack dependencies: #104580, #104581, #104588, #104590

…rn fusion inside inductor (#105456) **Summary** Enable the `dequant-conv2d-binary_postop(add)-unary_postop(relu)-quant` pattern fusion and lowering inside inductor. **Test Plan** ``` clear && python -m pytest test_mkldnn_pattern_matcher.py -k test_qconv2d_binary ``` Pull Request resolved: #105456 Approved by: https://github.com/jgong5, https://github.com/eellison ghstack dependencies: #104580, #104581, #104588, #104590, #105455

…ol2d) (#105639) **Summary** In this PR, we mainly enable 2 things. - Enable the skeleton of quantization recipe for single quantizable operators in `X86InductorQuantizer`. - Add quantization recipe of `maxpool2d` and annotate it as input./output share observer. **Test Plan** ``` python -m pytest test_x86inductor_quantizer.py -k test_maxpool2d_recipe ``` Pull Request resolved: #105639 Approved by: https://github.com/jgong5, https://github.com/jerryzh168 ghstack dependencies: #104580, #104581, #104588, #104590, #105455, #105456

**Summary** Enable the `dq-maxpool2d-q` pattern match and lower into `torch.ops.quantized.max_pool2d`. **Test Plan** ``` python -m pytest test_mkldnn_pattern_matcher.py -k test_qmaxpool2d python -m pytest test_quantized_op.py -k test_max_pool2d_pt2e ``` Pull Request resolved: #105906 Approved by: https://github.com/jgong5, https://github.com/eellison ghstack dependencies: #104580, #104581, #104588, #104590, #105455, #105456, #105639

**Summary** After oneDNN 3.1 upgrade, we don't need to do the weight scale reciprocal calculation. So, remove the redundant reciprocal calculation to optimize QConv performance and using IDeep version API to implement it in this PR: - This QConv implementation expects to work functionally both with current IDeep version and the following IDeep upgrade in PR: #107565. - With the following IDeep upgrade in PR: #107565, the QConv has better performance since the redundant reciprocal calculation are removed. Pull Request resolved: #105996 Approved by: https://github.com/jgong5, https://github.com/jerryzh168 ghstack dependencies: #104580, #104581, #104588, #104590, #105455, #105456, #105639, #105906

…ht scale reciprocal calculation (#107565) **Summary** Upgrade IDeep which includes 1 IDeep change as IDeep PR: intel/ideep#226 - For IDeep PR: intel/ideep#226 which has done 2 things: - Remove the redundant QConv weight scale reciprocal calculation. - Pump IDEEP_VERSION_REVISION version from 0 to 1. So only QConv related calculation will be impacted and we already use IDeep version API in #105996 to make the corresponding change in PyTorch. Pull Request resolved: #107565 Approved by: https://github.com/jgong5, https://github.com/jerryzh168 ghstack dependencies: #104580, #104581, #104588, #104590, #105455, #105456, #105639, #105906, #105996

**Summary** Enable `qconv1d/2d/3d`, `qconv2d_relu`, `qconv2d_add`, and `qconv2d_add_relu` operator for quantization 2.0 export with oneDNN library. **Test Plan** ``` python -u -m pytest -s -v test_quantized_op.py -k test_qconv1d_pt2e python -u -m pytest -s -v test_quantized_op.py -k test_qconv2d_pt2e python -u -m pytest -s -v test_quantized_op.py -k test_qconv3d_pt2e python -u -m pytest -s -v test_quantized_op.py -k test_qconv2d_relu_pt2e python -u -m pytest -s -v test_quantized_op.py -k test_qconv2d_add_pt2e python -u -m pytest -s -v test_quantized_op.py -k test_qconv2d_add_relu_pt2e ``` Pull Request resolved: #104580 Approved by: https://github.com/jgong5, https://github.com/jerryzh168