Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to enable per channel requant. #37620

Closed
wants to merge 22 commits into from

Conversation

kimishpatel
Copy link
Contributor

@kimishpatel kimishpatel commented Apr 30, 2020

Stack from ghstack:

Summary:
Now channel wise quantization is supported for linear/conv. Depthwise convs are still pending.
Approach is same as with zero points. Pointer to requantization array is passed and it is looked up using output_channel_index.
All the kernels are appropriately modified except for the depthwise ones.
Tests are altered to generate per channel zero points and requant scales.
Unit tests are replicated for conv tests to exercise per_channel conv. This was not strictly needed since conv kernels were changed such that they did per channel anyway. When per channels is not needed zero point and requant scale were same across channels.
However for depthwise convolutions we will be using different set of kernels to do per channel, which required us to introduce per_channel member to conv_param structure, to know which kernels to use for depthwise conv.
Ensuing modifications were to keep everything in sync for both regular conv and depthwise so that we dont have caveat when reading the code, that why does depthwise have separate test for
per channel and non-depthwise conv does not.

Test Plan:
Via tests inside qnnpack, i.e., q8gemm-test, q8conv/dwconv test.
fully-conntected-test, convolution-test.

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: D21339041

Summary:
Now channel wise quantization is supported for linear/conv.
Depthwise conv are still pending.
Tests are altered to generate per channel zero points and requant
scales.
All the kernels are fixed appropritately.
Added per_channel member to conv_param structure.
And replicated conv tests to exercise per_channel conv.
This was not strictly needed since conv kernels were changed
such that they did per channel anyway. When per channels is not needed
zp and scale were same across channels. This was to minimize code
duplicaiton as the perf impact is estimated (to be measured though) to
be low.
However this is not likely the case for depthwise convs. Thus they will
have separate kernels, which required us to introduce per_channel member
to conv_param structure, to know which kernels to apply for depthwise.
Ensuing modifications were to keep everything in
sync for both regular conv and depthwise so that we dont have caveat
when reading the code, that why does depthwise have separate test for
per channel and non-depthwise conv does not.

Test Plan:
Via tests inside qnnpack, i.e., q8gemm-test, q8conv/dwconv test.
fully-conntected-test, convolution-test.

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
@dr-ci
Copy link

dr-ci bot commented Apr 30, 2020

💊 CI failures summary and remediations

As of commit dcc721f (more details on the Dr. CI page):


  • 2/2 failures possibly* introduced in this PR
    • 1/2 non-CircleCI failure(s)

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_windows_vs2019_py36_cuda10.1_build (1/1)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

FAILED: caffe2/python/operator_test/elementwise_linear_op_test.py
[1142/3497] cmd.exe /C "cd /D C:\Users\circleci\project\build\caffe2 && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E copy C:/Users/circleci/project/caffe2/python/operator_test/counter_ops_test.py C:/Users/circleci/project/build/caffe2/python/operator_test/counter_ops_test.py" 
[1143/3497] cmd.exe /C "cd /D C:\Users\circleci\project\build\caffe2 && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E copy C:/Users/circleci/project/caffe2/python/operator_test/ctc_beam_search_decoder_op_test.py C:/Users/circleci/project/build/caffe2/python/operator_test/ctc_beam_search_decoder_op_test.py" 
[1144/3497] cmd.exe /C "cd /D C:\Users\circleci\project\build\caffe2 && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E copy C:/Users/circleci/project/caffe2/python/operator_test/cudnn_recurrent_test.py C:/Users/circleci/project/build/caffe2/python/operator_test/cudnn_recurrent_test.py" 
[1145/3497] cmd.exe /C "cd /D C:\Users\circleci\project\build\caffe2 && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E copy C:/Users/circleci/project/caffe2/python/operator_test/data_couple_op_test.py C:/Users/circleci/project/build/caffe2/python/operator_test/data_couple_op_test.py" 
[1146/3497] cmd.exe /C "cd /D C:\Users\circleci\project\build\caffe2 && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E copy C:/Users/circleci/project/caffe2/python/operator_test/ctc_greedy_decoder_op_test.py C:/Users/circleci/project/build/caffe2/python/operator_test/ctc_greedy_decoder_op_test.py" 
[1147/3497] cmd.exe /C "cd /D C:\Users\circleci\project\build\caffe2 && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E copy C:/Users/circleci/project/caffe2/python/operator_test/depthwise_3x3_conv_test.py C:/Users/circleci/project/build/caffe2/python/operator_test/depthwise_3x3_conv_test.py" 
[1148/3497] cmd.exe /C "cd /D C:\Users\circleci\project\build\caffe2 && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E copy C:/Users/circleci/project/caffe2/python/operator_test/dataset_ops_test.py C:/Users/circleci/project/build/caffe2/python/operator_test/dataset_ops_test.py" 
[1149/3497] cmd.exe /C "cd /D C:\Users\circleci\project\build\caffe2 && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E copy C:/Users/circleci/project/caffe2/python/operator_test/deform_conv_test.py C:/Users/circleci/project/build/caffe2/python/operator_test/deform_conv_test.py" 
[1150/3497] cmd.exe /C "cd /D C:\Users\circleci\project\build\caffe2 && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E copy C:/Users/circleci/project/caffe2/python/operator_test/dense_vector_to_id_list_op_test.py C:/Users/circleci/project/build/caffe2/python/operator_test/dense_vector_to_id_list_op_test.py" 
[1151/3497] cmd.exe /C "cd /D C:\Users\circleci\project\build\caffe2 && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E copy C:/Users/circleci/project/caffe2/python/operator_test/elementwise_linear_op_test.py C:/Users/circleci/project/build/caffe2/python/operator_test/elementwise_linear_op_test.py" 
FAILED: caffe2/python/operator_test/elementwise_linear_op_test.py  
cmd.exe /C "cd /D C:\Users\circleci\project\build\caffe2 && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E copy C:/Users/circleci/project/caffe2/python/operator_test/elementwise_linear_op_test.py C:/Users/circleci/project/build/caffe2/python/operator_test/elementwise_linear_op_test.py" 
[1152/3497] cmd.exe /C "cd /D C:\Users\circleci\project\build\caffe2 && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E copy C:/Users/circleci/project/caffe2/python/operator_test/detectron_keypoints.py C:/Users/circleci/project/build/caffe2/python/operator_test/detectron_keypoints.py" 
[1153/3497] cmd.exe /C "cd /D C:\Users\circleci\project\build\caffe2 && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E copy C:/Users/circleci/project/caffe2/python/operator_test/distance_op_test.py C:/Users/circleci/project/build/caffe2/python/operator_test/distance_op_test.py" 
[1154/3497] cmd.exe /C "cd /D C:\Users\circleci\project\build\caffe2 && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E copy C:/Users/circleci/project/caffe2/python/operator_test/dropout_op_test.py C:/Users/circleci/project/build/caffe2/python/operator_test/dropout_op_test.py" 
[1155/3497] cmd.exe /C "cd /D C:\Users\circleci\project\build\caffe2 && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E copy C:/Users/circleci/project/caffe2/python/visualize.py C:/Users/circleci/project/build/caffe2/python/visualize.py" 
[1156/3497] cmd.exe /C "cd /D C:\Users\circleci\project\build\caffe2 && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E copy C:/Users/circleci/project/caffe2/python/operator_test/duplicate_operands_test.py C:/Users/circleci/project/build/caffe2/python/operator_test/duplicate_operands_test.py" 
[1157/3497] cmd.exe /C "cd /D C:\Users\circleci\project\build\caffe2 && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E copy C:/Users/circleci/project/caffe2/python/operator_test/elementwise_ops_test.py C:/Users/circleci/project/build/caffe2/python/operator_test/elementwise_ops_test.py" 
[1158/3497] cmd.exe /C "cd /D C:\Users\circleci\project\build\caffe2 && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E copy C:/Users/circleci/project/caffe2/python/operator_test/elementwise_logical_ops_test.py C:/Users/circleci/project/build/caffe2/python/operator_test/elementwise_logical_ops_test.py" 
[1159/3497] cmd.exe /C "cd /D C:\Users\circleci\project\build\caffe2 && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E copy C:/Users/circleci/project/caffe2/python/operator_test/elementwise_op_broadcast_test.py C:/Users/circleci/project/build/caffe2/python/operator_test/elementwise_op_broadcast_test.py" 
[1160/3497] cmd.exe /C "cd /D C:\Users\circleci\project\build\caffe2 && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E copy C:/Users/circleci/project/caffe2/python/operator_test/ensure_cpu_output_op_test.py C:/Users/circleci/project/build/caffe2/python/operator_test/ensure_cpu_output_op_test.py" 

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 91 times.

Summary:
Now channel wise quantization is supported for linear/conv.
Depthwise conv are still pending.
Tests are altered to generate per channel zero points and requant
scales.
All the kernels are fixed appropritately.
Added per_channel member to conv_param structure.
And replicated conv tests to exercise per_channel conv.
This was not strictly needed since conv kernels were changed
such that they did per channel anyway. When per channels is not needed
zp and scale were same across channels. This was to minimize code
duplicaiton as the perf impact is estimated (to be measured though) to
be low.
However this is not likely the case for depthwise convs. Thus they will
have separate kernels, which required us to introduce per_channel member
to conv_param structure, to know which kernels to apply for depthwise.
Ensuing modifications were to keep everything in
sync for both regular conv and depthwise so that we dont have caveat
when reading the code, that why does depthwise have separate test for
per channel and non-depthwise conv does not.

Test Plan:
Via tests inside qnnpack, i.e., q8gemm-test, q8conv/dwconv test.
fully-conntected-test, convolution-test.

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D21339041](https://our.internmc.facebook.com/intern/diff/D21339041)

[ghstack-poisoned]
Summary:
Now channel wise quantization is supported for linear/conv.
Depthwise conv are still pending.
Tests are altered to generate per channel zero points and requant
scales.
All the kernels are fixed appropritately.
Added per_channel member to conv_param structure.
And replicated conv tests to exercise per_channel conv.
This was not strictly needed since conv kernels were changed
such that they did per channel anyway. When per channels is not needed
zp and scale were same across channels. This was to minimize code
duplicaiton as the perf impact is estimated (to be measured though) to
be low.
However this is not likely the case for depthwise convs. Thus they will
have separate kernels, which required us to introduce per_channel member
to conv_param structure, to know which kernels to apply for depthwise.
Ensuing modifications were to keep everything in
sync for both regular conv and depthwise so that we dont have caveat
when reading the code, that why does depthwise have separate test for
per channel and non-depthwise conv does not.

Test Plan:
Via tests inside qnnpack, i.e., q8gemm-test, q8conv/dwconv test.
fully-conntected-test, convolution-test.

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D21339041](https://our.internmc.facebook.com/intern/diff/D21339041)

[ghstack-poisoned]
Summary:
Now channel wise quantization is supported for linear/conv.
Depthwise conv are still pending.
Tests are altered to generate per channel zero points and requant
scales.
All the kernels are fixed appropritately.
Added per_channel member to conv_param structure.
And replicated conv tests to exercise per_channel conv.
This was not strictly needed since conv kernels were changed
such that they did per channel anyway. When per channels is not needed
zp and scale were same across channels. This was to minimize code
duplicaiton as the perf impact is estimated (to be measured though) to
be low.
However this is not likely the case for depthwise convs. Thus they will
have separate kernels, which required us to introduce per_channel member
to conv_param structure, to know which kernels to apply for depthwise.
Ensuing modifications were to keep everything in
sync for both regular conv and depthwise so that we dont have caveat
when reading the code, that why does depthwise have separate test for
per channel and non-depthwise conv does not.

Test Plan:
Via tests inside qnnpack, i.e., q8gemm-test, q8conv/dwconv test.
fully-conntected-test, convolution-test.

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D21339041](https://our.internmc.facebook.com/intern/diff/D21339041)

[ghstack-poisoned]
Summary:
Now channel wise quantization is supported for linear/conv.
Depthwise conv are still pending.
Tests are altered to generate per channel zero points and requant
scales.
All the kernels are fixed appropritately.
Added per_channel member to conv_param structure.
And replicated conv tests to exercise per_channel conv.
This was not strictly needed since conv kernels were changed
such that they did per channel anyway. When per channels is not needed
zp and scale were same across channels. This was to minimize code
duplicaiton as the perf impact is estimated (to be measured though) to
be low.
However this is not likely the case for depthwise convs. Thus they will
have separate kernels, which required us to introduce per_channel member
to conv_param structure, to know which kernels to apply for depthwise.
Ensuing modifications were to keep everything in
sync for both regular conv and depthwise so that we dont have caveat
when reading the code, that why does depthwise have separate test for
per channel and non-depthwise conv does not.

Test Plan:
Via tests inside qnnpack, i.e., q8gemm-test, q8conv/dwconv test.
fully-conntected-test, convolution-test.

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D21339041](https://our.internmc.facebook.com/intern/diff/D21339041)

[ghstack-poisoned]
@kimishpatel kimishpatel mentioned this pull request May 5, 2020
Summary:
Now channel wise quantization is supported for linear/conv.
Depthwise conv are still pending.
Tests are altered to generate per channel zero points and requant
scales.
All the kernels are fixed appropritately.
Added per_channel member to conv_param structure.
And replicated conv tests to exercise per_channel conv.
This was not strictly needed since conv kernels were changed
such that they did per channel anyway. When per channels is not needed
zp and scale were same across channels. This was to minimize code
duplicaiton as the perf impact is estimated (to be measured though) to
be low.
However this is not likely the case for depthwise convs. Thus they will
have separate kernels, which required us to introduce per_channel member
to conv_param structure, to know which kernels to apply for depthwise.
Ensuing modifications were to keep everything in
sync for both regular conv and depthwise so that we dont have caveat
when reading the code, that why does depthwise have separate test for
per channel and non-depthwise conv does not.

Test Plan:
Via tests inside qnnpack, i.e., q8gemm-test, q8conv/dwconv test.
fully-conntected-test, convolution-test.

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D21339041](https://our.internmc.facebook.com/intern/diff/D21339041)

[ghstack-poisoned]
Summary:
Now channel wise quantization is supported for linear/conv.
Depthwise conv are still pending.
Tests are altered to generate per channel zero points and requant
scales.
All the kernels are fixed appropritately.
Added per_channel member to conv_param structure.
And replicated conv tests to exercise per_channel conv.
This was not strictly needed since conv kernels were changed
such that they did per channel anyway. When per channels is not needed
zp and scale were same across channels. This was to minimize code
duplicaiton as the perf impact is estimated (to be measured though) to
be low.
However this is not likely the case for depthwise convs. Thus they will
have separate kernels, which required us to introduce per_channel member
to conv_param structure, to know which kernels to apply for depthwise.
Ensuing modifications were to keep everything in
sync for both regular conv and depthwise so that we dont have caveat
when reading the code, that why does depthwise have separate test for
per channel and non-depthwise conv does not.

Test Plan:
Via tests inside qnnpack, i.e., q8gemm-test, q8conv/dwconv test.
fully-conntected-test, convolution-test.

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D21339041](https://our.internmc.facebook.com/intern/diff/D21339041)

[ghstack-poisoned]
Summary:
Now channel wise quantization is supported for linear/conv.
Depthwise conv are still pending.
Tests are altered to generate per channel zero points and requant
scales.
All the kernels are fixed appropritately.
Added per_channel member to conv_param structure.
And replicated conv tests to exercise per_channel conv.
This was not strictly needed since conv kernels were changed
such that they did per channel anyway. When per channels is not needed
zp and scale were same across channels. This was to minimize code
duplicaiton as the perf impact is estimated (to be measured though) to
be low.
However this is not likely the case for depthwise convs. Thus they will
have separate kernels, which required us to introduce per_channel member
to conv_param structure, to know which kernels to apply for depthwise.
Ensuing modifications were to keep everything in
sync for both regular conv and depthwise so that we dont have caveat
when reading the code, that why does depthwise have separate test for
per channel and non-depthwise conv does not.

Test Plan:
Via tests inside qnnpack, i.e., q8gemm-test, q8conv/dwconv test.
fully-conntected-test, convolution-test.

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D21339041](https://our.internmc.facebook.com/intern/diff/D21339041)

[ghstack-poisoned]
Summary:
Now channel wise quantization is supported for linear/conv.
Depthwise conv are still pending.
Tests are altered to generate per channel zero points and requant
scales.
All the kernels are fixed appropritately.
Added per_channel member to conv_param structure.
And replicated conv tests to exercise per_channel conv.
This was not strictly needed since conv kernels were changed
such that they did per channel anyway. When per channels is not needed
zp and scale were same across channels. This was to minimize code
duplicaiton as the perf impact is estimated (to be measured though) to
be low.
However this is not likely the case for depthwise convs. Thus they will
have separate kernels, which required us to introduce per_channel member
to conv_param structure, to know which kernels to apply for depthwise.
Ensuing modifications were to keep everything in
sync for both regular conv and depthwise so that we dont have caveat
when reading the code, that why does depthwise have separate test for
per channel and non-depthwise conv does not.

Test Plan:
Via tests inside qnnpack, i.e., q8gemm-test, q8conv/dwconv test.
fully-conntected-test, convolution-test.

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D21339041](https://our.internmc.facebook.com/intern/diff/D21339041)

[ghstack-poisoned]
Summary:
Now channel wise quantization is supported for linear/conv. Depthwise convs are still pending.
Approach is same as with zero points. Pointer to requantization array is passed and it is looked up using output_channel_index.
All the kernels are appropriately modified except for the depthwise ones.
Tests are altered to generate per channel zero points and requant scales.
Unit tests are replicated for conv tests to exercise per_channel conv. This was not strictly needed since conv kernels were changed such that they did per channel anyway. When per channels is not needed zero point and requant scale were same across channels. 
However for depthwise convolutions we will be using different set of kernels to do per channel, which required us to introduce per_channel member to conv_param structure, to know which kernels to use  for depthwise conv. 
Ensuing modifications were to keep everything in sync for both regular conv and depthwise so that we dont have caveat when reading the code, that why does depthwise have separate test for
per channel and non-depthwise conv does not.

Test Plan:
Via tests inside qnnpack, i.e., q8gemm-test, q8conv/dwconv test.
fully-conntected-test, convolution-test.

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D21339041](https://our.internmc.facebook.com/intern/diff/D21339041)

[ghstack-poisoned]
xuezhou1998 pushed a commit to xuezhou1998/new_pytorch that referenced this pull request May 9, 2020
Summary:
Now channel wise quantization is supported for linear/conv.
Depthwise conv are still pending.
Tests are altered to generate per channel zero points and requant
scales.
All the kernels are fixed appropritately.
Added per_channel member to conv_param structure.
And replicated conv tests to exercise per_channel conv.
This was not strictly needed since conv kernels were changed
such that they did per channel anyway. When per channels is not needed
zp and scale were same across channels. This was to minimize code
duplicaiton as the perf impact is estimated (to be measured though) to
be low.
However this is not likely the case for depthwise convs. Thus they will
have separate kernels, which required us to introduce per_channel member
to conv_param structure, to know which kernels to apply for depthwise.
Ensuing modifications were to keep everything in
sync for both regular conv and depthwise so that we dont have caveat
when reading the code, that why does depthwise have separate test for
per channel and non-depthwise conv does not.

Test Plan:
Via tests inside qnnpack, i.e., q8gemm-test, q8conv/dwconv test.
fully-conntected-test, convolution-test.

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: c7d903f06f118d24a5a2989939995851bcb1a939
Pull Request resolved: pytorch/pytorch#37620
Summary:
Now channel wise quantization is supported for linear/conv. Depthwise convs are still pending.
Approach is same as with zero points. Pointer to requantization array is passed and it is looked up using output_channel_index.
All the kernels are appropriately modified except for the depthwise ones.
Tests are altered to generate per channel zero points and requant scales.
Unit tests are replicated for conv tests to exercise per_channel conv. This was not strictly needed since conv kernels were changed such that they did per channel anyway. When per channels is not needed zero point and requant scale were same across channels. 
However for depthwise convolutions we will be using different set of kernels to do per channel, which required us to introduce per_channel member to conv_param structure, to know which kernels to use  for depthwise conv. 
Ensuing modifications were to keep everything in sync for both regular conv and depthwise so that we dont have caveat when reading the code, that why does depthwise have separate test for
per channel and non-depthwise conv does not.

Test Plan:
Via tests inside qnnpack, i.e., q8gemm-test, q8conv/dwconv test.
fully-conntected-test, convolution-test.

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D21339041](https://our.internmc.facebook.com/intern/diff/D21339041)

[ghstack-poisoned]
Summary:
Now channel wise quantization is supported for linear/conv. Depthwise convs are still pending.
Approach is same as with zero points. Pointer to requantization array is passed and it is looked up using output_channel_index.
All the kernels are appropriately modified except for the depthwise ones.
Tests are altered to generate per channel zero points and requant scales.
Unit tests are replicated for conv tests to exercise per_channel conv. This was not strictly needed since conv kernels were changed such that they did per channel anyway. When per channels is not needed zero point and requant scale were same across channels. 
However for depthwise convolutions we will be using different set of kernels to do per channel, which required us to introduce per_channel member to conv_param structure, to know which kernels to use  for depthwise conv. 
Ensuing modifications were to keep everything in sync for both regular conv and depthwise so that we dont have caveat when reading the code, that why does depthwise have separate test for
per channel and non-depthwise conv does not.

Test Plan:
Via tests inside qnnpack, i.e., q8gemm-test, q8conv/dwconv test.
fully-conntected-test, convolution-test.

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D21339041](https://our.internmc.facebook.com/intern/diff/D21339041)

[ghstack-poisoned]
Summary:
Now channel wise quantization is supported for linear/conv. Depthwise convs are still pending.
Approach is same as with zero points. Pointer to requantization array is passed and it is looked up using output_channel_index.
All the kernels are appropriately modified except for the depthwise ones.
Tests are altered to generate per channel zero points and requant scales.
Unit tests are replicated for conv tests to exercise per_channel conv. This was not strictly needed since conv kernels were changed such that they did per channel anyway. When per channels is not needed zero point and requant scale were same across channels. 
However for depthwise convolutions we will be using different set of kernels to do per channel, which required us to introduce per_channel member to conv_param structure, to know which kernels to use  for depthwise conv. 
Ensuing modifications were to keep everything in sync for both regular conv and depthwise so that we dont have caveat when reading the code, that why does depthwise have separate test for
per channel and non-depthwise conv does not.

Test Plan:
Via tests inside qnnpack, i.e., q8gemm-test, q8conv/dwconv test.
fully-conntected-test, convolution-test.

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D21339041](https://our.internmc.facebook.com/intern/diff/D21339041)

[ghstack-poisoned]
Summary:
Now channel wise quantization is supported for linear/conv. Depthwise convs are still pending.
Approach is same as with zero points. Pointer to requantization array is passed and it is looked up using output_channel_index.
All the kernels are appropriately modified except for the depthwise ones.
Tests are altered to generate per channel zero points and requant scales.
Unit tests are replicated for conv tests to exercise per_channel conv. This was not strictly needed since conv kernels were changed such that they did per channel anyway. When per channels is not needed zero point and requant scale were same across channels. 
However for depthwise convolutions we will be using different set of kernels to do per channel, which required us to introduce per_channel member to conv_param structure, to know which kernels to use  for depthwise conv. 
Ensuing modifications were to keep everything in sync for both regular conv and depthwise so that we dont have caveat when reading the code, that why does depthwise have separate test for
per channel and non-depthwise conv does not.

Test Plan:
Via tests inside qnnpack, i.e., q8gemm-test, q8conv/dwconv test.
fully-conntected-test, convolution-test.

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D21339041](https://our.internmc.facebook.com/intern/diff/D21339041)

[ghstack-poisoned]
Copy link
Contributor

@dreiss dreiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked at all but the microkernels and it looks good except for minor comments.

@@ -505,8 +505,10 @@ at::Tensor PackedConvWeightsQnnp<kSpatialDim>::apply_impl(

double act_input_scale = act_nhwc.q_scale();


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert?

@@ -5,6 +5,61 @@
#include <cstdlib>

namespace qnnpack {
// For runtime quantization packing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is runtime quantization the same as dynamic quantization? If so, maybe a follow-up diff to unify the naming.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dynamic quantization is when you quantize your input data every single time and dequantize the output to fp32. Runtime quantization is the other mode. Only reason it is called runtime is due to the way QNNPACK was integrated in pytorch.

if (kzp != 0) {
// This part fills the packed wights with zero points for output channels
// when they are not divisble by nr blocking parameter.
// In that case
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing the end of this comment?

Copy link
Contributor

@AshkanAliabadi AshkanAliabadi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got half way through, will continue tomorrow.

@@ -57,8 +57,12 @@ BEGIN_FUNCTION pytorch_q8conv_ukernel_8x8__aarch64_neon
LDR x9, [sp]
# Load pointer to per channel zero points array
LDR x10, [x8]
# To go to a_zero_point
ADD x8, x8, 8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LDR x10, [x8], 8

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesnt compile on android.

ADD x8, x8, 4
# Load pointer to per channel requant scale
LDR x10, [x8, 8]!
ADD x8, x8, 8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LD1R {v24.8b}, [x8], 8
LDR x10, [x8], 8

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same problem as before with LDRs.

// - v26 = requantization_scale channels 0-3
// - v31 = requantization_scale channels 4-7
LD1 {v26.4s}, [x10], 16
LD1 {v30.4s}, [x10]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a long shot but I am wondering if you can do this and remove LSL x9, x9, 2 and ADD x10, x10, x9 above, in case the value of x9 and x10 do not matter after this point:

LD1 {v26.4s}, [x10], x9, lsl 2
LD1 {v30.4s}, [x10, 16]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that form exist?. I checked but it does not seem to me that the variant of LD1 exist where shift can be folded in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does, at least for LDR and STR. Check section 3, titled "Offset form: Scaled register as the offset" of this document here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes so for LDR yes there is. Not for LD1. But I think this I can use in base + offset calculation done via add.

Summary:
Now channel wise quantization is supported for linear/conv. Depthwise convs are still pending.
Approach is same as with zero points. Pointer to requantization array is passed and it is looked up using output_channel_index.
All the kernels are appropriately modified except for the depthwise ones.
Tests are altered to generate per channel zero points and requant scales.
Unit tests are replicated for conv tests to exercise per_channel conv. This was not strictly needed since conv kernels were changed such that they did per channel anyway. When per channels is not needed zero point and requant scale were same across channels. 
However for depthwise convolutions we will be using different set of kernels to do per channel, which required us to introduce per_channel member to conv_param structure, to know which kernels to use  for depthwise conv. 
Ensuing modifications were to keep everything in sync for both regular conv and depthwise so that we dont have caveat when reading the code, that why does depthwise have separate test for
per channel and non-depthwise conv does not.

Test Plan:
Via tests inside qnnpack, i.e., q8gemm-test, q8conv/dwconv test.
fully-conntected-test, convolution-test.

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D21339041](https://our.internmc.facebook.com/intern/diff/D21339041)

[ghstack-poisoned]
Summary:
Now channel wise quantization is supported for linear/conv. Depthwise convs are still pending.
Approach is same as with zero points. Pointer to requantization array is passed and it is looked up using output_channel_index.
All the kernels are appropriately modified except for the depthwise ones.
Tests are altered to generate per channel zero points and requant scales.
Unit tests are replicated for conv tests to exercise per_channel conv. This was not strictly needed since conv kernels were changed such that they did per channel anyway. When per channels is not needed zero point and requant scale were same across channels. 
However for depthwise convolutions we will be using different set of kernels to do per channel, which required us to introduce per_channel member to conv_param structure, to know which kernels to use  for depthwise conv. 
Ensuing modifications were to keep everything in sync for both regular conv and depthwise so that we dont have caveat when reading the code, that why does depthwise have separate test for
per channel and non-depthwise conv does not.

Test Plan:
Via tests inside qnnpack, i.e., q8gemm-test, q8conv/dwconv test.
fully-conntected-test, convolution-test.

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D21339041](https://our.internmc.facebook.com/intern/diff/D21339041)

[ghstack-poisoned]
Copy link
Contributor

@AshkanAliabadi AshkanAliabadi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. thanks.

@@ -100,14 +108,10 @@ BEGIN_FUNCTION pytorch_q8conv_ukernel_4x8__aarch32_neon
# Load a_zero_point:
# - d14 = a_zero_point
VLD1.8 {d14[]}, [r9]
ADD r9, r9, 4
# add 8 bytes to get to vfmax
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

12?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry did not fix it here. It gets fixed by the perf related PR.

// - v26 = requantization_scale channels 0-3
// - v31 = requantization_scale channels 4-7
LD1 {v26.4s}, [x10], 16
LD1 {v30.4s}, [x10]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does, at least for LDR and STR. Check section 3, titled "Offset form: Scaled register as the offset" of this document here.

LD1R {v26.4s}, [x8], 4
// - v26 = requantization_scale channels 0-3
// - v27 = requantization_scale channels 4-7
LD1 {v26.4s}, [x17], 16

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a glance it seems to me that you can save a couple of instructions in the block above if interested. It basically boils down to modifying the pointer arithmetic to take advantage of the free offsetting in the loads.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes so pointer arithmetic can save lsl by folding it in. LD1 I dont think since it has only no offset and post-index variants.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am gonna make these changes in perf opt PR.

Summary:
Now channel wise quantization is supported for linear/conv. Depthwise convs are still pending.
Approach is same as with zero points. Pointer to requantization array is passed and it is looked up using output_channel_index.
All the kernels are appropriately modified except for the depthwise ones.
Tests are altered to generate per channel zero points and requant scales.
Unit tests are replicated for conv tests to exercise per_channel conv. This was not strictly needed since conv kernels were changed such that they did per channel anyway. When per channels is not needed zero point and requant scale were same across channels. 
However for depthwise convolutions we will be using different set of kernels to do per channel, which required us to introduce per_channel member to conv_param structure, to know which kernels to use  for depthwise conv. 
Ensuing modifications were to keep everything in sync for both regular conv and depthwise so that we dont have caveat when reading the code, that why does depthwise have separate test for
per channel and non-depthwise conv does not.

Test Plan:
Via tests inside qnnpack, i.e., q8gemm-test, q8conv/dwconv test.
fully-conntected-test, convolution-test.

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D21339041](https://our.internmc.facebook.com/intern/diff/D21339041)

[ghstack-poisoned]
Summary:
Now channel wise quantization is supported for linear/conv. Depthwise convs are still pending.
Approach is same as with zero points. Pointer to requantization array is passed and it is looked up using output_channel_index.
All the kernels are appropriately modified except for the depthwise ones.
Tests are altered to generate per channel zero points and requant scales.
Unit tests are replicated for conv tests to exercise per_channel conv. This was not strictly needed since conv kernels were changed such that they did per channel anyway. When per channels is not needed zero point and requant scale were same across channels. 
However for depthwise convolutions we will be using different set of kernels to do per channel, which required us to introduce per_channel member to conv_param structure, to know which kernels to use  for depthwise conv. 
Ensuing modifications were to keep everything in sync for both regular conv and depthwise so that we dont have caveat when reading the code, that why does depthwise have separate test for
per channel and non-depthwise conv does not.

Test Plan:
Via tests inside qnnpack, i.e., q8gemm-test, q8conv/dwconv test.
fully-conntected-test, convolution-test.

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D21339041](https://our.internmc.facebook.com/intern/diff/D21339041)

[ghstack-poisoned]
Summary:
Now channel wise quantization is supported for linear/conv. Depthwise convs are still pending.
Approach is same as with zero points. Pointer to requantization array is passed and it is looked up using output_channel_index.
All the kernels are appropriately modified except for the depthwise ones.
Tests are altered to generate per channel zero points and requant scales.
Unit tests are replicated for conv tests to exercise per_channel conv. This was not strictly needed since conv kernels were changed such that they did per channel anyway. When per channels is not needed zero point and requant scale were same across channels. 
However for depthwise convolutions we will be using different set of kernels to do per channel, which required us to introduce per_channel member to conv_param structure, to know which kernels to use  for depthwise conv. 
Ensuing modifications were to keep everything in sync for both regular conv and depthwise so that we dont have caveat when reading the code, that why does depthwise have separate test for
per channel and non-depthwise conv does not.

Test Plan:
Via tests inside qnnpack, i.e., q8gemm-test, q8conv/dwconv test.
fully-conntected-test, convolution-test.

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D21339041](https://our.internmc.facebook.com/intern/diff/D21339041)

[ghstack-poisoned]
Summary:
Now channel wise quantization is supported for linear/conv. Depthwise convs are still pending.
Approach is same as with zero points. Pointer to requantization array is passed and it is looked up using output_channel_index.
All the kernels are appropriately modified except for the depthwise ones.
Tests are altered to generate per channel zero points and requant scales.
Unit tests are replicated for conv tests to exercise per_channel conv. This was not strictly needed since conv kernels were changed such that they did per channel anyway. When per channels is not needed zero point and requant scale were same across channels. 
However for depthwise convolutions we will be using different set of kernels to do per channel, which required us to introduce per_channel member to conv_param structure, to know which kernels to use  for depthwise conv. 
Ensuing modifications were to keep everything in sync for both regular conv and depthwise so that we dont have caveat when reading the code, that why does depthwise have separate test for
per channel and non-depthwise conv does not.

Test Plan:
Via tests inside qnnpack, i.e., q8gemm-test, q8conv/dwconv test.
fully-conntected-test, convolution-test.

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D21339041](https://our.internmc.facebook.com/intern/diff/D21339041)

[ghstack-poisoned]
Summary:
Now channel wise quantization is supported for linear/conv. Depthwise convs are still pending.
Approach is same as with zero points. Pointer to requantization array is passed and it is looked up using output_channel_index.
All the kernels are appropriately modified except for the depthwise ones.
Tests are altered to generate per channel zero points and requant scales.
Unit tests are replicated for conv tests to exercise per_channel conv. This was not strictly needed since conv kernels were changed such that they did per channel anyway. When per channels is not needed zero point and requant scale were same across channels. 
However for depthwise convolutions we will be using different set of kernels to do per channel, which required us to introduce per_channel member to conv_param structure, to know which kernels to use  for depthwise conv. 
Ensuing modifications were to keep everything in sync for both regular conv and depthwise so that we dont have caveat when reading the code, that why does depthwise have separate test for
per channel and non-depthwise conv does not.

Test Plan:
Via tests inside qnnpack, i.e., q8gemm-test, q8conv/dwconv test.
fully-conntected-test, convolution-test.

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D21339041](https://our.internmc.facebook.com/intern/diff/D21339041)

[ghstack-poisoned]
Summary:
Now channel wise quantization is supported for linear/conv. Depthwise convs are still pending.
Approach is same as with zero points. Pointer to requantization array is passed and it is looked up using output_channel_index.
All the kernels are appropriately modified except for the depthwise ones.
Tests are altered to generate per channel zero points and requant scales.
Unit tests are replicated for conv tests to exercise per_channel conv. This was not strictly needed since conv kernels were changed such that they did per channel anyway. When per channels is not needed zero point and requant scale were same across channels. 
However for depthwise convolutions we will be using different set of kernels to do per channel, which required us to introduce per_channel member to conv_param structure, to know which kernels to use  for depthwise conv. 
Ensuing modifications were to keep everything in sync for both regular conv and depthwise so that we dont have caveat when reading the code, that why does depthwise have separate test for
per channel and non-depthwise conv does not.

Test Plan:
Via tests inside qnnpack, i.e., q8gemm-test, q8conv/dwconv test.
fully-conntected-test, convolution-test.

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D21339041](https://our.internmc.facebook.com/intern/diff/D21339041)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 1f16d4c.

@facebook-github-bot facebook-github-bot deleted the gh/kimishpatel/14/head branch May 24, 2020 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants