Changes to enable per channel requant. #37620

kimishpatel · 2020-04-30T23:01:17Z

Stack from ghstack:

Some TODO fixes. #37829 Some TODO fixes.
Perf optimization for conv and gemm kernels. #37626 Perf optimization for conv and gemm kernels.
Added per channel separate test cases for fc and deconv tests. #37624 Added per channel separate test cases for fc and deconv tests.
Changes to enable per channel support on dynamic linear. #37623 Changes to enable per channel support on dynamic linear.
Enabled per channel quantized static linear/conv #37622 Enabled per channel quantized static linear/conv
Added per channel kernels for depthwise conv. #37621 Added per channel kernels for depthwise conv.
Changes to enable per channel requant. #37620 Changes to enable per channel requant.
Enable per channel zero point. #37619 Enable per channel zero point.
Interface changes to enable per channel quant. #37618 Interface changes to enable per channel quant.

Summary:
Now channel wise quantization is supported for linear/conv. Depthwise convs are still pending.
Approach is same as with zero points. Pointer to requantization array is passed and it is looked up using output_channel_index.
All the kernels are appropriately modified except for the depthwise ones.
Tests are altered to generate per channel zero points and requant scales.
Unit tests are replicated for conv tests to exercise per_channel conv. This was not strictly needed since conv kernels were changed such that they did per channel anyway. When per channels is not needed zero point and requant scale were same across channels.
However for depthwise convolutions we will be using different set of kernels to do per channel, which required us to introduce per_channel member to conv_param structure, to know which kernels to use for depthwise conv.
Ensuing modifications were to keep everything in sync for both regular conv and depthwise so that we dont have caveat when reading the code, that why does depthwise have separate test for
per channel and non-depthwise conv does not.

Test Plan:
Via tests inside qnnpack, i.e., q8gemm-test, q8conv/dwconv test.
fully-conntected-test, convolution-test.

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: D21339041

Summary: Now channel wise quantization is supported for linear/conv. Depthwise conv are still pending. Tests are altered to generate per channel zero points and requant scales. All the kernels are fixed appropritately. Added per_channel member to conv_param structure. And replicated conv tests to exercise per_channel conv. This was not strictly needed since conv kernels were changed such that they did per channel anyway. When per channels is not needed zp and scale were same across channels. This was to minimize code duplicaiton as the perf impact is estimated (to be measured though) to be low. However this is not likely the case for depthwise convs. Thus they will have separate kernels, which required us to introduce per_channel member to conv_param structure, to know which kernels to apply for depthwise. Ensuing modifications were to keep everything in sync for both regular conv and depthwise so that we dont have caveat when reading the code, that why does depthwise have separate test for per channel and non-depthwise conv does not. Test Plan: Via tests inside qnnpack, i.e., q8gemm-test, q8conv/dwconv test. fully-conntected-test, convolution-test. Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

dr-ci · 2020-04-30T23:55:25Z

💊 CI failures summary and remediations

As of commit dcc721f (more details on the Dr. CI page):

2/2 failures possibly* introduced in this PR
- 1/2 non-CircleCI failure(s)

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch_windows_vs2019_py36_cuda10.1_build (1/1)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

FAILED: caffe2/python/operator_test/elementwise_linear_op_test.py

[1142/3497] cmd.exe /C "cd /D C:\Users\circleci\project\build\caffe2 && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E copy C:/Users/circleci/project/caffe2/python/operator_test/counter_ops_test.py C:/Users/circleci/project/build/caffe2/python/operator_test/counter_ops_test.py" 
[1143/3497] cmd.exe /C "cd /D C:\Users\circleci\project\build\caffe2 && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E copy C:/Users/circleci/project/caffe2/python/operator_test/ctc_beam_search_decoder_op_test.py C:/Users/circleci/project/build/caffe2/python/operator_test/ctc_beam_search_decoder_op_test.py" 
[1144/3497] cmd.exe /C "cd /D C:\Users\circleci\project\build\caffe2 && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E copy C:/Users/circleci/project/caffe2/python/operator_test/cudnn_recurrent_test.py C:/Users/circleci/project/build/caffe2/python/operator_test/cudnn_recurrent_test.py" 
[1145/3497] cmd.exe /C "cd /D C:\Users\circleci\project\build\caffe2 && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E copy C:/Users/circleci/project/caffe2/python/operator_test/data_couple_op_test.py C:/Users/circleci/project/build/caffe2/python/operator_test/data_couple_op_test.py" 
[1146/3497] cmd.exe /C "cd /D C:\Users\circleci\project\build\caffe2 && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E copy C:/Users/circleci/project/caffe2/python/operator_test/ctc_greedy_decoder_op_test.py C:/Users/circleci/project/build/caffe2/python/operator_test/ctc_greedy_decoder_op_test.py" 
[1147/3497] cmd.exe /C "cd /D C:\Users\circleci\project\build\caffe2 && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E copy C:/Users/circleci/project/caffe2/python/operator_test/depthwise_3x3_conv_test.py C:/Users/circleci/project/build/caffe2/python/operator_test/depthwise_3x3_conv_test.py" 
[1148/3497] cmd.exe /C "cd /D C:\Users\circleci\project\build\caffe2 && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E copy C:/Users/circleci/project/caffe2/python/operator_test/dataset_ops_test.py C:/Users/circleci/project/build/caffe2/python/operator_test/dataset_ops_test.py" 
[1149/3497] cmd.exe /C "cd /D C:\Users\circleci\project\build\caffe2 && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E copy C:/Users/circleci/project/caffe2/python/operator_test/deform_conv_test.py C:/Users/circleci/project/build/caffe2/python/operator_test/deform_conv_test.py" 
[1150/3497] cmd.exe /C "cd /D C:\Users\circleci\project\build\caffe2 && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E copy C:/Users/circleci/project/caffe2/python/operator_test/dense_vector_to_id_list_op_test.py C:/Users/circleci/project/build/caffe2/python/operator_test/dense_vector_to_id_list_op_test.py" 
[1151/3497] cmd.exe /C "cd /D C:\Users\circleci\project\build\caffe2 && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E copy C:/Users/circleci/project/caffe2/python/operator_test/elementwise_linear_op_test.py C:/Users/circleci/project/build/caffe2/python/operator_test/elementwise_linear_op_test.py" 
FAILED: caffe2/python/operator_test/elementwise_linear_op_test.py  
cmd.exe /C "cd /D C:\Users\circleci\project\build\caffe2 && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E copy C:/Users/circleci/project/caffe2/python/operator_test/elementwise_linear_op_test.py C:/Users/circleci/project/build/caffe2/python/operator_test/elementwise_linear_op_test.py" 
[1152/3497] cmd.exe /C "cd /D C:\Users\circleci\project\build\caffe2 && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E copy C:/Users/circleci/project/caffe2/python/operator_test/detectron_keypoints.py C:/Users/circleci/project/build/caffe2/python/operator_test/detectron_keypoints.py" 
[1153/3497] cmd.exe /C "cd /D C:\Users\circleci\project\build\caffe2 && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E copy C:/Users/circleci/project/caffe2/python/operator_test/distance_op_test.py C:/Users/circleci/project/build/caffe2/python/operator_test/distance_op_test.py" 
[1154/3497] cmd.exe /C "cd /D C:\Users\circleci\project\build\caffe2 && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E copy C:/Users/circleci/project/caffe2/python/operator_test/dropout_op_test.py C:/Users/circleci/project/build/caffe2/python/operator_test/dropout_op_test.py" 
[1155/3497] cmd.exe /C "cd /D C:\Users\circleci\project\build\caffe2 && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E copy C:/Users/circleci/project/caffe2/python/visualize.py C:/Users/circleci/project/build/caffe2/python/visualize.py" 
[1156/3497] cmd.exe /C "cd /D C:\Users\circleci\project\build\caffe2 && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E copy C:/Users/circleci/project/caffe2/python/operator_test/duplicate_operands_test.py C:/Users/circleci/project/build/caffe2/python/operator_test/duplicate_operands_test.py" 
[1157/3497] cmd.exe /C "cd /D C:\Users\circleci\project\build\caffe2 && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E copy C:/Users/circleci/project/caffe2/python/operator_test/elementwise_ops_test.py C:/Users/circleci/project/build/caffe2/python/operator_test/elementwise_ops_test.py" 
[1158/3497] cmd.exe /C "cd /D C:\Users\circleci\project\build\caffe2 && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E copy C:/Users/circleci/project/caffe2/python/operator_test/elementwise_logical_ops_test.py C:/Users/circleci/project/build/caffe2/python/operator_test/elementwise_logical_ops_test.py" 
[1159/3497] cmd.exe /C "cd /D C:\Users\circleci\project\build\caffe2 && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E copy C:/Users/circleci/project/caffe2/python/operator_test/elementwise_op_broadcast_test.py C:/Users/circleci/project/build/caffe2/python/operator_test/elementwise_op_broadcast_test.py" 
[1160/3497] cmd.exe /C "cd /D C:\Users\circleci\project\build\caffe2 && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E copy C:/Users/circleci/project/caffe2/python/operator_test/ensure_cpu_output_op_test.py C:/Users/circleci/project/build/caffe2/python/operator_test/ensure_cpu_output_op_test.py"

ci.pytorch.org: 1 failed

Failed: pr/py3.6-clang7-rocmdeb-ubuntu16.04

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 91 times.

Summary: Now channel wise quantization is supported for linear/conv. Depthwise conv are still pending. Tests are altered to generate per channel zero points and requant scales. All the kernels are fixed appropritately. Added per_channel member to conv_param structure. And replicated conv tests to exercise per_channel conv. This was not strictly needed since conv kernels were changed such that they did per channel anyway. When per channels is not needed zp and scale were same across channels. This was to minimize code duplicaiton as the perf impact is estimated (to be measured though) to be low. However this is not likely the case for depthwise convs. Thus they will have separate kernels, which required us to introduce per_channel member to conv_param structure, to know which kernels to apply for depthwise. Ensuing modifications were to keep everything in sync for both regular conv and depthwise so that we dont have caveat when reading the code, that why does depthwise have separate test for per channel and non-depthwise conv does not. Test Plan: Via tests inside qnnpack, i.e., q8gemm-test, q8conv/dwconv test. fully-conntected-test, convolution-test. Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D21339041](https://our.internmc.facebook.com/intern/diff/D21339041) [ghstack-poisoned]

Summary: Now channel wise quantization is supported for linear/conv. Depthwise convs are still pending. Approach is same as with zero points. Pointer to requantization array is passed and it is looked up using output_channel_index. All the kernels are appropriately modified except for the depthwise ones. Tests are altered to generate per channel zero points and requant scales. Unit tests are replicated for conv tests to exercise per_channel conv. This was not strictly needed since conv kernels were changed such that they did per channel anyway. When per channels is not needed zero point and requant scale were same across channels. However for depthwise convolutions we will be using different set of kernels to do per channel, which required us to introduce per_channel member to conv_param structure, to know which kernels to use for depthwise conv. Ensuing modifications were to keep everything in sync for both regular conv and depthwise so that we dont have caveat when reading the code, that why does depthwise have separate test for per channel and non-depthwise conv does not. Test Plan: Via tests inside qnnpack, i.e., q8gemm-test, q8conv/dwconv test. fully-conntected-test, convolution-test. Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D21339041](https://our.internmc.facebook.com/intern/diff/D21339041) [ghstack-poisoned]

Summary: Now channel wise quantization is supported for linear/conv. Depthwise conv are still pending. Tests are altered to generate per channel zero points and requant scales. All the kernels are fixed appropritately. Added per_channel member to conv_param structure. And replicated conv tests to exercise per_channel conv. This was not strictly needed since conv kernels were changed such that they did per channel anyway. When per channels is not needed zp and scale were same across channels. This was to minimize code duplicaiton as the perf impact is estimated (to be measured though) to be low. However this is not likely the case for depthwise convs. Thus they will have separate kernels, which required us to introduce per_channel member to conv_param structure, to know which kernels to apply for depthwise. Ensuing modifications were to keep everything in sync for both regular conv and depthwise so that we dont have caveat when reading the code, that why does depthwise have separate test for per channel and non-depthwise conv does not. Test Plan: Via tests inside qnnpack, i.e., q8gemm-test, q8conv/dwconv test. fully-conntected-test, convolution-test. Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: c7d903f06f118d24a5a2989939995851bcb1a939 Pull Request resolved: pytorch/pytorch#37620

Summary: Now channel wise quantization is supported for linear/conv. Depthwise convs are still pending. Approach is same as with zero points. Pointer to requantization array is passed and it is looked up using output_channel_index. All the kernels are appropriately modified except for the depthwise ones. Tests are altered to generate per channel zero points and requant scales. Unit tests are replicated for conv tests to exercise per_channel conv. This was not strictly needed since conv kernels were changed such that they did per channel anyway. When per channels is not needed zero point and requant scale were same across channels. However for depthwise convolutions we will be using different set of kernels to do per channel, which required us to introduce per_channel member to conv_param structure, to know which kernels to use for depthwise conv. Ensuing modifications were to keep everything in sync for both regular conv and depthwise so that we dont have caveat when reading the code, that why does depthwise have separate test for per channel and non-depthwise conv does not. Test Plan: Via tests inside qnnpack, i.e., q8gemm-test, q8conv/dwconv test. fully-conntected-test, convolution-test. Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D21339041](https://our.internmc.facebook.com/intern/diff/D21339041) [ghstack-poisoned]

dreiss

Looked at all but the microkernels and it looks good except for minor comments.

dreiss · 2020-05-14T00:36:53Z

aten/src/ATen/native/quantized/cpu/qconv.cpp

@@ -505,8 +505,10 @@ at::Tensor PackedConvWeightsQnnp<kSpatialDim>::apply_impl(

  double act_input_scale = act_nhwc.q_scale();

+


dreiss · 2020-05-14T00:47:12Z

aten/src/ATen/native/quantized/cpu/qnnpack/src/fc-prepack.cc

@@ -5,6 +5,61 @@
 #include <cstdlib>

 namespace qnnpack {
+// For runtime quantization packing.


Is runtime quantization the same as dynamic quantization? If so, maybe a follow-up diff to unify the naming.

Dynamic quantization is when you quantize your input data every single time and dequantize the output to fp32. Runtime quantization is the other mode. Only reason it is called runtime is due to the way QNNPACK was integrated in pytorch.

dreiss · 2020-05-14T02:55:00Z

aten/src/ATen/native/quantized/cpu/qnnpack/src/qnnpack/pack.h

+      if (kzp != 0) {
+        // This part fills the packed wights with zero points for output channels
+        // when they are not divisble by nr blocking parameter.
+        // In that case


Missing the end of this comment?

AshkanAliabadi

Got half way through, will continue tomorrow.

AshkanAliabadi · 2020-05-14T04:50:25Z

aten/src/ATen/native/quantized/cpu/qnnpack/src/q8conv/8x8-aarch64-neon.S

@@ -57,8 +57,12 @@ BEGIN_FUNCTION pytorch_q8conv_ukernel_8x8__aarch64_neon
    LDR x9, [sp]
    # Load pointer to per channel zero points array
    LDR x10, [x8]
+    # To go to a_zero_point
+    ADD x8, x8, 8


LDR x10, [x8], 8

Doesnt compile on android.

AshkanAliabadi · 2020-05-14T04:53:34Z

aten/src/ATen/native/quantized/cpu/qnnpack/src/q8conv/8x8-aarch64-neon.S

-    ADD x8, x8, 4
+    # Load pointer to per channel requant scale
+    LDR x10, [x8, 8]!
+    ADD x8, x8, 8


LD1R {v24.8b}, [x8], 8 LDR x10, [x8], 8

Same problem as before with LDRs.

AshkanAliabadi · 2020-05-14T05:04:11Z

aten/src/ATen/native/quantized/cpu/qnnpack/src/q8conv/8x8-aarch64-neon.S

+    // - v26 = requantization_scale channels 0-3
+    // - v31 = requantization_scale channels 4-7
+    LD1 {v26.4s}, [x10], 16
+    LD1 {v30.4s}, [x10]


This is a long shot but I am wondering if you can do this and remove LSL x9, x9, 2 and ADD x10, x10, x9 above, in case the value of x9 and x10 do not matter after this point:

LD1 {v26.4s}, [x10], x9, lsl 2 LD1 {v30.4s}, [x10, 16]

Does that form exist?. I checked but it does not seem to me that the variant of LD1 exist where shift can be folded in.

It does, at least for LDR and STR. Check section 3, titled "Offset form: Scaled register as the offset" of this document here.

Yes so for LDR yes there is. Not for LD1. But I think this I can use in base + offset calculation done via add.

Summary: Now channel wise quantization is supported for linear/conv. Depthwise convs are still pending. Approach is same as with zero points. Pointer to requantization array is passed and it is looked up using output_channel_index. All the kernels are appropriately modified except for the depthwise ones. Tests are altered to generate per channel zero points and requant scales. Unit tests are replicated for conv tests to exercise per_channel conv. This was not strictly needed since conv kernels were changed such that they did per channel anyway. When per channels is not needed zero point and requant scale were same across channels. However for depthwise convolutions we will be using different set of kernels to do per channel, which required us to introduce per_channel member to conv_param structure, to know which kernels to use for depthwise conv. Ensuing modifications were to keep everything in sync for both regular conv and depthwise so that we dont have caveat when reading the code, that why does depthwise have separate test for per channel and non-depthwise conv does not. Test Plan: Via tests inside qnnpack, i.e., q8gemm-test, q8conv/dwconv test. fully-conntected-test, convolution-test. Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D21339041](https://our.internmc.facebook.com/intern/diff/D21339041) [ghstack-poisoned]

AshkanAliabadi

LGTM. thanks.

AshkanAliabadi · 2020-05-15T04:26:49Z

aten/src/ATen/native/quantized/cpu/qnnpack/src/q8conv/4x8-aarch32-neon.S

@@ -100,14 +108,10 @@ BEGIN_FUNCTION pytorch_q8conv_ukernel_4x8__aarch32_neon
    # Load a_zero_point:
    # - d14 = a_zero_point
    VLD1.8 {d14[]}, [r9]
-    ADD r9, r9, 4
+    # add 8 bytes to get to vfmax


Sorry did not fix it here. It gets fixed by the perf related PR.

AshkanAliabadi · 2020-05-15T04:29:38Z

aten/src/ATen/native/quantized/cpu/qnnpack/src/q8conv/8x8-aarch64-neon.S

+    // - v26 = requantization_scale channels 0-3
+    // - v31 = requantization_scale channels 4-7
+    LD1 {v26.4s}, [x10], 16
+    LD1 {v30.4s}, [x10]


It does, at least for LDR and STR. Check section 3, titled "Offset form: Scaled register as the offset" of this document here.

AshkanAliabadi · 2020-05-15T05:09:13Z

aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-aarch64-neon.S

-    LD1R {v26.4s}, [x8], 4
+    // - v26 = requantization_scale channels 0-3
+    // - v27 = requantization_scale channels 4-7
+    LD1 {v26.4s}, [x17], 16



At a glance it seems to me that you can save a couple of instructions in the block above if interested. It basically boils down to modifying the pointer arithmetic to take advantage of the free offsetting in the loads.

Yes so pointer arithmetic can save lsl by folding it in. LD1 I dont think since it has only no offset and post-index variants.

I am gonna make these changes in perf opt PR.

Summary: Now channel wise quantization is supported for linear/conv. Depthwise convs are still pending. Approach is same as with zero points. Pointer to requantization array is passed and it is looked up using output_channel_index. All the kernels are appropriately modified except for the depthwise ones. Tests are altered to generate per channel zero points and requant scales. Unit tests are replicated for conv tests to exercise per_channel conv. This was not strictly needed since conv kernels were changed such that they did per channel anyway. When per channels is not needed zero point and requant scale were same across channels. However for depthwise convolutions we will be using different set of kernels to do per channel, which required us to introduce per_channel member to conv_param structure, to know which kernels to use for depthwise conv. Ensuing modifications were to keep everything in sync for both regular conv and depthwise so that we dont have caveat when reading the code, that why does depthwise have separate test for per channel and non-depthwise conv does not. Test Plan: Via tests inside qnnpack, i.e., q8gemm-test, q8conv/dwconv test. fully-conntected-test, convolution-test. Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D21339041](https://our.internmc.facebook.com/intern/diff/D21339041) [ghstack-poisoned]

facebook-github-bot · 2020-05-20T22:24:27Z

This pull request has been merged in 1f16d4c.

kimishpatel requested review from AshkanAliabadi, supriyar and dreiss April 30, 2020 23:07

kimishpatel added 4 commits April 30, 2020 17:41

kimishpatel mentioned this pull request May 5, 2020

Some TODO fixes. #37829

Closed

kimishpatel added 5 commits May 5, 2020 09:12

kimishpatel added 3 commits May 11, 2020 15:31

kimishpatel mentioned this pull request May 12, 2020

Change input scale to double type for conv params. #38346

Closed

dreiss approved these changes May 14, 2020

View reviewed changes

AshkanAliabadi reviewed May 14, 2020

View reviewed changes

kimishpatel added 2 commits May 14, 2020 14:29

AshkanAliabadi reviewed May 15, 2020

View reviewed changes

kimishpatel added 6 commits May 15, 2020 08:51

facebook-github-bot closed this in 1f16d4c May 20, 2020

facebook-github-bot added the merged label May 20, 2020

facebook-github-bot deleted the gh/kimishpatel/14/head branch May 24, 2020 14:15

mruberry added the Merged label Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changes to enable per channel requant. #37620

Changes to enable per channel requant. #37620

kimishpatel commented Apr 30, 2020 •

edited

dr-ci bot commented Apr 30, 2020 •

edited

dreiss left a comment

dreiss May 14, 2020

dreiss May 14, 2020

kimishpatel May 14, 2020

dreiss May 14, 2020

AshkanAliabadi left a comment

AshkanAliabadi May 14, 2020

kimishpatel May 14, 2020

AshkanAliabadi May 14, 2020

kimishpatel May 14, 2020

AshkanAliabadi May 14, 2020

kimishpatel May 14, 2020

AshkanAliabadi May 15, 2020

kimishpatel May 15, 2020

AshkanAliabadi left a comment

AshkanAliabadi May 15, 2020

kimishpatel May 15, 2020

AshkanAliabadi May 15, 2020

AshkanAliabadi May 15, 2020

kimishpatel May 15, 2020

kimishpatel May 15, 2020

facebook-github-bot commented May 20, 2020

		@@ -505,8 +505,10 @@ at::Tensor PackedConvWeightsQnnp<kSpatialDim>::apply_impl(

		double act_input_scale = act_nhwc.q_scale();

Changes to enable per channel requant. #37620

Changes to enable per channel requant. #37620

Conversation

kimishpatel commented Apr 30, 2020 • edited

dr-ci bot commented Apr 30, 2020 • edited

💊 CI failures summary and remediations

🕵️ 1 new failure recognized by patterns

pytorch_windows_vs2019_py36_cuda10.1_build (1/1)

ci.pytorch.org: 1 failed

dreiss left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AshkanAliabadi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AshkanAliabadi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facebook-github-bot commented May 20, 2020

kimishpatel commented Apr 30, 2020 •

edited

dr-ci bot commented Apr 30, 2020 •

edited