[Quant][PT2E] Enable qconv for quantization 2.0 export (#104580)

**Summary** Enable `qconv1d/2d/3d`, `qconv2d_relu`, `qconv2d_add`, and `qconv2d_add_relu` operator for quantization 2.0 export with oneDNN library. **Test Plan** ``` python -u -m pytest -s -v test_quantized_op.py -k test_qconv1d_pt2e python -u -m pytest -s -v test_quantized_op.py -k test_qconv2d_pt2e python -u -m pytest -s -v test_quantized_op.py -k test_qconv3d_pt2e python -u -m pytest -s -v test_quantized_op.py -k test_qconv2d_relu_pt2e python -u -m pytest -s -v test_quantized_op.py -k test_qconv2d_add_pt2e python -u -m pytest -s -v test_quantized_op.py -k test_qconv2d_add_relu_pt2e ``` Pull Request resolved: #104580 Approved by: https://github.com/jgong5, https://github.com/jerryzh168
pytorch · Aug 25, 2023 · 8ef0572 · 8ef0572
1 parent 679e8e9
commit 8ef0572
Show file tree

Hide file tree

Showing 5 changed files with 1,093 additions and 0 deletions.
diff --git a/aten/src/ATen/native/quantized/cpu/OnednnUtils.h b/aten/src/ATen/native/quantized/cpu/OnednnUtils.h
@@ -379,4 +379,40 @@ static bool should_use_onednn_quant(
 
 } // onednn_utils
 
+at::Tensor _qconv_prepack_onednn(
+    at::Tensor weight, // from CPU backend instead of QuantizedCPU
+    at::Tensor weight_scales, // Weight zero points must be 0 for onednn
+    double input_scale,
+    int64_t input_zero_point,
+    torch::List<int64_t> stride,
+    torch::List<int64_t> padding,
+    torch::List<int64_t> dilation,
+    int64_t groups,
+    c10::optional<torch::List<int64_t>> input_shape=c10::nullopt);
+
+static at::Tensor _quantized_convolution_onednn(
+    at::Tensor act, // contains quantized values but not QTensor
+    double act_scale,
+    int64_t act_zero_point,
+    at::Tensor weight, // MKLDNN tensor with quantized values
+    at::Tensor weight_scales,
+    at::Tensor weight_zero_points,
+    c10::optional<at::Tensor> bias, // Bias is packed if not None
+    torch::List<int64_t> stride,
+    torch::List<int64_t> padding,
+    torch::List<int64_t> dilation,
+    bool transposed,
+    int64_t groups,
+    double inv_output_scale,
+    int64_t output_zero_point,
+    c10::optional<at::Tensor> accum=c10::nullopt, // accum to fused with conv add
+    double accum_scale=1.0,
+    int64_t accum_zero_point=0,
+    bool fp32_output=false,
+    c10::optional<c10::string_view> binary_attr=c10::nullopt,
+    c10::optional<at::Scalar> binary_alpha=c10::nullopt,
+    c10::optional<c10::string_view> unary_attr=c10::nullopt,
+    torch::List<c10::optional<at::Scalar>> unary_scalars=torch::List<c10::optional<at::Scalar>>(),
+    c10::optional<c10::string_view> unary_algorithm=c10::nullopt);
+
 #endif // #if AT_MKLDNN_ENABLED()