-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[AOTI] add C shim for QConvPointWise #138540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138540
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit 8177bed with merge base 07b0d63 ( BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the APIs definitely felt clunky, when I worked on adding ABI compatibility support for them. Thanks for making them better.
|
||
// Conv2D with binary postop | ||
m.def(TORCH_SELECTIVE_SCHEMA("onednn::qconv2d_pointwise.binary(Tensor qx, float x_scale, int x_zero_point, Tensor qaccum, float accum_scale, int accum_zero_point, Tensor qw, Tensor w_scale, Tensor w_zero_point, Tensor? bias, int[] stride, int[] padding, int[] dilation, int groups, float output_scale, int output_zero_point, ScalarType? output_dtype, str binary_attr, Scalar? alpha, str? unary_attr, Scalar?[] unary_scalars, str? unary_algorithm) -> Tensor")); | ||
m.def(TORCH_SELECTIVE_SCHEMA("onednn::qconv2d_pointwise.binary(Tensor qx, float x_scale, int x_zero_point, Tensor qw, Tensor w_scale, Tensor w_zero_point, Tensor qaccum, Tensor? bias, int[] stride, int[] padding, int[] dilation, int groups, float output_scale, int output_zero_point, ScalarType? output_dtype, float accum_scale, int accum_zero_point, str binary_attr, Scalar? alpha, str? unary_attr, Scalar?[] unary_scalars, str? unary_algorithm) -> Tensor")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this safe to directly change this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess my question is, what is the BC policy for onednn ops?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option is to add something like a v2 version of this OP to avoid the BC-breaking.
Regarding the qconv2d_pointwise
, since the cpu quantization support in inductor is still a prototype feature, I guess it should be fine to make API change since it's not yet stable. May I know if the current way looks fine to you or it's more preferred to add a v2 version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is an ondnn specific op, I am ok with your decision.
ghstack-source-id: ae23a41 Pull Request resolved: pytorch#138540
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
This PR adds C shim for `QConvPointWisePT2E` and `QConvPointWiseBinaryPT2E` similar to pytorch#138439. Besides that, we aligned the implementation of `qconv_pointwise` with `qlinear_pointwise` in the following aspects: 1. The parameter order of `qconv_pointwise` and `qlinear_pointwise` are quite different, we aligned the schema of `qconv_pointwise` to have similar parameter order as `qlinear_pointwise` to make it more consistent. 2. We always converted `x_scale` and `x_zero_point` to Tensors, just like in the lowering of `qlinear_pointwise`. This avoids the need to create two separate C APIs (one for `double x_scale` and `int64_t x_zero_point`, and another for `Tensor` versions). Instead, we only need one API for `Tensor`-based `x_scale` and `x_zero_point`. If we later add dynamic quantization for qconv (which will use `Tensor` for `x_scale` and `x_zero_point`), we can reuse the code from this PR and don't need to change the C shim layer API. Pull Request resolved: pytorch#138540 Approved by: https://github.com/jgong5, https://github.com/desertfire ghstack dependencies: pytorch#138691, pytorch#138806
Stack from ghstack (oldest at bottom):
len(serialized_weights)
when calculatingconsts_size
#139054This PR adds C shim for
QConvPointWisePT2E
andQConvPointWiseBinaryPT2E
similar to #138439. Besides that, we aligned the implementation ofqconv_pointwise
withqlinear_pointwise
in the following aspects:qconv_pointwise
andqlinear_pointwise
are quite different, we aligned the schema ofqconv_pointwise
to have similar parameter order asqlinear_pointwise
to make it more consistent.x_scale
andx_zero_point
to Tensors, just like in the lowering ofqlinear_pointwise
. This avoids the need to create two separate C APIs (one fordouble x_scale
andint64_t x_zero_point
, and another forTensor
versions). Instead, we only need one API forTensor
-basedx_scale
andx_zero_point
. If we later add dynamic quantization for qconv (which will useTensor
forx_scale
andx_zero_point
), we can reuse the code from this PR and don't need to change the C shim layer API.cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov