-
Notifications
You must be signed in to change notification settings - Fork 25.4k
Add operator support for dynamic quant on mobile #32479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary: Run dynamic quantization on mobile (similar to FBGEMM). Currently only implemented on linear operator Test Plan: python test/test_quantized.py TestDynamicQuantizedLinear.test_qlinear Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
💊 CircleCI build failures summary and remediationsAs of commit febb8b4: Commit febb8b4 was recently pushed. Waiting for builds... This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker. This comment has been revised 19 times. |
Summary: Run dynamic quantization on mobile (similar to FBGEMM). Currently only implemented on linear operator Test Plan: python test/test_quantized.py TestDynamicQuantizedLinear.test_qlinear Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Run dynamic quantization on mobile (similar to FBGEMM). Currently only implemented on linear operator Test Plan: python test/test_quantized.py TestDynamicQuantizedLinear.test_qlinear Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also add benchmarks under r/benchmarks/operator_benchmark/pt/
?
double scale = | ||
(std::max(max, 0.f) - std::min(min, 0.f)) / ((double)qmax - qmin); | ||
if (scale == 0) { | ||
scale = 0.1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason for this magic number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FBGEMM uses this to avoid divide by 0 errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question is why 0.1? Why not 1.0 or qmax-qmin
? I don't have preference one way or another, but this case happens only if in all zeros tensors, and I wonder if we should standardize it everywhere?
int32_t nudged_zero_point = 0; | ||
if (initial_zero_point < qmin) { | ||
nudged_zero_point = qmin; | ||
} else if (initial_zero_point > qmax) { | ||
nudged_zero_point = qmax; | ||
} else { | ||
nudged_zero_point = nearbyint(initial_zero_point); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give an example when this part is necessary? I cannot see the necessity of nudging the ZP if we picked the scale properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible for the initial_zero_point computed to be non-integral. let's say min is 5.5 and max is 555.5 - In this case the computed zero point will have to be nudged to get an integral value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, the nudging cases will only be necessary if we don't have guarantees of zero being representable, but we do (kinda). Here is my logic:
Suppose initial_zero_point = qmin - min / scale
. We guarantee that min
is either negative or zero, which makes initial_zero_point < qmin
to be always false. If we suppose that initial_zero_point = qmax - max / scale
, the logic is the same: max
is either positive or zero. Hence, initial_zero_point > qmax
is always false, so the second condition in the if statement will never be executed.
We can discuss it offline :)
It wouldn't run them on mobile, so I don't see the point. Unless you meant to benchmark the mobile operators on server. |
Summary: Run dynamic quantization on mobile (similar to FBGEMM). Currently only implemented on linear operator Test Plan: python test/test_quantized.py TestDynamicQuantizedLinear.test_qlinear Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
return | ||
use_channelwise = False | ||
use_multi_dim_input = False | ||
use_relu = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add support for LinearRelu to be consistent with server implementation?
Also, why is there a restriction on multi_dim_input support?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add it in a subsequent PR. We will need to modify the kernels as well to add this.
Summary: Run dynamic quantization on mobile (similar to FBGEMM). Currently only implemented on linear operator Test Plan: python test/test_quantized.py TestDynamicQuantizedLinear.test_qlinear Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Run dynamic quantization on mobile (similar to FBGEMM). Currently only implemented on linear operator Test Plan: python test/test_quantized.py TestDynamicQuantizedLinear.test_qlinear Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D19542980](https://our.internmc.facebook.com/intern/diff/D19542980) [ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this!
int precision; | ||
}; | ||
|
||
inline TensorQuantizationParams ChooseQuantizationParams( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Want to point out that ChooseQuantizationParams in FBGEMM (https://github.com/pytorch/FBGEMM/blob/master/src/QuantUtils.cc#L19) is originally from GEMMLowp:
https://github.com/google/gemmlowp/blob/master/doc/quantization_example.cc#L71
Summary: Run dynamic quantization on mobile (similar to FBGEMM). Currently only implemented on linear operator Test Plan: python test/test_quantized.py TestDynamicQuantizedLinear.test_qlinear Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D19542980](https://our.internmc.facebook.com/intern/diff/D19542980) [ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
Summary: Run dynamic quantization on mobile (similar to FBGEMM). Currently only implemented on linear operator Test Plan: python test/test_quantized.py TestDynamicQuantizedLinear.test_qlinear Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D19542980](https://our.internmc.facebook.com/intern/diff/D19542980) [ghstack-poisoned]
Summary: Run dynamic quantization on mobile (similar to FBGEMM). Currently only implemented on linear operator Test Plan: python test/test_quantized.py TestDynamicQuantizedLinear.test_qlinear Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D19542980](https://our.internmc.facebook.com/intern/diff/D19542980) [ghstack-poisoned]
Summary: Run dynamic quantization on mobile (similar to FBGEMM). Currently only implemented on linear operator Test Plan: python test/test_quantized.py TestDynamicQuantizedLinear.test_qlinear Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D19542980](https://our.internmc.facebook.com/intern/diff/D19542980) [ghstack-poisoned]
This pull request has been merged in 1695418. |
Summary: Pull Request resolved: pytorch#32479 Run dynamic quantization on mobile (similar to FBGEMM). Currently only implemented on linear operator Test Plan: python test/test_quantized.py TestDynamicQuantizedLinear.test_qlinear Imported from OSS Differential Revision: D19542980 fbshipit-source-id: c9f6e5e8ded4d62ae0f2ed99e478c8307dde22ed
int32_t nudged_zero_point = 0; | ||
if (initial_zero_point < qmin) { | ||
nudged_zero_point = qmin; | ||
} else if (initial_zero_point > qmax) { | ||
nudged_zero_point = qmax; | ||
} else { | ||
nudged_zero_point = nearbyint(initial_zero_point); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we still nudging? I thought I explained why this is not necessary
Summary: Pull Request resolved: pytorch#32479 Run dynamic quantization on mobile (similar to FBGEMM). Currently only implemented on linear operator Test Plan: python test/test_quantized.py TestDynamicQuantizedLinear.test_qlinear Imported from OSS Differential Revision: D19542980 fbshipit-source-id: c9f6e5e8ded4d62ae0f2ed99e478c8307dde22ed
Stack from ghstack:
Summary:
Run dynamic quantization on mobile (similar to FBGEMM). Currently only implemented on linear operator
Test Plan:
python test/test_quantized.py TestDynamicQuantizedLinear.test_qlinear
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: D19542980