Add operator support for dynamic quant on mobile #32479

supriyar · 2020-01-22T02:21:15Z

Stack from ghstack:

Add operator support for dynamic quant on mobile #32479 Add operator support for dynamic quant on mobile

Summary:
Run dynamic quantization on mobile (similar to FBGEMM). Currently only implemented on linear operator

Test Plan:
python test/test_quantized.py TestDynamicQuantizedLinear.test_qlinear

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: D19542980

Summary: Run dynamic quantization on mobile (similar to FBGEMM). Currently only implemented on linear operator Test Plan: python test/test_quantized.py TestDynamicQuantizedLinear.test_qlinear Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

kostmo · 2020-01-22T03:39:55Z

💊 CircleCI build failures summary and remediations

As of commit febb8b4:

Commit febb8b4 was recently pushed. Waiting for builds...

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

This comment has been revised 19 times.

Summary: Run dynamic quantization on mobile (similar to FBGEMM). Currently only implemented on linear operator Test Plan: python test/test_quantized.py TestDynamicQuantizedLinear.test_qlinear Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Run dynamic quantization on mobile (similar to FBGEMM). Currently only implemented on linear operator Test Plan: python test/test_quantized.py TestDynamicQuantizedLinear.test_qlinear Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: c90df5a Pull Request resolved: #32479

Summary: Run dynamic quantization on mobile (similar to FBGEMM). Currently only implemented on linear operator Test Plan: python test/test_quantized.py TestDynamicQuantizedLinear.test_qlinear Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Run dynamic quantization on mobile (similar to FBGEMM). Currently only implemented on linear operator Test Plan: python test/test_quantized.py TestDynamicQuantizedLinear.test_qlinear Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: edd8aaa Pull Request resolved: #32479

z-a-f

Should we also add benchmarks under r/benchmarks/operator_benchmark/pt/?

aten/src/ATen/native/quantized/cpu/qlinear_dynamic.cpp

z-a-f · 2020-01-22T23:30:49Z

aten/src/ATen/native/quantized/cpu/qlinear_dynamic.cpp

+    double scale =
+        (std::max(max, 0.f) - std::min(min, 0.f)) / ((double)qmax - qmin);
+    if (scale == 0) {
+      scale = 0.1;


Is there a reason for this magic number?

FBGEMM uses this to avoid divide by 0 errors.

The question is why 0.1? Why not 1.0 or qmax-qmin? I don't have preference one way or another, but this case happens only if in all zeros tensors, and I wonder if we should standardize it everywhere?

aten/src/ATen/native/quantized/cpu/qlinear_dynamic.cpp

z-a-f · 2020-01-22T23:41:29Z

aten/src/ATen/native/quantized/cpu/qlinear_dynamic.cpp

+    int32_t nudged_zero_point = 0;
+    if (initial_zero_point < qmin) {
+      nudged_zero_point = qmin;
+    } else if (initial_zero_point > qmax) {
+      nudged_zero_point = qmax;
+    } else {
+      nudged_zero_point = nearbyint(initial_zero_point);
+    }


Can you give an example when this part is necessary? I cannot see the necessity of nudging the ZP if we picked the scale properly.

It is possible for the initial_zero_point computed to be non-integral. let's say min is 5.5 and max is 555.5 - In this case the computed zero point will have to be nudged to get an integral value.

Not really, the nudging cases will only be necessary if we don't have guarantees of zero being representable, but we do (kinda). Here is my logic:

Suppose initial_zero_point = qmin - min / scale. We guarantee that min is either negative or zero, which makes initial_zero_point < qmin to be always false. If we suppose that initial_zero_point = qmax - max / scale, the logic is the same: max is either positive or zero. Hence, initial_zero_point > qmax is always false, so the second condition in the if statement will never be executed.

We can discuss it offline :)

aten/src/ATen/native/quantized/cpu/qlinear_dynamic.cpp

supriyar · 2020-01-23T00:41:49Z

Should we also add benchmarks under r/benchmarks/operator_benchmark/pt/?

It wouldn't run them on mobile, so I don't see the point. Unless you meant to benchmark the mobile operators on server.

Summary: Run dynamic quantization on mobile (similar to FBGEMM). Currently only implemented on linear operator Test Plan: python test/test_quantized.py TestDynamicQuantizedLinear.test_qlinear Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Run dynamic quantization on mobile (similar to FBGEMM). Currently only implemented on linear operator Test Plan: python test/test_quantized.py TestDynamicQuantizedLinear.test_qlinear Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 970e0de Pull Request resolved: #32479

raghuramank100 · 2020-01-23T03:29:17Z

test/test_quantized.py

+                return
+            use_channelwise = False
+            use_multi_dim_input = False
+            use_relu = False


Can we add support for LinearRelu to be consistent with server implementation?
Also, why is there a restriction on multi_dim_input support?

I'll add it in a subsequent PR. We will need to modify the kernels as well to add this.

test/test_quantized.py

Summary: Run dynamic quantization on mobile (similar to FBGEMM). Currently only implemented on linear operator Test Plan: python test/test_quantized.py TestDynamicQuantizedLinear.test_qlinear Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Run dynamic quantization on mobile (similar to FBGEMM). Currently only implemented on linear operator Test Plan: python test/test_quantized.py TestDynamicQuantizedLinear.test_qlinear Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D19542980](https://our.internmc.facebook.com/intern/diff/D19542980) [ghstack-poisoned]

jianyuh

Thanks for working on this!

aten/src/ATen/native/quantized/cpu/qlinear_dynamic.cpp

aten/src/ATen/native/quantized/cpu/quant_utils.h

test/test_quantized.py

jianyuh · 2020-01-23T23:38:01Z

aten/src/ATen/native/quantized/cpu/quant_utils.h

+  int precision;
+};
+
+inline TensorQuantizationParams ChooseQuantizationParams(


Want to point out that ChooseQuantizationParams in FBGEMM (https://github.com/pytorch/FBGEMM/blob/master/src/QuantUtils.cc#L19) is originally from GEMMLowp:
https://github.com/google/gemmlowp/blob/master/doc/quantization_example.cc#L71

Summary: Run dynamic quantization on mobile (similar to FBGEMM). Currently only implemented on linear operator Test Plan: python test/test_quantized.py TestDynamicQuantizedLinear.test_qlinear Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D19542980](https://our.internmc.facebook.com/intern/diff/D19542980) [ghstack-poisoned]

Summary: Run dynamic quantization on mobile (similar to FBGEMM). Currently only implemented on linear operator Test Plan: python test/test_quantized.py TestDynamicQuantizedLinear.test_qlinear Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: a8e4127 Pull Request resolved: #32479

jianyuh

Looks good to me!

aten/src/ATen/native/quantized/cpu/qlinear_dynamic.cpp

Summary: Run dynamic quantization on mobile (similar to FBGEMM). Currently only implemented on linear operator Test Plan: python test/test_quantized.py TestDynamicQuantizedLinear.test_qlinear Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D19542980](https://our.internmc.facebook.com/intern/diff/D19542980) [ghstack-poisoned]

Summary: Run dynamic quantization on mobile (similar to FBGEMM). Currently only implemented on linear operator Test Plan: python test/test_quantized.py TestDynamicQuantizedLinear.test_qlinear Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 622fd9e Pull Request resolved: #32479

Summary: Run dynamic quantization on mobile (similar to FBGEMM). Currently only implemented on linear operator Test Plan: python test/test_quantized.py TestDynamicQuantizedLinear.test_qlinear Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D19542980](https://our.internmc.facebook.com/intern/diff/D19542980) [ghstack-poisoned]

Summary: Run dynamic quantization on mobile (similar to FBGEMM). Currently only implemented on linear operator Test Plan: python test/test_quantized.py TestDynamicQuantizedLinear.test_qlinear Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: f408bbc Pull Request resolved: #32479

facebook-github-bot · 2020-01-25T03:12:06Z

This pull request has been merged in 1695418.

Summary: Pull Request resolved: pytorch#32479 Run dynamic quantization on mobile (similar to FBGEMM). Currently only implemented on linear operator Test Plan: python test/test_quantized.py TestDynamicQuantizedLinear.test_qlinear Imported from OSS Differential Revision: D19542980 fbshipit-source-id: c9f6e5e8ded4d62ae0f2ed99e478c8307dde22ed

z-a-f · 2020-02-01T03:56:07Z

aten/src/ATen/native/quantized/cpu/quant_utils.h

+  int32_t nudged_zero_point = 0;
+  if (initial_zero_point < qmin) {
+    nudged_zero_point = qmin;
+  } else if (initial_zero_point > qmax) {
+    nudged_zero_point = qmax;
+  } else {
+    nudged_zero_point = nearbyint(initial_zero_point);
+  }


Are we still nudging? I thought I explained why this is not necessary

Summary: Pull Request resolved: pytorch#32479 Run dynamic quantization on mobile (similar to FBGEMM). Currently only implemented on linear operator Test Plan: python test/test_quantized.py TestDynamicQuantizedLinear.test_qlinear Imported from OSS Differential Revision: D19542980 fbshipit-source-id: c9f6e5e8ded4d62ae0f2ed99e478c8307dde22ed

supriyar requested review from z-a-f and raghuramank100 January 22, 2020 02:21

z-a-f suggested changes Jan 22, 2020

View reviewed changes

raghuramank100 reviewed Jan 23, 2020

View reviewed changes

aten/src/ATen/native/quantized/cpu/qlinear_dynamic.cpp Outdated Show resolved Hide resolved

supriyar requested a review from z-a-f January 23, 2020 01:11

raghuramank100 reviewed Jan 23, 2020

View reviewed changes

test/test_quantized.py Outdated Show resolved Hide resolved

raghuramank100 requested a review from jianyuh January 23, 2020 03:37

supriyar requested a review from raghuramank100 January 23, 2020 22:41

jianyuh reviewed Jan 23, 2020

View reviewed changes

aten/src/ATen/native/quantized/cpu/qlinear_dynamic.cpp Outdated Show resolved Hide resolved

aten/src/ATen/native/quantized/cpu/quant_utils.h Outdated Show resolved Hide resolved

test/test_quantized.py Outdated Show resolved Hide resolved

test/test_quantized.py Show resolved Hide resolved

jianyuh reviewed Jan 23, 2020

View reviewed changes

supriyar requested a review from jianyuh January 24, 2020 04:28

jianyuh approved these changes Jan 24, 2020

View reviewed changes

aten/src/ATen/native/quantized/cpu/qlinear_dynamic.cpp Show resolved Hide resolved

facebook-github-bot closed this in 1695418 Jan 25, 2020

facebook-github-bot added the merged label Jan 25, 2020

facebook-github-bot deleted the gh/supriyar/52/head branch January 28, 2020 15:17

z-a-f reviewed Feb 1, 2020

View reviewed changes

jianyuh mentioned this pull request Feb 5, 2020

Dynamic Linear quantization fails with engine 'qnnpack when trying to create a quantized based model for use on an Android mobile device #32934

Closed

mruberry added the Merged label Oct 28, 2020

Add operator support for dynamic quant on mobile #32479

Add operator support for dynamic quant on mobile #32479

Uh oh!

Conversation

supriyar commented Jan 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kostmo commented Jan 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CircleCI build failures summary and remediations

Uh oh!

z-a-f left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

supriyar commented Jan 23, 2020

Uh oh!

raghuramank100 Jan 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jianyuh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jianyuh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

facebook-github-bot commented Jan 25, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

supriyar commented Jan 22, 2020 •

edited

Loading

kostmo commented Jan 22, 2020 •

edited

Loading

raghuramank100 Jan 23, 2020 •

edited

Loading