quantization support in onnx #1872

linkerzhang · 2019-03-18T18:54:03Z

ONNX quanitzation support:
Requirements:

Interoperability MUST be ensured.
ONLY widely accepted quantization schema can be standardized in ONNX. In this design, 8 bits linear (scale/zero_point) quantization will be standardized.
Customized quantization schema should be allowed.
ONNX should be able to represent customized quantization schemas (the schema hasn’t been standardized in ONNX yet) with a subgraph consisting of primitive operators.
All ONNX operators must define a mathematical function of the following form:
outputs = OP(inputs, attrs)
It means the data needed for mathematical calculation defined by an op must be either an input or an attribute.
Enable both static and dynamic quantization.
Quantization parameters used in defining an op will be defined as inputs/outputs. Static quantization will be a special case of dynamic one, where the quantization parameter inputs are from either initializers or constant nodes.
NOTE: as a best practice, weights in an inference model should be statically quantized.
Support model verification for static quantization models. The verification includes,
a. Same tensor should have same real-value representation.
If they use same static quantization parameters, then this can be ensured.
b. Any other kind of quantization parameters’ value check before sending a model to a hardware vendor.

Goals of this design/PR:

Add a small set of operators to standardize 8 bits linear (scale/zero_point) quantization.
QLinearMatMul/QLinearConv are added in this PR, more ops may be added in separate PRs as needed later.
Add a small set of operators to further enable ONNX to represent other quantization schemas.
MatmulInteger/ConvInteger are added in this PR, more ops may be added in separate PRs as needed later.
Add quantization information as model level annotation for easy model verification.

linkerzhang · 2019-03-18T18:55:20Z

Test cases for all ops will be added soon.

diyessi

I added a couple wording changes to make it easier to read.
These quantization additions look like what we agreed to at Intel.

diyessi · 2019-03-20T22:50:48Z

onnx/defs/quantization/defs.cc

+namespace ONNX_NAMESPACE {
+
+static const char* QuantizeLinear_ver10_doc = R"DOC(
+The linear quantization operator. It consumes a high precision tensor, a scale, a zero point and computes the low precision / quantized tensor.


Change to and a zero point to compute the ...

diyessi · 2019-03-20T22:53:57Z

onnx/defs/quantization/defs.cc

+            }));
+
+static const char* DequantizeLinear_ver10_doc = R"DOC(
+The linear dequantization operator. It consumes a quantized tensor, a scale, a zero point and computes the full precision tensor.


Change to and a zero point to compute the ...

darrenscrews

I looked at the PR, it matches what we reviewed in the document so I'm good for a sign off.

…han/quantization_support_in_onnx

raghuramank100 · 2019-03-27T06:26:42Z

onnx/defs/quantization/defs.cc

+        .Input(
+            2,
+            "y_zero_point",
+            "Zero point for doing quantization to get 'y'. It's a scalar, which means a per-tensor/layer quantization.",


Better to have y_zero_point as optional in the quantizer if it is optional in the dequantizer so that we are consistent.

Sounds good.

raghuramank100 · 2019-03-27T06:29:37Z

onnx/defs/math/defs.cc

+            2,
+            "a_zero_point",
+            "Zero point tensor for input 'A'. It's optional and default value is 0. It could be a scalar or a 1-D tensor, "
+            "which means a per-tensor or per-row quantization. If it's a 1-D tensor, its number of elements "


We should explicitly pass the axis as an argument here to be consistent with the definition of per-row/per-column/per-channel quantization.

I don't think that we need this flexibility, since for per-channel, it has to be per output channel (for conv) for weights. For matmul, if it's not per tensor, per row for the first and per-column for the 2nd is a good one, but per-column for the 1st input and per-row for 2nd input is not that useful (math is not that straight-forward). So I'd suggest to keep this specification.

I see what your saying but it might be good to be consistent across the spec. For Quantize_Linear, we specify axis as attribute:
.Attr(
"axis",
"The axis along which same quantization parameters are applied. It's optional. If it's not specified, it means per-tensor quantization and input 'x_scale' and 'x_zero_point' must be scalars. If it's specified, it means per 'axis' quantization and input 'x_scale' and 'x_zero_point' must be 1-D tensors.",
AttributeProto::INT,
false)

Might be good to make this consistent across all ops. We could still enforce that the op itself only supports per-row for the first tensor and per column for the second one.

My bad. "axis" for quant should be removed.

raghuramank100 · 2019-03-27T06:32:44Z

onnx/defs/nn/defs.cc

+        .Input(
+            1,
+            "x_scale",
+            "Scale tensor for input 'x'. It's a scalar, which means a per-tensor/layer quantization.",


Lets explicitly provide the axis as a parameter here. For example if I have 3x3x3x32 (3 input channels, 3x3 kernel and 32 output channels), I would specify axis=3 and have a vector of length 32 to specify the per-channel scale and zero-point

for conv, the most frequent used one is per-layer (per-tensor), or per output channel (for weights). I don't see much usage about other cases. I'd suggest to keep the spec and extend it when we need the other cases (flexibility). Sounds good?

linkerzhang · 2019-03-29T04:14:52Z

am going to have all test cases covered in separate PR soon.

linkerzhang · 2019-03-29T19:52:18Z

@raghuramank100 I'm going to check it in today. Please let me know if you have more comments.

@spandantiwari please help to drive the pytorch test failure. I'm not taking the failure as a check-in blocker.

Thank you!

raghuramank100 · 2019-03-29T20:00:22Z

onnx/defs/quantization/defs.cc

+    DequantizeLinear,
+    10,
+    OpSchema()
+        .Input(0, "x", "N-D quantized input tensor to be de-quantized.", "T")


Should we also have an axis argument, like we have with the quantizer?

clarified above, "axis" in quant should be removed. given only "weights" has per output channel quant (which is static).

Well it's certainly simpler to implement DequantizeLinear with zero-point and scale being scalars. We originally included the axis attribute because it was a compromise between space savings and accuracy, and with channel-wise weight packing, @youngkim93 noticed significant accuracy improvements across the various image recognition models. Are you dropping it because QLinearConv doesn't have it? (granted, getting QLinearConv to support it would greatly complicate things) This should be compatible with our current code because the axis was optional anyway.

ke1337 · 2019-04-01T21:22:30Z

onnx/defs/math/defs.cc

+        .Output(0, "Y", "Matrix multiply results from A * B", "T3")
+        .TypeConstraint(
+            "T1",
+            {"tensor(int8)", "tensor(uint8)"},


Please consider support int16 as well. There's cblas_gemm_s16s16s32 for int16 inputs to generate int32 outputs.

@darrenscrews is going to make that (more types support) in separate PR.

houseroad · 2019-04-03T21:54:06Z

CI side is okay. Waiting @raghuramank100 for the final approval :-)

linkerzhang · 2019-04-04T23:32:02Z

as clarified in mail thread, the "adding axis" comment from @raghuramank100 may be added when we see use cases and want to support per axis quantization for activations. I'm merging this PR now. @raghuramank100 please feel free to share more comments if any. we can keep tuning it.

fdwr · 2019-04-04T23:32:07Z

onnx/defs/quantization/defs.cc

+static const char* QuantizeLinear_ver10_doc = R"DOC(
+The linear per-tensor/layer quantization operator. It consumes a high precision tensor, a scale, a zero point to compute the low precision / quantized tensor.
+The quantization formula is y = saturate ((x / y_scale) + y_zero_point). For saturation, it saturates to [0, 255] if it's uint8, or [-128, 127] if it's int8.
+For (x / y_scale), it's rounding to nearest ties to even. Refer to https://en.wikipedia.org/wiki/Rounding for details. 'y_zero_point' and 'y' must have same type.


Guessing this was input from Intel to change from round-evens-away-from-zero (which was our earlier choice, which was TensorFlow's default and C++ std::round's default)?

houseroad · 2019-04-05T00:20:18Z

onnx/defs/nn/defs.cc

+    bool require_kernel_shape,
+    int input1Idx,
+    int input2Idx) {
+  // propagateElemTypeFromInputToOutput(ctx, 0, 0);


what is this?

houseroad · 2019-04-05T00:44:02Z

onnx/defs/nn/defs.cc

@@ -872,6 +893,298 @@ ONNX_OPERATOR_SET_SCHEMA(
    1,
    OpSchema().FillUsing(ConvOpSchemaGenerator("a filter")));

+static const char* QLinearConv_ver10_doc = R"DOC(
+The convolution operator consumes a quantized input tensor, its scale and zero point,
+a quantized filter, its scale and zero point, and output�s scale and zero point,


output's is non-utf8 character.

houseroad

This diff didn't correctly register ops in the operator set and contains non-utf8 character... needs more polish.

This reverts commit 4cfa542.

houseroad · 2019-04-05T00:46:30Z

The merged version is prematured. Please address my comments and resubmit the PR. Thanks

This reverts commit 4cfa542.

* add quantized ops * shape inference for matmul ops. * update * add quantization parameter annotation in graph proto. * update IR version * update proto files * update operator.md * update * fix build break. * fix build break. * sync and resolve comments * revert the change by mistake. * fix shape inference test failure. * fix comments

This reverts commit 4cfa542.

* add quantized ops * shape inference for matmul ops. * update * add quantization parameter annotation in graph proto. * update IR version * update proto files * update operator.md * update * fix build break. * fix build break. * sync and resolve comments * revert the change by mistake. * fix shape inference test failure. * fix comments

This reverts commit 4cfa542.

linkerzhang added 8 commits March 18, 2019 10:09

add quantized ops

723d0cd

shape inference for matmul ops.

8af20c7

update

3c6abe5

add quantization parameter annotation in graph proto.

72ed576

update IR version

1518ad3

update proto files

10cc21d

update operator.md

565af21

Merge branch 'master' into kezhan/quantization_support_in_onnx

26b487d

linkerzhang requested a review from gramalingam March 18, 2019 18:54

linkerzhang added 3 commits March 18, 2019 15:39

update

9d025bc

fix build break.

ffeae08

fix build break.

377bcb0

diyessi approved these changes Mar 20, 2019

View reviewed changes

darrenscrews approved these changes Mar 22, 2019

View reviewed changes

postrational approved these changes Mar 25, 2019

View reviewed changes

linkerzhang added 2 commits March 26, 2019 01:24

Merge branch 'master' of https://github.com/linkerzhang/onnx into kez…

9eddbfe

…han/quantization_support_in_onnx

sync and resolve comments

841f985

raghuramank100 reviewed Mar 27, 2019

View reviewed changes

linkerzhang added 2 commits March 27, 2019 19:12

sync, resolve conflicts and fix comments.

1e313b2

Merge branch 'master' into kezhan/quantization_support_in_onnx

90f24b9

ebarsoum approved these changes Mar 29, 2019

View reviewed changes

linkerzhang added 2 commits March 28, 2019 21:40

revert the change by mistake.

cf1057b

fix shape inference test failure.

ed39e87

raghuramank100 reviewed Mar 29, 2019

View reviewed changes

fix comments

4b60537

shinh mentioned this pull request Apr 1, 2019

Quantized ops pfnet-research/chainer-compiler#104

Closed

ke1337 reviewed Apr 1, 2019

View reviewed changes

Merge branch 'master' into kezhan/quantization_support_in_onnx

c24de8b

Merge branch 'master' into kezhan/quantization_support_in_onnx

bda7f09

linkerzhang mentioned this pull request Apr 4, 2019

add quantization ops in onnx #1908

Merged

fdwr reviewed Apr 4, 2019

View reviewed changes

Merge branch 'master' into kezhan/quantization_support_in_onnx

2195af0

linkerzhang merged commit 4cfa542 into onnx:master Apr 4, 2019

houseroad reviewed Apr 5, 2019

View reviewed changes

houseroad added a commit that referenced this pull request Apr 5, 2019

Revert "quantization support in onnx (#1872)"

5bc5ce9

This reverts commit 4cfa542.

houseroad mentioned this pull request Apr 5, 2019

Revert "quantization support in onnx" #1911

Merged

houseroad added a commit that referenced this pull request Apr 5, 2019

Revert "quantization support in onnx (#1872)" (#1911)

8121c73

This reverts commit 4cfa542.

hariharans29 pushed a commit to hariharans29/onnx that referenced this pull request Aug 15, 2019

Revert "quantization support in onnx (onnx#1872)" (onnx#1911)

ff9442f

This reverts commit 4cfa542.

jcwchen pushed a commit to jcwchen/onnx that referenced this pull request Sep 23, 2020

Revert "quantization support in onnx (onnx#1872)" (onnx#1911)

7037ffe

This reverts commit 4cfa542.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quantization support in onnx #1872

quantization support in onnx #1872

linkerzhang commented Mar 18, 2019 •

edited

Loading

linkerzhang commented Mar 18, 2019

diyessi left a comment

diyessi Mar 20, 2019

diyessi Mar 20, 2019

darrenscrews left a comment

raghuramank100 Mar 27, 2019

linkerzhang Mar 28, 2019

raghuramank100 Mar 27, 2019

linkerzhang Mar 28, 2019

raghuramank100 Mar 29, 2019

linkerzhang Mar 29, 2019

raghuramank100 Mar 27, 2019

linkerzhang Mar 28, 2019

linkerzhang commented Mar 29, 2019

linkerzhang commented Mar 29, 2019

raghuramank100 Mar 29, 2019

linkerzhang Mar 29, 2019

fdwr Apr 4, 2019

ke1337 Apr 1, 2019

linkerzhang Apr 2, 2019

houseroad commented Apr 3, 2019

linkerzhang commented Apr 4, 2019

fdwr Apr 4, 2019

houseroad Apr 5, 2019

houseroad Apr 5, 2019

houseroad left a comment

houseroad commented Apr 5, 2019

quantization support in onnx #1872

quantization support in onnx #1872

Conversation

linkerzhang commented Mar 18, 2019 • edited Loading

linkerzhang commented Mar 18, 2019

diyessi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

darrenscrews left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

linkerzhang commented Mar 29, 2019

linkerzhang commented Mar 29, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

houseroad commented Apr 3, 2019

linkerzhang commented Apr 4, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

houseroad left a comment

Choose a reason for hiding this comment

houseroad commented Apr 5, 2019

linkerzhang commented Mar 18, 2019 •

edited

Loading