Skip to content

Commit

Permalink
add test case for qlineaerconv
Browse files Browse the repository at this point in the history
Signed-off-by: Xavier Dupre <xadupre@microsoft.com>
  • Loading branch information
xadupre committed Aug 4, 2023
1 parent efbc9fc commit 51a0ac8
Show file tree
Hide file tree
Showing 86 changed files with 603 additions and 307 deletions.
59 changes: 59 additions & 0 deletions docs/Changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -24092,6 +24092,65 @@ This version of the operator has been available since version 20 of the default
<dd>Constrain grid types to float tensors.</dd>
</dl>

### <a name="QLinearMatMul-20"></a>**QLinearMatMul-20**</a>

Matrix product that behaves like numpy.matmul: https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.matmul.html.
It consumes two quantized input tensors, their scales and zero points, scale and zero point of output,
and computes the quantized output. The quantization formula is y = saturate((x / y_scale) + y_zero_point).
For (x / y_scale), it is rounding to nearest ties to even. Refer to https://en.wikipedia.org/wiki/Rounding for details.
Scale and zero point must have same shape. They must be either scalar (per tensor) or N-D tensor
(per row for 'a' and per column for 'b'). Scalar refers to per tensor quantization whereas N-D refers to per row
or per column quantization. If the input is 2D of shape [M, K] then zero point and scale tensor may be
an M element vector [v_1, v_2, ..., v_M] for per row quantization and K element vector of shape [v_1, v_2, ..., v_K]
for per column quantization. If the input is N-D tensor with shape [D1, D2, M, K] then zero point and scale tensor may
have shape [D1, D2, M, 1] for per row quantization and shape [D1, D2, 1, K] for per column quantization.
Production must never overflow, and accumulation may overflow if and only if in 32 bits.

#### Version

This version of the operator has been available since version 20 of the default ONNX operator set.

#### Inputs

<dl>
<dt><tt>a</tt> (non-differentiable) : T1</dt>
<dd>N-dimensional quantized matrix a</dd>
<dt><tt>a_scale</tt> (non-differentiable) : TS</dt>
<dd>scale of quantized input a</dd>
<dt><tt>a_zero_point</tt> (non-differentiable) : T1</dt>
<dd>zero point of quantized input a</dd>
<dt><tt>b</tt> (non-differentiable) : T2</dt>
<dd>N-dimensional quantized matrix b</dd>
<dt><tt>b_scale</tt> (non-differentiable) : TS</dt>
<dd>scale of quantized input b</dd>
<dt><tt>b_zero_point</tt> (non-differentiable) : T2</dt>
<dd>zero point of quantized input b</dd>
<dt><tt>y_scale</tt> (non-differentiable) : TS</dt>
<dd>scale of quantized output y</dd>
<dt><tt>y_zero_point</tt> (non-differentiable) : T3</dt>
<dd>zero point of quantized output y</dd>
</dl>

#### Outputs

<dl>
<dt><tt>y</tt> (non-differentiable) : T3</dt>
<dd>Quantized matrix multiply results from a * b</dd>
</dl>

#### Type Constraints

<dl>
<dt><tt>TS</tt> : tensor(float), tensor(float16), tensor(bfloat16)</dt>
<dd>Constrain scales.</dd>
<dt><tt>T1</tt> : tensor(int8), tensor(uint8), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz)</dt>
<dd>Constrain input a and its zero point data type to 8-bit integer tensor.</dd>
<dt><tt>T2</tt> : tensor(int8), tensor(uint8), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz)</dt>
<dd>Constrain input b and its zero point data type to 8-bit integer tensor.</dd>
<dt><tt>T3</tt> : tensor(int8), tensor(uint8), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz)</dt>
<dd>Constrain output y and its zero point data type to 8-bit integer tensor.</dd>
</dl>

### <a name="StringConcat-20"></a>**StringConcat-20**</a>

StringConcat concatenates string tensors elementwise (with NumPy-style broadcasting support)
Expand Down
242 changes: 140 additions & 102 deletions docs/Operators.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ For an operator input/output's differentiability, it can be differentiable,
|<a href="#Pad">Pad</a>|<a href="Changelog.md#Pad-19">19</a>, <a href="Changelog.md#Pad-18">18</a>, <a href="Changelog.md#Pad-13">13</a>, <a href="Changelog.md#Pad-11">11</a>, <a href="Changelog.md#Pad-2">2</a>, <a href="Changelog.md#Pad-1">1</a>|
|<a href="#Pow">Pow</a>|<a href="Changelog.md#Pow-15">15</a>, <a href="Changelog.md#Pow-13">13</a>, <a href="Changelog.md#Pow-12">12</a>, <a href="Changelog.md#Pow-7">7</a>, <a href="Changelog.md#Pow-1">1</a>|
|<a href="#QLinearConv">QLinearConv</a>|<a href="Changelog.md#QLinearConv-10">10</a>|
|<a href="#QLinearMatMul">QLinearMatMul</a>|<a href="Changelog.md#QLinearMatMul-10">10</a>|
|<a href="#QLinearMatMul">QLinearMatMul</a>|<a href="Changelog.md#QLinearMatMul-20">20</a>, <a href="Changelog.md#QLinearMatMul-10">10</a>|
|<a href="#QuantizeLinear">QuantizeLinear</a>|<a href="Changelog.md#QuantizeLinear-19">19</a>, <a href="Changelog.md#QuantizeLinear-13">13</a>, <a href="Changelog.md#QuantizeLinear-10">10</a>|
|<a href="#RNN">RNN</a>|<a href="Changelog.md#RNN-14">14</a>, <a href="Changelog.md#RNN-7">7</a>, <a href="Changelog.md#RNN-1">1</a>|
|<a href="#RandomNormal">RandomNormal</a>|<a href="Changelog.md#RandomNormal-1">1</a>|
Expand Down Expand Up @@ -19355,24 +19355,26 @@ expect(

#### Version

This version of the operator has been available since version 10 of the default ONNX operator set.
This version of the operator has been available since version 20 of the default ONNX operator set.

Other versions of this operator: <a href="Changelog.md#QLinearMatMul-10">10</a>

#### Inputs

<dl>
<dt><tt>a</tt> (non-differentiable) : T1</dt>
<dd>N-dimensional quantized matrix a</dd>
<dt><tt>a_scale</tt> (non-differentiable) : tensor(float)</dt>
<dt><tt>a_scale</tt> (non-differentiable) : TS</dt>
<dd>scale of quantized input a</dd>
<dt><tt>a_zero_point</tt> (non-differentiable) : T1</dt>
<dd>zero point of quantized input a</dd>
<dt><tt>b</tt> (non-differentiable) : T2</dt>
<dd>N-dimensional quantized matrix b</dd>
<dt><tt>b_scale</tt> (non-differentiable) : tensor(float)</dt>
<dt><tt>b_scale</tt> (non-differentiable) : TS</dt>
<dd>scale of quantized input b</dd>
<dt><tt>b_zero_point</tt> (non-differentiable) : T2</dt>
<dd>zero point of quantized input b</dd>
<dt><tt>y_scale</tt> (non-differentiable) : tensor(float)</dt>
<dt><tt>y_scale</tt> (non-differentiable) : TS</dt>
<dd>scale of quantized output y</dd>
<dt><tt>y_zero_point</tt> (non-differentiable) : T3</dt>
<dd>zero point of quantized output y</dd>
Expand All @@ -19388,11 +19390,13 @@ This version of the operator has been available since version 10 of the default
#### Type Constraints

<dl>
<dt><tt>T1</tt> : tensor(int8), tensor(uint8)</dt>
<dt><tt>TS</tt> : tensor(float), tensor(float16), tensor(bfloat16)</dt>
<dd>Constrain scales.</dd>
<dt><tt>T1</tt> : tensor(int8), tensor(uint8), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz)</dt>
<dd>Constrain input a and its zero point data type to 8-bit integer tensor.</dd>
<dt><tt>T2</tt> : tensor(int8), tensor(uint8)</dt>
<dt><tt>T2</tt> : tensor(int8), tensor(uint8), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz)</dt>
<dd>Constrain input b and its zero point data type to 8-bit integer tensor.</dd>
<dt><tt>T3</tt> : tensor(int8), tensor(uint8)</dt>
<dt><tt>T3</tt> : tensor(int8), tensor(uint8), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz)</dt>
<dd>Constrain output y and its zero point data type to 8-bit integer tensor.</dd>
</dl>

Expand All @@ -19403,114 +19407,148 @@ This version of the operator has been available since version 10 of the default
<summary>qlinearmatmul</summary>

```python
node = onnx.helper.make_node(
"QLinearMatMul",
inputs=[
"a",
"a_scale",
"a_zero_point",
"b",
"b_scale",
"b_zero_point",
"y_scale",
"y_zero_point",
],
outputs=["y"],
)
for quant_type_name in ["uint8", "int8"]:
quant_type = getattr(np, quant_type_name)
for dtype_name in ["float32", "float16"]:
dtype = getattr(np, dtype_name)
node = onnx.helper.make_node(
"QLinearMatMul",
inputs=[
"a",
"a_scale",
"a_zero_point",
"b",
"b_scale",
"b_zero_point",
"y_scale",
"y_zero_point",
],
outputs=["y"],
)

# 2D
a = np.array(
[
[208, 236, 0, 238],
[3, 214, 255, 29],
],
dtype=np.uint8,
)
# 2D
a = np.array([[208, 236, 0, 238], [3, 214, 255, 29]])
if quant_type == np.int8:
a -= 127
a = a.astype(quant_type)

a_scale = np.array([0.0066], dtype=np.float32)
a_zero_point = np.array([113], dtype=np.uint8)
a_scale = np.array([0.0066], dtype=dtype)
a_zero_point = np.array(
[113 - 127] if quant_type == np.int8 else [113], dtype=quant_type
)

b = np.array(
[[152, 51, 244], [60, 26, 255], [0, 127, 246], [127, 254, 247]],
dtype=np.uint8,
)
b = np.array(
[[152, 51, 244], [60, 26, 255], [0, 127, 246], [127, 254, 247]]
)
if quant_type == np.int8:
b -= 127
b = b.astype(quant_type)

b_scale = np.array([0.00705], dtype=np.float32)
b_zero_point = np.array([114], dtype=np.uint8)
b_scale = np.array([0.00705], dtype=dtype)
b_zero_point = np.array(
[114 - 127] if quant_type == np.int8 else [114], dtype=quant_type
)

y_scale = np.array([0.0107], dtype=np.float32)
y_zero_point = np.array([118], dtype=np.uint8)
y_scale = np.array([0.0107], dtype=np.float32)
y_zero_point = np.array(
[118 - 127] if quant_type == np.int8 else [118], dtype=quant_type
)

output = np.array(
[
[168, 115, 255],
[1, 66, 151],
],
dtype=np.uint8,
)
if quant_type == np.int8:
output = np.array([[41, -12, -9], [1, -75, 20]])
else:
output = np.array([[168, 115, 255], [1, 66, 151]])
output = output.astype(quant_type)

expect(
node,
inputs=[
a,
a_scale,
a_zero_point,
b,
b_scale,
b_zero_point,
y_scale,
y_zero_point,
],
outputs=[output],
name="test_qlinearmatmul_2D",
)
expect(
node,
inputs=[
a,
a_scale,
a_zero_point,
b,
b_scale,
b_zero_point,
y_scale,
y_zero_point,
],
outputs=[output],
name=f"test_qlinearmatmul_2D_{quant_type_name}_{dtype_name}",
)

# 3D
a = np.array(
[
[[208, 236, 0, 238], [3, 214, 255, 29]],
[[208, 236, 0, 238], [3, 214, 255, 29]],
],
dtype=np.uint8,
)
# 3D
a = np.array(
[
[[208, 236, 0, 238], [3, 214, 255, 29]],
[[208, 236, 0, 238], [3, 214, 255, 29]],
],
)
if quant_type == np.int8:
a -= 127
a = a.astype(quant_type)

a_scale = np.array([0.0066], dtype=np.float32)
a_zero_point = np.array([113], dtype=np.uint8)
a_scale = np.array([0.0066], dtype=dtype)
a_zero_point = np.array(
[113 - 127] if quant_type == np.int8 else [113], dtype=dtype
)

b = np.array(
[
[[152, 51, 244], [60, 26, 255], [0, 127, 246], [127, 254, 247]],
[[152, 51, 244], [60, 26, 255], [0, 127, 246], [127, 254, 247]],
],
dtype=np.uint8,
)
b = np.array(
[
[[152, 51, 244], [60, 26, 255], [0, 127, 246], [127, 254, 247]],
[[152, 51, 244], [60, 26, 255], [0, 127, 246], [127, 254, 247]],
],
)
if quant_type == np.int8:
b -= 127
b = b.astype(quant_type)

b_scale = np.array([0.00705], dtype=np.float32)
b_zero_point = np.array([114], dtype=np.uint8)
b_scale = np.array([0.00705], dtype=dtype)
b_zero_point = np.array([114], dtype=quant_type)

y_scale = np.array([0.0107], dtype=np.float32)
y_zero_point = np.array([118], dtype=np.uint8)
y_scale = np.array([0.0107], dtype=dtype)
y_zero_point = np.array(
[118 - 127] if quant_type == np.int8 else [118], dtype=quant_type
)

output = np.array(
[[[168, 115, 255], [1, 66, 151]], [[168, 115, 255], [1, 66, 151]]],
dtype=np.uint8,
)
if quant_type == np.int8:
if dtype == np.float32:
output = np.array(
[
[[-86, 117, 120], [115, 39, -121]],
[[-86, 117, 120], [115, 39, -121]],
]
)
else:
output = np.array(
[
[[-86, 116, 119], [115, 39, -121]],
[[-86, 116, 119], [115, 39, -121]],
]
)
else:
output = np.array(
[
[[168, 115, 255], [1, 66, 151]],
[[168, 115, 255], [1, 66, 151]],
]
)
output = output.astype(quant_type)

expect(
node,
inputs=[
a,
a_scale,
a_zero_point,
b,
b_scale,
b_zero_point,
y_scale,
y_zero_point,
],
outputs=[output],
name="test_qlinearmatmul_3D",
)
expect(
node,
inputs=[
a,
a_scale,
a_zero_point,
b,
b_scale,
b_zero_point,
y_scale,
y_zero_point,
],
outputs=[output],
name=f"test_qlinearmatmul_3D_{quant_type_name}_{dtype_name}",
)
```

</details>
Expand Down
Loading

0 comments on commit 51a0ac8

Please sign in to comment.