Add INT4, UINT4 types #5811

galagam · 2023-12-18T16:54:34Z

Description

Add INT4 and UINT4 quantized data types
Support for packing and unpacking int4x2->byte
Implementation of Operators: Cast, CastLike, DequantizeLinear, QuantizeLinear
Type support for non-compute operators Constant, ConstantOfShape, Identity, Reshape, Shape, Size, If, Loop, Scan, Flatten, Pad, Squeeze, Unsqueeze, Transpose.

Motivation and Context

See details in issue #5776

codecov · 2023-12-18T17:01:55Z

Codecov Report

Attention: 104 lines in your changes are missing coverage. Please review.

Comparison is base (75c6892) 56.39% compared to head (5bed7e1) 56.45%.

Files	Patch %	Lines
onnx/backend/test/case/node/cast.py	0.00%	37 Missing ⚠️
onnx/backend/test/case/node/quantizelinear.py	0.00%	18 Missing ⚠️
onnx/backend/test/case/node/dequantizelinear.py	0.00%	16 Missing ⚠️
onnx/reference/ops/op_dequantize_linear.py	47.05%	4 Missing and 5 partials ⚠️
onnx/reference/ops/op_quantize_linear.py	72.00%	3 Missing and 4 partials ⚠️
onnx/reference/op_run.py	66.66%	5 Missing and 1 partial ⚠️
onnx/reference/ops/op_cast_like.py	0.00%	2 Missing and 2 partials ⚠️
onnx/helper.py	88.88%	1 Missing and 1 partial ⚠️
onnx/reference/ops/op_cast.py	86.66%	1 Missing and 1 partial ⚠️
onnx/subbyte.py	92.85%	0 Missing and 2 partials ⚠️
... and 1 more

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #5811      +/-   ##
==========================================
+ Coverage   56.39%   56.45%   +0.05%     
==========================================
  Files         503      504       +1     
  Lines       29620    29865     +245     
  Branches     4426     4484      +58     
==========================================
+ Hits        16704    16860     +156     
- Misses      12109    12188      +79     
- Partials      807      817      +10

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

onnx/backend/test/case/node/cast.py

onnx/defs/nn/defs.cc

onnx/helper.py

onnx/reference/ops/op_cast.py

justinchuby

Thank you for adding int4 support to ONNX! I would personally suggest breaking the operator support into a separate PR. So

Add the types to the protos and implement any helper functions to work with these types
Create opset 21 definitions

For this to be easier to review and help us move faster.

onnx/backend/test/case/node/cast.py

onnx/defs/generator/defs.cc

onnx/defs/tensor/defs.cc

onnx/defs/tensor/utils.cc

onnx/helper.py

onnx/reference/ops/op_cast.py

onnx/subbyte_helper.py

onnx/helper.py

onnx/reference/ops/op_quantize_linear.py

onnx/numpy_helper.py

onnx/subbyte_helper.py

onnx/reference/ops/_op_list.py

Dismissing

galagam

First round of review fixes in commit bdffb14.

onnx/defs/tensor/defs.cc

onnx/helper.py

onnx/backend/test/case/node/cast.py

onnx/defs/tensor/utils.cc

onnx/helper.py

onnx/subbyte_helper.py

onnx/helper.py

onnx/subbyte.py

onnx/reference/ops/op_quantize_linear.py

onnx/numpy_helper.py

galagam · 2023-12-20T07:09:08Z

It looks like there's a CI pipeline issue in MacOS-CI.
I'm getting error reports of "Invalid tensor data type 22" although it was added and the Linux-CI is passing. Same for error "Opset 20 is under development and support for this is limited. The operator schemas and or other functionality may change before next ONNX release and in this case ONNX Runtime will not guarantee backward compatibility"
Perhaps the code is not compiled in the MacOS-CI prior to testing? This would only affect PRs that include changes to the C++ code.

opset 20 is under development
Invalid tensor data type 22"

EDIT:
@justinchuby can you take a look or tag someone relevant?

justinchuby · 2023-12-22T00:11:53Z

I will take a look. Thanks!

justinchuby · 2023-12-26T15:50:21Z

@liqunfu do you know why is Linux succeeding but macOS fails?

I will also update the test to account for the max supported opset version environment variable

liqunfu · 2023-12-28T02:38:30Z

@liqunfu do you know why is Linux succeeding but macOS fails?

I will also update the test to account for the max supported opset version environment variable

It is not clear from the CI run. I have a PR which was passing the same CI after. So it probably be related to your change. I noticed release-MacOS fails some reference implementation tests and have to disable it. The only way to find it out is to work on a Mac OS :)

Further, onnxruntime is installed during CI, it could be that onnxruntime for MAC is behind for Linux and Windows. also this line is insteresting: export ORT_MAX_ONNX_OPSET_SUPPORTED_VERSION=19. The next ORT release will support opset 20. But I am not sure the current ORT support 19 or just opset 18. In anycase, it is confusing that only MacOS with this PR shows the error.

galagam · 2024-01-02T13:18:20Z

It is not clear from the CI run. I have a PR which was passing the same CI after. So it probably be related to your change. I noticed release-MacOS fails some reference implementation tests and have to disable it. The only way to find it out is to work on a Mac OS :)

@liqunfu @justinchuby In the interest of moving this along, perhaps you can assist with a specific issue on the MAC OS CI.

onnxruntime attempts to run OnnxBackendNodeModelTest.test_cast_FLOAT16_to_INT4_cpu and fails.
However, this test should be excluded from backend testing: test_backend_reference.py#L126
Same goes for the other cast_*to*INT4 tests.

- Add INT4 and UINT4 quantized data types - Support for packing and unpacking int4x2->byte - Implementation of Operators: Cast, CastLike, DequantizeLinear, QuantizeLinear - Type support for non-compute operators Constant, ConstantOfShape, Identity, Reshape, Shape, Size, If, Loop, Scan, Flatten, Pad, Squeeze, Unsqueeze, Transpose. Signed-off-by: Gal Hubara Agam <ghubaraagam@nvidia.com>

- Import modules instead of objects/functions - docstrings - Type annotations - Added cast tests U/INT4->U/INT8 - Refactored abbreviations in variable names Signed-off-by: Gal Hubara Agam <ghubaraagam@nvidia.com>

Signed-off-by: Gal Hubara Agam <ghubaraagam@nvidia.com>

justinchuby · 2024-01-02T14:37:51Z

@xadupre could you help with the reference evaluator tests? Thanks!

galagam · 2024-01-02T15:34:04Z

Can you pull/rebase main?

I just rebased a couple of hours ago

justinchuby · 2024-01-02T16:28:45Z

Judging from https://dev.azure.com/onnx-pipelines/onnx/_build/results?buildId=55095&view=logs&jobId=825fcbdb-febe-56c2-0b31-e8b200b321eb&j=825fcbdb-febe-56c2-0b31-e8b200b321eb&t=46007089-1c35-5679-e243-e8ae35eaeec6

it looks like the failing tests are from the onnxruntime tests, not the reference runtime tests. I would disable them in https://github.com/onnx/onnx/blob/main/onnx/test/test_backend_onnxruntime.py

Signed-off-by: Gal Hubara Agam <ghubaraagam@nvidia.com>

Signed-off-by: galagam <ghubaraagam@nvidia.com>

docs/docsgen/source/technical/int4.md

galagam · 2024-01-08T12:47:24Z

@justinchuby please let me know if there's anything else I can do to help move this PR along.

justinchuby · 2024-01-08T13:15:38Z

LGTM! @gramalingam or @xadupre can approve to get it merged

lutzroeder · 2024-01-09T04:45:27Z

@galagam @xadupre can you please share a sample ONNX file using the UINT4 and INT4 types.

justinchuby · 2024-01-09T04:55:24Z

https://github.com/onnx/onnx/pull/5811/files?file-filters%5B%5D=.onnx&show-viewed-files=true

lutzroeder · 2024-01-09T05:14:25Z

Thank you, @justinchuby. Do any of these files contain UINT4 or INT4 weight initializers?

galagam · 2024-01-09T07:51:31Z

@lutzroeder see for example test_dequantizelinear_int4

lutzroeder · 2024-01-09T17:09:55Z

@galagam tried this file, it doesn't contain weight initializers?

galagam · 2024-01-10T08:26:28Z

@lutzroeder See https://github.com/onnx/onnx/blob/main/onnx/backend/test/data/node/test_dequantizelinear_int4/test_data_set_0/input_0.pb

>>> import onnx
>>> from onnx import TensorProto
>>> from google.protobuf import text_format
>>> 
>>> proto = TensorProto()
>>> with open('onnx/backend/test/data/node/test_dequantizelinear_int4/test_data_set_0/input_0.pb', 'rb') as f:
...     proto.ParseFromString(f.read())
... 
21
>>> print(proto)
dims: 5
data_type: 22
int32_data: 16
int32_data: -57
int32_data: 8
name: "x"

As you can see, data_type is 22, which corresponds to TensorProto.INT4.
We see that dims = 5, but we have only 3 int32_data elements. That is because the int4 elements are packed in couples (due to the odd dimension, a padding of one nibble is added).

The unpacked data array is (0,1,7,-4,-8).
The first packed element, 16 = 00010000b holds the first two elements (0, 1) in little-endian order.
The second packed element, -57 = 11000111b holds 7 (0111b) and -4 (1100b in two's complement) in little-endian order.
The last packed element, 8, 00001000b contains -8(1000b) and a zero added for padding.

I hope this helps to clarify things.
You can take a look at subbyte.py to understand the packing and unpacking methods.

yufenglee · 2024-01-11T15:40:48Z

@galagam, thanks for adding int4 type to ONNX. I think there needs to update QuantizeLinear and DequantizeLinear to support blockwise quantization for AWQ/GPTQ. Now QuantizeLinear and DequantizzeLinear only support per-tensor and per-channel quantization.

galagam · 2024-01-11T15:47:07Z

@galagam, thanks for adding int4 type to ONNX. I think there needs to update QuantizeLinear and DequantizeLinear to support blockwise quantization for AWQ/GPTQ. Now QuantizeLinear and DequantizzeLinear only support per-tensor and per-channel quantization.

@yufenglee You're absolutely right! I have a PR in review for blocked quantization - see #5812

liqunfu · 2024-01-14T23:03:28Z

@galagam , thank you for this PR! I wonder if you have bandwidth to take a look at release CI failures of related tests? https://github.com/onnx/onnx/actions/runs/7503096973/job/20427133324? Thanks again.

galagam · 2024-01-15T08:43:20Z

@liqunfu I'm having trouble setting up the environment to reproduce these errors.
I can offer this speculative fix (see patch below) based on the log file you've linked to.
If you can refer me to instructions for building the environment for reproducing the errors, I can verify and push a fix.
Alternatively, if you have an environment set up where you can apply the patch below and test it - please do.

diff --git a/onnx/reference/ops/op_quantize_linear.py b/onnx/reference/ops/op_quantize_linear.py
index c8866ebb..97484e74 100644
--- a/onnx/reference/ops/op_quantize_linear.py
+++ b/onnx/reference/ops/op_quantize_linear.py
@@ -117,7 +117,8 @@ class _CommonQuantizeLinear(OpRun):
                 return (f8.astype(float8e5m2fnuz),)  # type: ignore[attr-defined]

             if tensor_type in (TensorProto.UINT4, TensorProto.INT4):
-                xi = np.rint(x).astype(np.int32)
+                int_type_map = {TensorProto.UINT4: np.uint8, TensorProto.INT4: np.int8}
+                xi = np.rint(x).astype(int_type_map[tensor_type])
                 if len(y_scale.shape) > 0:
                     xi += zero_point.reshape(new_shape)
                 else:

liqunfu · 2024-01-16T09:43:17Z

@liqunfu I'm having trouble setting up the environment to reproduce these errors. I can offer this speculative fix (see patch below) based on the log file you've linked to. If you can refer me to instructions for building the environment for reproducing the errors, I can verify and push a fix. Alternatively, if you have an environment set up where you can apply the patch below and test it - please do.
diff --git a/onnx/reference/ops/op_quantize_linear.py b/onnx/reference/ops/op_quantize_linear.py
index c8866ebb..97484e74 100644
--- a/onnx/reference/ops/op_quantize_linear.py
+++ b/onnx/reference/ops/op_quantize_linear.py
@@ -117,7 +117,8 @@ class _CommonQuantizeLinear(OpRun):
                 return (f8.astype(float8e5m2fnuz),)  # type: ignore[attr-defined]

             if tensor_type in (TensorProto.UINT4, TensorProto.INT4):
-                xi = np.rint(x).astype(np.int32)
+                int_type_map = {TensorProto.UINT4: np.uint8, TensorProto.INT4: np.int8}
+                xi = np.rint(x).astype(int_type_map[tensor_type])
                 if len(y_scale.shape) > 0:
                     xi += zero_point.reshape(new_shape)
                 else:

thnks @galagam, need to skip these tests in case of older numpy version: #5858

cjvolzka · 2024-02-23T20:53:09Z

While preparing the 1.16 release notes, I noticed this dropped tensor(bool) from ConstantOfShape. @galagam was that intentional or a cut and paste error?

galagam · 2024-02-23T21:01:55Z

While preparing the 1.16 release notes, I noticed this dropped tensor(bool) from ConstantOfShape. @galagam was that intentional or a cut and paste error?

Not intentional. Thanks for pointing that out. Let me issue a quick fix.

Boolean type was unintentionally dropped in onnx#5811. Revert to previous version and add the new types uint4, int4. Modify type constraint comment to mention boolean is allowed. Signed-off-by: Gal Hubara Agam <ghubaraagam@nvidia.com>

Boolean type was unintentionally dropped in #5811. Revert to previous version and add the new types uint4, int4. Modify type constraint comment to mention boolean is allowed. Signed-off-by: Gal Hubara Agam <ghubaraagam@nvidia.com>

galagam requested review from a team as code owners December 18, 2023 16:54

xadupre reviewed Dec 18, 2023

View reviewed changes

onnx/backend/test/case/node/cast.py Show resolved Hide resolved

xadupre reviewed Dec 18, 2023

View reviewed changes

onnx/defs/nn/defs.cc Show resolved Hide resolved

xadupre reviewed Dec 18, 2023

View reviewed changes

onnx/helper.py Show resolved Hide resolved

xadupre reviewed Dec 18, 2023

View reviewed changes

onnx/reference/ops/op_cast.py Show resolved Hide resolved

justinchuby previously requested changes Dec 18, 2023

View reviewed changes

github-advanced-security bot found potential problems Dec 18, 2023

View reviewed changes

onnx/helper.py Fixed Show fixed Hide fixed

onnx/reference/ops/op_quantize_linear.py Fixed Show fixed Hide fixed

onnx/numpy_helper.py Fixed Show fixed Hide fixed

onnx/subbyte_helper.py Fixed Show fixed Hide fixed

justinchuby added this to the 1.16 milestone Dec 18, 2023

github-advanced-security bot found potential problems Dec 18, 2023

View reviewed changes

onnx/reference/ops/_op_list.py Fixed Show fixed Hide fixed

justinchuby added the spec label Dec 18, 2023

WilliamTambellini mentioned this pull request Dec 18, 2023

Add support for INT4/UINT4 oneapi-src/oneDNN#1712

Closed

galagam commented Dec 19, 2023

View reviewed changes

github-advanced-security bot found potential problems Dec 19, 2023

View reviewed changes

onnx/helper.py Fixed Show fixed Hide fixed

onnx/helper.py Fixed Show fixed Hide fixed

onnx/subbyte.py Fixed Show fixed Hide fixed

onnx/reference/ops/op_quantize_linear.py Fixed Show fixed Hide fixed

onnx/numpy_helper.py Fixed Show fixed Hide fixed

galagam force-pushed the add-int4-uint4-qdq-types branch from 0e573f2 to 0b49912 Compare December 26, 2023 10:39

justinchuby added the review needed: operators approvers Require reviews from members of operators-approvers label Dec 28, 2023

galagam added 3 commits January 2, 2024 15:18

Various review fixes

5bcf35b

- Import modules instead of objects/functions - docstrings - Type annotations - Added cast tests U/INT4->U/INT8 - Refactored abbreviations in variable names Signed-off-by: Gal Hubara Agam <ghubaraagam@nvidia.com>

Modify packing order in int4/uint4 to use little-endian

c0e0e17

Signed-off-by: Gal Hubara Agam <ghubaraagam@nvidia.com>

galagam force-pushed the add-int4-uint4-qdq-types branch from 0b49912 to c0e0e17 Compare January 2, 2024 13:18

galagam added 3 commits January 2, 2024 21:15

lint fixes/waives + TODO 5840

dd33a4e

Signed-off-by: Gal Hubara Agam <ghubaraagam@nvidia.com>

Minor: fix missing newline in auto-generated file

569007c

Signed-off-by: Gal Hubara Agam <ghubaraagam@nvidia.com>

Merge branch 'main' into add-int4-uint4-qdq-types

5bed7e1

Signed-off-by: galagam <ghubaraagam@nvidia.com>

galagam requested a review from justinchuby January 4, 2024 13:18

justinchuby reviewed Jan 4, 2024

View reviewed changes

docs/docsgen/source/technical/int4.md Show resolved Hide resolved

xadupre approved these changes Jan 8, 2024

View reviewed changes

xadupre added this pull request to the merge queue Jan 8, 2024

Merged via the queue into onnx:main with commit d2ac757 Jan 8, 2024
37 checks passed

lutzroeder added a commit to lutzroeder/netron that referenced this pull request Jan 11, 2024

Add ONNX test file (#6) (onnx/onnx#5811)

375a34d

lutzroeder added a commit to lutzroeder/netron that referenced this pull request Jan 12, 2024

Add ONNX test file (#6) (onnx/onnx#5811)

9b3f0ca

galagam mentioned this pull request Feb 23, 2024

Fix ConstantOfShape type constraints #5961

Merged

galagam mentioned this pull request Apr 8, 2024

[Feature request] Add support for Int4 data-type #5776

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add INT4, UINT4 types #5811

Add INT4, UINT4 types #5811

galagam commented Dec 18, 2023

codecov bot commented Dec 18, 2023 •

edited

Loading

justinchuby left a comment

galagam left a comment

galagam commented Dec 20, 2023 •

edited

Loading

justinchuby commented Dec 22, 2023

justinchuby commented Dec 26, 2023

liqunfu commented Dec 28, 2023 •

edited

Loading

galagam commented Jan 2, 2024

justinchuby commented Jan 2, 2024

galagam commented Jan 2, 2024

justinchuby commented Jan 2, 2024

galagam commented Jan 8, 2024

justinchuby commented Jan 8, 2024 •

edited

Loading

lutzroeder commented Jan 9, 2024

justinchuby commented Jan 9, 2024

lutzroeder commented Jan 9, 2024

galagam commented Jan 9, 2024

lutzroeder commented Jan 9, 2024

galagam commented Jan 10, 2024

yufenglee commented Jan 11, 2024

galagam commented Jan 11, 2024

liqunfu commented Jan 14, 2024

galagam commented Jan 15, 2024

liqunfu commented Jan 16, 2024

cjvolzka commented Feb 23, 2024

galagam commented Feb 23, 2024

Add INT4, UINT4 types #5811

Add INT4, UINT4 types #5811

Conversation

galagam commented Dec 18, 2023

Description

Motivation and Context

codecov bot commented Dec 18, 2023 • edited Loading

Codecov Report

justinchuby left a comment

Choose a reason for hiding this comment

galagam left a comment

Choose a reason for hiding this comment

galagam commented Dec 20, 2023 • edited Loading

justinchuby commented Dec 22, 2023

justinchuby commented Dec 26, 2023

liqunfu commented Dec 28, 2023 • edited Loading

galagam commented Jan 2, 2024

justinchuby commented Jan 2, 2024

galagam commented Jan 2, 2024

justinchuby commented Jan 2, 2024

galagam commented Jan 8, 2024

justinchuby commented Jan 8, 2024 • edited Loading

lutzroeder commented Jan 9, 2024

justinchuby commented Jan 9, 2024

lutzroeder commented Jan 9, 2024

galagam commented Jan 9, 2024

lutzroeder commented Jan 9, 2024

galagam commented Jan 10, 2024

yufenglee commented Jan 11, 2024

galagam commented Jan 11, 2024

liqunfu commented Jan 14, 2024

galagam commented Jan 15, 2024

liqunfu commented Jan 16, 2024

cjvolzka commented Feb 23, 2024

galagam commented Feb 23, 2024

codecov bot commented Dec 18, 2023 •

edited

Loading

galagam commented Dec 20, 2023 •

edited

Loading

liqunfu commented Dec 28, 2023 •

edited

Loading

justinchuby commented Jan 8, 2024 •

edited

Loading