Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add INT4, UINT4 types #5811

Merged
merged 7 commits into from
Jan 8, 2024
Merged

Add INT4, UINT4 types #5811

merged 7 commits into from
Jan 8, 2024

Conversation

galagam
Copy link
Contributor

@galagam galagam commented Dec 18, 2023

Description

  • Add INT4 and UINT4 quantized data types
  • Support for packing and unpacking int4x2->byte
  • Implementation of Operators: Cast, CastLike, DequantizeLinear, QuantizeLinear
  • Type support for non-compute operators Constant, ConstantOfShape, Identity, Reshape, Shape, Size, If, Loop, Scan, Flatten, Pad, Squeeze, Unsqueeze, Transpose.

Motivation and Context

See details in issue #5776

@galagam galagam requested review from a team as code owners December 18, 2023 16:54
Copy link

codecov bot commented Dec 18, 2023

Codecov Report

Attention: 104 lines in your changes are missing coverage. Please review.

Comparison is base (75c6892) 56.39% compared to head (5bed7e1) 56.45%.

Files Patch % Lines
onnx/backend/test/case/node/cast.py 0.00% 37 Missing ⚠️
onnx/backend/test/case/node/quantizelinear.py 0.00% 18 Missing ⚠️
onnx/backend/test/case/node/dequantizelinear.py 0.00% 16 Missing ⚠️
onnx/reference/ops/op_dequantize_linear.py 47.05% 4 Missing and 5 partials ⚠️
onnx/reference/ops/op_quantize_linear.py 72.00% 3 Missing and 4 partials ⚠️
onnx/reference/op_run.py 66.66% 5 Missing and 1 partial ⚠️
onnx/reference/ops/op_cast_like.py 0.00% 2 Missing and 2 partials ⚠️
onnx/helper.py 88.88% 1 Missing and 1 partial ⚠️
onnx/reference/ops/op_cast.py 86.66% 1 Missing and 1 partial ⚠️
onnx/subbyte.py 92.85% 0 Missing and 2 partials ⚠️
... and 1 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5811      +/-   ##
==========================================
+ Coverage   56.39%   56.45%   +0.05%     
==========================================
  Files         503      504       +1     
  Lines       29620    29865     +245     
  Branches     4426     4484      +58     
==========================================
+ Hits        16704    16860     +156     
- Misses      12109    12188      +79     
- Partials      807      817      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@justinchuby justinchuby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding int4 support to ONNX! I would personally suggest breaking the operator support into a separate PR. So

  1. Add the types to the protos and implement any helper functions to work with these types
  2. Create opset 21 definitions

For this to be easier to review and help us move faster.

onnx/backend/test/case/node/cast.py Outdated Show resolved Hide resolved
onnx/defs/generator/defs.cc Show resolved Hide resolved
onnx/defs/tensor/defs.cc Show resolved Hide resolved
onnx/defs/tensor/utils.cc Outdated Show resolved Hide resolved
onnx/helper.py Outdated Show resolved Hide resolved
onnx/reference/ops/op_cast.py Outdated Show resolved Hide resolved
onnx/subbyte_helper.py Outdated Show resolved Hide resolved
onnx/subbyte_helper.py Outdated Show resolved Hide resolved
onnx/subbyte_helper.py Outdated Show resolved Hide resolved
onnx/subbyte_helper.py Outdated Show resolved Hide resolved
onnx/helper.py Fixed Show fixed Hide fixed
onnx/reference/ops/op_quantize_linear.py Fixed Show fixed Hide fixed
onnx/numpy_helper.py Fixed Show fixed Hide fixed
onnx/subbyte_helper.py Fixed Show fixed Hide fixed
@justinchuby justinchuby added this to the 1.16 milestone Dec 18, 2023
Copy link
Contributor Author

@galagam galagam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First round of review fixes in commit bdffb14.

onnx/defs/tensor/defs.cc Show resolved Hide resolved
onnx/helper.py Outdated Show resolved Hide resolved
onnx/helper.py Outdated Show resolved Hide resolved
onnx/helper.py Outdated Show resolved Hide resolved
onnx/helper.py Outdated Show resolved Hide resolved
onnx/backend/test/case/node/cast.py Outdated Show resolved Hide resolved
onnx/backend/test/case/node/cast.py Show resolved Hide resolved
onnx/defs/tensor/utils.cc Outdated Show resolved Hide resolved
onnx/helper.py Outdated Show resolved Hide resolved
onnx/subbyte_helper.py Outdated Show resolved Hide resolved
onnx/helper.py Fixed Show fixed Hide fixed
onnx/helper.py Fixed Show fixed Hide fixed
onnx/subbyte.py Fixed Show fixed Hide fixed
onnx/reference/ops/op_quantize_linear.py Fixed Show fixed Hide fixed
onnx/numpy_helper.py Fixed Show fixed Hide fixed
@galagam
Copy link
Contributor Author

galagam commented Dec 20, 2023

It looks like there's a CI pipeline issue in MacOS-CI.
I'm getting error reports of "Invalid tensor data type 22" although it was added and the Linux-CI is passing. Same for error "Opset 20 is under development and support for this is limited. The operator schemas and or other functionality may change before next ONNX release and in this case ONNX Runtime will not guarantee backward compatibility"
Perhaps the code is not compiled in the MacOS-CI prior to testing? This would only affect PRs that include changes to the C++ code.

opset 20 is under development
Invalid tensor data type 22"


EDIT:
@justinchuby can you take a look or tag someone relevant?

@justinchuby
Copy link
Contributor

I will take a look. Thanks!

@justinchuby
Copy link
Contributor

@liqunfu do you know why is Linux succeeding but macOS fails?

I will also update the test to account for the max supported opset version environment variable

@liqunfu
Copy link
Contributor

liqunfu commented Dec 28, 2023

@liqunfu do you know why is Linux succeeding but macOS fails?

I will also update the test to account for the max supported opset version environment variable

It is not clear from the CI run. I have a PR which was passing the same CI after. So it probably be related to your change. I noticed release-MacOS fails some reference implementation tests and have to disable it. The only way to find it out is to work on a Mac OS :)

Further, onnxruntime is installed during CI, it could be that onnxruntime for MAC is behind for Linux and Windows. also this line is insteresting: export ORT_MAX_ONNX_OPSET_SUPPORTED_VERSION=19. The next ORT release will support opset 20. But I am not sure the current ORT support 19 or just opset 18. In anycase, it is confusing that only MacOS with this PR shows the error.

@justinchuby justinchuby added the review needed: operators approvers Require reviews from members of operators-approvers label Dec 28, 2023
@galagam
Copy link
Contributor Author

galagam commented Jan 2, 2024

It is not clear from the CI run. I have a PR which was passing the same CI after. So it probably be related to your change. I noticed release-MacOS fails some reference implementation tests and have to disable it. The only way to find it out is to work on a Mac OS :)

@liqunfu @justinchuby In the interest of moving this along, perhaps you can assist with a specific issue on the MAC OS CI.

onnxruntime attempts to run OnnxBackendNodeModelTest.test_cast_FLOAT16_to_INT4_cpu and fails.
However, this test should be excluded from backend testing: test_backend_reference.py#L126
Same goes for the other cast_*to*INT4 tests.

- Add INT4 and UINT4 quantized data types
- Support for packing and unpacking int4x2->byte
- Implementation of Operators: Cast, CastLike, DequantizeLinear,
  QuantizeLinear
- Type support for non-compute operators Constant, ConstantOfShape,
  Identity, Reshape, Shape, Size, If, Loop, Scan, Flatten, Pad,
  Squeeze, Unsqueeze, Transpose.

Signed-off-by: Gal Hubara Agam <ghubaraagam@nvidia.com>
- Import modules instead of objects/functions
- docstrings
- Type annotations
- Added cast tests U/INT4->U/INT8
- Refactored abbreviations in variable names

Signed-off-by: Gal Hubara Agam <ghubaraagam@nvidia.com>
Signed-off-by: Gal Hubara Agam <ghubaraagam@nvidia.com>
@justinchuby
Copy link
Contributor

@xadupre could you help with the reference evaluator tests? Thanks!

@galagam
Copy link
Contributor Author

galagam commented Jan 2, 2024

Can you pull/rebase main?

I just rebased a couple of hours ago

@justinchuby
Copy link
Contributor

Judging from https://dev.azure.com/onnx-pipelines/onnx/_build/results?buildId=55095&view=logs&jobId=825fcbdb-febe-56c2-0b31-e8b200b321eb&j=825fcbdb-febe-56c2-0b31-e8b200b321eb&t=46007089-1c35-5679-e243-e8ae35eaeec6

it looks like the failing tests are from the onnxruntime tests, not the reference runtime tests. I would disable them in https://github.com/onnx/onnx/blob/main/onnx/test/test_backend_onnxruntime.py

Signed-off-by: Gal Hubara Agam <ghubaraagam@nvidia.com>
Signed-off-by: Gal Hubara Agam <ghubaraagam@nvidia.com>
Signed-off-by: galagam <ghubaraagam@nvidia.com>
@galagam
Copy link
Contributor Author

galagam commented Jan 8, 2024

@justinchuby please let me know if there's anything else I can do to help move this PR along.

@justinchuby
Copy link
Contributor

justinchuby commented Jan 8, 2024

LGTM! @gramalingam or @xadupre can approve to get it merged

@xadupre xadupre added this pull request to the merge queue Jan 8, 2024
Merged via the queue into onnx:main with commit d2ac757 Jan 8, 2024
37 checks passed
@lutzroeder
Copy link
Member

@galagam @xadupre can you please share a sample ONNX file using the UINT4 and INT4 types.

@justinchuby
Copy link
Contributor

@lutzroeder
Copy link
Member

Thank you, @justinchuby. Do any of these files contain UINT4 or INT4 weight initializers?

@galagam
Copy link
Contributor Author

galagam commented Jan 9, 2024

@lutzroeder see for example test_dequantizelinear_int4

@lutzroeder
Copy link
Member

@galagam tried this file, it doesn't contain weight initializers?

@galagam
Copy link
Contributor Author

galagam commented Jan 10, 2024

@lutzroeder See https://github.com/onnx/onnx/blob/main/onnx/backend/test/data/node/test_dequantizelinear_int4/test_data_set_0/input_0.pb

>>> import onnx
>>> from onnx import TensorProto
>>> from google.protobuf import text_format
>>> 
>>> proto = TensorProto()
>>> with open('onnx/backend/test/data/node/test_dequantizelinear_int4/test_data_set_0/input_0.pb', 'rb') as f:
...     proto.ParseFromString(f.read())
... 
21
>>> print(proto)
dims: 5
data_type: 22
int32_data: 16
int32_data: -57
int32_data: 8
name: "x"

As you can see, data_type is 22, which corresponds to TensorProto.INT4.
We see that dims = 5, but we have only 3 int32_data elements. That is because the int4 elements are packed in couples (due to the odd dimension, a padding of one nibble is added).

The unpacked data array is (0,1,7,-4,-8).
The first packed element, 16 = 00010000b holds the first two elements (0, 1) in little-endian order.
The second packed element, -57 = 11000111b holds 7 (0111b) and -4 (1100b in two's complement) in little-endian order.
The last packed element, 8, 00001000b contains -8(1000b) and a zero added for padding.

I hope this helps to clarify things.
You can take a look at subbyte.py to understand the packing and unpacking methods.

lutzroeder added a commit to lutzroeder/netron that referenced this pull request Jan 11, 2024
@yufenglee
Copy link
Contributor

@galagam, thanks for adding int4 type to ONNX. I think there needs to update QuantizeLinear and DequantizeLinear to support blockwise quantization for AWQ/GPTQ. Now QuantizeLinear and DequantizzeLinear only support per-tensor and per-channel quantization.

@galagam
Copy link
Contributor Author

galagam commented Jan 11, 2024

@galagam, thanks for adding int4 type to ONNX. I think there needs to update QuantizeLinear and DequantizeLinear to support blockwise quantization for AWQ/GPTQ. Now QuantizeLinear and DequantizzeLinear only support per-tensor and per-channel quantization.

@yufenglee You're absolutely right! I have a PR in review for blocked quantization - see #5812

lutzroeder added a commit to lutzroeder/netron that referenced this pull request Jan 12, 2024
@liqunfu
Copy link
Contributor

liqunfu commented Jan 14, 2024

@galagam , thank you for this PR! I wonder if you have bandwidth to take a look at release CI failures of related tests? https://github.com/onnx/onnx/actions/runs/7503096973/job/20427133324? Thanks again.

@galagam
Copy link
Contributor Author

galagam commented Jan 15, 2024

@liqunfu I'm having trouble setting up the environment to reproduce these errors.
I can offer this speculative fix (see patch below) based on the log file you've linked to.
If you can refer me to instructions for building the environment for reproducing the errors, I can verify and push a fix.
Alternatively, if you have an environment set up where you can apply the patch below and test it - please do.

diff --git a/onnx/reference/ops/op_quantize_linear.py b/onnx/reference/ops/op_quantize_linear.py
index c8866ebb..97484e74 100644
--- a/onnx/reference/ops/op_quantize_linear.py
+++ b/onnx/reference/ops/op_quantize_linear.py
@@ -117,7 +117,8 @@ class _CommonQuantizeLinear(OpRun):
                 return (f8.astype(float8e5m2fnuz),)  # type: ignore[attr-defined]

             if tensor_type in (TensorProto.UINT4, TensorProto.INT4):
-                xi = np.rint(x).astype(np.int32)
+                int_type_map = {TensorProto.UINT4: np.uint8, TensorProto.INT4: np.int8}
+                xi = np.rint(x).astype(int_type_map[tensor_type])
                 if len(y_scale.shape) > 0:
                     xi += zero_point.reshape(new_shape)
                 else:

@liqunfu
Copy link
Contributor

liqunfu commented Jan 16, 2024

@liqunfu I'm having trouble setting up the environment to reproduce these errors. I can offer this speculative fix (see patch below) based on the log file you've linked to. If you can refer me to instructions for building the environment for reproducing the errors, I can verify and push a fix. Alternatively, if you have an environment set up where you can apply the patch below and test it - please do.

diff --git a/onnx/reference/ops/op_quantize_linear.py b/onnx/reference/ops/op_quantize_linear.py
index c8866ebb..97484e74 100644
--- a/onnx/reference/ops/op_quantize_linear.py
+++ b/onnx/reference/ops/op_quantize_linear.py
@@ -117,7 +117,8 @@ class _CommonQuantizeLinear(OpRun):
                 return (f8.astype(float8e5m2fnuz),)  # type: ignore[attr-defined]

             if tensor_type in (TensorProto.UINT4, TensorProto.INT4):
-                xi = np.rint(x).astype(np.int32)
+                int_type_map = {TensorProto.UINT4: np.uint8, TensorProto.INT4: np.int8}
+                xi = np.rint(x).astype(int_type_map[tensor_type])
                 if len(y_scale.shape) > 0:
                     xi += zero_point.reshape(new_shape)
                 else:

thnks @galagam, need to skip these tests in case of older numpy version: #5858

@cjvolzka
Copy link
Contributor

While preparing the 1.16 release notes, I noticed this dropped tensor(bool) from ConstantOfShape. @galagam was that intentional or a cut and paste error?

@galagam
Copy link
Contributor Author

galagam commented Feb 23, 2024

While preparing the 1.16 release notes, I noticed this dropped tensor(bool) from ConstantOfShape. @galagam was that intentional or a cut and paste error?

Not intentional. Thanks for pointing that out. Let me issue a quick fix.

galagam added a commit to galagam/onnx that referenced this pull request Feb 23, 2024
Boolean type was unintentionally dropped in onnx#5811.
Revert to previous version and add the new types uint4, int4.
Modify type constraint comment to mention boolean is allowed.

Signed-off-by: Gal Hubara Agam <ghubaraagam@nvidia.com>
github-merge-queue bot pushed a commit that referenced this pull request Feb 25, 2024
Boolean type was unintentionally dropped in #5811. 
Revert to previous version and add the new types uint4, int4. 
Modify type constraint comment to mention boolean is allowed.

Signed-off-by: Gal Hubara Agam <ghubaraagam@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
review needed: operators approvers Require reviews from members of operators-approvers spec
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

8 participants