-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add INT4, UINT4 types #5811
Add INT4, UINT4 types #5811
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #5811 +/- ##
==========================================
+ Coverage 56.39% 56.45% +0.05%
==========================================
Files 503 504 +1
Lines 29620 29865 +245
Branches 4426 4484 +58
==========================================
+ Hits 16704 16860 +156
- Misses 12109 12188 +79
- Partials 807 817 +10 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding int4 support to ONNX! I would personally suggest breaking the operator support into a separate PR. So
- Add the types to the protos and implement any helper functions to work with these types
- Create opset 21 definitions
For this to be easier to review and help us move faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First round of review fixes in commit bdffb14.
It looks like there's a CI pipeline issue in opset 20 is under development EDIT: |
I will take a look. Thanks! |
0e573f2
to
0b49912
Compare
@liqunfu do you know why is Linux succeeding but macOS fails? I will also update the test to account for the max supported opset version environment variable |
It is not clear from the CI run. I have a PR which was passing the same CI after. So it probably be related to your change. I noticed release-MacOS fails some reference implementation tests and have to disable it. The only way to find it out is to work on a Mac OS :) Further, onnxruntime is installed during CI, it could be that onnxruntime for MAC is behind for Linux and Windows. also this line is insteresting: export ORT_MAX_ONNX_OPSET_SUPPORTED_VERSION=19. The next ORT release will support opset 20. But I am not sure the current ORT support 19 or just opset 18. In anycase, it is confusing that only MacOS with this PR shows the error. |
@liqunfu @justinchuby In the interest of moving this along, perhaps you can assist with a specific issue on the MAC OS CI. onnxruntime attempts to run OnnxBackendNodeModelTest.test_cast_FLOAT16_to_INT4_cpu and fails. |
- Add INT4 and UINT4 quantized data types - Support for packing and unpacking int4x2->byte - Implementation of Operators: Cast, CastLike, DequantizeLinear, QuantizeLinear - Type support for non-compute operators Constant, ConstantOfShape, Identity, Reshape, Shape, Size, If, Loop, Scan, Flatten, Pad, Squeeze, Unsqueeze, Transpose. Signed-off-by: Gal Hubara Agam <ghubaraagam@nvidia.com>
- Import modules instead of objects/functions - docstrings - Type annotations - Added cast tests U/INT4->U/INT8 - Refactored abbreviations in variable names Signed-off-by: Gal Hubara Agam <ghubaraagam@nvidia.com>
Signed-off-by: Gal Hubara Agam <ghubaraagam@nvidia.com>
0b49912
to
c0e0e17
Compare
@xadupre could you help with the reference evaluator tests? Thanks! |
I just rebased a couple of hours ago |
it looks like the failing tests are from the onnxruntime tests, not the reference runtime tests. I would disable them in https://github.com/onnx/onnx/blob/main/onnx/test/test_backend_onnxruntime.py |
Signed-off-by: Gal Hubara Agam <ghubaraagam@nvidia.com>
Signed-off-by: Gal Hubara Agam <ghubaraagam@nvidia.com>
Signed-off-by: galagam <ghubaraagam@nvidia.com>
@justinchuby please let me know if there's anything else I can do to help move this PR along. |
LGTM! @gramalingam or @xadupre can approve to get it merged |
Thank you, @justinchuby. Do any of these files contain |
@lutzroeder see for example test_dequantizelinear_int4 |
@galagam tried this file, it doesn't contain weight initializers? |
@lutzroeder See https://github.com/onnx/onnx/blob/main/onnx/backend/test/data/node/test_dequantizelinear_int4/test_data_set_0/input_0.pb
As you can see, data_type is 22, which corresponds to TensorProto.INT4. The unpacked data array is (0,1,7,-4,-8). I hope this helps to clarify things. |
@galagam, thanks for adding int4 type to ONNX. I think there needs to update QuantizeLinear and DequantizeLinear to support blockwise quantization for AWQ/GPTQ. Now QuantizeLinear and DequantizzeLinear only support per-tensor and per-channel quantization. |
@yufenglee You're absolutely right! I have a PR in review for blocked quantization - see #5812 |
@galagam , thank you for this PR! I wonder if you have bandwidth to take a look at release CI failures of related tests? https://github.com/onnx/onnx/actions/runs/7503096973/job/20427133324? Thanks again. |
@liqunfu I'm having trouble setting up the environment to reproduce these errors.
|
thnks @galagam, need to skip these tests in case of older numpy version: #5858 |
While preparing the 1.16 release notes, I noticed this dropped |
Not intentional. Thanks for pointing that out. Let me issue a quick fix. |
Boolean type was unintentionally dropped in onnx#5811. Revert to previous version and add the new types uint4, int4. Modify type constraint comment to mention boolean is allowed. Signed-off-by: Gal Hubara Agam <ghubaraagam@nvidia.com>
Boolean type was unintentionally dropped in #5811. Revert to previous version and add the new types uint4, int4. Modify type constraint comment to mention boolean is allowed. Signed-off-by: Gal Hubara Agam <ghubaraagam@nvidia.com>
Description
Motivation and Context
See details in issue #5776