Extend supported types by QLinearMatMul (float16, float 8 types) #5473

xadupre · 2023-08-04T15:56:56Z

Description

QLinearMatMul is used on quantized types. This PR extends the list of supported quantized types to float 8 types and the list of supported inputs types to float16, bfloat16.

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

onnx/test/reference_evaluator_test.py

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

Signed-off-by: xadupre <xadupre@microsoft.com>

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

onnx/defs/math/utils.cc

onnx/defs/math/defs.cc

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

gramalingam

At some point, we should try to turn this into a function "Dequantize => MatMul => Quantize". But fine if that is done separately. (There may be some questions about precision of intermediate values etc. there.)

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

WilliamTambellini · 2023-09-11T22:28:17Z

Tks @xadupre Does it mean int4 is not yet supported ?

xadupre · 2023-09-12T06:11:01Z

Int4 is not defined yet in onnx. That would be the first step before adding it to the list of supported types. Maybe it would be worth discussing it during one of the sig meeting.

gramalingam

(As per offline discussion, Xavier suggests holding off this PR changes until better float 8 quantization support is available. Adding this comment to avoid accidental merge.)

justinchuby · 2023-09-19T21:54:11Z

Moved to 1.16

WilliamTambellini · 2023-09-20T00:17:32Z

That s unfortunate. What about reducing the scope and at least add uint8/int8 (no float8)?
Fast int8 is now already available in most (4th gen) CPUs, and most recent nvidia gpus.

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

onnx/test/test_backend_onnxruntime.py

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

codecov · 2023-10-27T15:48:41Z

Codecov Report

Attention: 47 lines in your changes are missing coverage. Please review.

Comparison is base (ede2c77) 56.06% compared to head (4eaf601) 56.04%.

Files	Patch %	Lines
onnx/backend/test/case/node/qlinearmatmul.py	0.00%	46 Missing ⚠️
onnx/reference/ops/op_qlinear_matmul.py	50.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #5473      +/-   ##
==========================================
- Coverage   56.06%   56.04%   -0.02%     
==========================================
  Files         501      501              
  Lines       29366    29409      +43     
  Branches     4404     4413       +9     
==========================================
+ Hits        16463    16482      +19     
- Misses      12091    12115      +24     
  Partials      812      812

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

onnx/version_converter/convert.h

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

…hts (#18043) ### Description Whenever a node QuantizeLinear or DequantizeLinear, the type of the weights before being quantize must be known to create the scale with the expected type. Another option would be to add many operator CastLike but that would push the burden to onnxruntime optimizer. The PR tries to avoid changing the signature. To do so, it modified the scale computation to use a numpy array to store the result and not a python float. The numpy array must be of the same type than the weights to quantize. The PR adds many `assert` to check the type of the scale is not a python type or a float64. This was added to make sure all the code follows the same logic. These lines were kept for the first review. DequantizeLinear, QuantizeLinear cannot be tested with onnx==1.15. PR onnx/onnx#5709 is missing to fix shape inference. PR onnx/onnx#5473) is missing to support QLinearMatMul with float 16. That explains why some tests are disabled with float 16. ### Motivation and Context The current quantization tool assumes every weight is float 32. For large models such as LLAMA, it is usually float 16. The quantization needs to quantize such weights.

xadupre added 5 commits August 4, 2023 15:55

add operator qlinearmatmul 20

319343e

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

refactoring

efbc9fc

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

add test case for qlineaerconv

51a0ac8

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

fix merged conflicts

ede015a

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

fix changelogs

05af0d8

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

github-advanced-security bot found potential problems Aug 4, 2023

View reviewed changes

onnx/test/reference_evaluator_test.py Fixed Show fixed Hide fixed

onnx/test/reference_evaluator_test.py Fixed Show fixed Hide fixed

onnx/test/reference_evaluator_test.py Fixed Show fixed Hide fixed

onnx/test/reference_evaluator_test.py Fixed Show fixed Hide fixed

xadupre added 3 commits August 7, 2023 12:08

fix backend test

fc70428

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

documentation

6785222

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

fix converter

a7db270

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

xadupre changed the title ~~Extend supported types by QLinearConv (int8, float 8 types)~~ Extend supported types by QLinearMatMul (int8, float 8 types) Aug 7, 2023

xadupre added 3 commits August 7, 2023 13:10

disable backend test for onnxruntime

2b7c89c

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

documentation

d327759

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

fix documentation

ea38819

Signed-off-by: xadupre <xadupre@microsoft.com>

xadupre mentioned this pull request Aug 8, 2023

DynamicQuantizeLinear opset 20 and float 8 #5472

Closed

fix merge conflicts

920e48d

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

xadupre marked this pull request as ready for review August 30, 2023 17:16

xadupre requested review from a team as code owners August 30, 2023 17:16

justinchuby reviewed Aug 30, 2023

View reviewed changes

onnx/defs/math/utils.cc Outdated Show resolved Hide resolved

justinchuby reviewed Aug 30, 2023

View reviewed changes

onnx/defs/math/utils.cc Outdated Show resolved Hide resolved

justinchuby added this to the 1.15 milestone Aug 30, 2023

gramalingam added the operator Issues related to ONNX operators label Aug 30, 2023

gramalingam reviewed Aug 31, 2023

View reviewed changes

onnx/defs/math/defs.cc Show resolved Hide resolved

xadupre added 2 commits September 1, 2023 09:51

fix merge conflict, update documentation

90e39f9

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

fix documentation

f426000

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

gramalingam approved these changes Sep 1, 2023

View reviewed changes

xadupre and others added 2 commits September 5, 2023 14:15

fix merge conflicts

7cee903

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

Merge branch 'main' into qm

71d1224

gramalingam requested changes Sep 19, 2023

View reviewed changes

justinchuby modified the milestones: 1.15, 1.16 Sep 19, 2023

xadupre added 5 commits October 27, 2023 14:23

fix merge conflicts

2b80579

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

move QLinearMatMul to opset 21

595f30d

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

remove unrelated changes

9c4f908

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

remove unlreated changes

aab4ecf

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

fix missing conflict

8057b10

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

github-advanced-security bot found potential problems Oct 27, 2023

View reviewed changes

onnx/test/test_backend_onnxruntime.py Fixed Show fixed Hide fixed

onnx/test/test_backend_onnxruntime.py Fixed Show fixed Hide fixed

xadupre added 4 commits October 27, 2023 15:57

fix duplicated registration

3afa5ad

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

add missing classes

4f86750

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

upgrade main opset

de3195c

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

fix doc

6a93209

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

xadupre changed the title ~~Extend supported types by QLinearMatMul (int8, float 8 types)~~ Extend supported types by QLinearMatMul (float16, float 8 types) Oct 27, 2023

xadupre mentioned this pull request Oct 27, 2023

Quantization tool: support float 8 with MatMul, support float 16 weights microsoft/onnxruntime#18043

Merged

gramalingam reviewed Nov 1, 2023

View reviewed changes

onnx/version_converter/convert.h Outdated Show resolved Hide resolved

fix merge conflicts

ae078c2

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

justinchuby approved these changes Nov 7, 2023

View reviewed changes

xadupre and others added 2 commits November 14, 2023 12:03

Merge branch 'main' of https://github.com/onnx/onnx into qm

8a69554

Merge branch 'main' into qm

4eaf601

gramalingam approved these changes Nov 14, 2023

View reviewed changes

xadupre added this pull request to the merge queue Nov 16, 2023

Merged via the queue into onnx:main with commit b60f694 Nov 16, 2023
37 checks passed

xadupre deleted the qm branch November 16, 2023 12:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend supported types by QLinearMatMul (float16, float 8 types) #5473

Extend supported types by QLinearMatMul (float16, float 8 types) #5473

xadupre commented Aug 4, 2023 •

edited

Loading

gramalingam left a comment

WilliamTambellini commented Sep 11, 2023

xadupre commented Sep 12, 2023 •

edited

Loading

gramalingam left a comment

justinchuby commented Sep 19, 2023

WilliamTambellini commented Sep 20, 2023

codecov bot commented Oct 27, 2023 •

edited

Loading

Extend supported types by QLinearMatMul (float16, float 8 types) #5473

Extend supported types by QLinearMatMul (float16, float 8 types) #5473

Conversation

xadupre commented Aug 4, 2023 • edited Loading

Description

gramalingam left a comment

Choose a reason for hiding this comment

WilliamTambellini commented Sep 11, 2023

xadupre commented Sep 12, 2023 • edited Loading

gramalingam left a comment

Choose a reason for hiding this comment

justinchuby commented Sep 19, 2023

WilliamTambellini commented Sep 20, 2023

codecov bot commented Oct 27, 2023 • edited Loading

Codecov Report

xadupre commented Aug 4, 2023 •

edited

Loading

xadupre commented Sep 12, 2023 •

edited

Loading

codecov bot commented Oct 27, 2023 •

edited

Loading