Support SmoothQuant for ORT static quantization #16288

mengniwang95 · 2023-06-08T11:53:55Z

Description

Support SmoothQuant for ORT static quantization via intel neural compressor

Note:
Please use neural-compressor==2.2 to try SmoothQuant function.

Motivation and Context

For large language models (LLMs) with gigantic parameters, the systematic outliers make quantification of activations difficult. As a training free post-training quantization (PTQ) solution, SmoothQuant offline migrates this difficulty from activations to weights with a mathematically equivalent transformation. Integrating SmoothQuant into ORT quantization can benefit the accuracy of INT8 LLMs.

Signed-off-by: Mengni Wang <mengni.wang@intel.com>

hshen14 · 2023-06-15T05:52:40Z

Hi @yufenglee , please review the PR. Thanks.

onnxruntime/test/python/quantization/test_quantize_static.py

Signed-off-by: Mengni Wang <mengni.wang@intel.com>

hshen14 · 2023-06-26T01:10:23Z

FYI - Intel Neural Compressor v2.2 was released with enhanced SmoothQuant. This PR enables SmoothQuant feature in ORT.

mengniwang95 · 2023-07-03T01:16:54Z

@yufenglee hi, could you take a look and review this PR? thanks a lot 😄

yufenglee · 2023-07-03T21:12:57Z

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, Windows CPU CI Pipeline, Windows GPU CI Pipeline

yufenglee · 2023-07-03T21:13:26Z

/azp run Windows GPU TensorRT CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, onnxruntime-python-checks-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed

azure-pipelines · 2023-07-03T21:13:34Z

Azure Pipelines successfully started running 9 pipeline(s).

azure-pipelines · 2023-07-03T21:13:46Z

Azure Pipelines successfully started running 5 pipeline(s).

yufenglee · 2023-07-03T21:14:03Z

/azp run Linux DNNL CI Pipeline

azure-pipelines · 2023-07-03T21:14:12Z

Azure Pipelines successfully started running 1 pipeline(s).

yufenglee · 2023-07-03T21:16:24Z

@pranavsharma, could you please help review? The PR adds a package in requirements-dev.txt

yufenglee · 2023-07-03T21:17:20Z

/azp run Linux QNN CI Pipeline, Windows ARM64 QNN CI Pipeline

azure-pipelines · 2023-07-03T21:17:30Z

Azure Pipelines successfully started running 2 pipeline(s).

pranavsharma

This is a new dep. Have you followed the process of adding a new dep? See #15797 (comment)

mengniwang95 · 2023-07-05T11:56:41Z

Hi, @pranavsharma It seems the new dep neural-compressor isn't installed during CI test, could you provide me some help about where to add this dep for CI test?

requirements-dev.txt

Signed-off-by: Mengni Wang <mengni.wang@intel.com>

mengniwang95 · 2023-07-22T01:22:20Z

@tianleiwu Hi, we released a new version which uses opencv-python-headless instead of opencv, we validated that opencv-python-headless can skip libGL.so.1 issue. Could you help tp trigger tests again?

onnxruntime/test/python/quantization/test_quantize_static.py

Pull main branch

mengniwang95 · 2023-07-25T07:10:14Z

Hi, could someone help to trigger the test?

yufenglee · 2023-07-25T07:38:25Z

/azp run Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows ARM64 QNN CI Pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed, ONNX Runtime React Native CI Pipeline

yufenglee · 2023-07-25T07:38:57Z

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, Linux QNN CI Pipeline

azure-pipelines · 2023-07-25T07:38:59Z

Azure Pipelines successfully started running 8 pipeline(s).

azure-pipelines · 2023-07-25T07:39:02Z

No commit pushedDate could be found for PR 16288 in repo microsoft/onnxruntime

yufenglee · 2023-07-25T07:39:55Z

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, Linux QNN CI Pipeline

azure-pipelines · 2023-07-25T07:40:00Z

No commit pushedDate could be found for PR 16288 in repo microsoft/onnxruntime

yufenglee · 2023-07-25T07:43:28Z

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline, Linux QNN CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline

azure-pipelines · 2023-07-25T07:43:33Z

No commit pushedDate could be found for PR 16288 in repo microsoft/onnxruntime

yufenglee · 2023-07-25T07:45:49Z

/azp run Linux CPU CI Pipeline

azure-pipelines · 2023-07-25T07:45:59Z

Azure Pipelines successfully started running 1 pipeline(s).

yufenglee · 2023-07-25T07:46:58Z

/azp run Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline, Linux QNN CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline

azure-pipelines · 2023-07-25T07:47:26Z

Azure Pipelines successfully started running 7 pipeline(s).

yufenglee · 2023-07-25T07:48:25Z

/azp run onnxruntime-binary-size-checks-ci-pipeline

azure-pipelines · 2023-07-25T07:48:33Z

Azure Pipelines successfully started running 1 pipeline(s).

mengniwang95 · 2023-07-25T08:38:37Z

Hi @yufenglee , I am not sure why ONNX Runtime React Native CI Pipeline failed, could you help to take a look?

yihonglyu · 2023-07-25T11:02:15Z

/azp run ONNX Runtime React Native CI Pipeline

azure-pipelines · 2023-07-25T11:02:27Z

Azure Pipelines successfully started running 1 pipeline(s).

tianleiwu · 2023-07-26T01:37:03Z

Hi @yufenglee , I am not sure why ONNX Runtime React Native CI Pipeline failed, could you help to take a look?
The error seems not related to this change. I think it is fine. Merge main shall resolve the pipeline failure.

hshen14 · 2023-07-27T02:28:25Z

Thanks @yihonglyu @tianleiwu @yufenglee! Appreciate all the help during our first PR contribution to ORT.

### Description Python Package Pipeline failed since there is exception raised in test_smooth_quant (from #16288): ``` File "/home/cloudtest/.local/lib/python3.8/site-packages/onnxruntime/quantization/quantize.py", line 384, in quantize_static importlib.import_module("neural_compressor.adaptor.ox_utils.smooth_quant") File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/__init__.py", line 24, in <module> from .contrib import * File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/contrib/__init__.py", line 19, in <module> from .strategy import * File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/contrib/strategy/__init__.py", line 26, in <module> __import__(basename(f)[:-3], globals(), locals(), level=1) File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/contrib/strategy/sigopt.py", line 22, in <module> from neural_compressor.strategy.strategy import strategy_registry, TuneStrategy File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/strategy/__init__.py", line 20, in <module> from .strategy import STRATEGIES File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/strategy/strategy.py", line 41, in <module> from ..algorithm import AlgorithmScheduler, ALGORITHMS File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/algorithm/__init__.py", line 20, in <module> from .algorithm import ALGORITHMS, Algorithm, AlgorithmScheduler, algorithm_registry File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/algorithm/algorithm.py", line 21, in <module> from neural_compressor.utils.create_obj_from_config import get_algorithm File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/utils/create_obj_from_config.py", line 20, in <module> from neural_compressor.metric import METRICS File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/metric/__init__.py", line 30, in <module> __import__(basename(f)[:-3], globals(), locals(), level=1) File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/metric/coco_tools.py", line 54, in <module> from pycocotools import coco File "/usr/local/lib/python3.8/dist-packages/pycocotools/coco.py", line 52, in <module> from . import mask as maskUtils File "/usr/local/lib/python3.8/dist-packages/pycocotools/mask.py", line 3, in <module> import pycocotools._mask as _mask File "pycocotools/_mask.pyx", line 1, in init pycocotools._mask ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject ``` The cause is pycocotools package uses "oldest-supported-numpy", which might cause older version numpy in build pycocotools: https://github.com/ppwwyyxx/cocoapi/blob/9e9164f979fe4265c6f387f10e234f8697a15922/PythonAPI/pyproject.toml#L4 Related issue: cocodataset/cocoapi#248 ### Motivation and Context

### Description Support SmoothQuant for ORT static quantization via intel neural compressor > Note: Please use neural-compressor==2.2 to try SmoothQuant function. ### Motivation and Context For large language models (LLMs) with gigantic parameters, the systematic outliers make quantification of activations difficult. As a training free post-training quantization (PTQ) solution, SmoothQuant offline migrates this difficulty from activations to weights with a mathematically equivalent transformation. Integrating SmoothQuant into ORT quantization can benefit the accuracy of INT8 LLMs. --------- Signed-off-by: Mengni Wang <mengni.wang@intel.com>

### Description Python Package Pipeline failed since there is exception raised in test_smooth_quant (from #16288): ``` File "/home/cloudtest/.local/lib/python3.8/site-packages/onnxruntime/quantization/quantize.py", line 384, in quantize_static importlib.import_module("neural_compressor.adaptor.ox_utils.smooth_quant") File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/__init__.py", line 24, in <module> from .contrib import * File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/contrib/__init__.py", line 19, in <module> from .strategy import * File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/contrib/strategy/__init__.py", line 26, in <module> __import__(basename(f)[:-3], globals(), locals(), level=1) File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/contrib/strategy/sigopt.py", line 22, in <module> from neural_compressor.strategy.strategy import strategy_registry, TuneStrategy File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/strategy/__init__.py", line 20, in <module> from .strategy import STRATEGIES File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/strategy/strategy.py", line 41, in <module> from ..algorithm import AlgorithmScheduler, ALGORITHMS File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/algorithm/__init__.py", line 20, in <module> from .algorithm import ALGORITHMS, Algorithm, AlgorithmScheduler, algorithm_registry File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/algorithm/algorithm.py", line 21, in <module> from neural_compressor.utils.create_obj_from_config import get_algorithm File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/utils/create_obj_from_config.py", line 20, in <module> from neural_compressor.metric import METRICS File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/metric/__init__.py", line 30, in <module> __import__(basename(f)[:-3], globals(), locals(), level=1) File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/metric/coco_tools.py", line 54, in <module> from pycocotools import coco File "/usr/local/lib/python3.8/dist-packages/pycocotools/coco.py", line 52, in <module> from . import mask as maskUtils File "/usr/local/lib/python3.8/dist-packages/pycocotools/mask.py", line 3, in <module> import pycocotools._mask as _mask File "pycocotools/_mask.pyx", line 1, in init pycocotools._mask ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject ``` The cause is pycocotools package uses "oldest-supported-numpy", which might cause older version numpy in build pycocotools: https://github.com/ppwwyyxx/cocoapi/blob/9e9164f979fe4265c6f387f10e234f8697a15922/PythonAPI/pyproject.toml#L4 Related issue: cocodataset/cocoapi#248 ### Motivation and Context

mengniwang95 added 4 commits June 7, 2023 22:48

Support SmoothQuant

ab3d43f

Signed-off-by: Mengni Wang <mengni.wang@intel.com>

add ut and dependence

8eb3520

Signed-off-by: Mengni Wang <mengni.wang@intel.com>

fix python format

c1ccdd5

Signed-off-by: Mengni Wang <mengni.wang@intel.com>

fix python format

7b5e7f9

Signed-off-by: Mengni Wang <mengni.wang@intel.com>

mengniwang95 requested a review from a team as a code owner June 8, 2023 11:53

mengniwang95 added 3 commits June 14, 2023 19:58

Fix dependency and model

e385a30

Signed-off-by: Mengni Wang <mengni.wang@intel.com>

fix python format

5094bb4

Signed-off-by: Mengni Wang <mengni.wang@intel.com>

fix python format

636ffd5

Signed-off-by: Mengni Wang <mengni.wang@intel.com>

mengniwang95 changed the title ~~[WIP] Support SmoothQuant for ORT static quantization~~ Support SmoothQuant for ORT static quantization Jun 15, 2023

github-advanced-security bot found potential problems Jun 19, 2023

View reviewed changes

onnxruntime/test/python/quantization/test_quantize_static.py Fixed Show fixed Hide fixed

enhance ut

13adeab

Signed-off-by: Mengni Wang <mengni.wang@intel.com>

yufenglee previously approved these changes Jul 3, 2023

View reviewed changes

pranavsharma reviewed Jul 3, 2023

View reviewed changes

yihonglyu reviewed Jul 6, 2023

View reviewed changes

requirements-dev.txt Outdated Show resolved Hide resolved

update requirements

d7bc884

Signed-off-by: Mengni Wang <mengni.wang@intel.com>

mengniwang95 dismissed yufenglee’s stale review via d7bc884 July 10, 2023 01:05

Update ThirdPartyNotices.txt

ebced60

github-advanced-security bot found potential problems Jul 24, 2023

View reviewed changes

onnxruntime/test/python/quantization/test_quantize_static.py Fixed Show fixed Hide fixed

mengniwang95 added 2 commits July 25, 2023 15:05

Merge pull request #3 from microsoft/main

56bb3e3

Pull main branch

Update test_quantize_static.py

20307c2

tianleiwu approved these changes Jul 26, 2023

View reviewed changes

yihonglyu merged commit fe463d4 into microsoft:main Jul 27, 2023
64 of 67 checks passed

tianleiwu mentioned this pull request Jul 29, 2023

skip test_smooth_quant to unblock Python Package Pipeline #16914

Merged

Support SmoothQuant for ORT static quantization #16288

Support SmoothQuant for ORT static quantization #16288

Conversation

mengniwang95 commented Jun 8, 2023 • edited

Description

Motivation and Context

hshen14 commented Jun 15, 2023

hshen14 commented Jun 26, 2023

mengniwang95 commented Jul 3, 2023

yufenglee commented Jul 3, 2023

yufenglee commented Jul 3, 2023

azure-pipelines bot commented Jul 3, 2023

azure-pipelines bot commented Jul 3, 2023

yufenglee commented Jul 3, 2023

azure-pipelines bot commented Jul 3, 2023

yufenglee commented Jul 3, 2023

yufenglee commented Jul 3, 2023

azure-pipelines bot commented Jul 3, 2023

pranavsharma left a comment

Choose a reason for hiding this comment

mengniwang95 commented Jul 5, 2023

mengniwang95 commented Jul 22, 2023

mengniwang95 commented Jul 25, 2023

yufenglee commented Jul 25, 2023

yufenglee commented Jul 25, 2023

azure-pipelines bot commented Jul 25, 2023

azure-pipelines bot commented Jul 25, 2023

yufenglee commented Jul 25, 2023

azure-pipelines bot commented Jul 25, 2023

yufenglee commented Jul 25, 2023

azure-pipelines bot commented Jul 25, 2023

yufenglee commented Jul 25, 2023

azure-pipelines bot commented Jul 25, 2023

yufenglee commented Jul 25, 2023

azure-pipelines bot commented Jul 25, 2023

yufenglee commented Jul 25, 2023

azure-pipelines bot commented Jul 25, 2023

mengniwang95 commented Jul 25, 2023

yihonglyu commented Jul 25, 2023

azure-pipelines bot commented Jul 25, 2023

tianleiwu commented Jul 26, 2023

hshen14 commented Jul 27, 2023

mengniwang95 commented Jun 8, 2023 •

edited