Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support SmoothQuant for ORT static quantization #16288

Merged
merged 26 commits into from Jul 27, 2023

Conversation

mengniwang95
Copy link
Contributor

@mengniwang95 mengniwang95 commented Jun 8, 2023

Description

Support SmoothQuant for ORT static quantization via intel neural compressor

Note:
Please use neural-compressor==2.2 to try SmoothQuant function.

Motivation and Context

For large language models (LLMs) with gigantic parameters, the systematic outliers make quantification of activations difficult. As a training free post-training quantization (PTQ) solution, SmoothQuant offline migrates this difficulty from activations to weights with a mathematically equivalent transformation. Integrating SmoothQuant into ORT quantization can benefit the accuracy of INT8 LLMs.

Signed-off-by: Mengni Wang <mengni.wang@intel.com>
Signed-off-by: Mengni Wang <mengni.wang@intel.com>
Signed-off-by: Mengni Wang <mengni.wang@intel.com>
Signed-off-by: Mengni Wang <mengni.wang@intel.com>
@mengniwang95 mengniwang95 requested a review from a team as a code owner June 8, 2023 11:53
Signed-off-by: Mengni Wang <mengni.wang@intel.com>
Signed-off-by: Mengni Wang <mengni.wang@intel.com>
Signed-off-by: Mengni Wang <mengni.wang@intel.com>
@mengniwang95 mengniwang95 changed the title [WIP] Support SmoothQuant for ORT static quantization Support SmoothQuant for ORT static quantization Jun 15, 2023
@hshen14
Copy link

hshen14 commented Jun 15, 2023

Hi @yufenglee , please review the PR. Thanks.

Signed-off-by: Mengni Wang <mengni.wang@intel.com>
@hshen14
Copy link

hshen14 commented Jun 26, 2023

FYI - Intel Neural Compressor v2.2 was released with enhanced SmoothQuant. This PR enables SmoothQuant feature in ORT.

@mengniwang95
Copy link
Contributor Author

@yufenglee hi, could you take a look and review this PR? thanks a lot 😄

yufenglee
yufenglee previously approved these changes Jul 3, 2023
@yufenglee
Copy link
Member

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, Windows CPU CI Pipeline, Windows GPU CI Pipeline

@yufenglee
Copy link
Member

/azp run Windows GPU TensorRT CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, onnxruntime-python-checks-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed

@azure-pipelines
Copy link

Azure Pipelines successfully started running 9 pipeline(s).

@azure-pipelines
Copy link

Azure Pipelines successfully started running 5 pipeline(s).

@yufenglee
Copy link
Member

/azp run Linux DNNL CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@yufenglee
Copy link
Member

@pranavsharma, could you please help review? The PR adds a package in requirements-dev.txt

@yufenglee
Copy link
Member

/azp run Linux QNN CI Pipeline, Windows ARM64 QNN CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

Copy link
Contributor

@pranavsharma pranavsharma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a new dep. Have you followed the process of adding a new dep? See #15797 (comment)

@mengniwang95
Copy link
Contributor Author

Hi, @pranavsharma It seems the new dep neural-compressor isn't installed during CI test, could you provide me some help about where to add this dep for CI test?

requirements-dev.txt Outdated Show resolved Hide resolved
Signed-off-by: Mengni Wang <mengni.wang@intel.com>
@mengniwang95
Copy link
Contributor Author

@tianleiwu Hi, we released a new version which uses opencv-python-headless instead of opencv, we validated that opencv-python-headless can skip libGL.so.1 issue. Could you help tp trigger tests again?

@mengniwang95
Copy link
Contributor Author

Hi, could someone help to trigger the test?

@yufenglee
Copy link
Member

/azp run Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows ARM64 QNN CI Pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed, ONNX Runtime React Native CI Pipeline

@yufenglee
Copy link
Member

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, Linux QNN CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 8 pipeline(s).

@azure-pipelines
Copy link

No commit pushedDate could be found for PR 16288 in repo microsoft/onnxruntime

@yufenglee
Copy link
Member

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, Linux QNN CI Pipeline

@azure-pipelines
Copy link

No commit pushedDate could be found for PR 16288 in repo microsoft/onnxruntime

@yufenglee
Copy link
Member

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline, Linux QNN CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline

@azure-pipelines
Copy link

No commit pushedDate could be found for PR 16288 in repo microsoft/onnxruntime

@yufenglee
Copy link
Member

/azp run Linux CPU CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@yufenglee
Copy link
Member

/azp run Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline, Linux QNN CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 7 pipeline(s).

@yufenglee
Copy link
Member

/azp run onnxruntime-binary-size-checks-ci-pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mengniwang95
Copy link
Contributor Author

Hi @yufenglee , I am not sure why ONNX Runtime React Native CI Pipeline failed, could you help to take a look?

@yihonglyu
Copy link
Contributor

/azp run ONNX Runtime React Native CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@tianleiwu
Copy link
Contributor

Hi @yufenglee , I am not sure why ONNX Runtime React Native CI Pipeline failed, could you help to take a look?
The error seems not related to this change. I think it is fine. Merge main shall resolve the pipeline failure.

@yihonglyu yihonglyu merged commit fe463d4 into microsoft:main Jul 27, 2023
64 of 67 checks passed
@hshen14
Copy link

hshen14 commented Jul 27, 2023

Thanks @yihonglyu @tianleiwu @yufenglee! Appreciate all the help during our first PR contribution to ORT.

tianleiwu added a commit that referenced this pull request Jul 29, 2023
### Description
Python Package Pipeline failed since there is exception raised in
test_smooth_quant (from #16288):
```
File "/home/cloudtest/.local/lib/python3.8/site-packages/onnxruntime/quantization/quantize.py", line 384, in quantize_static
    importlib.import_module("neural_compressor.adaptor.ox_utils.smooth_quant")
  File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/__init__.py", line 24, in <module>
    from .contrib import *
  File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/contrib/__init__.py", line 19, in <module>
    from .strategy import *
  File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/contrib/strategy/__init__.py", line 26, in <module>
    __import__(basename(f)[:-3], globals(), locals(), level=1)
  File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/contrib/strategy/sigopt.py", line 22, in <module>
    from neural_compressor.strategy.strategy import strategy_registry, TuneStrategy
  File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/strategy/__init__.py", line 20, in <module>
    from .strategy import STRATEGIES
  File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/strategy/strategy.py", line 41, in <module>
    from ..algorithm import AlgorithmScheduler, ALGORITHMS
  File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/algorithm/__init__.py", line 20, in <module>
    from .algorithm import ALGORITHMS, Algorithm, AlgorithmScheduler, algorithm_registry
  File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/algorithm/algorithm.py", line 21, in <module>
    from neural_compressor.utils.create_obj_from_config import get_algorithm
  File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/utils/create_obj_from_config.py", line 20, in <module>
    from neural_compressor.metric import METRICS
  File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/metric/__init__.py", line 30, in <module>
    __import__(basename(f)[:-3], globals(), locals(), level=1)
  File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/metric/coco_tools.py", line 54, in <module>
    from pycocotools import coco
  File "/usr/local/lib/python3.8/dist-packages/pycocotools/coco.py", line 52, in <module>
    from . import mask as maskUtils
  File "/usr/local/lib/python3.8/dist-packages/pycocotools/mask.py", line 3, in <module>
    import pycocotools._mask as _mask
  File "pycocotools/_mask.pyx", line 1, in init pycocotools._mask
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
```
The cause is pycocotools package uses "oldest-supported-numpy", which
might cause older version numpy in build pycocotools:

https://github.com/ppwwyyxx/cocoapi/blob/9e9164f979fe4265c6f387f10e234f8697a15922/PythonAPI/pyproject.toml#L4

Related issue: cocodataset/cocoapi#248

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
jchen351 pushed a commit that referenced this pull request Aug 12, 2023
### Description

Support SmoothQuant for ORT static quantization via intel neural
compressor

> Note:
Please use neural-compressor==2.2 to try SmoothQuant function.

### Motivation and Context
For large language models (LLMs) with gigantic parameters, the
systematic outliers make quantification of activations difficult. As a
training free post-training quantization (PTQ) solution, SmoothQuant
offline migrates this difficulty from activations to weights with a
mathematically equivalent transformation. Integrating SmoothQuant into
ORT quantization can benefit the accuracy of INT8 LLMs.

---------

Signed-off-by: Mengni Wang <mengni.wang@intel.com>
jchen351 pushed a commit that referenced this pull request Aug 12, 2023
### Description
Python Package Pipeline failed since there is exception raised in
test_smooth_quant (from #16288):
```
File "/home/cloudtest/.local/lib/python3.8/site-packages/onnxruntime/quantization/quantize.py", line 384, in quantize_static
    importlib.import_module("neural_compressor.adaptor.ox_utils.smooth_quant")
  File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/__init__.py", line 24, in <module>
    from .contrib import *
  File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/contrib/__init__.py", line 19, in <module>
    from .strategy import *
  File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/contrib/strategy/__init__.py", line 26, in <module>
    __import__(basename(f)[:-3], globals(), locals(), level=1)
  File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/contrib/strategy/sigopt.py", line 22, in <module>
    from neural_compressor.strategy.strategy import strategy_registry, TuneStrategy
  File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/strategy/__init__.py", line 20, in <module>
    from .strategy import STRATEGIES
  File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/strategy/strategy.py", line 41, in <module>
    from ..algorithm import AlgorithmScheduler, ALGORITHMS
  File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/algorithm/__init__.py", line 20, in <module>
    from .algorithm import ALGORITHMS, Algorithm, AlgorithmScheduler, algorithm_registry
  File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/algorithm/algorithm.py", line 21, in <module>
    from neural_compressor.utils.create_obj_from_config import get_algorithm
  File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/utils/create_obj_from_config.py", line 20, in <module>
    from neural_compressor.metric import METRICS
  File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/metric/__init__.py", line 30, in <module>
    __import__(basename(f)[:-3], globals(), locals(), level=1)
  File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/metric/coco_tools.py", line 54, in <module>
    from pycocotools import coco
  File "/usr/local/lib/python3.8/dist-packages/pycocotools/coco.py", line 52, in <module>
    from . import mask as maskUtils
  File "/usr/local/lib/python3.8/dist-packages/pycocotools/mask.py", line 3, in <module>
    import pycocotools._mask as _mask
  File "pycocotools/_mask.pyx", line 1, in init pycocotools._mask
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
```
The cause is pycocotools package uses "oldest-supported-numpy", which
might cause older version numpy in build pycocotools:

https://github.com/ppwwyyxx/cocoapi/blob/9e9164f979fe4265c6f387f10e234f8697a15922/PythonAPI/pyproject.toml#L4

Related issue: cocodataset/cocoapi#248

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants