-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support SmoothQuant for ORT static quantization #16288
Conversation
Signed-off-by: Mengni Wang <mengni.wang@intel.com>
Signed-off-by: Mengni Wang <mengni.wang@intel.com>
Signed-off-by: Mengni Wang <mengni.wang@intel.com>
Signed-off-by: Mengni Wang <mengni.wang@intel.com>
Signed-off-by: Mengni Wang <mengni.wang@intel.com>
Signed-off-by: Mengni Wang <mengni.wang@intel.com>
Signed-off-by: Mengni Wang <mengni.wang@intel.com>
Hi @yufenglee , please review the PR. Thanks. |
Signed-off-by: Mengni Wang <mengni.wang@intel.com>
FYI - Intel Neural Compressor v2.2 was released with enhanced SmoothQuant. This PR enables SmoothQuant feature in ORT. |
@yufenglee hi, could you take a look and review this PR? thanks a lot 😄 |
/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, Windows CPU CI Pipeline, Windows GPU CI Pipeline |
/azp run Windows GPU TensorRT CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, onnxruntime-python-checks-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed |
Azure Pipelines successfully started running 9 pipeline(s). |
Azure Pipelines successfully started running 5 pipeline(s). |
/azp run Linux DNNL CI Pipeline |
Azure Pipelines successfully started running 1 pipeline(s). |
@pranavsharma, could you please help review? The PR adds a package in requirements-dev.txt |
/azp run Linux QNN CI Pipeline, Windows ARM64 QNN CI Pipeline |
Azure Pipelines successfully started running 2 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a new dep. Have you followed the process of adding a new dep? See #15797 (comment)
Hi, @pranavsharma It seems the new dep neural-compressor isn't installed during CI test, could you provide me some help about where to add this dep for CI test? |
Signed-off-by: Mengni Wang <mengni.wang@intel.com>
@tianleiwu Hi, we released a new version which uses opencv-python-headless instead of opencv, we validated that opencv-python-headless can skip libGL.so.1 issue. Could you help tp trigger tests again? |
Pull main branch
Hi, could someone help to trigger the test? |
/azp run Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows ARM64 QNN CI Pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed, ONNX Runtime React Native CI Pipeline |
/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, Linux QNN CI Pipeline |
Azure Pipelines successfully started running 8 pipeline(s). |
No commit pushedDate could be found for PR 16288 in repo microsoft/onnxruntime |
/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, Linux QNN CI Pipeline |
No commit pushedDate could be found for PR 16288 in repo microsoft/onnxruntime |
/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline, Linux QNN CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline |
No commit pushedDate could be found for PR 16288 in repo microsoft/onnxruntime |
/azp run Linux CPU CI Pipeline |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline, Linux QNN CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline |
Azure Pipelines successfully started running 7 pipeline(s). |
/azp run onnxruntime-binary-size-checks-ci-pipeline |
Azure Pipelines successfully started running 1 pipeline(s). |
Hi @yufenglee , I am not sure why ONNX Runtime React Native CI Pipeline failed, could you help to take a look? |
/azp run ONNX Runtime React Native CI Pipeline |
Azure Pipelines successfully started running 1 pipeline(s). |
|
Thanks @yihonglyu @tianleiwu @yufenglee! Appreciate all the help during our first PR contribution to ORT. |
### Description Python Package Pipeline failed since there is exception raised in test_smooth_quant (from #16288): ``` File "/home/cloudtest/.local/lib/python3.8/site-packages/onnxruntime/quantization/quantize.py", line 384, in quantize_static importlib.import_module("neural_compressor.adaptor.ox_utils.smooth_quant") File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/__init__.py", line 24, in <module> from .contrib import * File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/contrib/__init__.py", line 19, in <module> from .strategy import * File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/contrib/strategy/__init__.py", line 26, in <module> __import__(basename(f)[:-3], globals(), locals(), level=1) File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/contrib/strategy/sigopt.py", line 22, in <module> from neural_compressor.strategy.strategy import strategy_registry, TuneStrategy File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/strategy/__init__.py", line 20, in <module> from .strategy import STRATEGIES File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/strategy/strategy.py", line 41, in <module> from ..algorithm import AlgorithmScheduler, ALGORITHMS File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/algorithm/__init__.py", line 20, in <module> from .algorithm import ALGORITHMS, Algorithm, AlgorithmScheduler, algorithm_registry File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/algorithm/algorithm.py", line 21, in <module> from neural_compressor.utils.create_obj_from_config import get_algorithm File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/utils/create_obj_from_config.py", line 20, in <module> from neural_compressor.metric import METRICS File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/metric/__init__.py", line 30, in <module> __import__(basename(f)[:-3], globals(), locals(), level=1) File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/metric/coco_tools.py", line 54, in <module> from pycocotools import coco File "/usr/local/lib/python3.8/dist-packages/pycocotools/coco.py", line 52, in <module> from . import mask as maskUtils File "/usr/local/lib/python3.8/dist-packages/pycocotools/mask.py", line 3, in <module> import pycocotools._mask as _mask File "pycocotools/_mask.pyx", line 1, in init pycocotools._mask ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject ``` The cause is pycocotools package uses "oldest-supported-numpy", which might cause older version numpy in build pycocotools: https://github.com/ppwwyyxx/cocoapi/blob/9e9164f979fe4265c6f387f10e234f8697a15922/PythonAPI/pyproject.toml#L4 Related issue: cocodataset/cocoapi#248 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
### Description Support SmoothQuant for ORT static quantization via intel neural compressor > Note: Please use neural-compressor==2.2 to try SmoothQuant function. ### Motivation and Context For large language models (LLMs) with gigantic parameters, the systematic outliers make quantification of activations difficult. As a training free post-training quantization (PTQ) solution, SmoothQuant offline migrates this difficulty from activations to weights with a mathematically equivalent transformation. Integrating SmoothQuant into ORT quantization can benefit the accuracy of INT8 LLMs. --------- Signed-off-by: Mengni Wang <mengni.wang@intel.com>
### Description Python Package Pipeline failed since there is exception raised in test_smooth_quant (from #16288): ``` File "/home/cloudtest/.local/lib/python3.8/site-packages/onnxruntime/quantization/quantize.py", line 384, in quantize_static importlib.import_module("neural_compressor.adaptor.ox_utils.smooth_quant") File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/__init__.py", line 24, in <module> from .contrib import * File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/contrib/__init__.py", line 19, in <module> from .strategy import * File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/contrib/strategy/__init__.py", line 26, in <module> __import__(basename(f)[:-3], globals(), locals(), level=1) File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/contrib/strategy/sigopt.py", line 22, in <module> from neural_compressor.strategy.strategy import strategy_registry, TuneStrategy File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/strategy/__init__.py", line 20, in <module> from .strategy import STRATEGIES File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/strategy/strategy.py", line 41, in <module> from ..algorithm import AlgorithmScheduler, ALGORITHMS File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/algorithm/__init__.py", line 20, in <module> from .algorithm import ALGORITHMS, Algorithm, AlgorithmScheduler, algorithm_registry File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/algorithm/algorithm.py", line 21, in <module> from neural_compressor.utils.create_obj_from_config import get_algorithm File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/utils/create_obj_from_config.py", line 20, in <module> from neural_compressor.metric import METRICS File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/metric/__init__.py", line 30, in <module> __import__(basename(f)[:-3], globals(), locals(), level=1) File "/home/cloudtest/.local/lib/python3.8/site-packages/neural_compressor/metric/coco_tools.py", line 54, in <module> from pycocotools import coco File "/usr/local/lib/python3.8/dist-packages/pycocotools/coco.py", line 52, in <module> from . import mask as maskUtils File "/usr/local/lib/python3.8/dist-packages/pycocotools/mask.py", line 3, in <module> import pycocotools._mask as _mask File "pycocotools/_mask.pyx", line 1, in init pycocotools._mask ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject ``` The cause is pycocotools package uses "oldest-supported-numpy", which might cause older version numpy in build pycocotools: https://github.com/ppwwyyxx/cocoapi/blob/9e9164f979fe4265c6f387f10e234f8697a15922/PythonAPI/pyproject.toml#L4 Related issue: cocodataset/cocoapi#248 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
Description
Support SmoothQuant for ORT static quantization via intel neural compressor
Motivation and Context
For large language models (LLMs) with gigantic parameters, the systematic outliers make quantification of activations difficult. As a training free post-training quantization (PTQ) solution, SmoothQuant offline migrates this difficulty from activations to weights with a mathematically equivalent transformation. Integrating SmoothQuant into ORT quantization can benefit the accuracy of INT8 LLMs.