🅰️ Auto Awq in Olive with QuantLinear which provides the capbility for onnx conversion #1080

trajepl · 2024-04-15T03:25:31Z

Describe your changes

Enable AutoAwq in Olive -> AwqQuantizer where we do not only call AutoAwq to quantize model, but also use customized QuantLinear to make it is possible to convert the quantized torch model to onnx.
Background: AutoAwq quantization will rewrite the quantization utils with their own code which may not be able to be converted to onnx.
Add facebook/opt-125m example with AwqQuantizer

Checklist before requesting a review

Add unit tests for this change.
Make sure all tests can pass.
Update documents if necessary.
Lint and apply fixes to your code by running lintrunner -a
Is this a user-facing change? If yes, give a description of this change to be included in the release notes.
Is this PR including examples changes? If yes, please remember to update example documentation in a follow-up PR.

(Optional) Issue link

olive/passes/pytorch/ort_quant_utils.py

guotuofeng · 2024-04-18T02:49:33Z

olive/passes/pytorch/gptq.py

@@ -126,7 +126,7 @@ def _run_for_config(
        from auto_gptq.modeling import BaseGPTQForCausalLM
        from auto_gptq.modeling.auto import GPTQ_CAUSAL_LM_MODEL_MAP

-        from olive.passes.pytorch.gptq_utils import QuantLinearORT
+        from olive.passes.pytorch.quant_utils import QuantLinearORT


what's the ORT mean? seems the torch module doesn't have anything to do with ORT?

The quant_utils is wrote with Pytorch.Module but it is customized for onnx exporter.

Does it the packing need custom ORT Operator at runtime? If the exported graph is a standard ONNX graph (which does not need custom ORT Operator), then ORT in name is not clear.

Yes, it packed. But to align the implementations(the customer ORT Operator you mentioned is actually customized torch module), I already removed the ort_xxx

QuantLinearORT

guotuofeng · 2024-04-18T02:50:16Z

it seems to be under pytorch passes but the title is OrtQuantLinear?

trajepl · 2024-04-18T02:54:34Z

it seems to be under pytorch passes but the title is OrtQuantLinear?

Updated title.

…/auto_awq

trajepl · 2024-04-29T03:04:35Z

/azp run

azure-pipelines · 2024-04-29T03:04:45Z

Azure Pipelines successfully started running 1 pipeline(s).

examples/opt_125m/user_script.py

olive/passes/pytorch/awq.py

devang-ml · 2024-04-29T19:40:34Z

olive/passes/pytorch/gptq.py

@@ -126,7 +126,7 @@ def _run_for_config(
        from auto_gptq.modeling import BaseGPTQForCausalLM
        from auto_gptq.modeling.auto import GPTQ_CAUSAL_LM_MODEL_MAP

-        from olive.passes.pytorch.gptq_utils import QuantLinearORT
+        from olive.passes.pytorch.quant_utils import QuantLinearORT


Does it the packing need custom ORT Operator at runtime? If the exported graph is a standard ONNX graph (which does not need custom ORT Operator), then ORT in name is not clear.

olive/passes/pytorch/awq.py

trajepl added 2 commits April 15, 2024 11:24

Auto Awq in Olive with OrtQuantLinear

30f384e

fix

c18d07d

devang-ml reviewed Apr 15, 2024

View reviewed changes

olive/passes/pytorch/ort_quant_utils.py Outdated Show resolved Hide resolved

rename ort_quant_utils to quant_utils

0b7c5ad

guotuofeng reviewed Apr 18, 2024

View reviewed changes

trajepl changed the title ~~🅰️ Auto Awq in Olive with OrtQuantLinear~~ 🅰️ Auto Awq in Olive with QuantLinear which provides the capbility for onnx conversion Apr 18, 2024

Merge branch 'main' of https://github.com/microsoft/olive into jiapli…

311071d

…/auto_awq

guotuofeng reviewed Apr 29, 2024

View reviewed changes

examples/opt_125m/user_script.py Show resolved Hide resolved

devang-ml reviewed Apr 29, 2024

View reviewed changes

trajepl added 4 commits April 30, 2024 10:42

add readme for awq example

a51b478

rename awq to autoawq

2cccb88

fix

3ed710a

rename QuantLinearORT to QuantLinear

47e0561

guotuofeng approved these changes May 7, 2024

View reviewed changes

trajepl merged commit f991288 into main May 7, 2024
35 checks passed

trajepl deleted the jiapli/auto_awq branch May 7, 2024 10:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🅰️ Auto Awq in Olive with QuantLinear which provides the capbility for onnx conversion #1080

🅰️ Auto Awq in Olive with QuantLinear which provides the capbility for onnx conversion #1080

trajepl commented Apr 15, 2024 •

edited

guotuofeng Apr 18, 2024

trajepl Apr 18, 2024

devang-ml Apr 29, 2024

trajepl Apr 30, 2024 •

edited

devang-ml Apr 30, 2024

trajepl May 6, 2024

guotuofeng commented Apr 18, 2024

trajepl commented Apr 18, 2024

trajepl commented Apr 29, 2024

azure-pipelines bot commented Apr 29, 2024

devang-ml Apr 29, 2024

🅰️ Auto Awq in Olive with QuantLinear which provides the capbility for onnx conversion #1080

🅰️ Auto Awq in Olive with QuantLinear which provides the capbility for onnx conversion #1080

Conversation

trajepl commented Apr 15, 2024 • edited

Describe your changes

Checklist before requesting a review

(Optional) Issue link

guotuofeng Apr 18, 2024

Choose a reason for hiding this comment

trajepl Apr 18, 2024

Choose a reason for hiding this comment

devang-ml Apr 29, 2024

Choose a reason for hiding this comment

trajepl Apr 30, 2024 • edited

Choose a reason for hiding this comment

devang-ml Apr 30, 2024

Choose a reason for hiding this comment

trajepl May 6, 2024

Choose a reason for hiding this comment

guotuofeng commented Apr 18, 2024

trajepl commented Apr 18, 2024

trajepl commented Apr 29, 2024

azure-pipelines bot commented Apr 29, 2024

devang-ml Apr 29, 2024

Choose a reason for hiding this comment

trajepl commented Apr 15, 2024 •

edited

trajepl Apr 30, 2024 •

edited