Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🅰️ Auto Awq in Olive with QuantLinear which provides the capbility for onnx conversion #1080

Merged
merged 8 commits into from
May 7, 2024

Conversation

trajepl
Copy link
Contributor

@trajepl trajepl commented Apr 15, 2024

Describe your changes

  1. Enable AutoAwq in Olive -> AwqQuantizer where we do not only call AutoAwq to quantize model, but also use customized QuantLinear to make it is possible to convert the quantized torch model to onnx.
    Background: AutoAwq quantization will rewrite the quantization utils with their own code which may not be able to be converted to onnx.
  2. Add facebook/opt-125m example with AwqQuantizer

Checklist before requesting a review

  • Add unit tests for this change.
  • Make sure all tests can pass.
  • Update documents if necessary.
  • Lint and apply fixes to your code by running lintrunner -a
  • Is this a user-facing change? If yes, give a description of this change to be included in the release notes.
  • Is this PR including examples changes? If yes, please remember to update example documentation in a follow-up PR.

(Optional) Issue link

@@ -126,7 +126,7 @@ def _run_for_config(
from auto_gptq.modeling import BaseGPTQForCausalLM
from auto_gptq.modeling.auto import GPTQ_CAUSAL_LM_MODEL_MAP

from olive.passes.pytorch.gptq_utils import QuantLinearORT
from olive.passes.pytorch.quant_utils import QuantLinearORT
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the ORT mean? seems the torch module doesn't have anything to do with ORT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The quant_utils is wrote with Pytorch.Module but it is customized for onnx exporter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it the packing need custom ORT Operator at runtime? If the exported graph is a standard ONNX graph (which does not need custom ORT Operator), then ORT in name is not clear.

Copy link
Contributor Author

@trajepl trajepl Apr 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it packed. But to align the implementations(the customer ORT Operator you mentioned is actually customized torch module), I already removed the ort_xxx

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QuantLinearORT

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@guotuofeng
Copy link
Collaborator

it seems to be under pytorch passes but the title is OrtQuantLinear?

@trajepl trajepl changed the title 🅰️ Auto Awq in Olive with OrtQuantLinear 🅰️ Auto Awq in Olive with QuantLinear which provides the capbility for onnx conversion Apr 18, 2024
@trajepl
Copy link
Contributor Author

trajepl commented Apr 18, 2024

it seems to be under pytorch passes but the title is OrtQuantLinear?

Updated title.

@trajepl
Copy link
Contributor Author

trajepl commented Apr 29, 2024

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

olive/passes/pytorch/awq.py Outdated Show resolved Hide resolved
@@ -126,7 +126,7 @@ def _run_for_config(
from auto_gptq.modeling import BaseGPTQForCausalLM
from auto_gptq.modeling.auto import GPTQ_CAUSAL_LM_MODEL_MAP

from olive.passes.pytorch.gptq_utils import QuantLinearORT
from olive.passes.pytorch.quant_utils import QuantLinearORT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it the packing need custom ORT Operator at runtime? If the exported graph is a standard ONNX graph (which does not need custom ORT Operator), then ORT in name is not clear.

olive/passes/pytorch/awq.py Outdated Show resolved Hide resolved
@trajepl trajepl merged commit f991288 into main May 7, 2024
35 checks passed
@trajepl trajepl deleted the jiapli/auto_awq branch May 7, 2024 10:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants