-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🅰️ Auto Awq in Olive with QuantLinear which provides the capbility for onnx conversion #1080
Conversation
olive/passes/pytorch/gptq.py
Outdated
@@ -126,7 +126,7 @@ def _run_for_config( | |||
from auto_gptq.modeling import BaseGPTQForCausalLM | |||
from auto_gptq.modeling.auto import GPTQ_CAUSAL_LM_MODEL_MAP | |||
|
|||
from olive.passes.pytorch.gptq_utils import QuantLinearORT | |||
from olive.passes.pytorch.quant_utils import QuantLinearORT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the ORT mean? seems the torch module doesn't have anything to do with ORT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The quant_utils is wrote with Pytorch.Module but it is customized for onnx exporter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it the packing need custom ORT Operator at runtime? If the exported graph is a standard ONNX graph (which does not need custom ORT Operator), then ORT in name is not clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it packed. But to align the implementations(the customer ORT Operator
you mentioned is actually customized torch module), I already removed the ort_xxx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QuantLinearORT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
it seems to be under pytorch passes but the title is OrtQuantLinear? |
Updated title. |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
olive/passes/pytorch/gptq.py
Outdated
@@ -126,7 +126,7 @@ def _run_for_config( | |||
from auto_gptq.modeling import BaseGPTQForCausalLM | |||
from auto_gptq.modeling.auto import GPTQ_CAUSAL_LM_MODEL_MAP | |||
|
|||
from olive.passes.pytorch.gptq_utils import QuantLinearORT | |||
from olive.passes.pytorch.quant_utils import QuantLinearORT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it the packing need custom ORT Operator at runtime? If the exported graph is a standard ONNX graph (which does not need custom ORT Operator), then ORT in name is not clear.
Describe your changes
Background: AutoAwq quantization will rewrite the quantization utils with their own code which may not be able to be converted to onnx.
Checklist before requesting a review
lintrunner -a
(Optional) Issue link