-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quantization of T5-XL Model leads to "exceeds maximum protobuf" #7974
Comments
@CodyLDA , could you share the model or method to generate the model? |
The PyTorch model is the torch.onnx.export(model, args=(input_ids, attention_mask), f=model_path,
export_params=True, opset_version=12, do_constant_folding=True,
input_names=["input_ids", "attention_mask"], output_names=["hidden_states"],
dynamic_axes={"input_ids": {0: "batch", 1: "sequence"}, "attention_mask": {0: "batch", 1: "sequence"},
"hidden_states": {0: "batch", 1: "sequence"}},
use_external_data_format=True) where For the actual quantization step: quantize_dynamic(model_input=onnx_model_name, model_output=output_model_name,
per_channel=True, activation_type=QuantType.QUInt8, weight_type=QuantType.QUInt8,
optimize_model=False, use_external_data_format=False) Then I get the exceed maximum protobuf because |
Did you find a solution for this? I have the same problem with |
Not yet, right now, I have to use the "workaround" mentioned in #7017, but I am not sure if this is the recommended way. |
Thanks for the workaround! |
Hello,
I would like to quantize the T5-XL model (>2GB model size). It is basically the same problem as mentioned in #7017
There is a nightly solution proposed but I would really appreciate it if this would be supported natively.
Also mentioned in the above issue, the shape inference was removed in https://github.com/microsoft/onnxruntime/pull/5210/files, but it seems that this merge was reverted.
Describe the solution you'd like
In this post https://github.com/onnx/onnx/blob/master/docs/PythonAPIOverview.md#shape-inference-a-large-onnx-model-2gb is a currently supported variant of shape inference. So the current implementation of the shape inferences could be to use a try-exception-statement such that if the
onnx.shape_inference.infer_shapes(model)
fails, theonnx.shape_inference.infer_shapes_path('path/to/the/model.onnx')
could be used. Although this is considered to not be good practice.Further I am not sure if the shape inference is the only problem for large models.
Thank you.
The text was updated successfully, but these errors were encountered: