Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quantization of T5-XL Model leads to "exceeds maximum protobuf" #7974

Open
DA-L3 opened this issue Jun 7, 2021 · 5 comments
Open

Quantization of T5-XL Model leads to "exceeds maximum protobuf" #7974

DA-L3 opened this issue Jun 7, 2021 · 5 comments
Assignees

Comments

@DA-L3
Copy link

DA-L3 commented Jun 7, 2021

Hello,

I would like to quantize the T5-XL model (>2GB model size). It is basically the same problem as mentioned in #7017
There is a nightly solution proposed but I would really appreciate it if this would be supported natively.
Also mentioned in the above issue, the shape inference was removed in https://github.com/microsoft/onnxruntime/pull/5210/files, but it seems that this merge was reverted.

Describe the solution you'd like
In this post https://github.com/onnx/onnx/blob/master/docs/PythonAPIOverview.md#shape-inference-a-large-onnx-model-2gb is a currently supported variant of shape inference. So the current implementation of the shape inferences could be to use a try-exception-statement such that if the onnx.shape_inference.infer_shapes(model) fails, the onnx.shape_inference.infer_shapes_path('path/to/the/model.onnx') could be used. Although this is considered to not be good practice.
Further I am not sure if the shape inference is the only problem for large models.

Thank you.

@yufenglee
Copy link
Member

@CodyLDA , could you share the model or method to generate the model?

@DA-L3
Copy link
Author

DA-L3 commented Jun 9, 2021

The PyTorch model is the t5-3b from huggingface.
To export it to ONNX-Format, I am using

torch.onnx.export(model, args=(input_ids, attention_mask), f=model_path,  
                              export_params=True, opset_version=12, do_constant_folding=True, 
                              input_names=["input_ids", "attention_mask"], output_names=["hidden_states"], 
                              dynamic_axes={"input_ids": {0: "batch", 1: "sequence"}, "attention_mask": {0: "batch", 1: "sequence"}, 
                              "hidden_states": {0: "batch", 1: "sequence"}}, 
                              use_external_data_format=True)

where input_ids and attention_mask are dummy inputs generated from a random text sequence and the T5Tokenizer.

For the actual quantization step:

quantize_dynamic(model_input=onnx_model_name, model_output=output_model_name, 
                              per_channel=True, activation_type=QuantType.QUInt8, weight_type=QuantType.QUInt8,
                              optimize_model=False, use_external_data_format=False)

Then I get the exceed maximum protobuf because quantize dynamic uses ONNXQuantizer which then calls model = onnx.shape_inference.infer_shapes(model), that causes at least this shape inference issue.

@Riccorl
Copy link

Riccorl commented Jun 16, 2021

Did you find a solution for this? I have the same problem with xlm-roberta-large. Even setting use_external_data_format=True doesn't work.

@DA-L3
Copy link
Author

DA-L3 commented Jun 16, 2021

Not yet, right now, I have to use the "workaround" mentioned in #7017, but I am not sure if this is the recommended way.

@Riccorl
Copy link

Riccorl commented Jun 16, 2021

Thanks for the workaround!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants