Quantization of T5-XL Model leads to "exceeds maximum protobuf" #7974

DA-L3 · 2021-06-07T08:47:42Z

Hello,

I would like to quantize the T5-XL model (>2GB model size). It is basically the same problem as mentioned in #7017
There is a nightly solution proposed but I would really appreciate it if this would be supported natively.
Also mentioned in the above issue, the shape inference was removed in https://github.com/microsoft/onnxruntime/pull/5210/files, but it seems that this merge was reverted.

Describe the solution you'd like
In this post https://github.com/onnx/onnx/blob/master/docs/PythonAPIOverview.md#shape-inference-a-large-onnx-model-2gb is a currently supported variant of shape inference. So the current implementation of the shape inferences could be to use a try-exception-statement such that if the onnx.shape_inference.infer_shapes(model) fails, the onnx.shape_inference.infer_shapes_path('path/to/the/model.onnx') could be used. Although this is considered to not be good practice.
Further I am not sure if the shape inference is the only problem for large models.

Thank you.

The text was updated successfully, but these errors were encountered:

yufenglee · 2021-06-08T22:08:34Z

@CodyLDA , could you share the model or method to generate the model?

DA-L3 · 2021-06-09T06:53:30Z

The PyTorch model is the t5-3b from huggingface.
To export it to ONNX-Format, I am using

torch.onnx.export(model, args=(input_ids, attention_mask), f=model_path,  
                              export_params=True, opset_version=12, do_constant_folding=True, 
                              input_names=["input_ids", "attention_mask"], output_names=["hidden_states"], 
                              dynamic_axes={"input_ids": {0: "batch", 1: "sequence"}, "attention_mask": {0: "batch", 1: "sequence"}, 
                              "hidden_states": {0: "batch", 1: "sequence"}}, 
                              use_external_data_format=True)

where input_ids and attention_mask are dummy inputs generated from a random text sequence and the T5Tokenizer.

For the actual quantization step:

quantize_dynamic(model_input=onnx_model_name, model_output=output_model_name, 
                              per_channel=True, activation_type=QuantType.QUInt8, weight_type=QuantType.QUInt8,
                              optimize_model=False, use_external_data_format=False)

Then I get the exceed maximum protobuf because quantize dynamic uses ONNXQuantizer which then calls model = onnx.shape_inference.infer_shapes(model), that causes at least this shape inference issue.

Riccorl · 2021-06-16T14:50:52Z

Did you find a solution for this? I have the same problem with xlm-roberta-large. Even setting use_external_data_format=True doesn't work.

DA-L3 · 2021-06-16T14:57:45Z

Not yet, right now, I have to use the "workaround" mentioned in #7017, but I am not sure if this is the recommended way.

Riccorl · 2021-06-16T15:12:03Z

Thanks for the workaround!

DA-L3 mentioned this issue Jun 7, 2021

CPU-inference T5Encoder-XL slower than PyTorch Ki6an/fastT5#14

Closed

tianleiwu assigned yufenglee Jun 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantization of T5-XL Model leads to "exceeds maximum protobuf" #7974

Quantization of T5-XL Model leads to "exceeds maximum protobuf" #7974

DA-L3 commented Jun 7, 2021

yufenglee commented Jun 8, 2021

DA-L3 commented Jun 9, 2021

Riccorl commented Jun 16, 2021

DA-L3 commented Jun 16, 2021

Riccorl commented Jun 16, 2021

Quantization of T5-XL Model leads to "exceeds maximum protobuf" #7974

Quantization of T5-XL Model leads to "exceeds maximum protobuf" #7974

Comments

DA-L3 commented Jun 7, 2021

yufenglee commented Jun 8, 2021

DA-L3 commented Jun 9, 2021

Riccorl commented Jun 16, 2021

DA-L3 commented Jun 16, 2021

Riccorl commented Jun 16, 2021