how to convert int8 converted onnx model to tensorrt? #20

lucasjinreal · 2022-03-16T06:33:40Z

Hi, I followed the example and successfully converted the model to int8 on GPT2 model.
but the model generated with some Customized onnx op:

Shape,Gather,Range,Unsqueeze,Concat,Reshape,Add,LayerNormalization,DynamicQua │
│ ntizeLinear,Slice,Mul,MatMulInteger,Squeeze,Cast,Split,Sub,Transpose,MatMul,Pow,Div,Wh │
│ ere,Softmax,FastGelu,SkipLayerNormalization

Such as DynamicQuantizeLinear, FastGelu, how to converted it to tensorrt?

the int8 model was 400M compares with original 1.2G. much more small, if can inference via tensorrt, it can be massively accelerated.

The text was updated successfully, but these errors were encountered:

ykim362 · 2022-05-18T22:13:41Z

In this repository, we demonstrate CPU quantization only. For the GPU quantization, please visit https://github.com/Microsoft/onnxruntime. Thank you for your interest.

ykim362 closed this as completed May 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to convert int8 converted onnx model to tensorrt? #20

how to convert int8 converted onnx model to tensorrt? #20

lucasjinreal commented Mar 16, 2022

ykim362 commented May 18, 2022

how to convert int8 converted onnx model to tensorrt? #20

how to convert int8 converted onnx model to tensorrt? #20

Comments

lucasjinreal commented Mar 16, 2022

ykim362 commented May 18, 2022