Inference slower than Pytorch model for long sequence length #3

jasontian6666 · 2022-03-02T22:50:36Z

Thank you for providing fast-bart. It has made my life much easier.

I find the bart-onnx-quantized model 2-3x faster than the Pytorch model. However, when the sequence length is long (~500 tokens), the onnx-based model is 1.5-2x slower.

I also find a similar problem for T5-onnx model that has been discussed at microsoft/onnxruntime#6835.

Just wondering if we're facing the same issue here.

sidsharma72 · 2022-05-18T12:48:39Z

In my experiments with longer input sequences (~500 tokens), the onnx performance is only slightly slower than that of the Pytorch model, if not similar.
The performance gains o Onnx over Pytorch do diminish for longer sequences, especially above 400 tokens, etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference slower than Pytorch model for long sequence length #3

Inference slower than Pytorch model for long sequence length #3

jasontian6666 commented Mar 2, 2022

sidsharma72 commented May 18, 2022

Inference slower than Pytorch model for long sequence length #3

Inference slower than Pytorch model for long sequence length #3

Comments

jasontian6666 commented Mar 2, 2022

sidsharma72 commented May 18, 2022