You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for providing fast-bart. It has made my life much easier.
I find the bart-onnx-quantized model 2-3x faster than the Pytorch model. However, when the sequence length is long (~500 tokens), the onnx-based model is 1.5-2x slower.
In my experiments with longer input sequences (~500 tokens), the onnx performance is only slightly slower than that of the Pytorch model, if not similar.
The performance gains o Onnx over Pytorch do diminish for longer sequences, especially above 400 tokens, etc.
Hi @siddharth-sharma7
Thank you for providing fast-bart. It has made my life much easier.
I find the bart-onnx-quantized model 2-3x faster than the Pytorch model. However, when the sequence length is long (~500 tokens), the onnx-based model is 1.5-2x slower.
I also find a similar problem for T5-onnx model that has been discussed at microsoft/onnxruntime#6835.
Just wondering if we're facing the same issue here.
The text was updated successfully, but these errors were encountered: