Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference slower than Pytorch model for long sequence length #3

Open
jasontian6666 opened this issue Mar 2, 2022 · 1 comment
Open

Comments

@jasontian6666
Copy link

Hi @siddharth-sharma7

Thank you for providing fast-bart. It has made my life much easier.

I find the bart-onnx-quantized model 2-3x faster than the Pytorch model. However, when the sequence length is long (~500 tokens), the onnx-based model is 1.5-2x slower.

I also find a similar problem for T5-onnx model that has been discussed at microsoft/onnxruntime#6835.

Just wondering if we're facing the same issue here.

@sidsharma72
Copy link

In my experiments with longer input sequences (~500 tokens), the onnx performance is only slightly slower than that of the Pytorch model, if not similar.
The performance gains o Onnx over Pytorch do diminish for longer sequences, especially above 400 tokens, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants