-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why gpt2-xl (based transformer-xl) onnx slower than the originer pytorch #11293
Comments
Looks like you're using ver 1.8.1. Have you tried with the latest ORT ver? Also, attach the model and the repro code. |
@lileilai, to get fully optimized, you will need a custom Attention operator. It is because current Attention operator only applies to the self attention in BERT and GPT-2, and it cannot apply to transformer-xl. See our guide if you would like to create a custom operator and fusion: https://github.com/microsoft/onnxruntime/blob/master/onnxruntime/python/tools/transformers/Dev_Guide.md |
Thanks for your reply, and i will try the latest ORT version later |
I have try ORT 1.11,but it got the same statistic。My confusion is when i using " torch.onnx.export(opt_version=12) ", the onnx model have a slower inference perfermance than original pytorch . Comparing to the result of baseline onnx model without additional kenel fusion ( LayerNorm、Attention、FastGlue ),it is abnormal. |
Hi can you
Hi could you share how you used IOBinding for this model and did it give a speed up? I am trying to implement something similar myself. |
Ah thanks. I was using c# and I guess most of these functions haven't been implemented yet. That would explain it. |
Describe the bug
I have a transformer-xl (Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context) gpt-xl ( 41 layer ), and the code is implemented by myself; After transfer to onnx and optimized by gpt2_optimizer ( LayerNormalization kernel fusion, fastGelu kernel fusion )。Even with the IOBinding, the inference time still slower than original pytorch。
Urgency
If there are particular important use cases blocked by this or strict project-related timelines, please share more information and dates. If there are no hard deadlines, please specify none.
System information
To Reproduce
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots
If applicable, add screenshots to help explain your problem.
Additional context
Add any other context about the problem here. If the issue is about a particular model, please share the model details as well to facilitate debugging.
The text was updated successfully, but these errors were encountered: