You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For decoder models with cache, it can be painful to manually compile the TensorRT engine as ONNX Runtime does not give options to specify shapes. The engine build could maybe be done automatically.
The current doc is only for use_cache=False, which is not very interesting. It could be improved to show how to pre-build the TRT with use_cache=True.
1.15 ORT TRT supports explicit input shape meaning users can provide the shape range for all the dynamic shape input.
Please see the PR as well as the doc for usage/details.
Let us know if you have further questions or other feedbacks for ORT TRT. We are willing to make ORT TRT easier to use.
@chilo-ms Thanks a lot, that looks great! It is not at the top of our todo for now, but we're welcoming community contribution to interface well TensorrtExecutionProvider with ORTModel classes!
Feature request
For decoder models with cache, it can be painful to manually compile the TensorRT engine as ONNX Runtime does not give options to specify shapes. The engine build could maybe be done automatically.
The current doc is only for
use_cache=False
, which is not very interesting. It could be improved to show how to pre-build the TRT with use_cache=True.References:
https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/gpu#tensorrt-engine-build-and-warmup
microsoft/onnxruntime#13559
Motivation
TensorRT is fast
Your contribution
will work on it sometime
The text was updated successfully, but these errors were encountered: