[Help wanted] Support TensorRT #40

csukuangfj · 2023-02-20T07:39:54Z

TODO

Support GPU via TensorRT

See https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html

yuekaizhang · 2023-04-25T13:53:52Z

I would like take on this.

Support the Onnxruntime CUDA provider.

manickavela29 · 2024-03-14T10:32:43Z

Hi @csukuangfj , @yuekaizhang

Observed that currently only CUDA EP support is there and TensorRT EP support is not there for onnxruntime.
is there ay active developments going on for TensorRT GPU backend?

csukuangfj · 2024-03-14T12:00:19Z

is there ay active developments going on for TensorRT GPU backend?

We don't have a plan to support it in the near future. Would you like to contribute?

manickavela29 · 2024-03-29T07:26:05Z

I tried adding triggering onnxruntime's tensorrt ep for zipfromer but the model performance was very bad,
debugging further with standalone onnxruntime in python for Encoder models, will update if I see some good results.

manickavela29 · 2024-05-27T13:23:05Z

Hi @csukuangfj,
TensorRT has several parameters, and these will be only valid if TensorRT provider is chosen,
so I need your suggestion on either of below 2.

Putting TRT configs as part of the model-config.cc file model-config.cc
Creating a new config for TRT and exposing the required parameters from it.

Thank you

csukuangfj · 2024-05-28T03:02:20Z

Could you create a new config for tensorrt and add this config as a member field of OnlineModelConfig and OfflineModelConfig?

You can set the default values of this config as the one used in

sherpa-onnx/sherpa-onnx/csrc/session.cc

Lines 137 to 150 in b714817

    
           std::vector<const char*> option_values = { 
        
               "0", 
        
               "2147483648", 
        
               "10", 
        
               "5", 
        
               "0", 
        
               "0", 
        
               "0", 
        
               "1", 
        
               "1", 
        
               "1", 
        
               ".", 
        
               "1", 
        
               ".",  // can be same as the engine cache folder

manickavela29 · 2024-06-03T10:34:12Z

yes, I will send the PR for configs separately in some time.

manickavela29 · 2024-06-04T10:56:14Z

Current perf Cuda vs Trt

csrc/online-zipformer2-transducer-model.cc:RunEncoder:445 Encoder Duration : 1.930044 ms
csrc/online-zipformer2-transducer-model.cc:RunEncoder:445 Encoder Duration : 0.034984 ms
csrc/online-zipformer2-transducer-model.cc:RunEncoder:445 Encoder Duration : 0.034912 ms
csrc/online-websocket-server-impl.cc:Run:256 Warm up completed : 3 times.
csrc/online-websocket-server.cc:main:79 Started!
csrc/online-websocket-server.cc:main:80 Listening on: 6007
csrc/online-websocket-server.cc:main:81 Number of work threads: 8

csrc/online-zipformer2-transducer-model.cc:RunEncoder:445 Encoder Duration : 0.535651 ms
csrc/online-zipformer2-transducer-model.cc:RunEncoder:445 Encoder Duration : 0.187492 ms
csrc/online-zipformer2-transducer-model.cc:RunEncoder:445 Encoder Duration : 0.187698 ms

Apart from this, with Trt there is a huge session creation time.
which is expected, only way to handle is to cache the engine images.

yuekaizhang · 2024-06-05T02:00:28Z

Current perf Cuda vs Trt

csrc/online-zipformer2-transducer-model.cc:RunEncoder:445 Encoder Duration : 1.930044 ms csrc/online-zipformer2-transducer-model.cc:RunEncoder:445 Encoder Duration : 0.034984 ms csrc/online-zipformer2-transducer-model.cc:RunEncoder:445 Encoder Duration : 0.034912 ms csrc/online-websocket-server-impl.cc:Run:256 Warm up completed : 3 times. csrc/online-websocket-server.cc:main:79 Started! csrc/online-websocket-server.cc:main:80 Listening on: 6007 csrc/online-websocket-server.cc:main:81 Number of work threads: 8

csrc/online-zipformer2-transducer-model.cc:RunEncoder:445 Encoder Duration : 0.535651 ms csrc/online-zipformer2-transducer-model.cc:RunEncoder:445 Encoder Duration : 0.187492 ms csrc/online-zipformer2-transducer-model.cc:RunEncoder:445 Encoder Duration : 0.187698 ms

Apart from this, with Trt there is a huge session creation time. which is expected, only way to handle is to cache the engine images.

May I know the results for CPU provider if you have？ Also, could you explain why there are three lines for each block? e.g. 0.535651 ms 0.187492 ms 0.187698 ms. @manickavela29

manickavela29 · 2024-06-05T02:13:00Z

I can try to get for CPU numbers, but i don't have any high performance CPU,

(in between someone can add support for dnnl ep 🙂)

But here the focus itself is towards GPU with Cuda Vs Trt, is CPU benchmarking relevant?

Code blocks are just performance log which I added for zipformer. Those are not part of the patch

csukuangfj added the help wanted Extra attention is needed label Feb 20, 2023

yuekaizhang mentioned this issue May 4, 2023

[WIP] Add onnxruntime-gpu support #138

Closed

3 tasks

manickavela29 mentioned this issue May 26, 2024

initial tensorrt ep commit #921

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Help wanted] Support TensorRT #40

[Help wanted] Support TensorRT #40

csukuangfj commented Feb 20, 2023

yuekaizhang commented Apr 25, 2023

manickavela29 commented Mar 14, 2024 •

edited

csukuangfj commented Mar 14, 2024

manickavela29 commented Mar 29, 2024

manickavela29 commented May 27, 2024

csukuangfj commented May 28, 2024

manickavela29 commented Jun 3, 2024

manickavela29 commented Jun 4, 2024

yuekaizhang commented Jun 5, 2024

manickavela29 commented Jun 5, 2024

[Help wanted] Support TensorRT #40

[Help wanted] Support TensorRT #40

Comments

csukuangfj commented Feb 20, 2023

TODO

yuekaizhang commented Apr 25, 2023

manickavela29 commented Mar 14, 2024 • edited

csukuangfj commented Mar 14, 2024

manickavela29 commented Mar 29, 2024

manickavela29 commented May 27, 2024

csukuangfj commented May 28, 2024

manickavela29 commented Jun 3, 2024

manickavela29 commented Jun 4, 2024

yuekaizhang commented Jun 5, 2024

manickavela29 commented Jun 5, 2024

manickavela29 commented Mar 14, 2024 •

edited