New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When using libtorch, gpu decoding is slower than cpu. #1643
Comments
Same symptoms when tested on cpu. |
The slow decoding at the begging maybe caused by libtorch, libtorch model requires |
What number should I specify when using warmup? |
Seems 100 is okay in your testing. |
Thank you. I'll give it a try. |
Sorry, it's an additional question. Why does libtorch need a warmup? |
search the best path to execute forward calculation. |
When using gpu to decode, gpu memory gets allocated but gpu-util rises after a lot of time.
For example, if you proceed with decoding 600 voices, it progresses very slowly until about the 100th, and then speeds up from the point when gpu-util rises.
Increasing the number of threads in decoder_main.cc makes it faster, but I'd like to fix the problem when it's single-threaded.
What should I do?
cpu = 24 cores
gpu = rtx a5000(24gb) x 2
ubuntu 20.04.4
The text was updated successfully, but these errors were encountered: