Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix extremely long inference time when using CUDA with short sentences. #172

Merged
merged 1 commit into from Nov 23, 2023

Conversation

marty1885
Copy link
Contributor

@marty1885 marty1885 commented Aug 9, 2023

Hi,

I found out that running piper on GPU was extremely slow. Far slower then CPU only. But only when the input text is short. This patch fixes it.

Before (on current HEAD), took me 30s to synthesis the sentence Okay then. Have a great day and I hope this has been helpful on a GPU.

❯ echo 'Okay then Have a great day and I hope this has been helpful' | time python3 -m piper -m /home/hentaku/piper-models/0807_2693_1341612.onnx --cuda -f out.wav
2023-08-09 16:11:31.451160092 [W:onnxruntime:, session_state.cc:1169 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2023-08-09 16:11:31.451189218 [W:onnxruntime:, session_state.cc:1171 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
python3 -m piper -m /home/hentaku/piper-models/0807_2693_1341612.onnx --cuda   4.78s user 3.66s system 28% cpu 29.645 total

After the fix, the total runtime is down to 3.0s. Including model load time.

❯ echo 'Okay then. Have a great day and I hope this has been helpful' | time python3 -m piper -m /home/hentaku/piper-models/0807_2693_1341612.onnx --cuda -f out.wav
2023-08-09 16:08:24.660402671 [W:onnxruntime:, session_state.cc:1169 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2023-08-09 16:08:24.660429803 [W:onnxruntime:, session_state.cc:1171 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
python3 -m piper -m /home/hentaku/piper-models/0807_2693_1341612.onnx --cuda   2.61s user 1.63s system 138% cpu 3.048 total

For the record. the patch is co-developed with @dic1911

@lvscar
Copy link

lvscar commented Nov 23, 2023

In my environment, the fix is working wonderfully.
Ubuntu 22.04.2 LTS
CUDA Version 12.2
piper-tts 1.2.0
nvidia-cublas-cu11 11.11.3.6
nvidia-cuda-runtime-cu11 11.8.89
onnxruntime-gpu 1.15.1
onnxruntime 1.16.3

@synesthesiam synesthesiam merged commit 0bb4cb9 into rhasspy:master Nov 23, 2023
@synesthesiam
Copy link
Contributor

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants