When using libtorch, gpu decoding is slower than cpu. #1643

hms1205 · 2023-01-03T02:11:21Z

When using gpu to decode, gpu memory gets allocated but gpu-util rises after a lot of time.
For example, if you proceed with decoding 600 voices, it progresses very slowly until about the 100th, and then speeds up from the point when gpu-util rises.
Increasing the number of threads in decoder_main.cc makes it faster, but I'd like to fix the problem when it's single-threaded.
What should I do?

cpu = 24 cores
gpu = rtx a5000(24gb) x 2
ubuntu 20.04.4

hms1205 · 2023-01-04T03:27:52Z

Same symptoms when tested on cpu.
(Speed increases after a certain number of progress)
There is a big difference in speed in the 'forward_attention_decoder' api part, but I don't know if it's an attention problem or a libtorch problem.

robin1001 · 2023-01-04T03:32:08Z

The slow decoding at the begging maybe caused by libtorch, libtorch model requires warmup.

hms1205 · 2023-01-04T03:44:12Z

The slow decoding at the begging maybe caused by libtorch, libtorch model requires warmup.

What number should I specify when using warmup?
Should I do a test?

robin1001 · 2023-01-04T03:48:28Z

Seems 100 is okay in your testing.

hms1205 · 2023-01-04T03:52:29Z

Seems 100 is okay in your testing.

Thank you. I'll give it a try.

hms1205 · 2023-01-04T04:01:35Z

The slow decoding at the begging maybe caused by libtorch, libtorch model requires warmup.

Sorry, it's an additional question. Why does libtorch need a warmup?

xingchensong · 2023-02-21T03:19:02Z

The slow decoding at the begging maybe caused by libtorch, libtorch model requires warmup.

Sorry, it's an additional question. Why does libtorch need a warmup?

search the best path to execute forward calculation.

hms1205 closed this as not planned Won't fix, can't repro, duplicate, stale Jan 4, 2023

hms1205 reopened this Jan 4, 2023

xingchensong closed this as completed Feb 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When using libtorch, gpu decoding is slower than cpu. #1643

When using libtorch, gpu decoding is slower than cpu. #1643

hms1205 commented Jan 3, 2023

hms1205 commented Jan 4, 2023

robin1001 commented Jan 4, 2023

hms1205 commented Jan 4, 2023

robin1001 commented Jan 4, 2023

hms1205 commented Jan 4, 2023

hms1205 commented Jan 4, 2023

xingchensong commented Feb 21, 2023

When using libtorch, gpu decoding is slower than cpu. #1643

When using libtorch, gpu decoding is slower than cpu. #1643

Comments

hms1205 commented Jan 3, 2023

hms1205 commented Jan 4, 2023

robin1001 commented Jan 4, 2023

hms1205 commented Jan 4, 2023

robin1001 commented Jan 4, 2023

hms1205 commented Jan 4, 2023

hms1205 commented Jan 4, 2023

xingchensong commented Feb 21, 2023