Will decreasing the model's output space lead to faster prediction times? #711

Dyllanjrusher · 2022-12-18T03:28:43Z

Dyllanjrusher
Dec 18, 2022

Hey whisper community!

I've seen some great work over at this issue decreasing the latency of the predictions for real time text prediction. However, the encode timing of 174ms isn't quite fast enough for the idea I have. I'm looking to get the latency down to sub 15ms for a voice to midi output application for a music performance!

My idea is to take a small subset of voice commands, say "ta" & "dum" and turn them into snare sound, bass sound.

I've learned from this discussion that it is very possible to fine tune the model to a different output space, say the set of Japanese words. I'm wondering if fine tuning to a set of words like {"ta", "dum"} would decrease the encoding time?

Thanks for your time & input!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Will decreasing the model's output space lead to faster prediction times? #711

{{title}}

Replies: 1 comment

Select a reply

Will decreasing the model's output space lead to faster prediction times? #711

Dyllanjrusher Dec 18, 2022

Replies: 1 comment

Dyllanjrusher
Dec 18, 2022