Will decreasing the model's output space lead to faster prediction times? #711
Unanswered
Dyllanjrusher
asked this question in
Q&A
Replies: 1 comment
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hey whisper community!
I've seen some great work over at this issue decreasing the latency of the predictions for real time text prediction. However, the encode timing of 174ms isn't quite fast enough for the idea I have. I'm looking to get the latency down to sub 15ms for a voice to midi output application for a music performance!
My idea is to take a small subset of voice commands, say "ta" & "dum" and turn them into snare sound, bass sound.
I've learned from this discussion that it is very possible to fine tune the model to a different output space, say the set of Japanese words. I'm wondering if fine tuning to a set of words like {"ta", "dum"} would decrease the encoding time?
Thanks for your time & input!
Beta Was this translation helpful? Give feedback.
All reactions