Replies: 3 comments 2 replies
-
|
It's a interesting idea. Though not directly supported, one way this can be done is to replace each predicted non-timestamp token of with the next token of your actual transcript at the decoding stage. This way, you tell the model the words and have it predict only the timestamps. |
Beta Was this translation helpful? Give feedback.
-
|
@Nisekoi-1, in case it is of interest, Subsync https://github.com/sc0ty/subsync does this today with an alternate speech detect algo. Maybe you could ask sc0ty to consider Whisper instead of PocketSphinx as the speech to text algo, since Whisper is more accurate than PocketSphinx. Of course, there are advantages of being able to see some known text into something like Whisper, since it could probably work well with the tiny model. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hii, is there a way that whisper don't create a transcript and we put it ourself and whisper only sync it to audio?
Beta Was this translation helpful? Give feedback.
All reactions