You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The accuracy of the "large" model is pretty good. On my corpus it gets about 80% of words accurately transcribed. The large Vosk model gets 84%, but Whisper gets some things correct that Vosk misses, so still not bad.
The big problem is that Whisper, even this C++ implementation, is still considerably slow. On average, Vosk can transcribe large audio files in around 2 seconds. With Whisper, it takes on average 2 minutes.
Since I'm running Whisper on the command line for each sample file, is this slowness mainly due to loading the large 3GB ggml-large.bin model file for every run?
If so, is there any way to mitigate this by launching Whisper in some sort of "streaming" mode, so it keeps the model loaded into memory, accepts filenames via stdin, and dumps transcriptions via stdout? I don't see any option like that in the command line docs.
The text was updated successfully, but these errors were encountered:
yes it need load model once ; but it need reworking the program so we need to check first if there is any option for that before contributing .hope gerganove will reply and see what we can do
The accuracy of the "large" model is pretty good. On my corpus it gets about 80% of words accurately transcribed. The large Vosk model gets 84%, but Whisper gets some things correct that Vosk misses, so still not bad.
The big problem is that Whisper, even this C++ implementation, is still considerably slow. On average, Vosk can transcribe large audio files in around 2 seconds. With Whisper, it takes on average 2 minutes.
Since I'm running Whisper on the command line for each sample file, is this slowness mainly due to loading the large 3GB
ggml-large.bin
model file for every run?If so, is there any way to mitigate this by launching Whisper in some sort of "streaming" mode, so it keeps the model loaded into memory, accepts filenames via stdin, and dumps transcriptions via stdout? I don't see any option like that in the command line docs.
The text was updated successfully, but these errors were encountered: