Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming or Daemon Mode? #1428

Open
chrisspen opened this issue Nov 4, 2023 · 3 comments
Open

Streaming or Daemon Mode? #1428

chrisspen opened this issue Nov 4, 2023 · 3 comments

Comments

@chrisspen
Copy link

The accuracy of the "large" model is pretty good. On my corpus it gets about 80% of words accurately transcribed. The large Vosk model gets 84%, but Whisper gets some things correct that Vosk misses, so still not bad.

The big problem is that Whisper, even this C++ implementation, is still considerably slow. On average, Vosk can transcribe large audio files in around 2 seconds. With Whisper, it takes on average 2 minutes.

Since I'm running Whisper on the command line for each sample file, is this slowness mainly due to loading the large 3GB ggml-large.bin model file for every run?

If so, is there any way to mitigate this by launching Whisper in some sort of "streaming" mode, so it keeps the model loaded into memory, accepts filenames via stdin, and dumps transcriptions via stdout? I don't see any option like that in the command line docs.

@UniversalTechno
Copy link

yes it need load model once ; but it need reworking the program so we need to check first if there is any option for that before contributing .hope gerganove will reply and see what we can do

@ggerganov
Copy link
Owner

You can use the server example

@UniversalTechno
Copy link

UniversalTechno commented Jan 5, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants