-
Notifications
You must be signed in to change notification settings - Fork 829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inference on long files #44
Comments
Hi, The Whisper transcription loop already handles long files using a sliding 30-second window while keeping the context. So you don't need to do anything to transcribe long files. |
Thank you. So is it normal that the transcription time is considerably long for long files ? |
Yes, the transcription time depends on the audio file duration. Long files will take longer. |
Sorry I closed and reopened the issue. I just have one last thing about the longer files. |
What is your GPU and what model size are you running? |
It's a NVIDIA GeForce GTX 1070 Ti 8Go, I was running the large-v2 model on a 18min file. But even with 4min file I have OOM. |
Try running the model with 8-bit quantization: model = WhisperModel(model_path, device="cuda", compute_type="int8") |
Wow, just like that! it's a lot faster, and no OOM!!! |
Hello,
Thank you for this great library!
Is there any way we can chunk the initial audio into shorter samples, let's say 50 seconds each, run inference on those, and end up with a final reconstruction.
I came across this article and I wonder if it's possible to get it working here.
Any ideas if this is possible ?
The text was updated successfully, but these errors were encountered: