New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Long-form (including timestamps) for whisper #19887
Comments
cc @sanchit-gandhi and @ArthurZucker |
Hey @JeffreyWardman! I believe @ArthurZucker has started looking into this, see #19490 (comment) for context! |
Thanks @sanchit-gandhi! By the looks of it, it would still be missing the timestamps. This is quite an important feature for me. I'm not completely familiar with the underlying code for huggingface. How does the chunking work? Does it calculate the first break between words after a given duration? |
cc @ArthurZucker who knows more about timestamp generation! This blog highlights quite nicely how chunking works in Transformers: https://huggingface.co/blog/asr-chunking |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Does whisper implementation of hugging support timestamps to generate SRT files like openai/whisper implementation? https://github.com/openai/whisper/blob/main/whisper/utils.py#L64 |
Not yet! Working on this you can follow #20620 ! |
Feature request
504cd71
When would the ETA be for this?
Motivation
Whisper is not usable for long audio of speech, or for chunking audio based on timestamps determined by the ASR.
Your contribution
Guidance/PR in longer term future if not picked up by others in the next month or so
The text was updated successfully, but these errors were encountered: