Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long-form (including timestamps) for whisper #19887

Closed
JeffreyWardman opened this issue Oct 26, 2022 · 7 comments
Closed

Long-form (including timestamps) for whisper #19887

JeffreyWardman opened this issue Oct 26, 2022 · 7 comments

Comments

@JeffreyWardman
Copy link

Feature request

504cd71

  • Inference is currently only implemented for short-form i.e. audio is pre-segmented into <=30s segments. Long-form (including timestamps) will be implemented in a future release.

When would the ETA be for this?

Motivation

Whisper is not usable for long audio of speech, or for chunking audio based on timestamps determined by the ASR.

Your contribution

Guidance/PR in longer term future if not picked up by others in the next month or so

@sgugger
Copy link
Collaborator

sgugger commented Oct 26, 2022

cc @sanchit-gandhi and @ArthurZucker

@sanchit-gandhi
Copy link
Contributor

Hey @JeffreyWardman! I believe @ArthurZucker has started looking into this, see #19490 (comment) for context!

@JeffreyWardman
Copy link
Author

Thanks @sanchit-gandhi! By the looks of it, it would still be missing the timestamps. This is quite an important feature for me. I'm not completely familiar with the underlying code for huggingface. How does the chunking work? Does it calculate the first break between words after a given duration?

@sanchit-gandhi
Copy link
Contributor

cc @ArthurZucker who knows more about timestamp generation!

This blog highlights quite nicely how chunking works in Transformers: https://huggingface.co/blog/asr-chunking

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@kurianbenoy-sentient
Copy link

Does whisper implementation of hugging support timestamps to generate SRT files like openai/whisper implementation?

https://github.com/openai/whisper/blob/main/whisper/utils.py#L64

@ArthurZucker
Copy link
Collaborator

Not yet! Working on this you can follow #20620 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants