Long-form (including timestamps) for whisper #19887

JeffreyWardman · 2022-10-26T05:03:55Z

Feature request

504cd71

Inference is currently only implemented for short-form i.e. audio is pre-segmented into <=30s segments. Long-form (including timestamps) will be implemented in a future release.

When would the ETA be for this?

Motivation

Whisper is not usable for long audio of speech, or for chunking audio based on timestamps determined by the ASR.

Your contribution

Guidance/PR in longer term future if not picked up by others in the next month or so

The text was updated successfully, but these errors were encountered:

sgugger · 2022-10-26T13:26:31Z

cc @sanchit-gandhi and @ArthurZucker

sanchit-gandhi · 2022-10-28T08:31:23Z

Hey @JeffreyWardman! I believe @ArthurZucker has started looking into this, see #19490 (comment) for context!

JeffreyWardman · 2022-10-28T08:42:31Z

Thanks @sanchit-gandhi! By the looks of it, it would still be missing the timestamps. This is quite an important feature for me. I'm not completely familiar with the underlying code for huggingface. How does the chunking work? Does it calculate the first break between words after a given duration?

sanchit-gandhi · 2022-11-02T17:23:56Z

cc @ArthurZucker who knows more about timestamp generation!

This blog highlights quite nicely how chunking works in Transformers: https://huggingface.co/blog/asr-chunking

github-actions · 2022-11-27T15:02:00Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

kurianbenoy-sentient · 2022-12-09T03:13:20Z

Does whisper implementation of hugging support timestamps to generate SRT files like openai/whisper implementation?

https://github.com/openai/whisper/blob/main/whisper/utils.py#L64

ArthurZucker · 2022-12-09T09:18:58Z

Not yet! Working on this you can follow #20620 !

This was referenced Nov 4, 2022

Timestamps in Whisper processor #20057

Closed

[README] Add section on 🤗 Transformers openai/whisper#468

Closed

JeffreyWardman closed this as completed Nov 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Long-form (including timestamps) for whisper #19887

Long-form (including timestamps) for whisper #19887

JeffreyWardman commented Oct 26, 2022

sgugger commented Oct 26, 2022

sanchit-gandhi commented Oct 28, 2022

JeffreyWardman commented Oct 28, 2022

sanchit-gandhi commented Nov 2, 2022

github-actions bot commented Nov 27, 2022

kurianbenoy-sentient commented Dec 9, 2022

ArthurZucker commented Dec 9, 2022

Long-form (including timestamps) for whisper #19887

Long-form (including timestamps) for whisper #19887

Comments

JeffreyWardman commented Oct 26, 2022

Feature request

Motivation

Your contribution

sgugger commented Oct 26, 2022

sanchit-gandhi commented Oct 28, 2022

JeffreyWardman commented Oct 28, 2022

sanchit-gandhi commented Nov 2, 2022

github-actions bot commented Nov 27, 2022

kurianbenoy-sentient commented Dec 9, 2022

ArthurZucker commented Dec 9, 2022