Timecode per/ word a future option? #26

samelie · 2022-09-22T04:20:44Z

samelie
Sep 22, 2022

Right now we get a range printed , eg: [00:00.000 --> 00:05.720]. Would be great to have a way to print out the timecode next to the word. This would unlock lots of interactive behavior.
Thank you for releasing this achievement

Answered by jongwook

Sep 22, 2022

(Duplicate of #3) Getting word-level timestamps are not directly supported, but it could be possible using the predicted distribution over the timestamp tokens or the cross-attention weights.

Currently, the predicted timestamps tend to be biased towards integers, and there are some failure modes where the timestamps can be constantly shifted, making reliable word-level timestamp prediction difficult. Once this is solved by us or the community, I agree that it'd be a great addition to this repo.

View full answer

RaulKite · 2022-09-22T07:13:52Z

RaulKite
Sep 22, 2022

It will be a killer option to use in may many cases

3 replies

kospl Sep 25, 2022

name one use case for this feature

DAVIDSystems Sep 25, 2022

audioediting with text, subtitling with word highlighting and many more.

DAVIDSystems Sep 25, 2022

jongwook · 2022-09-22T07:15:51Z

jongwook
Sep 22, 2022
Maintainer

(Duplicate of #3) Getting word-level timestamps are not directly supported, but it could be possible using the predicted distribution over the timestamp tokens or the cross-attention weights.

Currently, the predicted timestamps tend to be biased towards integers, and there are some failure modes where the timestamps can be constantly shifted, making reliable word-level timestamp prediction difficult. Once this is solved by us or the community, I agree that it'd be a great addition to this repo.

2 replies

melindadevins Sep 22, 2022

When transcript is displayed on video, the word being spoken is often highlighted. The word level timestamp makes it possible. Really appreciate it if the timestamp can be obtained.

jayn1985 Sep 22, 2022

word level timestamp is really helpful in many cases, hope it can be supported in the later version.

edgartaor · 2022-09-25T07:11:00Z

edgartaor
Sep 25, 2022

Could something like this work?

https://nvidia.github.io/OpenSeq2Seq/html/speech-recognition/speech-to-text-align.html

0 replies

sandalwoodsh · 2022-11-02T20:22:50Z

sandalwoodsh
Nov 2, 2022

I agree that such a feature would be tremendously awesome!

0 replies

jongwook · 2023-01-04T01:27:18Z

jongwook
Jan 4, 2023
Maintainer

I've made a demo of obtaining word-level timestamps using the cross-attention patterns in the multilingual ASR notebook

3 replies

octimot Feb 27, 2023

@jongwook Thanks for this update! It seems to work pretty well, but I'm having trouble scaling it up to audio longer than 30 seconds:

The call whisper_model(mel.unsqueeze(0), tokens.unsqueeze(0)) triggers IndexError: index out of range in self in
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)

What would be the best approach to make it take each 30sec window at a time? Would you mind giving a rough example?

Cheers!

RaulKite Feb 27, 2023

Have you tried this repo?

https://github.com/linto-ai/whisper-timestamped

It is based in @jongwook notebook and have advanced in that way

octimot Feb 27, 2023

@RaulKite
Yes, thank you!

I believe whisper-timestamped is doing the word-alignment on the fly which is cool, but I'd like to understand how to scale the basic implementation that @jongwook proposed, just to have a bit more flexibility.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timecode per/ word a future option? #26

{{title}}

Replies: 5 comments 8 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Timecode per/ word a future option? #26

Replies: 5 comments · 8 replies

jongwook Sep 22, 2022 Maintainer

jongwook Jan 4, 2023 Maintainer

Replies: 5 comments 8 replies

jongwook
Sep 22, 2022
Maintainer

jongwook
Jan 4, 2023
Maintainer