Even when the speakers starts talking after 10 sec, Whisper make the first timestamp to start at sec 0. How could I change that? #1130

romain130492 · 2023-03-21T10:59:06Z

romain130492
Mar 21, 2023

Hello

I'm using Whisper,
when having a video with a speaker starting his speech at sec 10, I'm getting the first timestamp to be at sec 1. instead of sec 10.
Here is my config:

Config
POST v1/audio/transcriptions

{ 
 model:"whisper-1"
 file:"...mp3"
 response_format:"srt",
 prompt:"Hello, welcome to my lecture"
}

Output:

1
00:00:01,000 --> 00:00:14,000
Why are there both successful and struggling entrepreneurs? 

2
00:00:15,000 --> 00:00:23,000
Many customers prefer to watch videos to enjoy online content.

3
00:00:24,000 --> 00:00:32,000
an other sentences.

I believe 1 it should be 00:00:10,000 --> 00:00:14,000, since there is no one talking at all for 10 sec.
Also, the 3, the speakers starts again talking at sec 28, but I'm getting the timestamp to be at sec 24. The silence is simply included in the timestamp with Whisper

Any idea how I could fix that, maybe using a prompt?

Thanks!

mayeaux · 2023-03-26T16:24:02Z

mayeaux
Mar 26, 2023

You can accomplish this by using word_level timestamps and then rebuilding the file yourself. I just finished that code I will publish it pretty quick.

4 replies

romain130492 Mar 27, 2023
Author

Can you do that with the API too? let me know if you publish that code, thanks!

mayeaux Mar 29, 2023

Can you do that with the API too? let me know if you publish that code, thanks!

It's not possible with the API because of the lack of word timestamping.

With the word timestamps it's pretty trivial to rewrite the srt/vtt files, I'll publish it pretty soon once I go back to work on the related code.

romain130492 Mar 30, 2023
Author

I see, then you need to deploy that model on your own server right.

st-nickkimer Apr 1, 2023

@mayeaux any chance you've been able to get around to this code snippet?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Even when the speakers starts talking after 10 sec, Whisper make the first timestamp to start at sec 0. How could I change that? #1130

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Even when the speakers starts talking after 10 sec, Whisper make the first timestamp to start at sec 0. How could I change that? #1130

Uh oh!

Uh oh!

romain130492 Mar 21, 2023

Replies: 1 comment · 4 replies

Uh oh!

mayeaux Mar 26, 2023

Uh oh!

romain130492 Mar 27, 2023 Author

Uh oh!

mayeaux Mar 29, 2023

Uh oh!

romain130492 Mar 30, 2023 Author

Uh oh!

st-nickkimer Apr 1, 2023

romain130492
Mar 21, 2023

Replies: 1 comment 4 replies

mayeaux
Mar 26, 2023

romain130492 Mar 27, 2023
Author

romain130492 Mar 30, 2023
Author