When does Whisper decide to split the input audio ? #629

Ca-ressemble-a-du-fake · 2022-12-02T08:44:57Z

Ca-ressemble-a-du-fake
Dec 2, 2022

Hi,

I need to split a big audio file in small chunks ranging from 1 to 10 s. I first tried to apply VAD on the big audio, normalize to -27dB each chunk, and then call Whisper on each chunk. The transcription was good but with some errors.

I noticed that if I feed Whisper with the whole big audio file the transcription was much better (although the timestamps were rounded to second so the chunk generation was not as good but it can be remedied via stable_ts patch).

Now I have a speech in noisy environment that I want to transcribe. So first I denoise it and then I forward the denoised audio to Whisper. The results are not as good as with an audio without background noise. Therefore I would like to test whether amplifying each chunk before providing it to Whisper could improve the overall transcription quality.

That's why I would like to split the denoised big audio file at the same timestamps as would Whisper do, then normalize the produced chunks and then feed them to Whisper.

I thought Whisper would output 30 s long timestamps (as it pads or trims the audio input to 30 s chunks) but this is not the case. Neither does it provides sentence bound timestamps.

So my question is how can I split the input audio as would Whipser do ?

Hope my question is clear 😃

Thanks in advance for your help

FurkanGozukara · 2022-12-04T21:50:04Z

FurkanGozukara
Dec 4, 2022

whisper is context aware so feeding entire audio is better. merge audio as single and provide that way

1 reply

Ca-ressemble-a-du-fake Dec 5, 2022
Author

Thank you @FurkanGozukara for this piece of advice. Gonna follow it !

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When does Whisper decide to split the input audio ? #629

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

When does Whisper decide to split the input audio ? #629

Uh oh!

Ca-ressemble-a-du-fake Dec 2, 2022

Replies: 1 comment · 1 reply

Uh oh!

FurkanGozukara Dec 4, 2022

Uh oh!

Ca-ressemble-a-du-fake Dec 5, 2022 Author

Ca-ressemble-a-du-fake
Dec 2, 2022

Replies: 1 comment 1 reply

FurkanGozukara
Dec 4, 2022

Ca-ressemble-a-du-fake Dec 5, 2022
Author