Few last seconds of audio not transcribed #282

mbelcen · 2022-10-09T16:02:57Z

mbelcen
Oct 9, 2022

Hi,

I have tested whisper on the attached audio with the python script with lower-level access to the model (below) but the model doesn't transcribe a big chunk of the audio (almost half). It happened with few other audios.
However, when using the simple script, the full transcript is given.
Any idea what that happens?
Thanks

audio.zip


model = whisper.load_model("base")

# load audio and pad/trim it to fit 30 seconds
audio = whisper.load_audio("audio.mp3")
audio = whisper.pad_or_trim(audio)

# make log-Mel spectrogram and move to the same device as the model
mel = whisper.log_mel_spectrogram(audio).to(model.device)

# detect the spoken language
_, probs = model.detect_language(mel)
print(f"Detected language: {max(probs, key=probs.get)}")

# decode the audio
options = whisper.DecodingOptions(fp16=False)
result = whisper.decode(model, mel, options)

# print the recognized text
print(result.text)```

Answered by jianfch

Oct 10, 2022

it's not transcribing most of it because you're trimming a 50 seconds down to 30 seconds, so 20 seconds of the audio got thrown out in this line:

audio = whisper.pad_or_trim(audio)

pad_or_trim() by default trims the audio down to 480000 samples or 30 seconds, which what the model expects for input

View full answer

jianfch · 2022-10-10T15:56:02Z

jianfch
Oct 10, 2022

it's not transcribing most of it because you're trimming a 50 seconds down to 30 seconds, so 20 seconds of the audio got thrown out in this line:

audio = whisper.pad_or_trim(audio)

pad_or_trim() by default trims the audio down to 480000 samples or 30 seconds, which what the model expects for input

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Few last seconds of audio not transcribed #282

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Few last seconds of audio not transcribed #282

Uh oh!

Uh oh!

mbelcen Oct 9, 2022

Replies: 1 comment

Uh oh!

jianfch Oct 10, 2022

mbelcen
Oct 9, 2022

jianfch
Oct 10, 2022