Result Segment with wrong duration of audio file. #357

tcl8273 · 2022-10-18T04:24:14Z

tcl8273
Oct 18, 2022

I am using

uncoverV2.mp4

whisper to get segments but I see the end time of the final segment is the wrong duration

{'text': " Nobody sees, nobody knows We are a secret, can't be exposed That's how it is, that's how it goes", 'segments': [{'id': 0, 'seek': 0, 'start': 0.0, 'end': 4.5600000000000005, 'text': ' Nobody sees, nobody knows', 'tokens': [50364, 9297, 8194, 11, 5079, 3255, 50592, 50592, 492, 366, 257, 4054, 11, 393, 380, 312, 9495, 50864, 50864, 663, 311, 577, 309, 307, 11, 300, 311, 577, 309, 1709, 51140], 'temperature': 0.0, 'avg_logprob': -0.39771583676338196, 'compression_ratio': 1.1428571428571428, 'no_speech_prob': 0.1816209852695465}, {'id': 1, 'seek': 0, 'start': 4.5600000000000005, 'end': 10.0, 'text': " We are a secret, can't be exposed", 'tokens': [50364, 9297, 8194, 11, 5079, 3255, 50592, 50592, 492, 366, 257, 4054, 11, 393, 380, 312, 9495, 50864, 50864, 663, 311, 577, 309, 307, 11, 300, 311, 577, 309, 1709, 51140], 'temperature': 0.0, 'avg_logprob': -0.39771583676338196, 'compression_ratio': 1.1428571428571428, 'no_speech_prob': 0.1816209852695465}, {'id': 2, 'seek': 1000, 'start': 10.0, 'end': 31.0, 'text': " That's how it is, that's how it goes", 'tokens': [50364, 663, 311, 577, 309, 307, 11, 300, 311, 577, 309, 1709, 51414], 'temperature': 0.0, 'avg_logprob': -0.247916613306318, 'compression_ratio': 1.0909090909090908, 'no_speech_prob': 0.004447056911885738}], 'language': 'en'}

The audio durations are 15s

Answered by jianfch

Oct 18, 2022

Empirically, I found that this tends to go away with the large model. You can also just add some lines to clamp the max timestamp to the duration of the audio. Another way is to suppress any timestamp tokens that is greater than the audio duration at the decoding stage.

View full answer

jianfch · 2022-10-18T05:39:03Z

jianfch
Oct 18, 2022

Empirically, I found that this tends to go away with the large model. You can also just add some lines to clamp the max timestamp to the duration of the audio. Another way is to suppress any timestamp tokens that is greater than the audio duration at the decoding stage.

3 replies

tcl8273 Oct 18, 2022
Author

I used a large model to get transcribe.

tcl8273 Oct 18, 2022
Author

when I used Ubuntu I see the different duration (24s) instead of 15s:

Could you help me resolve this error?
Thank you.

jianfch Oct 18, 2022

You can implement the latter options I mentioned. or just use stable-ts

r4gor · 2022-10-18T12:31:21Z

r4gor
Oct 18, 2022

I am having the same problem

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Result Segment with wrong duration of audio file. #357

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Result Segment with wrong duration of audio file. #357

Uh oh!

tcl8273 Oct 18, 2022

Replies: 2 comments · 3 replies

Uh oh!

jianfch Oct 18, 2022

Uh oh!

tcl8273 Oct 18, 2022 Author

Uh oh!

Uh oh!

tcl8273 Oct 18, 2022 Author

Uh oh!

Uh oh!

jianfch Oct 18, 2022

Uh oh!

r4gor Oct 18, 2022

tcl8273
Oct 18, 2022

Replies: 2 comments 3 replies

jianfch
Oct 18, 2022

tcl8273 Oct 18, 2022
Author

tcl8273 Oct 18, 2022
Author

r4gor
Oct 18, 2022