whisper: Use correct seek_end when offset is used #833

ThijsRay · 2023-04-29T12:41:36Z

Whenever an offset_ms is provided, the value of seek_end is calculated incorrectly. This causes Whisper to keep transcribing after the end of the file.

The current behavior looks like

[00:34:40.000 --> 00:34:47.000]   This is an example audio file.
[00:34:47.000 --> 00:34:49.000]   The text has been redacted
[00:34:49.000 --> 00:34:51.000]   This is the end of the audio.
[00:34:51.000 --> 00:34:52.000]   ***
[00:34:52.000 --> 00:34:53.000]   ***
[00:34:53.000 --> 00:34:54.000]   ***
[00:34:55.000 --> 00:34:56.000]   ***
...

The expected behavior should be

[00:34:40.000 --> 00:34:47.000]   This is an example audio file.
[00:34:47.000 --> 00:34:49.000]   The text has been redacted
[00:34:49.000 --> 00:34:51.000]   This is the end of the audio.
- end of program -

This commit changes the calculation of the seek_end variable to only add seek_start if a custom duration_ms is provided. Otherwise, it defaults to the end of the file.

Whenever an `offset_ms` is provided, the value of `seek_end` is calculated incorrectly. This causes Whisper to keep transcribing after the end of the file. The current behavior looks like ``` [00:34:40.000 --> 00:34:47.000] This is an example audio file. [00:34:47.000 --> 00:34:49.000] The text has been redacted [00:34:49.000 --> 00:34:51.000] This is the end of the audio. [00:34:51.000 --> 00:34:52.000] *** [00:34:52.000 --> 00:34:53.000] *** [00:34:53.000 --> 00:34:54.000] *** [00:34:55.000 --> 00:34:56.000] *** ... ``` The expected behavior should be ``` [00:34:40.000 --> 00:34:47.000] This is an example audio file. [00:34:47.000 --> 00:34:49.000] The text has been redacted [00:34:49.000 --> 00:34:51.000] This is the end of the audio. - end of program - ``` This commit changes the calculation of the `seek_end` variable to only add `seek_start` if a custom `duration_ms` is provided. Otherwise, it defaults to the end of the file. Signed-off-by: Thijs Raymakers <thijs@raymakers.nl>

ggerganov approved these changes Apr 29, 2023

View reviewed changes

ggerganov merged commit 6108d3c into ggerganov:master Apr 29, 2023

ThijsRay deleted the fix_seek_end_with_offset branch April 30, 2023 20:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

whisper: Use correct seek_end when offset is used #833

whisper: Use correct seek_end when offset is used #833

ThijsRay commented Apr 29, 2023

whisper: Use correct seek_end when offset is used #833

whisper: Use correct seek_end when offset is used #833

Conversation

ThijsRay commented Apr 29, 2023