Implement max line width and max line count, and make word highlighting optional #1184

ryanheise · 2023-04-02T03:06:05Z

This implementation is based on word_timestamps and so it requires that option to be turned on. Word highlighting has also been made optional and turned off by default.

Examples:

# Each subtitle has a maximum of 2 lines wrapped to 47 characters:
whisper --word_timestamps True --max_line_width 47 --max_line_count 2 audio.mp3
# Same with word highlighting:
whisper --word_timestamps True --max_line_width 47 --max_line_count 2 --highlight_words True audio.mp3
# Each subtitle preserves the original Whisper segment but wrapped to 47 character lines:
whisper --word_timestamps True --max_line_width 47 audio.mp3
# Same with word highlighting:
whisper --word_timestamps True --max_line_width 47 --highlight_words True audio.mp3

When --max_line_count is specified, subtitles will be segmented at the line limit, or when there is a pause in the speech. This overcomes segmentation artifacts that can occur at window boundaries mid sentence.

This segmentation approach works better in conjunction with #1114 because that PR fixes some bugs with timestamp accuracy near segment boundaries where boundary words are stretched to cover pauses, making it harder for the current PR to detect those pauses. In the meantime, the current PR detects pauses by measuring the distance between the start timestamps of successive words rather than between the end timestamp of the previous word and the start timestamp of the current word.

g-i-o-r-g-i-o · 2023-04-12T12:32:18Z

Suddenly whisper transcribe started to misbehave, probably something related to line length?

My transcribed text used to look like this, which is good:

New line here.
And another new line.
And another,
which continues here

Now instead it looks like this, which is very bad:

New line here. And 
another new line. And 
another, which continues 
here

What option do I need to add to this minimal code to bring back the old behaviour?

I think that the default behaviour should stay identical to the past, since now the output is all messed up.

thanks for any help

# load the entire audio file
audio = whisper.load_audio(audio_in)

options = {
    "language": "en", # input language, if omitted is auto detected
    "task": "transcribe", # "translate" or "transcribe" if you just want transcription
    'fp16': True, 
    'verbose': True
}

result = whisper.transcribe(model, audio, **options)

gGedgs645 · 2023-04-29T15:03:41Z

I Agree with @GianniGi would be great to have an option to force a line break at the end of the sentence.

ryanheise · 2023-04-29T15:20:48Z

This pull request is not about sentence segmentation, it is about wrapping subtitles to a maximum line width and line count.

If you are interested in the discussion about sentence segmentation, it is over here: #1243

…ng optional (openai#1184) * Add highlight_words, max_line_width, max_line_count * Refactor subtitle generator --------- Co-authored-by: Jong Wook Kim <jongwook@openai.com>

ovidb · 2023-05-10T15:57:34Z

Do we know when this will make it into the API? I can't see it in the docs yet

…ng optional (openai#1184) * Add highlight_words, max_line_width, max_line_count * Refactor subtitle generator --------- Co-authored-by: Jong Wook Kim <jongwook@openai.com>

makseem77 · 2023-07-03T22:31:58Z

Hello, sorry in advance if this is not the right place to ask, but I'm writing a python script that takes a mp4 file in output and outputs a WEBVTT file of the transcription. I managed to make it work, but now I'm trying to reduce the size of each subtitle lines and get closer to word-level transcriptions in WEBVTT but I'm having trouble understanding how to set the word_timestamps parameter to True when implementing Whisper in a Python script.

I understand from this snippet of code (from ilanit1997@819074f):

if not args["word_timestamps"]:
for option in word_options:
if args[option]:
parser.error(f"--{option} requires --word_timestamps True")

that you can set it using its command line argument, but I can't find out how to do it in my basic python script. (pasted it down below for reference).

import whisper
from whisper.utils import get_writer

model = whisper.load_model('base.en')

whisper.DecodingOptions(language='en', fp16='false')
audio = 'final_video.mp4'
result = model.transcribe(audio)
output_directory = "./"
word_options = {
"highlight_words": True,
"max_line_count": 50,
"max_line_width": 3
}

srt_writer = get_writer("srt", output_directory)
srt_writer(result, audio, word_options)

Sorry again if it's not the place to ask or if it's something I should be able to figure out myself, but I'm kind of stuck.
Kindly,

…ng optional (openai#1184) * Add highlight_words, max_line_width, max_line_count * Refactor subtitle generator --------- Co-authored-by: Jong Wook Kim <jongwook@openai.com>

Add highlight_words, max_line_width, max_line_count

62241c0

ryanheise marked this pull request as draft April 2, 2023 07:54

Refactor subtitle generator

663b2ee

ryanheise marked this pull request as ready for review April 2, 2023 11:55

jongwook added 2 commits April 10, 2023 20:22

Merge branch 'main' into line-char-limits

339ac46

Merge branch 'main' into line-char-limits

f492545

jongwook merged commit 43940fc into openai:main Apr 11, 2023

ahobsonsayers mentioned this pull request Apr 11, 2023

Support whisper max line width/count option m-bain/whisperX#175

Closed

Purfview mentioned this pull request Jul 7, 2023

Invalid system calls when run under Take Command Console Purfview/whisper-standalone-win#31

Closed

ryanheise deleted the line-char-limits branch November 18, 2023 03:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement max line width and max line count, and make word highlighting optional #1184

Implement max line width and max line count, and make word highlighting optional #1184

ryanheise commented Apr 2, 2023 •

edited

Loading

g-i-o-r-g-i-o commented Apr 12, 2023 •

edited

Loading

gGedgs645 commented Apr 29, 2023

ryanheise commented Apr 29, 2023

ovidb commented May 10, 2023

makseem77 commented Jul 3, 2023

Implement max line width and max line count, and make word highlighting optional #1184

Implement max line width and max line count, and make word highlighting optional #1184

Conversation

ryanheise commented Apr 2, 2023 • edited Loading

g-i-o-r-g-i-o commented Apr 12, 2023 • edited Loading

gGedgs645 commented Apr 29, 2023

ryanheise commented Apr 29, 2023

ovidb commented May 10, 2023

makseem77 commented Jul 3, 2023

ryanheise commented Apr 2, 2023 •

edited

Loading

g-i-o-r-g-i-o commented Apr 12, 2023 •

edited

Loading