Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement max line width and max line count, and make word highlighting optional #1184

Merged
merged 4 commits into from
Apr 11, 2023

Conversation

ryanheise
Copy link
Contributor

@ryanheise ryanheise commented Apr 2, 2023

This implementation is based on word_timestamps and so it requires that option to be turned on. Word highlighting has also been made optional and turned off by default.

Examples:

# Each subtitle has a maximum of 2 lines wrapped to 47 characters:
whisper --word_timestamps True --max_line_width 47 --max_line_count 2 audio.mp3
# Same with word highlighting:
whisper --word_timestamps True --max_line_width 47 --max_line_count 2 --highlight_words True audio.mp3
# Each subtitle preserves the original Whisper segment but wrapped to 47 character lines:
whisper --word_timestamps True --max_line_width 47 audio.mp3
# Same with word highlighting:
whisper --word_timestamps True --max_line_width 47 --highlight_words True audio.mp3

When --max_line_count is specified, subtitles will be segmented at the line limit, or when there is a pause in the speech. This overcomes segmentation artifacts that can occur at window boundaries mid sentence.

This segmentation approach works better in conjunction with #1114 because that PR fixes some bugs with timestamp accuracy near segment boundaries where boundary words are stretched to cover pauses, making it harder for the current PR to detect those pauses. In the meantime, the current PR detects pauses by measuring the distance between the start timestamps of successive words rather than between the end timestamp of the previous word and the start timestamp of the current word.

@ryanheise ryanheise marked this pull request as draft April 2, 2023 07:54
@ryanheise ryanheise marked this pull request as ready for review April 2, 2023 11:55
@g-i-o-r-g-i-o
Copy link

g-i-o-r-g-i-o commented Apr 12, 2023

Suddenly whisper transcribe started to misbehave, probably something related to line length?

My transcribed text used to look like this, which is good:

New line here.
And another new line.
And another,
which continues here

Now instead it looks like this, which is very bad:

New line here. And 
another new line. And 
another, which continues 
here

What option do I need to add to this minimal code to bring back the old behaviour?

I think that the default behaviour should stay identical to the past, since now the output is all messed up.

thanks for any help

# load the entire audio file
audio = whisper.load_audio(audio_in)

options = {
    "language": "en", # input language, if omitted is auto detected
    "task": "transcribe", # "translate" or "transcribe" if you just want transcription
    'fp16': True, 
    'verbose': True
}

result = whisper.transcribe(model, audio, **options)


@gGedgs645
Copy link

I Agree with @GianniGi would be great to have an option to force a line break at the end of the sentence.

@ryanheise
Copy link
Contributor Author

This pull request is not about sentence segmentation, it is about wrapping subtitles to a maximum line width and line count.

If you are interested in the discussion about sentence segmentation, it is over here: #1243

zackees pushed a commit to zackees/whisper that referenced this pull request May 5, 2023
…ng optional (openai#1184)

* Add highlight_words, max_line_width, max_line_count

* Refactor subtitle generator

---------

Co-authored-by: Jong Wook Kim <jongwook@openai.com>
@ovidb
Copy link

ovidb commented May 10, 2023

Do we know when this will make it into the API? I can't see it in the docs yet

ilanit1997 pushed a commit to ilanit1997/whisper that referenced this pull request May 16, 2023
…ng optional (openai#1184)

* Add highlight_words, max_line_width, max_line_count

* Refactor subtitle generator

---------

Co-authored-by: Jong Wook Kim <jongwook@openai.com>
@makseem77
Copy link

Hello, sorry in advance if this is not the right place to ask, but I'm writing a python script that takes a mp4 file in output and outputs a WEBVTT file of the transcription. I managed to make it work, but now I'm trying to reduce the size of each subtitle lines and get closer to word-level transcriptions in WEBVTT but I'm having trouble understanding how to set the word_timestamps parameter to True when implementing Whisper in a Python script.

I understand from this snippet of code (from ilanit1997@819074f):

if not args["word_timestamps"]:
for option in word_options:
if args[option]:
parser.error(f"--{option} requires --word_timestamps True")

that you can set it using its command line argument, but I can't find out how to do it in my basic python script. (pasted it down below for reference).

import whisper
from whisper.utils import get_writer

model = whisper.load_model('base.en')

whisper.DecodingOptions(language='en', fp16='false')
audio = 'final_video.mp4'
result = model.transcribe(audio)
output_directory = "./"
word_options = {
"highlight_words": True,
"max_line_count": 50,
"max_line_width": 3
}

srt_writer = get_writer("srt", output_directory)
srt_writer(result, audio, word_options)

Sorry again if it's not the place to ask or if it's something I should be able to figure out myself, but I'm kind of stuck.
Kindly,

abyesilyurt pushed a commit to abyesilyurt/whisper that referenced this pull request Nov 13, 2023
…ng optional (openai#1184)

* Add highlight_words, max_line_width, max_line_count

* Refactor subtitle generator

---------

Co-authored-by: Jong Wook Kim <jongwook@openai.com>
@ryanheise ryanheise deleted the line-char-limits branch November 18, 2023 03:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants