Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new option to generate subtitles by a specific number of words #1729

Merged
merged 7 commits into from
Nov 6, 2023

Conversation

amolinasalazar
Copy link
Contributor

@amolinasalazar amolinasalazar commented Oct 22, 2023

*Updated according jonwook code review

Added a new word option called --max_words_per_line that will generate subtitles setting a maximum limit of words per segment. This could sound similar to --max_line_width option, but the results are more pleasent for readers IMHO. Here a couple of comparisons using .SRT files:

max_word_per_line
Notice that --max_words_per_line works as an upper bound of words, but still it will respect the segments in the way that end of sentences can have less words if the remaining number of words in a segment is lower than the max_words_per_line value.
i.e. Segment = [word1, word2, word3, word4, word5] and max_words_per_line = 3
=>Result = [word1, word2, word3] and [word4, word5]

This is not the behaviour we can see using --max_line_width that can leave bigger gaps of time when joining end and beginning of segments:

comparison

Subtitles generated with --max_words_per_line look similar of what we can see in Shorts, Reels and other short duration videos.

This is my first contribution, so feel free of changing/comment/improve anything.

Additional notes

  • The use of --max_line_width will disable the effects of --max_words_per_line.
  • Manually tested using Python and cli and checked results in .srt and .vtt files (.txt. and .tsv files won't be affected).

@FurkanGozukara
Copy link

amazing

@amolinasalazar amolinasalazar marked this pull request as ready for review October 22, 2023 16:17
@khaledbkheet
Copy link

from pydub import AudioSegment

song = AudioSegment.from_mp3("good_morning.mp3")

PyDub handles time in milliseconds

ten_minutes = 10 * 60 * 1000

first_10_minutes = song[:ten_minutes]

first_10_minutes.export("good_morning_10.mp3", format="mp3")

@jongwook jongwook merged commit 6ed314f into openai:main Nov 6, 2023
8 checks passed
@FurkanGozukara
Copy link

@amolinasalazar which word count do you suggest for youtube?

@amolinasalazar
Copy link
Contributor Author

@amolinasalazar which word count do you suggest for youtube?

Actually I think that's a really personal choice and it can depend on several things.

In the end, the main factor why Reels or Shorts normally have just a couple of words on the screen at a moment is because of the aspect ratio of the videos. Having long subtitle lines for videos watched in mobile phones with a vertical orientation will fill the whole screen with words.

There are other factors like the font size, the speed of the speech or even the complexity of the context. Less words normally create dynamic and impactful videos, ideal for simple and strong messages, but it can be stressful if those have a long duration. For example, I won't set 1-3 words at a time if you are explaining a hard topic as it could be stressful to understand.

So in my opinion, you need to find a comfortable number on your own, but something between 3-6 words can be pleasant in general.

abyesilyurt pushed a commit to abyesilyurt/whisper that referenced this pull request Nov 13, 2023
…penai#1729)

* ADD parser for new argument --max_words_count

* ADD max_words_count in words_options
ADD warning for max_line_width compatibility

* ADD logic for max_words_count

* rename to max_words_per_line

* make them kwargs

* allow specifying file path by --model

* black formatting

---------

Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
yaomingamd pushed a commit to ROCm/whisper that referenced this pull request Nov 27, 2023
…penai#1729)

* ADD parser for new argument --max_words_count

* ADD max_words_count in words_options
ADD warning for max_line_width compatibility

* ADD logic for max_words_count

* rename to max_words_per_line

* make them kwargs

* allow specifying file path by --model

* black formatting

---------

Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
@demoskalifi
Copy link

can I use this using the openai whisper API? if so, how?

@Francoyy
Copy link

Francoyy commented Apr 10, 2024

The command name is --max_words_per_line and not --max_words_count (https://github.com/openai/whisper/pull/1729/files#diff-f6accbbb4ebcd3dd6815bf012490d9ba37eb89a65f2124adc95c2a39bc6941b7R422)
An example of command would be
whisper file.mp4 --language English --model large-v3 --output_format srt --word_timestamps True --max_words_per_line 6

@amolinasalazar
Copy link
Contributor Author

The command name is --max_words_per_line and not --max_words_count (https://github.com/openai/whisper/pull/1729/files#diff-f6accbbb4ebcd3dd6815bf012490d9ba37eb89a65f2124adc95c2a39bc6941b7R422) An example of command would be whisper file.mp4 --language English --model large-v3 --output_format srt --word_timestamps True --max_words_per_line 6

True, jonwook renamed this command before merging the PR. I'll update the first comment so there are no confussions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants