Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transcript: Support distribute crowded words in timeline #163

Closed
winlinvip opened this issue Mar 10, 2024 · 3 comments
Closed

Transcript: Support distribute crowded words in timeline #163

winlinvip opened this issue Mar 10, 2024 · 3 comments
Assignees

Comments

@winlinvip
Copy link
Member

Dwayne:

Image

Winlin:

This is not a bug in FFmpeg, but rather, the issue arises because Whisp recognized too many words and did not distribute them evenly throughout the timeline, causing them to accumulate all at once.

Reproduce this issue by this video: https://youtu.be/NONRDS7Rpjg

Image

A 15 segment to reproduce this issue:

rapid-speech.mp4

This type of interview program is quite common, where multiple people speaking without pauses can lead to the AI recognizing the voice as continuously speaking for over ten seconds.

@winlinvip
Copy link
Member Author

winlinvip commented Mar 10, 2024

First of all, SRS Stack will write LF when subtitle is too long, for example, if OpenAI whisper response is:

0
00:00:00,550 --> 00:00:15,839
For today's tech check. So tell us about the details of this report. I know Huawei is obviously a very big competitor. Yeah. And that's small but growing. Let's get it that way. But the headline here is counterpart research looked at the first six weeks of smartphone sales in China compared it to a

SRS Stack will convert to:

0
00:00:00,550 --> 00:00:15,839
For today's tech check. So tell us about the
details of this report. I know Huawei is
obviously a very big competitor. Yeah. And
that's small but growing. Let's get it that
way. But the headline here is counterpart
research looked at the first six weeks of
smartphone sales in China compared it to a

It will cause the subtitle very long, bellow is the result:

output-LF-by-SRS-Stack.mp4

Actually, FFmpeg libass will do the work, so we only need to simply use the output of whipser, bellow is the example:

output-1subtitle.mp4

I think it should fix almost all common cases.

@winlinvip
Copy link
Member Author

winlinvip commented Mar 10, 2024

Input file:

rapid-speech.mp4

By FFmpeg:

ffmpeg -i input.mp4 -vf "subtitles=input.srt:force_style='Alignment=2,MarginV=20'" \
    -vcodec libx264 -profile:v main -preset:v medium -tune zerolatency  -bf 0  \
    -acodec aac -copyts -y output.mp4

Sometime, OpenAI whisper response with:

0
00:00:00,550 --> 00:00:15,839
For today's tech check. So tell us about the details of this report. I know Huawei is obviously a very big competitor. Yeah. And that's small but growing. Let's get it that way. But the headline here is counterpart research looked at the first six weeks of smartphone sales in China compared it to a

The result is bellow:

output-1subtitle.mp4

Sometimes, it responses:

0
00:00:00,550 --> 00:00:06,629
For today's tech check. So tell us about the details of this report. I know Huawei is obviously a very big competitor. Yeah. And that's

1
00:00:07,350 --> 00:00:13,829
small but growing. Let's get it that way. But the headline here is counterpart research looked at the first six weeks of smartphone

2
00:00:13,829 --> 00:00:15,789
sales in China compared it to. 

The result is bellow:

output-3subtitles.mp4

In most situations, OpenAI Whisper will generate multiple subtitles. If it doesn't, we might have to create them ourselves, which could be risky due to the potential for introducing bugs. Therefore, I would avoid doing this unless absolutely necessary.

@winlinvip
Copy link
Member Author

winlinvip commented Mar 10, 2024

Also add a Segments parameters in Fix Queue:

image

User can clear the subtitle if the subtitle is too long.

image

Also show the data in overlay queue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

1 participant