-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generated subtitles are too long #52
Comments
There isn't any logic that I added for determining the length of the segments. They are whatever the model produces. I have a few guess of what could be causing the difference that you observe:
I think I've seen reports by some people that the original implementation also sometime produces very long segments. So it could be just a matter of chance - not much we can do here. Additionally, I have tried adding logic for forcing the segments to not become very long by forcing a timestamp token to be sampled when the segment becomes long. In general it helps, but sometimes I observed errors and decided to not include it. Hope this helps. |
@ggerganov , @R4ZZ3 I'm having problems with long subtitles too, I am transcribing french and german and in those languages it happens too, specially if the person is speaking too fast.
I will make more tests with different dithering parameters and report back. |
I too have this issue the subtitles are far too long with the C++ version of Whisper, @Topping1 I played with this and is there a way to get the text to be shorter? |
Does it help if you change the following line: Line 2383 in b81a81d
to: int n_take = std::min(16, int(prompt_past.size())); Make sure to try latest |
@ggerganov I've tried the above and it doesn't make a difference, the text is still the same length. |
Force timestamp token to be sampled if the probability sum over all timestamp tokens is above the probability of any other token
I just implemented the timestamp probability rule used in the original whisper implementation: For my audio samples, the length of the segments has significantly become shorter. |
Just fetched the updates and recompiled. |
This one works better and doesn't generate long subtitles, (at least for my video) great job @ggerganov! |
Force timestamp token to be sampled if the probability sum over all timestamp tokens is above the probability of any other token
Force timestamp token to be sampled if the probability sum over all timestamp tokens is above the probability of any other token
Change get_logits to return a single slice
Hi and first of all thanks for creating this implementation.
I am trying to create a new version out of this one https://huggingface.co/spaces/Finnish-NLP/Whisper-ASR-youtube-subtitles
I have this now working locally but now I spotted that generated subtitles are way too long.
![image](https://user-images.githubusercontent.com/25264037/195977600-b30e9801-6d0a-4820-aca8-d50a111b4942.png)
On the right this cpp implementation. Original pytorch implementation (small model) on the left.
Is it a "feature" of this implementation or what is going on here?
The text was updated successfully, but these errors were encountered: