Skip to content

Subtitles sometimes go out of sync #89

Answered by jongwook
athu16 asked this question in Q&A
Discussion options

You must be logged in to vote

This is one of the failure mode of the hacky long-form heuristics (in transcribe.py and discussed in Section 4.5), where the timestamp offsets sometimes accumulate over time, because the transcription from the previous 30-second window including the timestamps are fed to the model as conditioning input. This is currently controlled by a currently hard-coded constant here:

if result.temperature > 0.5:
# do not feed the prompt tokens if a high temperature was used
prompt_reset_since = len(all_tokens)

and you can modify this block to always reset the context to mitigate the tendency of going out of sync.

In…

Replies: 3 comments 12 replies

Comment options

You must be logged in to vote
1 reply
@turnkit
Comment options

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
11 replies
@jltchiu
Comment options

@akashmjn
Comment options

@jongwook
Comment options

@coder543
Comment options

@dillfrescott
Comment options

Answer selected by jongwook
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
9 participants