Skip to content
Discussion options

You must be logged in to vote

Empirically, I found that this tends to go away with the large model. You can also just add some lines to clamp the max timestamp to the duration of the audio. Another way is to suppress any timestamp tokens that is greater than the audio duration at the decoding stage.

Replies: 2 comments 3 replies

Comment options

You must be logged in to vote
3 replies
@tcl8273
Comment options

@tcl8273
Comment options

@jianfch
Comment options

Answer selected by jongwook
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants