Skip to content
Discussion options

You must be logged in to vote

Whisper is not very accurate on singing voices in general. By using the large model and giving a prompt (of the words preceding the audio), it gets pretty close, except the last line where it hallucinated and got the timestamp wrong.

root@devbox-0:~$ whisper --model large 196658015-54bed2d2-218b-414a-8010-43c2021fe8fa.mp4 --language ja --initial_prompt "ふわふわる"
[00:00.000 --> 00:06.000] ふわふわり あなたが笑っている それだけで笑いになる
[00:06.000 --> 00:14.000] 神様ありがとう 運命のイタズラでも
[00:14.000 --> 00:31.000] 眩暮らし

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by jongwook
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants