New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't currently support long audio ? #25
Comments
Hi I think I made a typo there. What I meant is it tends to make more mistakes when the audio is longer than 15 seconds, but I typed 15 ms instead. The reason is that the training data are sentences shorter than 15 seconds, so the model might not generalize well to long audios. It's hard for me to detect the problem in your code based on incomplete knowledge. Does the example we give work for you? Could you share the actual input text and audio? |
test code :
Audio file link : https://file.viggo.site/temp/4.wav thank you for your reply . |
Thank you for sharing. I think there are two problems here. Second, the audio is too long. Both the speech model and the alignment algorithm have a complexity of log(O^2), it is very difficult for them to handle very long audios. Processing long audio is still a challenge for speech research, so I do not have a good solution. Segmenting the audio into shorter clips will help. |
“Charsiu works the best when your files are shorter than 15 ms. Test whether your files are longer than 15ms”
I saw this hint in the description and tested it.
Forcing alignment of long audio, the following error message will appear:
Traceback (most recent call last): File "test.py", line 31, in <module> charsiu.align(audio=audio, text=text) File "E:\***/python/charsiu/charsiu/src\Charsiu.py", line 157, in align pred_words = self.charsiu_processor.align_words(pred_phones,phones,words) File "E:\***/python/charsiu/charsiu/src\processors.py", line 417, in align_words word_dur.append((dur,words_rep[count])) #((start,end,phone),word) IndexError: list index out of range
The text was updated successfully, but these errors were encountered: