-
-
Notifications
You must be signed in to change notification settings - Fork 319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems in tokenizing LibriTTS #5
Comments
datasets fix a23f7c9 |
Looks like the command is fixed but the "words count mismatch" warning still exists. I only found that it might be caused by some inconsistency between text files and phonemes. Will it actually influence the result? |
I haven't figured out these warnings yet, I checked the results and there is nothing wrong. |
@jry-king Hi, i meet the same warning info, i assume the reason is espeak tokenizer frequently take many words into one phone, e.g. ' of the' to 'ʌvðə'. |
can you provide more cases? we need dig into https://github.com/bootphon/phonemizer. |
it's hard to count them all. i can give you some cases. |
errors shoud be fixed
|
i tried to fix it, but failed. training good means the warning doesn't influent the training and inference? |
Thanks for your reproduction of the VALL-E paper! When I tried to prepare LibriTTS data with prepare.sh I encountered this problem:
I'm only using 4 tar files (train-clean-100, train-clean-360, test-clean and dev-clean) out of 7 in LibriTTS. Could you give me some suggestions about what's going on? Thanks!
The text was updated successfully, but these errors were encountered: