-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
text embedding dimension #23
Comments
Hi, you can try the trick add blank token between any two input tokens. My experiment in Chinese shows that this trick can improve pronunciation significantly. |
Hi @shahuzi , I'm very happy that you are also interested in traininng glow-tts by using Mandarin datasets and thank you very much for your suggestion. I will try it! By the way, because I encountered some alignment problems before (e.g. some words are always missing at the inference time) and I'm not sure whether I gave the model right input sequences. Could you kindly tell me what does your input sequences look like? Are you also using phonemes as I mentioned above? Do you use prosodic labels (e.g. "#1 #3 #4 #5" which stands for the pause in a sentence)? |
@Charlottecuc 不好意思,没有及时回复你。 |
@shahuzi 您好,方便提供几个你用glow-tts合成的音频样例吗?十分感谢 |
由于涉及到数据安全问题,我没法给你提供demo,见谅。目前我的结论是:对于播报式的音库,可以正常地合成,对于表现力很丰富的音库,合成会出问题。 |
Hi. Since I tried to train Glow-TTS using Mandarin datasets, there are about 300 symbols in
symbols.py
. Therefore, it seems that I need to increase the text embedding depth. I notice that in your paper, you mentioned that:Does the
Embedding Dimension
here stands for "text embedding dimension"?If it is, which parameter here should I modify,
hidden_channels
, orhidden_channels_enc
?Thank you very much!
The text was updated successfully, but these errors were encountered: