Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

同一个文本内容,多次转换结果不一致问题 #45

Closed
yannier912 opened this issue Dec 15, 2020 · 4 comments
Closed

同一个文本内容,多次转换结果不一致问题 #45

yannier912 opened this issue Dec 15, 2020 · 4 comments

Comments

@yannier912
Copy link

作者您好!
我在测试的时候发现,之前用d8训练好的tacotron模型,通过tacotron_synthesize.py直接合成语音。
开始trim_top_db=22,合成语音A;然后改为25,合成语音B;然后又改回22,合成语音C和A明显不同;再改回25,合成语音D和B也明显不同。这样来回试了多次,每次都不一样。
请问这是什么原因呢?

@lturing
Copy link
Owner

lturing commented Dec 15, 2020

在运行tacotron_synthesize.py,合成的wav跟trim_top_db无关。
造成这个的原因是tacotron decoder 中的prenet,prenet中有dropout,在train和inference阶段,dropout都没有关闭,对应代码,导致相同的输入,合成的mel频谱都有点不一样。

@lturing lturing closed this as completed Dec 15, 2020
@yannier912
Copy link
Author

@lturing
哦哦了解了!感谢!~
我试一下inference时候把dropout设置为1,看下效果。

@lturing
Copy link
Owner

lturing commented Dec 15, 2020

inference 阶段dropout还不能关闭,dropout会使得prenet输出的信息有损失,迫使模型依赖attention中的context(alignment很好),不过你可以试试。
如果在training阶段也把prenet关掉,那么很有可能学不到alignment

@yannier912
Copy link
Author

@lturing
嗯是的……那只能dropout开着了。
finetune的模型效果比较好的话,dropout影响不明显,模型效果不太好dropout就还蛮影响的。
感谢解惑!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants