Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

severe metallic sound #3

Open
GuangChen2016 opened this issue May 18, 2022 · 9 comments
Open

severe metallic sound #3

GuangChen2016 opened this issue May 18, 2022 · 9 comments

Comments

@GuangChen2016
Copy link

GuangChen2016 commented May 18, 2022

Hi, thanks for your nice jobs. I used your codes for ny own datasets and the synthesized voices seems not that normal at 160K steps now. Though we could still figure out what's being saied, the spectrum is unnormal (especially the high frequency part, as you can see from the following figures.) with severe metallic sound. I have double checked the feature extraction process and the training process, and all are normal. Do you know any reason about it? BTW, how many steps are required to train the LJSpeech model?
image

Thanks again.

@keonlee9420
Copy link
Owner

Hi @GuangChen2016 , thanks for your attention. Yes, I met the same issue and I'm still figuring out what's going on. In my opinion, it may be from feature matching loss since it shows extremely lower value as time goes, which is not that normal under GAN training. Of course some configurations or model architectures may cause the issue, too. I'd update the project if I could handle it, but It would be better if you can contribute by your side! Thanks.

@BridgetteSong
Copy link

BridgetteSong commented Jun 9, 2022

Hi @GuangChen2016 , thanks for your attention. Yes, I met the same issue and I'm still figuring out what's going on. In my opinion, it may be from feature matching loss since it shows extremely lower value as time goes, which is not that normal under GAN training. Of course some configurations or model architectures may cause the issue, too. I'd update the project if I could handle it, but It would be better if you can contribute by your side! Thanks.

@keonlee9420 In your code, GAN's inputs is always predicted mels, which I think will cause unstable training. I think in initial training stage, we use true mels and compute loss between mel_true and mel_predict, and after some steps, we use predicted mels. Maybe this can help some.

@mayfool
Copy link

mayfool commented Jul 6, 2022

Hi @GuangChen2016 , thanks for your attention. Yes, I met the same issue and I'm still figuring out what's going on. In my opinion, it may be from feature matching loss since it shows extremely lower value as time goes, which is not that normal under GAN training. Of course some configurations or model architectures may cause the issue, too. I'd update the project if I could handle it, but It would be better if you can contribute by your side! Thanks.

@keonlee9420 In your code, GAN's inputs is always predicted mels, which I think will cause unstable training. I think in initial training stage, we use true mels and compute loss between mel_true and mel_predict, and after some steps, we use predicted mels. Maybe this can help some.

I tried to train generator first for 50k steps, but it didn't work.I will try to train as you said.Hope it can solve this problem.

@BridgetteSong
Copy link

BridgetteSong commented Jul 6, 2022

Hi @GuangChen2016 , thanks for your attention. Yes, I met the same issue and I'm still figuring out what's going on. In my opinion, it may be from feature matching loss since it shows extremely lower value as time goes, which is not that normal under GAN training. Of course some configurations or model architectures may cause the issue, too. I'd update the project if I could handle it, but It would be better if you can contribute by your side! Thanks.

@keonlee9420 In your code, GAN's inputs is always predicted mels, which I think will cause unstable training. I think in initial training stage, we use true mels and compute loss between mel_true and mel_predict, and after some steps, we use predicted mels. Maybe this can help some.

I tried to train generator first for 50k steps, but it didn't work.I will try to train as you said.Hope it can solve this problem.

in my recent experiment, I found what I said can't solve the problem. In the training, I found mel_loss of vocoder is very big, I think use acoustic model outputs as inputs of vocoder will increase the difficulty of training. So now I add a Normalized Flow with the same as VITS, I get amazing results.

@mayfool
Copy link

mayfool commented Aug 26, 2022

Hi @GuangChen2016 , thanks for your attention. Yes, I met the same issue and I'm still figuring out what's going on. In my opinion, it may be from feature matching loss since it shows extremely lower value as time goes, which is not that normal under GAN training. Of course some configurations or model architectures may cause the issue, too. I'd update the project if I could handle it, but It would be better if you can contribute by your side! Thanks.

@keonlee9420 In your code, GAN's inputs is always predicted mels, which I think will cause unstable training. I think in initial training stage, we use true mels and compute loss between mel_true and mel_predict, and after some steps, we use predicted mels. Maybe this can help some.

I tried to train generator first for 50k steps, but it didn't work.I will try to train as you said.Hope it can solve this problem.

in my recent experiment, I found what I said can't solve the problem. In the training, I found mel_loss of vocoder is very big, I think use acoustic model outputs as inputs of vocoder will increase the difficulty of training. So now I add a Normalized Flow with the same as VITS, I get amazing results.

Only add Normalized Flow like postnet? Or also add posterior encoder as vits?

@skyler14
Copy link

Hi @GuangChen2016 , thanks for your attention. Yes, I met the same issue and I'm still figuring out what's going on. In my opinion, it may be from feature matching loss since it shows extremely lower value as time goes, which is not that normal under GAN training. Of course some configurations or model architectures may cause the issue, too. I'd update the project if I could handle it, but It would be better if you can contribute by your side! Thanks.

@keonlee9420 In your code, GAN's inputs is always predicted mels, which I think will cause unstable training. I think in initial training stage, we use true mels and compute loss between mel_true and mel_predict, and after some steps, we use predicted mels. Maybe this can help some.

I tried to train generator first for 50k steps, but it didn't work.I will try to train as you said.Hope it can solve this problem.

in my recent experiment, I found what I said can't solve the problem. In the training, I found mel_loss of vocoder is very big, I think use acoustic model outputs as inputs of vocoder will increase the difficulty of training. So now I add a Normalized Flow with the same as VITS, I get amazing results.

Can you post your checkpoint so we can see what amazing results look like?

@keonlee9420
Copy link
Owner

Hey guys, thank you all for your great efforts and discussion.

I've been resolving that issue, and finally make it work! Currently, I'm building a new open-source tts project for the general purpose, which is improved a lot and much easier to use, and I will share it soon including what @BridgetteSong suggested as well. Please stay tuned!

@skyler14
Copy link

Hey guys, thank you all for your great efforts and discussion.

I've been resolving that issue, and finally make it work! Currently, I'm building a new open-source tts project for the general purpose, which is improved a lot and much easier to use, and I will share it soon including what @BridgetteSong suggested as well. Please stay tuned!

I was wondering if you had some general advice for radtts, it seems you implemented that into your code base but driving something even with well-trained models has been a daunting task.

@15755841658
Copy link

@keonlee9420 How to solve this problem? I encountered the same synthesis result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants