-
Notifications
You must be signed in to change notification settings - Fork 499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot reproduce as good audios as the demo shows #45
Comments
Just to make sure, this should be 256 in your case, right? I'm wondering if exponential parameter averaging matters. It may improve quality of final model but will take more training time to converge than the case it's disabled. Can you try with the following parameters?
Also, If you want to reproduce my experimental settings, try:
|
@r9y9 , Thank you very much! Yeah, I did use out_channels=10*3, sorry for the miss. |
Ah, you need another important change:
|
Yeah, It's right! I guess I should go over though all your codes first. Thank you very much! |
@r9y9 Apologise for having so many questions. I used your codes, your hparams settings and the checkpoints to synthesise LJspeech wavs. However, the quality is not as good again as your demo shows. Attached is some generated wavs. Eager to see your suggestions. |
You might want to try fineturning your model. As I mention at #1 (comment), I did finetune the model many times. i.e., train 200k steps -> (change some hyper param and let's see how it works) -> train 200k step (lr starts from initial value) -> ... repeated. This might lead faster convergence. As far as I remember correctly I trained the model over 1000k steps. |
I retrained. The performance is relatively good now. Thank you! |
@r9y9 so if you stop, then start again from checkpoint, the learning rate resets to initial scheme? |
Yes exactly. |
Ah okay, that explains why the loss can jump up, then settle back down again. Thanks! |
@r9y9 can you give an example of what changes you made? i.e. what kind of lr made good progress? |
I was trying to find a good value for Line 51 in 1813ea8
|
I'm closing this. Feel free to reopen if you still see the problem. |
Hi, @r9y9, thanks for sharing this great repo. I have run your code using CMU ARCTIC clb data to train a single-speaker wavenet. I think I have set all the model hparams as the same as you show on your demo website. But the test audios coming out of the 170k-step checkpoint still contain many noises. Below is the haprams, and I have made other codes untouched.
Hope you can help me, thanks.
input_type="mulaw",
quantize_channels=256, # 65536 or 256
The text was updated successfully, but these errors were encountered: