-
Notifications
You must be signed in to change notification settings - Fork 907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Effects on WaveNet predicted wavs #57
Comments
I think it's normal that the Wavenet loss is shaky, log wavs sound better than eval wavs simply because the model is making predictions conditioned on ground truth during training, while eval wavs are synthesized sequentially which means the model is conditioned on its previous outputs. Since wavenet is still at early stages of training, conditioning on previously sampled outputs will cause loss cumulation and thus trash wavs.. |
Should the hyper parameter |
@begeekmyfriend that plot in r9y9/wavenet_vocoder#79 is trained with real wav not GTA ,and GTA training wavplot for me is all mass when 50K step |
It seems failure to convergence within 36K steps on 10h dataset on WaveNet model. @Rayhane-mamah You have found ways to reduce the quantity of training data on Tacotron. I still need to find similar ways to get to convergence on small dataset on WaveNet... |
I have used 34h |
@begeekmyfriend what‘s the step and lost in your latest post? |
@begeekmyfriend what mode do u use? raw or mu-law-quantize? |
I am just using the raw mode and I have not looked into the code closely. |
The same happened to my WaveNet training (tested until step 250000 ). Note that, I used the groundtruth Mels to train the vocoder instead of these force-aligned Mels synthesised via Tacotron. Under logs-WaveNet/plots, the Target waveform is almost identical with Prediction waveform, this is also confirmed by listening to audio clips under logs-WaveNet/wavs. Yet, the results under logs-WaveNet/eval-dir/plot are not converged, though the envelope of the predicted signal does look similar to that of the target. However, the predicted audio clips under logs-WaveNet/eval-dir/wavs sound totally a mess---total noise. |
@QLQL Do u use the raw mode? |
@QLQL Please use r9y9's original repo. |
In my case, I am training WaveNet on mel spectrograms of SpectrogramNet with gta is on always. I found that mel spectrograms from SpectrogramNet is really bad when gta is turned off for synthesis. This affects WaveNet's quality, too. |
@a3626a STFT generated, ground truth, both for this repo and r9y9's, on 10h long dataset, and the evaluation results are quite different as you can see in #57 (comment). |
@azraelkuan Yes, I was using raw mode. But I will also test with r9r9's repo as suggested by @begeekmyfriend . BTW, @begeekmyfriend did you manage to train the WaveNet Vocoder part with Rayhane-mamah repo? If so, how many steps did you use, and what is your batch size? I can cope with batch size of 2 instead of the default 4 due to OOM problem. |
@QLQL It failed to convergence with Rayhane's wavenet vocoder when it got to 360K steps. I trained it with batch size as 3 with 11GB GTX 1080ti. |
indeed, there are servious problem in incremental step, Tacotron-2/wavenet_vocoder/models/modules.py Lines 162 to 166 in e2f9780
in the while_loop , when we call incremental_step, the queue will be defined as None, so we don't get the correct convolution queue, any way to slove the problem?
hope somebody can raise some better solutions! |
@begeekmyfriend I waited three days with another ~200 k steps, and there is no improvement in the loss, still between 6--7, as also shown in the following example under eval-dir/plots. I assume, maybe that has something to do with the issue mentioned by @azraelkuan ? |
@QLQL i think u can predict a wav just use the |
@azraelkuan , Yes. As quoted in @Rayhane-mamah reply earlier: the train mode is
In real synthesis applications (eval mode), we don't have groundtruth samples. We have to rely on previous predicted samples. |
I have tested value of
|
@QLQL |
@a3626a , Thank you very much for the nice suggestion! Didn't think about that earlier! |
@a3626a i found a much better way for this problem, we can set a input_buffer list to the while_loop, and return the input_buffer back at the incremental step, i test this, it works well. |
@azraelkuan Have you get acceptable result after using input_buffer?And ,does the input_buffer list contains the convolution_queue for every dilation layers? Thanks |
@JK1532
|
@azraelkuan, When using tf.layers.Conv1D, are you able to get its kernel and bias without problem during synthesis mode? |
Ah yes this is very clever @azraelkuan! Really well thought! I will try it out for train+eval and synthesis time, if everything goes well and results are also correct I will most probably switch to this approach. If you get samples in the meantime using your code, feel free to share and suggest a PR ;) |
@Rayhane-mamah yeah,i have finish my exam, so i will fix the bugs in my code these days. :) |
@Rayhane-mamah hi, i'm training the WaveNet with your revised code, feeding raw audio input and the eval result seems not good at 20k steps. i wonder if it will be good to train more steps? Meanwhile, i have trained the ibab's wavenet with local condition, mu-quantize audio and mel-spec input, and it can generate correct audio after a days training. i'm not satisfied with the acoustic quality so i try to use your code, raw input and bigger net. i'm confused about the eval result, is the revised still not right or i just need to train it more steps? thanks ~ |
@HyperGD1994 yes, there is problem in the eval code, i try to fix the bugs, but like @a3626a, still can not get good result. |
@azraelkuan have you tried ibab's generate method, will that be more brief? |
I found it gives good result wav in training sample but would stuck when run synthesize with pretrained model ,not sure it's related with the problem you found in synthesizer.py in wavnet_vocoder
|
@azraelkuan @butterl Can you open your fork for every one interested in it? |
@azraelkuan wow that's wonderful, may i ask how many steps have you trained to get this? do you input only mel spec or with audio file too? |
@HyperGD1994 this result use test inputs,only 1500 steps, i am testing the real evaluating step |
@begeekmyfriend @HyperGD1994 I used the Head of repo and just suit for THCHS30 dataset, and the waveplot is from the eval in the training of wavenet, the training step of tacotron is 100k and wavenet 160K @azraelkuan any suggestion on real evaluating step modification ? I can only see it stuck in tqdm 0% |
@butterl i have same problem with you, do you have solve it? |
@azraelkuan @begeekmyfriend@HyperGD1994 I also encountered the same problem. Do you have the final solution? Looking forward to your help。 |
Has anyone found a solution? |
@UESTCgan what's your problem? tqdm 0%? I do not use the whole code, i just use part of it, so i did not encounter this problem, but i think this bug will not be difficult to fix, you can try to debug it. as for the true evaluate problem, have @azraelkuan fix it? the raw input is so hard to train, and the mu-law input seems to have a lot of bugs. However, i modified ibab code with local condition and a bigger net, and i finally got wonderful result with mu-law input. I suggest that you guys can try that. |
Indeed, the answer has been given https://github.com/Rayhane-mamah/Tacotron-2/files/2145382/models.tar.gz, because training a mol model need much time about two weeks, so i don't test this, but in my test, the mu law works well. |
@azraelkuan Thank you for your answer. |
@HyperGD1994 Thank for your help!Based on my current 20h Chinese data set, I have trained 240K times. The predicted values produced during training sound good, but only noise is obtained when synthesizing sound. I will try the latest code next. |
@azraelkuan Did you test in LJspeech dataset? In my case using 'mulaw', it makes bad wave files with noise. |
I had a mistake. I'm testing with the parameters in above images. @azraelkuan Thank you for your advice. |
Thank you all for your valuable contributions, this issue has been fixed with latest commit. If any further problems appear, feel free to open new issues :) |
It seems the decreasing of loss during WaveNet training unsteady. Is it all right for the results or should I wait more steps? The predicted wavs under
logs-WaveNet/wavs
sound OK but the ones underlogs-WaveNet/eval-dir/wavs
sound like a mess...The text was updated successfully, but these errors were encountered: