-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exponents calculation in positional encoding #12
Comments
@enhuiz thanks for another one feedback. I agree with you that it should be |
Hi @ivanvovk, I have tried to fix the positional encoding and retrain the model, though the grad_norm get lower, the test result seems much worse (from both the test loss curve and the generated samples). I guess there could be some other issues, so I check the other part of the code and find here is a mismatch between the implementation and the paper (formula 11). WaveGrad/model/diffusion_process.py Line 103 in d230621
An sqrt() seems lost here. I'll fix it and try again. |
@enhuiz yeah, I fixed PE and got the same problems. And yeah, you're right, sqrt() is missed here, need to fix it also. |
@enhuiz seems like for me sqrt() update solves the problem and now test samples look good. How it does for you? |
The loss curve looks better than the previous one, l1_spec_test_batch_loss and total loss are lower, l1_test_batch_loss is higher which is acceptable as it is measure on the audio wave. Training grad and total loss are both lower. I think I'm still in the early stage. I use niters=1000 to train and niters=50 to test, the audio quality of the fixed version seems not significantly better than the previous one. |
WaveGrad/model/diffusion_process.py Line 116 in d230621
I find changing this t to t+1 helps remove the noise in the generated sample after fix pe and sqrt, you may check the following samples: I guess here we need the current sqrt cumprod instead of the previous one. |
@enhuiz yes, I agree! For me this change improved the quality even more! Somehow missed it when made the implementation... Thanks for revealing all these bugs, man, I really appreciate it. Now it seems to work fine. |
@ivanvovk Good to know it, you are welcome! |
Closing issue since it is solved. |
WaveGrad/model/linear_modulation.py
Lines 21 to 24 in f59d4bd
At line 22, the exponents are calculated as:
exponents = exponents ** 1e-4
instead ofexponents = 1e-4 ** exponents
from the original transformer paper.This makes the values very closed to each other at different dimensions, I plot an example using
exponents = exponents ** 1e-4
, with noise levellinspace(0, 1, 250)
andn_channels=512
.After changing to
exponents = 1e-4 ** exponents
, the positional encoding looks fine:The strange thing is that even with the current position coding, the model seems to be trained well. I tried to train on LibriTTS, and the generated speech sounds okay to me. I'll try to switch to the latter and see whether there will be an improvement.
The text was updated successfully, but these errors were encountered: