Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exponents calculation in positional encoding #12

Closed
enhuiz opened this issue Nov 12, 2020 · 9 comments
Closed

Exponents calculation in positional encoding #12

enhuiz opened this issue Nov 12, 2020 · 9 comments
Labels
bug Something isn't working update

Comments

@enhuiz
Copy link

enhuiz commented Nov 12, 2020

exponents = torch.arange(half_dim, dtype=torch.float32).to(noise_level) / float(half_dim)
exponents = exponents ** 1e-4
exponents = LINEAR_SCALE * noise_level.unsqueeze(1) * exponents.unsqueeze(0)
return torch.cat([exponents.sin(), exponents.cos()], dim=-1)

At line 22, the exponents are calculated as: exponents = exponents ** 1e-4 instead of exponents = 1e-4 ** exponents from the original transformer paper.

This makes the values very closed to each other at different dimensions, I plot an example using exponents = exponents ** 1e-4 , with noise level linspace(0, 1, 250) and n_channels=512.

Figure_1

After changing to exponents = 1e-4 ** exponents, the positional encoding looks fine:

Figure_2

The strange thing is that even with the current position coding, the model seems to be trained well. I tried to train on LibriTTS, and the generated speech sounds okay to me. I'll try to switch to the latter and see whether there will be an improvement.

@ivanvovk
Copy link
Owner

@enhuiz thanks for another one feedback. I agree with you that it should be exponents = 1e-4 ** exponents! However, I think for model it is still interpretable at some point since we cat sin and cos, which makes more or less individual (locally) encoding for noise levels at range of [0, 1]. But, obviously, what happens currently, it is not good. Please, report on your experiment with the second approach! I will also try.

@ivanvovk ivanvovk added bug Something isn't working update labels Nov 12, 2020
@enhuiz
Copy link
Author

enhuiz commented Nov 13, 2020

Hi @ivanvovk, I have tried to fix the positional encoding and retrain the model, though the grad_norm get lower, the test result seems much worse (from both the test loss curve and the generated samples).

image

I guess there could be some other issues, so I check the other part of the code and find here is a mismatch between the implementation and the paper (formula 11).

outputs = continuous_sqrt_alpha_cumprod * y_0 + (1 - continuous_sqrt_alpha_cumprod**2) * eps

An sqrt() seems lost here. I'll fix it and try again.

@ivanvovk
Copy link
Owner

@enhuiz yeah, I fixed PE and got the same problems. And yeah, you're right, sqrt() is missed here, need to fix it also.

@ivanvovk
Copy link
Owner

ivanvovk commented Nov 13, 2020

@enhuiz seems like for me sqrt() update solves the problem and now test samples look good. How it does for you?

@enhuiz
Copy link
Author

enhuiz commented Nov 14, 2020

image

The loss curve looks better than the previous one, l1_spec_test_batch_loss and total loss are lower, l1_test_batch_loss is higher which is acceptable as it is measure on the audio wave. Training grad and total loss are both lower.

I think I'm still in the early stage. I use niters=1000 to train and niters=50 to test, the audio quality of the fixed version seems not significantly better than the previous one.

samples-at-12k.zip

@enhuiz
Copy link
Author

enhuiz commented Nov 14, 2020

noise_level = torch.FloatTensor([self.sqrt_alphas_cumprod_prev[t]]).repeat(batch_size, 1).to(mels)

I find changing this t to t+1 helps remove the noise in the generated sample after fix pe and sqrt, you may check the following samples:

samples-at-12k.zip

I guess here we need the current sqrt cumprod instead of the previous one.

@ivanvovk
Copy link
Owner

@enhuiz yes, I agree! For me this change improved the quality even more! Somehow missed it when made the implementation... Thanks for revealing all these bugs, man, I really appreciate it. Now it seems to work fine.

@enhuiz
Copy link
Author

enhuiz commented Nov 14, 2020

@ivanvovk Good to know it, you are welcome!

@ivanvovk
Copy link
Owner

Closing issue since it is solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working update
Projects
None yet
Development

No branches or pull requests

2 participants