Exponents calculation in positional encoding #12

enhuiz · 2020-11-12T05:51:43Z

Lines 21 to 24 in f59d4bd

    
           exponents = torch.arange(half_dim, dtype=torch.float32).to(noise_level) / float(half_dim) 
        
           exponents = exponents ** 1e-4 
        
           exponents = LINEAR_SCALE * noise_level.unsqueeze(1) * exponents.unsqueeze(0) 
        
           return torch.cat([exponents.sin(), exponents.cos()], dim=-1)

At line 22, the exponents are calculated as: exponents = exponents ** 1e-4 instead of exponents = 1e-4 ** exponents from the original transformer paper.

This makes the values very closed to each other at different dimensions, I plot an example using exponents = exponents ** 1e-4 , with noise level linspace(0, 1, 250) and n_channels=512.

After changing to exponents = 1e-4 ** exponents, the positional encoding looks fine:

The strange thing is that even with the current position coding, the model seems to be trained well. I tried to train on LibriTTS, and the generated speech sounds okay to me. I'll try to switch to the latter and see whether there will be an improvement.

The text was updated successfully, but these errors were encountered:

ivanvovk · 2020-11-12T10:11:38Z

@enhuiz thanks for another one feedback. I agree with you that it should be exponents = 1e-4 ** exponents! However, I think for model it is still interpretable at some point since we cat sin and cos, which makes more or less individual (locally) encoding for noise levels at range of [0, 1]. But, obviously, what happens currently, it is not good. Please, report on your experiment with the second approach! I will also try.

enhuiz · 2020-11-13T09:27:51Z

Hi @ivanvovk, I have tried to fix the positional encoding and retrain the model, though the grad_norm get lower, the test result seems much worse (from both the test loss curve and the generated samples).

I guess there could be some other issues, so I check the other part of the code and find here is a mismatch between the implementation and the paper (formula 11).

WaveGrad/model/diffusion_process.py

Line 103 in d230621

    
           outputs = continuous_sqrt_alpha_cumprod * y_0 + (1 - continuous_sqrt_alpha_cumprod**2) * eps

An sqrt() seems lost here. I'll fix it and try again.

ivanvovk · 2020-11-13T09:48:17Z

@enhuiz yeah, I fixed PE and got the same problems. And yeah, you're right, sqrt() is missed here, need to fix it also.

ivanvovk · 2020-11-13T11:07:03Z

@enhuiz seems like for me sqrt() update solves the problem and now test samples look good. How it does for you?

enhuiz · 2020-11-14T04:48:36Z

The loss curve looks better than the previous one, l1_spec_test_batch_loss and total loss are lower, l1_test_batch_loss is higher which is acceptable as it is measure on the audio wave. Training grad and total loss are both lower.

I think I'm still in the early stage. I use niters=1000 to train and niters=50 to test, the audio quality of the fixed version seems not significantly better than the previous one.

samples-at-12k.zip

enhuiz · 2020-11-14T05:55:13Z

WaveGrad/model/diffusion_process.py

Line 116 in d230621

    
           noise_level = torch.FloatTensor([self.sqrt_alphas_cumprod_prev[t]]).repeat(batch_size, 1).to(mels)

I find changing this t to t+1 helps remove the noise in the generated sample after fix pe and sqrt, you may check the following samples:

samples-at-12k.zip

I guess here we need the current sqrt cumprod instead of the previous one.

ivanvovk · 2020-11-14T10:05:39Z

@enhuiz yes, I agree! For me this change improved the quality even more! Somehow missed it when made the implementation... Thanks for revealing all these bugs, man, I really appreciate it. Now it seems to work fine.

enhuiz · 2020-11-14T10:27:18Z

@ivanvovk Good to know it, you are welcome!

ivanvovk · 2020-11-20T17:54:48Z

Closing issue since it is solved.

ivanvovk added bug Something isn't working update labels Nov 12, 2020

ivanvovk closed this as completed Nov 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exponents calculation in positional encoding #12

Exponents calculation in positional encoding #12

enhuiz commented Nov 12, 2020

ivanvovk commented Nov 12, 2020

enhuiz commented Nov 13, 2020

ivanvovk commented Nov 13, 2020

ivanvovk commented Nov 13, 2020 •

edited

Loading

enhuiz commented Nov 14, 2020

enhuiz commented Nov 14, 2020

ivanvovk commented Nov 14, 2020

enhuiz commented Nov 14, 2020

ivanvovk commented Nov 20, 2020

Exponents calculation in positional encoding #12

Exponents calculation in positional encoding #12

Comments

enhuiz commented Nov 12, 2020

ivanvovk commented Nov 12, 2020

enhuiz commented Nov 13, 2020

ivanvovk commented Nov 13, 2020

ivanvovk commented Nov 13, 2020 • edited Loading

enhuiz commented Nov 14, 2020

enhuiz commented Nov 14, 2020

ivanvovk commented Nov 14, 2020

enhuiz commented Nov 14, 2020

ivanvovk commented Nov 20, 2020

ivanvovk commented Nov 13, 2020 •

edited

Loading