fix in implementation of S-DTW backward @taras-sereda #15

taras-sereda · 2021-11-18T15:43:37Z

Hey, I've found that in your implementation of S-DTW backward, E - matrices are not used, instead you are using G - matrices and their entries are ignoring scaling factors a, b, c.
What's the reason for this?
My guess you are doing this in order to preserve and propagate gradients, because they are vanishing due to small values of a, b, c. But I might be wrong, so I'd be glad to hear your motivation on doing this.

Playing with your code, I also found that gradients are vanishing, especially when bandwitdth=None.
So I'm solving this problem by normalizing distance matrix, by n_mel_channel. And with this normalization and exact implementation of S-dtw backward I'm able to converge on overfit experiments quicker then with non-exact computation of s-dtw backward.
I'm using these SDT hparams:

gamma = 0.05
warp = 256
bandwidth = 50

here is a small test I'm using for checks:

        target_spectro = np.load('')
        target_spectro = torch.from_numpy(target_spectro)
        target_spectro = target_spectro.unsqueeze(0).cuda()
        pred_spectro = torch.randn_like(target_spectro, requires_grad=True)

        optimizer = Adam([pred_spectro])

        # model fits in ~3k iterations
        n_iter = 4_000
        for i in range(n_iter):

            loss = self.numba_soft_dtw(pred_spectro, target_spectro)
            loss = loss / pred_spectro.size(1)
            loss.backward()

            if i % 1_000 == 0:
                print(f'iter: {i}, loss: {loss.item():.6f}')
                print(f'd_loss_pred {pred_spectro.grad.mean()}')

            optimizer.step()
            optimizer.zero_grad()

Curious to hear how your training is going!
Best. Taras

Hey, I've found that in your implementation of S-DTW backward, E - matrices are not used, instead you are using G - matrices and their entries are ignoring scaling factors `a, b, c`. What's the reason for this? My guess you are doing this in order to preserve and propagate gradients, because they are vanishing due to small values of `a, b, c`. But I might be wrong, so I'd be glad to hear your motivation on doing this. Playing with your code, I also found that gradients are vanishing, especially when `bandwitdth=None`. So I'm solving this problem by normalizing distance matrix, by `n_mel_channel`. And with this normalization and exact implementation of S-dtw backward I'm able to converge on overfit experiments quicker then with non-exact computation of s-dtw backward. I'm using these SDT hparams: ``` gamma = 0.05 warp = 256 bandwidth = 50 ``` here is a small test I'm using for checks: ``` target_spectro = np.load('') target_spectro = torch.from_numpy(target_spectro) target_spectro = target_spectro.unsqueeze(0).cuda() pred_spectro = torch.randn_like(target_spectro, requires_grad=True) optimizer = Adam([pred_spectro]) # model fits in ~3k iterations n_iter = 4_000 for i in range(n_iter): loss = self.numba_soft_dtw(pred_spectro, target_spectro) loss = loss / pred_spectro.size(1) loss.backward() if i % 1_000 == 0: print(f'iter: {i}, loss: {loss.item():.6f}') print(f'd_loss_pred {pred_spectro.grad.mean()}') optimizer.step() optimizer.zero_grad() ``` Curious to hear how your training is going! Best. Taras

fix in implementation of S-DTW backward

keonlee9420 · 2021-11-19T01:41:38Z

Hi @taras-sereda , thank you very much for your effort! I think what you claimed seems worth considering, and I'm training the model with your update, but unfortunately, it shows no evidence on convergence so far (it lasts about 9 hours).
So the reason for G is from the derivation of backward following the original Soft DTW paper(please refer to Algorithm2), applying on the version introduced in the parallel tacotron 2 paper(please refer to section 4.2.). G is already expected to utilize calculated E where each coef a, b, c is involved. But it was a bit ago so let me double-check on it.

taras-sereda added 2 commits November 18, 2021 17:40

Merge pull request #1 from taras-sereda/taras-sereda-patch-1

a546129

fix in implementation of S-DTW backward

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix in implementation of S-DTW backward @taras-sereda #15

fix in implementation of S-DTW backward @taras-sereda #15

taras-sereda commented Nov 18, 2021

keonlee9420 commented Nov 19, 2021

fix in implementation of S-DTW backward @taras-sereda #15

Are you sure you want to change the base?

fix in implementation of S-DTW backward @taras-sereda #15

Conversation

taras-sereda commented Nov 18, 2021

keonlee9420 commented Nov 19, 2021