Zero grad in residual vq #25

npuichigo · 2022-12-26T11:29:09Z

🐛 Bug Report

Zero grad in second residual vq as mentioned here (lucidrains/vector-quantize-pytorch#33)

encodec/encodec/quantization/core_vq.py

Line 336 in 1943298

residual = residual - quantized

The fix link is lucidrains/vector-quantize-pytorch@ecf2f7c

The text was updated successfully, but these errors were encountered:

npuichigo · 2023-01-06T06:30:47Z

@adefossez

adefossez · 2023-01-11T14:04:38Z

Thanks for bringing that out!

It seem like this won't impact the Straight-Through-Estimator gradient for the Encoder, but will kill the commitment loss for all residual VQ but the first one right ?

npuichigo · 2023-01-11T14:40:33Z

It seems so. But I'm not sure how much it affects the final result.

adefossez · 2023-01-24T12:59:11Z

I'm a bit reluctant on introducing a change we haven't tested in this codebase, as it could change the best hyper params etc. I can add a warning however if the model is used in training mode pointing to this issue.

See issue #25

cantabile-kwok · 2023-04-04T13:15:04Z

@adefossez @npuichigo Could you please point out into more detail why "this won't impact the Straight-Through-Estimator gradient for the Encoder"? I think if the residual is computed in a sense that doesn't pass its real gradients, then the gradient estimator may also be affected. The following code snippet may illustrate this:

import torch
def quantize(x, codebook):
    diff = codebook - x  # (n_code, dim)
    mse = (diff**2).sum(1)
    idx = torch.argmin(mse)
    return codebook[idx]

dim = 5
x = torch.randn(1, dim, requires_grad=True)
codebook1 = torch.randn(10, dim)
codebook2 = torch.randn(10, dim)

q1 = quantize(x, codebook1)  # quantize x with first codebook
q1 = x + (q1 - x).detach()  # transplant q1's gradient to x
residual = x - q1  # detach q1 or not may make a difference. Compute residual for next level quantizing
q2 = quantize(residual, codebook2)  # quantize residual with second codebook
q2 = residual + (q2 - residual).detach()  # transplant q2's gradient to residual

loss = 0*q1.sum() + 1*q2.sum()  # loss is a function of q1 and q2, now it is independent of q1.
loss.backward()
print(x.grad)

The printed gradient is all zero, but if we replace residual = x - q1 with residual = x - q1.detach(), the gradient will be non-zero.

adefossez · 2023-04-04T13:54:31Z

why did you put 0 * q1.sum() ? that is what is breaking the STE gradient. With the current code d q1 / d x = Id and d q_i d / x = 0 for all i > 1, which is okay as the overall gradient d (sum q_i) / d x = Id which is what we want. The only thing that is impacted in the commitment loss.

cantabile-kwok · 2023-04-04T14:08:42Z

Oh, I think I over-complicated the problem here. In the model, all the quantization outputs q_i are simply added to feed the decoder, so the relation d (sum q_i) / d x = Id helps making this STE still working. In my code snippet, I assume the loss function can be any arbitrary function of argument q1 and q2. In this case, the gradient from q2 will never impact the previous networks, thus may not be good.

Still, if we replace residual = x - q1 with residual = x - q1.detach(), it seems d (sum q_i) / d x = n*Id then. Thus the scale of the losses may be affected. Thanks for the clarification!

DingWeiPeng · 2024-01-09T11:25:42Z

@adefossez @cantabile-kwok

If residual = residual - quantized , then the second codebook can update with gradient but it can not afffect the first codebook.
If residual = residual - quantized.detach(), then the second codebook's gradient will affect the fisrt codebook.

In core_vq.py, there is the following code in VectorQuantization Class :

Now there is the following code in the ResidualVectorQuantization Class

So, this problem equals to the following problem. The following code snippet may illustrate this:

'''
import torch
def quantize(x, codebook):
diff = codebook - x # (n_code, dim)
mse = (diff**2).sum(1)
idx = torch.argmin(mse)
return codebook[idx]

dim = 5
x = torch.randn(1, dim, requires_grad=True)
codebook1 = torch.randn(10, dim)
codebook2 = torch.randn(10, dim)

q1 = quantize(x, codebook1) # quantize x with first codebook
q1 = x + (q1 - x).detach() # transplant q1's gradient to x
residual = x - q1.detach() # detach q1 or not may make a difference. Compute residual for next level quantizing
q2 = quantize(residual, codebook2) # quantize residual with second codebook
q2 = residual + (q2 - residual).detach() # transplant q2's gradient to residual

loss = 1*q2.sum() # loss is a function of q1 and q2, now it is independent of q1.
loss.backward()
print(x.grad)
'''

if residual = x-q1, x.grad = 0,
if residul = x-q1.detach(), x.grad = tensor([[1., 1., 1., 1., 1.]])

See issue facebookresearch#25

npuichigo added the bug Something isn't working label Dec 26, 2022

adefossez added a commit that referenced this issue Jan 24, 2023

Add warning when using RVQ for training

c79ba28

See issue #25

hackyon mentioned this issue Jul 5, 2023

Add forward methods to quantizer that also computes commitment loss huggingface/transformers#24593

Closed

5 tasks

Dinglet pushed a commit to Dinglet/encodec that referenced this issue Jan 25, 2024

Add warning when using RVQ for training

e5eadf9

See issue facebookresearch#25

yuzuda283 mentioned this issue Jun 11, 2024

zero grad issus in encodec? ZhangXInFD/SpeechTokenizer#10

Open

thatsvenyouknow pushed a commit to thatsvenyouknow/neuro-encodec that referenced this issue Jun 20, 2024

Add warning when using RVQ for training

15f0460

See issue facebookresearch#25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zero grad in residual vq #25

Zero grad in residual vq #25

npuichigo commented Dec 26, 2022

npuichigo commented Jan 6, 2023

adefossez commented Jan 11, 2023

npuichigo commented Jan 11, 2023

adefossez commented Jan 24, 2023

cantabile-kwok commented Apr 4, 2023 •

edited

Loading

adefossez commented Apr 4, 2023

cantabile-kwok commented Apr 4, 2023 •

edited

Loading

DingWeiPeng commented Jan 9, 2024 •

edited

Loading

Zero grad in residual vq #25

Zero grad in residual vq #25

Comments

npuichigo commented Dec 26, 2022

🐛 Bug Report

npuichigo commented Jan 6, 2023

adefossez commented Jan 11, 2023

npuichigo commented Jan 11, 2023

adefossez commented Jan 24, 2023

cantabile-kwok commented Apr 4, 2023 • edited Loading

adefossez commented Apr 4, 2023

cantabile-kwok commented Apr 4, 2023 • edited Loading

DingWeiPeng commented Jan 9, 2024 • edited Loading

cantabile-kwok commented Apr 4, 2023 •

edited

Loading

cantabile-kwok commented Apr 4, 2023 •

edited

Loading

DingWeiPeng commented Jan 9, 2024 •

edited

Loading