About the stability of the VQ based approach for codec #7

syang1993 · 2022-11-01T11:46:59Z

❓ Questions

Thanks for sharing the amazing speech codec. Since the Encodec and SoundStream utilized the RVQ to quantize the latent representations, I'm worried about the stability of (R)VQ.

I evaluated the Lyra2 at 9.2 kbps and the Encodec at 12 kbps with high-quality data and found there exists irregular harmonics (an example). I guess it is caused by the VQ process, do you have any view about this?

adefossez · 2022-11-17T16:32:52Z

You cannot expect artifact free outputs at any bandwidth level with either Lyra or Encodec. This is not just due to the RVQ part, but also to the way the model is trained with adversarial losses. While adversarial losses remove some artifacts, some other remains and it is unclear how to further improve things to the level of perfection. Distortion is to be expected, however for extreme low bitrates (less than 12kbps), the distortions from Lyra or Encodec will be much lower than traditional codecs like Opus.

syang1993 added the question Further information is requested label Nov 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the stability of the VQ based approach for codec #7

About the stability of the VQ based approach for codec #7

syang1993 commented Nov 1, 2022

adefossez commented Nov 17, 2022

About the stability of the VQ based approach for codec #7

About the stability of the VQ based approach for codec #7

Comments

syang1993 commented Nov 1, 2022

❓ Questions

adefossez commented Nov 17, 2022