Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the stability of the VQ based approach for codec #7

Open
syang1993 opened this issue Nov 1, 2022 · 1 comment
Open

About the stability of the VQ based approach for codec #7

syang1993 opened this issue Nov 1, 2022 · 1 comment
Labels
question Further information is requested

Comments

@syang1993
Copy link

❓ Questions

Thanks for sharing the amazing speech codec. Since the Encodec and SoundStream utilized the RVQ to quantize the latent representations, I'm worried about the stability of (R)VQ.

I evaluated the Lyra2 at 9.2 kbps and the Encodec at 12 kbps with high-quality data and found there exists irregular harmonics (an example). I guess it is caused by the VQ process, do you have any view about this?

@syang1993 syang1993 added the question Further information is requested label Nov 1, 2022
@adefossez
Copy link
Contributor

You cannot expect artifact free outputs at any bandwidth level with either Lyra or Encodec. This is not just due to the RVQ part, but also to the way the model is trained with adversarial losses. While adversarial losses remove some artifacts, some other remains and it is unclear how to further improve things to the level of perfection. Distortion is to be expected, however for extreme low bitrates (less than 12kbps), the distortions from Lyra or Encodec will be much lower than traditional codecs like Opus.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants