You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for sharing the amazing speech codec. Since the Encodec and SoundStream utilized the RVQ to quantize the latent representations, I'm worried about the stability of (R)VQ.
I evaluated the Lyra2 at 9.2 kbps and the Encodec at 12 kbps with high-quality data and found there exists irregular harmonics (an example). I guess it is caused by the VQ process, do you have any view about this?
The text was updated successfully, but these errors were encountered:
You cannot expect artifact free outputs at any bandwidth level with either Lyra or Encodec. This is not just due to the RVQ part, but also to the way the model is trained with adversarial losses. While adversarial losses remove some artifacts, some other remains and it is unclear how to further improve things to the level of perfection. Distortion is to be expected, however for extreme low bitrates (less than 12kbps), the distortions from Lyra or Encodec will be much lower than traditional codecs like Opus.
❓ Questions
Thanks for sharing the amazing speech codec. Since the Encodec and SoundStream utilized the RVQ to quantize the latent representations, I'm worried about the stability of (R)VQ.
I evaluated the Lyra2 at 9.2 kbps and the Encodec at 12 kbps with high-quality data and found there exists irregular harmonics (an example). I guess it is caused by the VQ process, do you have any view about this?
The text was updated successfully, but these errors were encountered: