Audio codec used for training in the original paper - very low bandwidth/quality?

First of all, great project!

One question though: in the original paper, you mentioned using a four quantizer Encodec for MusicGen training, with a pretty large stride (50 Hz). This will produce a pretty low quality output (and monophonic, and 32 kHz-only).
Have you done any ablation studies with trying larger bandwidths? For instance, in the Encodec paper, you've trained a stereo 48kHz 24kbit/s model. What were the issues with using this in MusicGen?

@adefossez hopefully you can shed some light here. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audio codec used for training in the original paper - very low bandwidth/quality? #282

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Audio codec used for training in the original paper - very low bandwidth/quality? #282

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions