Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quality problems with fine-tuning the musicgen model #447

Open
MajaSoure opened this issue Apr 12, 2024 · 1 comment
Open

Quality problems with fine-tuning the musicgen model #447

MajaSoure opened this issue Apr 12, 2024 · 1 comment

Comments

@MajaSoure
Copy link

MajaSoure commented Apr 12, 2024

I would like to perform finetuning on a small musicgen model. I have a dataset consisting of different short sounds. Actually, it's not big. But I successfully trained on the melodies of one instrument, where the dataset size was extremely small, only 3 hours, and I got interesting results. However, with sounds everything is different. In 80% of cases or even more often, I encounter the fact that after the main attack sound I have a crackling sound in the generated audio track.

I've tried various ways to optimize training, but so far nothing obvious has helped, such as reducing the learning rate or dropout. Also, my logs look very suspicious from the very beginning of the training:
Train Summary | Epoch 1 | lr=1.00E+00 | grad_norm=INF | grad_scale=45645.824 | ce=0.962 | ppl=2.650 | duration=2472.758

It's also interesting that when I enter a word with a small letter and a word with a capital letter in the prompt, I get different results. In this case, everything depends on the word. In one case, the result is as expected, but in the other there is a complete bunch of random sounds, as if the model had not been trained. (By the way, I checked the original models and during generation the situation with uppercase and lowercase letters for the same word is similar.) In fact, I'd be very interested to know more about how merging text tags that are packaged in json format for each sample works. I'm new to learning your model. Thanks in advance for your answer and help!

@yocontra
Copy link

These parameters might be useful to look at within the musicgen codebase, if you want to understand how it goes from json -> text:

dataset.train.merge_text_p
dataset.train.drop_desc_p
dataset.train.drop_other_p
conditioners.description.t5.word_dropout

Check out this project for a working example on fine tuning: https://github.com/sakemin/cog-musicgen-fine-tuner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants