Configuration for training with CLAP embeddings #441

jbm-composer · 2024-03-26T04:42:22Z

I'm wondering if anyone has any configuration info they could share on training with CLAP embeddings?
I want to try the laion/larger_clap_music model from Huggingface, but it's really unclear to me how the project is supposed to be configured.

Any help greatly appreciated.

The text was updated successfully, but these errors were encountered:

jbm-composer · 2024-03-26T17:53:51Z

Just adding a bit more info, I managed to at least get to an attempt to load larger_clap_music using this config:

conditioners:
  description:
    model: clap
    clap: # based on
      checkpoint: //reference/clap/larger_clap_music/pytorch_model.bin
      name: laion/larger_clap_music
      model_arch: 'HTSAT-base'
      enable_fusion: false
      sample_rate: 32000
      max_audio_length: 10
      audio_stride: 1
      dim: 512
      attribute: description
      normalize: true
      quantize: true  # use RVQ quantization
      n_q: 12
      bins: 1024
      kmeans_iters: 50
      text_p: 0.  # probability of using text embed at train time
      cache_path: null

But loading the state_dict fails with a laundry list of "Unexpected key(s)"

I also tried just pointing it to the folder (it complained that it was not a file) and the config.json inside the HF download (which gave some kind of parse error).

jbm-composer · 2024-03-27T01:10:13Z

Okay, I can load larger_clap_music using the ClapModel (and ClapProcessor) from Huggingface, but not in Audiocraft. I see that Audiocraft is based on CLAP from the Laion repo... Does anybody know if there's a way to load the HF weights into the Laion model? Or has anybody hacked the HF ClapModel into Audiocraft, by any chance?

jbm-composer · 2024-03-27T23:38:06Z

I worked out a way around loading the HF weights. Now what I'm wondering about is how to configure a text prompt for running test generations during training. My goal is to test the performance of training on CLAP audio embeddings and using text embeddings for inference.

Any help greatly appreciated.

yukara-ikemiya · 2024-03-28T01:53:18Z

In audiocraft, 'test generation' during training is little bit tricky, which is done at the following code part.

audiocraft/audiocraft/solvers/musicgen.py

Line 493 in 69fea8b

def generate_audio(self) -> dict:

We may have to prepare a dataset for generation as same as training data.
You may know we can add metadata with .json file to each audio data as shown in the example here.
https://github.com/facebookresearch/audiocraft/tree/main/dataset/example

If you don't need to do 'continuation generation' during training, dummy audio should be enough.
In this case, you may have to do

Prepare dummy audio and metafile for test generation
Add descriptions you want to use for test generation to metafile (.json) with "description" tag.

jbm-composer · 2024-03-28T02:20:10Z

Thanks so much for the reply!

Digging around the solver code (as you pointed out) it did seem like the joint embedding might want a prompt, so I did add some super simple metadata files. I haven't run it to the point of a test generation yet, but hopefully it works as expected. I haven't added any dummy audio at this point, but I think in the past it's just used the audio from a data.... (I think...??)

Another "gotcha" that wasn't obvious to me at first is that dataset.valid.num_samples has to be >= the number of GPUs on the system. Makes sense, of course, but I crashed a few times before figuring it out.

jbm-composer · 2024-03-28T02:30:39Z

Actually though... what determines when it will generate a sample output? I can see it running through train and valid steps, and it's saving checkpoints, but I don't seem to be getting audio. I also want audio sent to wandb, ideally... I do have wandb: with_media_logging: true set

yukara-ikemiya · 2024-03-28T15:27:36Z

It seems that 'test generation' runs at end of every epoch as same as evaluation, which is defined in the BaseSolver class (base class of every solver class).

audiocraft/audiocraft/solvers/base.py

Line 466 in 69fea8b

def run_epoch(self):

As shown in the method, you can firstly check if your run go through self.should_run_stage('generate') statement. If not, it means 'test generation' is rejected here, so you can find which configuration causes the rejection.

And then, finally audio saving is done in the above mentioned method generate_audio after generating audio samples at the following line:

audiocraft/audiocraft/solvers/musicgen.py

Line 562 in 69fea8b

sample_manager.add_samples(

jbm-composer · 2024-03-28T15:37:50Z

Yes, I saw from another Issue/comment that the "every" in the "generate" config refers to epochs, not steps. I had it set to 1000, thinking it meant step, so I would have been waiting a while... heh...
It's not always super clear when we're using steps (or "updates") and when we're using epochs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configuration for training with CLAP embeddings #441

Configuration for training with CLAP embeddings #441

jbm-composer commented Mar 26, 2024

jbm-composer commented Mar 26, 2024 •

edited

jbm-composer commented Mar 27, 2024

jbm-composer commented Mar 27, 2024

yukara-ikemiya commented Mar 28, 2024 •

edited

jbm-composer commented Mar 28, 2024

jbm-composer commented Mar 28, 2024

yukara-ikemiya commented Mar 28, 2024

jbm-composer commented Mar 28, 2024

Configuration for training with CLAP embeddings #441

Configuration for training with CLAP embeddings #441

Comments

jbm-composer commented Mar 26, 2024

jbm-composer commented Mar 26, 2024 • edited

jbm-composer commented Mar 27, 2024

jbm-composer commented Mar 27, 2024

yukara-ikemiya commented Mar 28, 2024 • edited

jbm-composer commented Mar 28, 2024

jbm-composer commented Mar 28, 2024

yukara-ikemiya commented Mar 28, 2024

jbm-composer commented Mar 28, 2024

jbm-composer commented Mar 26, 2024 •

edited

yukara-ikemiya commented Mar 28, 2024 •

edited