Training HiFi-GAN with features other than Mel spectrograms #2209

m-pana · 2023-10-16T09:43:58Z

m-pana
Oct 16, 2023

Hi all,

I would like to train the HiFi-GAN model on features that are not based on spectrograms (namely, deep features extracted from an encoder network). I have already extracted them offline, before training.

The training dataset happens to be LibriTTS (train-clean-100 partition), so this recipe seems very fitting for me. However, the recipe uses Mel spectrograms by default. What would be the best way to change the recipe so that, instead of computing Mel spectrograms from the audio, it uses a different kind of features?

I have managed to figure out that this is where the mel computation happens, so I guess I could just replace that with my own code to perform feature extraction on the fly.
However, it would be best to simply load some pre-extracted feature matrix. How can I do that?

Thank you.

BenoitWang · 2023-10-16T16:57:12Z

BenoitWang
Oct 16, 2023
Collaborator

Hi @m-pana, here's some steps you need to do:

go to libritts_prepare.py, check the create_json function and then add your pre-extracted feature path (for each audio) into the item, like:

json_dict[uttid] = {
        "wav": relative_path,
        "spk_id": spk_id,
        "label": original_text,
        "segment": True if "train" in json_file else False,
        "fea_path": parse your feature path for each audio,
    }

go to the dataio_prepare function in train.py, load your feature like:

@sb.utils.data_pipeline.takes("wav", "segment", "fea_path")
@sb.utils.data_pipeline.provides("fea", "sig")
def audio_pipeline(wav, segment, fea_path):
    fea = np.load(fea_path) # read feature from your feature path

Then you can start training with your own feature.

Are you using encodec? Feel free to keep us updated on the performance compared with mel-spectrograms if you manage to train a model, thanks.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training HiFi-GAN with features other than Mel spectrograms #2209

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Training HiFi-GAN with features other than Mel spectrograms #2209

m-pana Oct 16, 2023

Replies: 1 comment

BenoitWang Oct 16, 2023 Collaborator

m-pana
Oct 16, 2023

BenoitWang
Oct 16, 2023
Collaborator