Replies: 1 comment
-
Hi @m-pana, here's some steps you need to do:
json_dict[uttid] = {
"wav": relative_path,
"spk_id": spk_id,
"label": original_text,
"segment": True if "train" in json_file else False,
"fea_path": parse your feature path for each audio,
}
@sb.utils.data_pipeline.takes("wav", "segment", "fea_path")
@sb.utils.data_pipeline.provides("fea", "sig")
def audio_pipeline(wav, segment, fea_path):
fea = np.load(fea_path) # read feature from your feature path Then you can start training with your own feature. Are you using encodec? Feel free to keep us updated on the performance compared with mel-spectrograms if you manage to train a model, thanks. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi all,
I would like to train the HiFi-GAN model on features that are not based on spectrograms (namely, deep features extracted from an encoder network). I have already extracted them offline, before training.
The training dataset happens to be LibriTTS (train-clean-100 partition), so this recipe seems very fitting for me. However, the recipe uses Mel spectrograms by default. What would be the best way to change the recipe so that, instead of computing Mel spectrograms from the audio, it uses a different kind of features?
I have managed to figure out that this is where the mel computation happens, so I guess I could just replace that with my own code to perform feature extraction on the fly.
However, it would be best to simply load some pre-extracted feature matrix. How can I do that?
Thank you.
Beta Was this translation helpful? Give feedback.
All reactions