Required amount of data and iterations to train the model #12

Alexey322 · 2022-09-06T11:12:45Z

Hi, I'm training your model from scratch on 60 votes, each with 3-15 minutes of data. Surprisingly, the model starts to retrain already at 26k iterations with batch 12, given that the total duration of all audio files is about 7-8 hours. Unfortunately, I got unsatisfactory results, the speech of many speakers is completely illegible. I attach screenshots of the decoder training.

szprytny · 2022-09-06T19:31:26Z

Hi @Alexey322 ,
I did train from scratch for Polish language - about 14 hours dataset in total, about 9 hours of that is one speaker, other speakers' durations vary much.

I can tell you, that looking at your tensorboard and compairing to mine, I see higher loss_ctc - about 1.8 vs mine 1.3,
binarization_loss values - > 0.4, for me it was between 0.25-0.35

train/mel_loss was going toward -2.0 reaching it around 200k step, at 60k step it was around -1.7,
For val/mel_loss I had peak near 30k being -1.52 then at 200k step it was -0.75

Alexey322 · 2022-09-07T09:39:45Z

Thank you for sharing the results, @szprytny . Why did you try to overfit the model and what synthesis results did you get before and after overfitting?

szprytny · 2022-09-07T18:44:12Z

I cannot answer regarding synthesis on not overfitted model, because I used that 600k checkpoint for training second step of RADTTS++ model.
I can only say, that some of the speakers are quite biased comparing to training samples, but still for most of them you could recognize who is who :D

What is important - pronunciation is very good, there is no problem with understanding of spoken sentences, even very long ones "tongue twisters".
e.g. w gąszczu.zip

Tensorboard screenshot is from step 1 - training decoder with config_ljs_decoder.json
Then I used in 2nd step config_ljs_dap.json to get model for synthesis.

unilight · 2023-03-19T15:25:46Z

Hi @szprytny, thank you for the insights! Just wondering that in your experience, what would be a sufficient amount of training steps? It's not described in the original paper, and as I am still doing initial experiments with LJSpeech, the config (https://github.com/NVIDIA/radtts/blob/main/configs/config_ljs_decoder.json) sets the total number of epochs to be 10,000,000, which seems to be way too much.

szprytny · 2023-03-28T14:00:55Z

Hi @szprytny, thank you for the insights! Just wondering that in your experience, what would be a sufficient amount of training steps? It's not described in the original paper, and as I am still doing initial experiments with LJSpeech, the config (https://github.com/NVIDIA/radtts/blob/main/configs/config_ljs_decoder.json) sets the total number of epochs to be 10,000,000, which seems to be way too much.

That probably depends on dataset very much, but I can tell, that model is producing intelligible utterances pretty quickly for me, about 30k steps with 8 samples per batch.

I don't train model with pitch and energy conditioning anymore. I noticed that for my multispeaker data results are much worse than basic RADTTS model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Required amount of data and iterations to train the model #12

Required amount of data and iterations to train the model #12

Alexey322 commented Sep 6, 2022 •

edited

Loading

szprytny commented Sep 6, 2022 •

edited

Loading

Alexey322 commented Sep 7, 2022

szprytny commented Sep 7, 2022 •

edited

Loading

unilight commented Mar 19, 2023

szprytny commented Mar 28, 2023

Required amount of data and iterations to train the model #12

Required amount of data and iterations to train the model #12

Comments

Alexey322 commented Sep 6, 2022 • edited Loading

szprytny commented Sep 6, 2022 • edited Loading

Alexey322 commented Sep 7, 2022

szprytny commented Sep 7, 2022 • edited Loading

unilight commented Mar 19, 2023

szprytny commented Mar 28, 2023

Alexey322 commented Sep 6, 2022 •

edited

Loading

szprytny commented Sep 6, 2022 •

edited

Loading

szprytny commented Sep 7, 2022 •

edited

Loading