Skip to content

Inference

Marco BARNIG edited this page Jul 3, 2024 · 31 revisions

1. Inference Command

The command to synthesize a text with the best-model or with a checkpoint is the following :
tts --text "bla bla bla" --model_path your-path/model_checkpoint.pth --config_path your-path/config.json --out_path your-path/audio-name.wav
Here is an example of my command to generate an audio file of the luxembourgish Epos "De Nordwand an d'Sonn".

tts --text "an der zäit hunn sech den nordwand an d'sonn gestridden, wie vun hinnen zwee wuel méi staark wier, wéi e wanderer, deen an ee waarme mantel agepak war, iwwert de wee koum. si goufen sech eens, datt deejéinege fir dee stäerkste gëlle sollt, deen de wanderer forcéiere géif, säi mantel auszedoen. dunn huet d'sonn d'loft mat hire frëndleche strale gewiermt, a schonn no kuerzer zäit huet de wanderer säi mantel ausgedoen.  do huet den nordwand missen zouginn, datt d'sonn vun hinnen zwee dee stäerkste wier." --model_path inference-female/checkpoint_10000.pth --config_path inference-female/config.json --out_path output-female/female-speech-10000-nordwand.wav

2. Male Speech Examples

You can listen hereafter to the speech examples of audio files generated with the the TTS-LOD-model of Max Kuborn for five different checkpoints :

kuborn-speech-10000-nordwand.mp4
kuborn-speech-20000-nordwand.mp4
kuborn-speech-30000-nordwand.mp4
kuborn-speech-40000-nordwand.mp4
kuborn-speech-50000-nordwand.mp4

The quality of the male LOD voice is lower than expected. One reason is the presence of female voices labelled as M in the dataset of Max Kuborn.

I retrained the male model with a clean dataset. Here is the result for the checkpoint 30.000.

new-male-speech-30000-nordwand.mp4
male-speech-40000-nordwand.mp4
male-speech-50000-nordwand.mp4
male-speech-53442-nordwand.mp4

3. Female Speech Examples

You can listen hereafter to the speech examples of audio files generated with the the female TTS-LOD-model :

female-speech-10000-nordwand.mp4
female-speech-20000-nordwand.mp4
female-speech-30000-nordwand.mp4
female-speech-40000-nordwand.mp4
Clone this wiki locally