Base model for zero shot speech generation #263

cjohn001 · 2024-06-06T00:04:46Z

Hello together,
I am currently trying to use OpenVoice for German language generation. I have not been able to figure out how this zero shot speech synthesis shall work. Is there some kind of multilanguage base model missing? When I use one of the language dependent base models things sound weird.

It would also be interesting if someone could explain how the different emotions/speech styles can be controlled. The documentation of the API could benefit from some more examples.

Vicopem01 · 2024-07-05T01:22:17Z

the text to speech synthesis in v1 is powered with openAI TTS system, the v2 is via MeloTTS. the v2 sounds more improved from my experience.

on first run, the models will be loaded automatically to your system and OpenVoice performs tone color conversion on the synthesized audio. here is the demo set up

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Base model for zero shot speech generation #263

Base model for zero shot speech generation #263

cjohn001 commented Jun 6, 2024

Vicopem01 commented Jul 5, 2024

Base model for zero shot speech generation #263

Base model for zero shot speech generation #263

Comments

cjohn001 commented Jun 6, 2024

Vicopem01 commented Jul 5, 2024