Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Base model for zero shot speech generation #263

Open
cjohn001 opened this issue Jun 6, 2024 · 1 comment
Open

Base model for zero shot speech generation #263

cjohn001 opened this issue Jun 6, 2024 · 1 comment

Comments

@cjohn001
Copy link

cjohn001 commented Jun 6, 2024

Hello together,
I am currently trying to use OpenVoice for German language generation. I have not been able to figure out how this zero shot speech synthesis shall work. Is there some kind of multilanguage base model missing? When I use one of the language dependent base models things sound weird.

It would also be interesting if someone could explain how the different emotions/speech styles can be controlled. The documentation of the API could benefit from some more examples.

@Vicopem01
Copy link

the text to speech synthesis in v1 is powered with openAI TTS system, the v2 is via MeloTTS. the v2 sounds more improved from my experience.

on first run, the models will be loaded automatically to your system and OpenVoice performs tone color conversion on the synthesized audio. here is the demo set up

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants