Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please make a simple gradio app that supports text to speech, 0 shot voice cloning, and true training for voice cloning #2

Open
FurkanGozukara opened this issue Feb 6, 2024 · 12 comments
Labels
feature request New feature or request

Comments

@FurkanGozukara
Copy link

It shouldn't be hard for you. It can be ugly looking and bad coded, just works is sufficient

@vatsalaggarwal
Copy link
Member

vatsalaggarwal commented Feb 6, 2024 via email

@FurkanGozukara
Copy link
Author

FurkanGozukara commented Feb 6, 2024

Have you tried ttsdemo.themetavoice.xyz ?

On Tue, Feb 6, 2024 at 10:28 PM Furkan Gözükara @.> wrote: It shouldn't be hard for you. It can be ugly looking and bad coded, just works is sufficient — Reply to this email directly, view it on GitHub <#2>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABPTUD47ETLA7HEUAYZRNKDYSKVBVAVCNFSM6AAAAABC4ZE75WVHI2DSMVQWIX3LMV43ASLTON2WKOZSGEZDCNZXGM4TEMQ . You are receiving this because you are subscribed to this thread.Message ID: @.>

looking nice but i need source code to run locally

by the way 0 shot is bad as i expected

@sidroopdaska
Copy link
Member

by the way 0 shot is bad as i expected

Can you share more details - what text(s) did you try, and what voice did you use (preset / custom upload) ?

looking nice but i need source code to run locally
You can run locally by doing the following:

  1. setup env, outlined here
  2. run script for local execution, outline here

@FurkanGozukara
Copy link
Author

by the way 0 shot is bad as i expected

Can you share more details - what text(s) did you try, and what voice did you use (preset / custom upload) ?

looking nice but i need source code to run locally
You can run locally by doing the following:

  1. setup env, outlined here
  2. run script for local execution, outline here

hello where is gradio?

i gave this 5 min reference file

http://sndup.net/p2ct

I got much better results with coqui voice cloning

also this is the file it generated with that 5 min reference file

https://sndup.net/r99n/

,I hate that we still cant attach .wav files into github replies

@INF800
Copy link

INF800 commented Feb 7, 2024

Hi @sidroopdaska, thanks for the amazing project.

I tired zero shot voice cloning with my Indian accent and I could not get the accent right as it sounded more foreign.

Can you please tell more about how to get it right for Indian accents?

Thanks,
Rakesh

@sidroopdaska
Copy link
Member

Hey @INF800, we presently support zero shot voice cloning for American & British speakers only.
For an indian accent, you will need to finetune. I would recommend 1-5 mins of your voice + LoRA.
Let us know if you need any help on getting started with this implementation

@sidroopdaska
Copy link
Member

@FurkanGozukara

gradio

https://ttsdemo.themetavoice.xyz/
reference implementation: https://github.com/metavoiceio/metavoice-src/tree/main/fam/ui

could you share the result with xTTS so I can compare?

what do you find lacking in the speech with MetaVoice?

@INF800
Copy link

INF800 commented Feb 10, 2024

Hey @INF800, we presently support zero shot voice cloning for American & British speakers only. For an indian accent, you will need to finetune. I would recommend 1-5 mins of your voice + LoRA. Let us know if you need any help on getting started with this implementation.

Definitely yes! If you can tell me how to get started it would be helpful.

@platform-kit
Copy link

@sidroopdaska I'd love to train a LORA as well. Please share any relevant pointers on how to get started.

@paliacci
Copy link

paliacci commented Feb 19, 2024

@sidroopdaska I'd love to train a LORA as well. Can't wait to integrate it into our projects. How can I get more help? My email: paliacci@aliyun.com

@vatsalaggarwal
Copy link
Member

I've added some initial pointers to this here: #70 (comment)

@vatsalaggarwal vatsalaggarwal added the feature request New feature or request label Mar 12, 2024
@lucapericlp
Copy link
Contributor

lucapericlp commented Mar 14, 2024

Hey @platform-kit / @paliacci /@INF800, we just published an initial approach for finetuning the last N transformer blocks of the first stage LLM. Just a note that it'd be best to play around with the hyperparams in finetune_params.py as we didn't determine optimal params (some people from the community were keen to contribute this portion). Let us know if you have any issues or if you're up for contributing to improving the finetuning (via param sweep or otherwise)!

Next step to improve finetuning effectiveness is to have LoRA adapters for the first stage LLM which is being worked on here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

7 participants