Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Male voice #11

Closed
loretoparisi opened this issue Jul 28, 2020 · 7 comments
Closed

Male voice #11

loretoparisi opened this issue Jul 28, 2020 · 7 comments

Comments

@loretoparisi
Copy link

First thank you, I have solved the issue opened thanks to you support. In my understanding both Melgan (that I have tried) and Waveglow (not run yet) have a female voice. To have a male voice is it necessary to train from scratch the model? Or add support to a specific vocoder?

Thank you.

@ming024
Copy link
Owner

ming024 commented Jul 29, 2020

@loretoparisi I think that Universal Vocoding meets your requirement. I have a plan to support multi-speaker TTS, but it is not my top priority now. You are welcomed to fork this repository for a male-TTS implementation. I think both the f0_min for PyWorld Vocoder and some parameter related to preprocessing have to be modified for male speakers.

@shoegazerstella
Copy link

shoegazerstella commented Jul 29, 2020

Hi @ming024,
I am trying what you suggested, to use Universal Vocoder for synthesizing the output of FastSpeech2.

The output of FS2 has size:
torch.Size([1, 80, x])

while UV wants something like this as input:
torch.Size([1, x, 80])

I tryed by swapping the axis but of course didn't work, what I got was just noise or silence.
So this confirms what you suggested, that maybe it is worth retraining UV with the same parameters as FS2.

Another thing I tried was playing with the UV params, using the mel_spec computed from the generated wav file by FS2. I got some interesting changes in pitch, but nothing I can really use for my purpose of changing the speaker voice.

If you have any other advice I could use please let me know, thanks a lot!

[EDIT]
Also would you have some thoughts on what approach best fits with FastSpeech between voice cloning and voice conversion? to be integrated or used as a post processing step.

@ming024
Copy link
Owner

ming024 commented Jul 30, 2020

@shoegazerstella Actually I haven't tried Universal Vocoding before so I am not sure where the error come from.

I think the decoder of FastSpeech is similar to the decoder of a voice conversion model. Some VC models use vector quantization or other tricks to learn a discrete embedding space, and find out that phonetic information is contained therein. Maybe there is some way to combine the training of both tasks, no matter jointly training or pretraining, etc.

@carankt
Copy link

carankt commented Dec 1, 2020

@ming024 Any idea what specific changes one needs to make for Male voice cloning?

@joseluismoreira
Copy link

@carankt any updates about male voice or how to do it, please?

@ming024
Copy link
Owner

ming024 commented Feb 26, 2021

Guys multi-speaker synthesis is supported now.

@ming024
Copy link
Owner

ming024 commented May 26, 2021

closed #11

@ming024 ming024 closed this as completed May 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants