Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synthesis with other person out of RAVDESS #11

Open
hathubkhn opened this issue Aug 22, 2022 · 6 comments
Open

Synthesis with other person out of RAVDESS #11

hathubkhn opened this issue Aug 22, 2022 · 6 comments

Comments

@hathubkhn
Copy link

Hello author,
Firstly, thank you for giving this repo, it is really nice.
I have a question that:

  1. I download CMU data with single person with 100 audios and make speaker embedding vector and synthesis with this, the performance is not good. I cannot detect any words.
  2. Should we need to fine-tuning deep-speaker model to generate speaker embedding with my data.

Thank you

@keonlee9420
Copy link
Owner

Hi @hathubkhn , thanks for your attention.

  1. There could be a various reason for such case. Could you please share the tensorboard logs and some samples audio with mel-spectrogram?
  2. It might be, but it depends on the number of speakers and their features.

@hathubkhn
Copy link
Author

Hi,

  1. When waiting for your response, I try to finetune in LJSPEECH data and I can synthesize the sentence but it is not high quality. I will attach my Mel-spectrogram below and please help me to find out how to improve
  2. I want to use your repo to make voice cloning, I am not sure it cannot, so that based on yourTTS I make another loss for speaker similarity. And training from scratch. Is it possible?

@hathubkhn
Copy link
Author

kids_are_sitting_on_the_door_and_today_is_very_nice_I_want_to_go_out_Actor_001_neutral

@hathubkhn
Copy link
Author

Here is my training
Screenshot 2022-09-02 at 9 09 47 AM
from scratch when adding speaker loss (SCL-speaker consistance loss) and training with LJSPEECH

@keonlee9420
Copy link
Owner

Ah, so sorry for the late response. I thought I replied to your comments.

  1. It might be due to the light weight conv. Replacing it with normal transformer block will resolve the quality issue.
  2. Yes, if the lambda (weight of each loss) is carefully assigned by some experiments.

@rsandx
Copy link

rsandx commented Jul 30, 2023

@keonlee9420,

About your last point 1 for a potential solution of the quality issue, can you provide an example for replacing the light weight conv. with normal transformer block? Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants