Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sharing Korean Glow-TTS Samples #27

Open
Joovvhan opened this issue Jul 14, 2020 · 5 comments
Open

Sharing Korean Glow-TTS Samples #27

Joovvhan opened this issue Jul 14, 2020 · 5 comments

Comments

@Joovvhan
Copy link

Dear contributors,

Thank you for sharing your great works.

I have successfully reproduced your result with the LJSpeech Dataset.

In addition, I have trained your model with Korean Single Speaker Speech Dataset and G2PK grapheme-to-phoneme converting module as a Korean cleaner.

This is the link to the demo page.

I would be glad if you introduce my demo page in your README.

Thanks again for your great code.

@dathudeptrai
Copy link

@Joovvhan hi, i also try to train Korean KSS dataset and make a colab here (https://colab.research.google.com/drive/1ybWwOS5tipgPFttNulp77P6DAB5MtiuN?usp=sharing). This is fastspeech2 + mb-melgan. I'm not a native speaker, can you compared glow-tts + waveglow with our fastspeech2 + mb-melgans ?. Thanks

@Joovvhan
Copy link
Author

Joovvhan commented Aug 8, 2020

@dathudeptrai Yes, I would be glad to.
As I looked through your samples quickly, your pre-trained model on the Colab is quite impressive.

Yet, since I am not the original writer of the Glow-TTS, I have not tuned any hyperparameters or introduced any audio processing techniques to improve the audio quality.

I guess it would make more sense to apply the same techniques that have been used in training or synthesizing your samples (fastspeech2 + mb-melgans) to the Glow-TTS model first, then compare samples.

In addition, it seems that your pretrained mb-melgans is better than officially provided universal WaveGlow model in generating the Korean speech. I found that WaveGlow produces some screeching sound when regenerating audio from ground truth spectrograms. I have not found any in your samples yet.

I will share you the link to the audio comparison page when I am done.

It would be better to write comments on your issue page if we have any further discussions.

Thanks.

@Joovvhan
Copy link
Author

Dear authors,

I have improved my demo page by replacing the WaveGlow vocoder with a Multi-MelGAN vocoder provided by TensorFlowTTS authors.

I found out that the official universal WaveGlow vocoder is not so universal for the Korean language.

This is the link to the webpage.

I will leave the poor sample page unchanged for someone who would like to compare the effect of the vocoder.

@Joovvhan
Copy link
Author

Dear contributors,

I have applied G2PK, grapheme to phoneme conversion package, and achieved an improved Korean TTS results.

This is the link to the demo page.

Since the original paper used phoneme tokens as inputs, I believe this result is closer to the intention of your original work.

Thanks.

@v-nhandt21
Copy link

Dear contributors,

I have applied G2PK, grapheme to phoneme conversion package, and achieved an improved Korean TTS results.

This is the link to the demo page.

Since the original paper used phoneme tokens as inputs, I believe this result is closer to the intention of your original work.

Thanks.

Hi Joovvhan, I found your demo is much good, but about G2PK in your language, how can you handle with out of vocab words such as English or name of unknown place?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants