Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VCTK datasets #1

Closed
XXXHUA opened this issue Jun 29, 2021 · 7 comments
Closed

VCTK datasets #1

XXXHUA opened this issue Jun 29, 2021 · 7 comments

Comments

@XXXHUA
Copy link

XXXHUA commented Jun 29, 2021

Hi, I note your paper evaluates the models' performance on VCTK datasets, but I not see the process file about VCTK. Hence, could you share the files, thank you very much.

@keonlee9420
Copy link
Owner

Hi @XXXHUA , actually I'm not the author of the paper 😅. Anyway, you can find what you want in here. The model uses almost the same preprocessing process on VCTK dataset.

@XXXHUA
Copy link
Author

XXXHUA commented Jul 13, 2021

Hi, thanks you rely for the question about VCTK datasets, but i faced a new problem on LibriTTS. I load the pretrained model 100000.pth.tar and 600000.pth.tar in this repo, but the synthesize performance is very poor. Have you ever gotten better results?

@keonlee9420
Copy link
Owner

Sorry for the late response. I think it is because the meta-learning part is not perfect yet. I've got to optimize it more and will share the result if I succeeded. Stay tuned!

@keonlee9420
Copy link
Owner

I fixed the optimizer and now can confirm that meta learner is working. I'll share the pre-trained model soon.

@keonlee9420
Copy link
Owner

@XXXHUA and also try to downsample the training data from 22050 to 16k. it will boost up training and also the generated speech quality (compared to the same step of the current pre-trained models)

@XXXHUA
Copy link
Author

XXXHUA commented Jul 26, 2021

Thanks for your a series of detailed replies, i have tried to use 16KHz data in this model, but i guess the poor performance i faced is result from the 24KHz vocoder. The latest model is trained on 16KHz data ?

@keonlee9420
Copy link
Owner

keonlee9420 commented Jul 26, 2021

Ah, I didn't know that you used 24kHz vocoder and yes the mismatch between synthesizer and vocoder is definitely the bottleneck for the poor quality. You may need to match them first. The latest one is also trained on 22050Hz.

@XXXHUA XXXHUA closed this as completed Aug 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants