Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How many hours speech data is used? #9

Open
yangdongchao opened this issue Apr 2, 2024 · 5 comments
Open

How many hours speech data is used? #9

yangdongchao opened this issue Apr 2, 2024 · 5 comments

Comments

@yangdongchao
Copy link

Hi, it is a great work.
I want to ask, how many hours data you used to get the performance in your demo.

@ex3ndr
Copy link
Owner

ex3ndr commented Apr 2, 2024 via email

@rishikksh20
Copy link

This model produces good voice quality and prosody for such a small amount of data if we train this model on a good amount of multi-lingual dataset, we will get amazing speech quality. I am preparing 1k Hindi language dataset for such model training along with already available English, and other latin datasets to train this model. The only limiting factor is to run MFA on such a large speech dataset, maybe I train GPT duration predictor on a fairly small subset of data and the main model on all data.

@yangdongchao
Copy link
Author

Right now it was trained on libritts-r, which is quite low like 1k hours at most. I am in the process of preparing 3tb dataset that would be used for training for next iteration. Steve Korshakov Sent via Superhuman @.> On Tue, Apr 2 2024 at 12:07 AM, Dongchao Yang @.@.>> wrote: Hi, it is a great work. I want to ask, how many hours data you used to get the performance in your demo. — Reply to this email directly, view it on GitHub<#9>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AADB2E74SEWRMMCLJHCDJ6DY3JKMHAVCNFSM6AAAAABFSZSR4SVHI2DSMVQWIX3LMV43ASLTON2WKOZSGIYTSNZSG43DMNA. You are receiving this because you are subscribed to this thread.Message ID: @.>

Good job! Expecting your better model trained on large-scale dataset.

@yangdongchao
Copy link
Author

This model produces good voice quality and prosody for such a small amount of data if we train this model on a good amount of multi-lingual dataset, we will get amazing speech quality. I am preparing 1k Hindi language dataset for such model training along with already available English, and other latin datasets to train this model. The only limiting factor is to run MFA on such a large speech dataset, maybe I train GPT duration predictor on a fairly small subset of data and the main model on all data.

Yes, I think the genereted voice is good. I am also try to reproduce it.

@ex3ndr
Copy link
Owner

ex3ndr commented Apr 2, 2024

This model produces good voice quality and prosody for such a small amount of data if we train this model on a good amount of multi-lingual dataset, we will get amazing speech quality. I am preparing 1k Hindi language dataset for such model training along with already available English, and other latin datasets to train this model. The only limiting factor is to run MFA on such a large speech dataset, maybe I train GPT duration predictor on a fairly small subset of data and the main model on all data.

I am doing opposite, gpt is already trained on bigger dataset, but audio model is not. GPT is quite easy to train I didn't even bother to tweak anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants