You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've recently begun experimenting with TTS, and in order to learn more about it, I'm eager to incorporate my native language from the ground up. From what I understand, having a substantial amount of data sets is crucial for achieving optimal results. While starting with 7-8 hours would be a good foundation, aiming for around 24 hours seems ideal (data sourced from LJ Speech, please correct me if I'm mistaken).
Before delving into using an actor's voice, I'm considering using my own voice for a preliminary test to gauge its effectiveness. Would recording around 100 sentences be sufficient for this purpose, with the expectation of exporting a single word regardless of quality (whether it sounds robotic or realistic)? If not, what would you recommend as the minimal data set required to generate one word that's included in the data set (I would record it using piper recording studio)?
I understand that it's generally preferable to train models based on existing ones, but since my native language isn't currently supported, I'm opting to start from scratch.
Thank you for your insights.
The text was updated successfully, but these errors were encountered:
Not the developer I am, but just a question: hat new language do you want to be supported on Piper? I want to have more info about it. It's a very interesting project I see.
Hello everyone,
I've recently begun experimenting with TTS, and in order to learn more about it, I'm eager to incorporate my native language from the ground up. From what I understand, having a substantial amount of data sets is crucial for achieving optimal results. While starting with 7-8 hours would be a good foundation, aiming for around 24 hours seems ideal (data sourced from LJ Speech, please correct me if I'm mistaken).
Before delving into using an actor's voice, I'm considering using my own voice for a preliminary test to gauge its effectiveness. Would recording around 100 sentences be sufficient for this purpose, with the expectation of exporting a single word regardless of quality (whether it sounds robotic or realistic)? If not, what would you recommend as the minimal data set required to generate one word that's included in the data set (I would record it using piper recording studio)?
I understand that it's generally preferable to train models based on existing ones, but since my native language isn't currently supported, I'm opting to start from scratch.
Thank you for your insights.
The text was updated successfully, but these errors were encountered: