-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change default Sample Rate and Signal to Standard for TTS dataset #2
Comments
Hi, @Sadam1195 I have updated the library and added support to change default sample rate. Please let me know if you find any other suggestions. |
Hi @hetpandya sorry I didn't chance to submit my PR which I already implemented. Great, that you fixed punctuation issue in new update which was very essential for building a dataset for TTS. One other thing which I found missing in this project is how audio is being split is not very intelligent. You should separate the audio based punctuation. Like on full stop, comma, question mark, or exclamation mark. Because audio should be sliced on some pause not in between the speech which cuts phonemes and vowels and audio length should be not greater than 10 seconds as that is not an ideal case for tts datasets. because longer the audio harder for models to align better and 1-10 seconds of audio chunks is ideal for tts dataset. You can add a flag like |
@Sadam1195 I understand about the method for splitting and concatenating, but for the way you mentioned, it will be difficult since only know about the beginning and ending of a caption. I'll try finding a way for it though. Also, as far as I know, youtube captions are generated on the basis of pauses taken in the speech |
Hi, I have found two more issue that I have fixed in your code. I will be sending PR for those too.
The text was updated successfully, but these errors were encountered: