Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change default Sample Rate and Signal to Standard for TTS dataset #2

Closed
Sadam1195 opened this issue Apr 2, 2021 · 3 comments
Closed

Comments

@Sadam1195
Copy link

Hi, I have found two more issue that I have fixed in your code. I will be sending PR for those too.

@hetpandya
Copy link
Owner

Hi, @Sadam1195 I have updated the library and added support to change default sample rate. Please let me know if you find any other suggestions.
Thanks

@Sadam1195
Copy link
Author

Sadam1195 commented Jun 14, 2021

Hi @hetpandya sorry I didn't chance to submit my PR which I already implemented. Great, that you fixed punctuation issue in new update which was very essential for building a dataset for TTS.

One other thing which I found missing in this project is how audio is being split is not very intelligent. You should separate the audio based punctuation. Like on full stop, comma, question mark, or exclamation mark. Because audio should be sliced on some pause not in between the speech which cuts phonemes and vowels and audio length should be not greater than 10 seconds as that is not an ideal case for tts datasets. because longer the audio harder for models to align better and 1-10 seconds of audio chunks is ideal for tts dataset. You can add a flag like max_time like 10 seconds and if audio does not have any full stop in those 10 seconds then it would be better to cut of audio on whichever punctuation is in it because otherwise it would chop off speech. Hope it makes scenes to you.

@hetpandya
Copy link
Owner

@Sadam1195 I understand about the method for splitting and concatenating, but for the way you mentioned, it will be difficult since only know about the beginning and ending of a caption. I'll try finding a way for it though. Also, as far as I know, youtube captions are generated on the basis of pauses taken in the speech

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants