You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am building a dataset to train with and need to ask a few questions before proceeding.
What is the max supported/suggested audio length? is several minutes alright or should the audio be limited to about ~20 seconds or so? Likewise, is there a reasonable limit to the length of the index?
Thank you.
The text was updated successfully, but these errors were encountered:
ghost
changed the title
Supported audio/index length? Nested subfolders?
Supported audio/index length?
Jan 1, 2022
Hello, this is a bit case-dependent. My experience tells that long files are tricky to handle. For speech data it works the best with audio under 20 seconds. Actually the memory usage is O(N^2) where N is the file length so it'll quickly run out of memory for anything lasting longer than a minute. Quality also degrades when there are long silences.
There are tricks to tame those long audio files. For example, you can do this in two steps. First drop frames from the acoustic features and do a rough alignment. With this initial alignment, you can then cut the file into smaller pieces and do full scaled alignment on each of them.
I see. I should have clarified that I was asking about training and not just alignment.
The audio was way too long, several minutes each, and while I wasn't running out of memory the training wasn't progressing. I roughly segmented the audio with another tool and it worked as expected.
Thank you very much for the advice. Especially with the suggested tricks.
Hello,
I am building a dataset to train with and need to ask a few questions before proceeding.
What is the max supported/suggested audio length? is several minutes alright or should the audio be limited to about ~20 seconds or so? Likewise, is there a reasonable limit to the length of the index?
Thank you.
The text was updated successfully, but these errors were encountered: