Dataset licence #31

C00reNUT · 2022-05-06T19:12:04Z

Hello,
thank you for this amazing TTS model public. It is by far the best quality tts model I have tried so far.

I would like to ask you about the licensing of the dataset you have used for the training - Am I guessing correctly that you have used your own selection of librivox recordings?

I'm asking just to be sure that I can use the outputs in commercial setting, since all librivox recordings are in the public domain.

neonbjb · 2022-05-06T20:47:45Z

The dataset consists of thousands of audiobooks and podcasts that were scraped from the web. Many are copywritten, which is why I am not releasing the dataset.

If you know or believe the laws in your jurisdiction will consider ML models as extensions of their datasets, then you should consider Tortoise license encumbered and you should not use it for commercial purposes.

C00reNUT · 2022-05-06T21:14:46Z

Thank you for the clarification.

Just one more thing, I am asking because I would like to use the train_ voices

This repo comes with several pre-packaged voices. Voices prepended with "train_" came from the training set and perform far better than the others. If your goal is high quality speech, I recommend you pick one of them. If you want to see what Tortoise can do for zero-shot mimicing, take a look at the others.

I just want to be sure that they are not 'exact' 1:1 copy of the original voice, because maybe the generalization of the model could be fine according to the law, but I wouldn't be so sure with the exact voice match

neonbjb · 2022-05-06T21:20:52Z

This is a good point. You should not use any of the pre-packaged voices for business purposes for the time being. I will re-open t his and investigate which voices have copywrites attached to them and remove them.

neonbjb · 2022-05-06T21:22:14Z

FYI: LibriTTS and HiFiTTS datasets were used to train Tortoise. If you are looking for license-free voices that will work very well with this program, use one of those.

C00reNUT · 2022-05-06T21:30:32Z

FYI: LibriTTS and HiFiTTS datasets were used to train Tortoise. If you are looking for license-free voices that will work very well with this program, use one of those.

Excellent, that is a very valuable information. There shall be plenty of public domain options, it will be just a bit of hit or miss trials

Aspie96 · 2022-07-11T03:44:50Z

Just as a (probably) dumb (related) question: is there any reason to favour those datasets over LibriSpeech or some other dataset based on LibriVox (maybe a public domain one, since LibriSpeech is not exactly public domain)?

neonbjb · 2022-07-11T21:19:06Z

Not a dumb question, this is something that took me some pain to figure out. ASR-focused datasets are often poor for TTS because they are missing punctuation and have bad splitting (e.g. not split on sentences). These are both important cues for a TTS system. Both of these applies to LibriSpeech.

I believe LibriSpeech intersects with LibriTTS, so the model should work equally well with voices from either datasets.

…65/tortoise-tts:main into main Reviewed-on: https://git.ecker.tech/mrq/tortoise-tts/pulls/31

neonbjb closed this as completed May 6, 2022

neonbjb reopened this May 6, 2022

zachwe pushed a commit to zachwe/tortoise-tts that referenced this issue Sep 12, 2023

Merge pull request 'Download from Gradio' (neonbjb#31) from Armored10…

80eeef0

…65/tortoise-tts:main into main Reviewed-on: https://git.ecker.tech/mrq/tortoise-tts/pulls/31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset licence #31

Dataset licence #31

C00reNUT commented May 6, 2022

neonbjb commented May 6, 2022

C00reNUT commented May 6, 2022

neonbjb commented May 6, 2022

neonbjb commented May 6, 2022

C00reNUT commented May 6, 2022

Aspie96 commented Jul 11, 2022

neonbjb commented Jul 11, 2022

Dataset licence #31

Dataset licence #31

Comments

C00reNUT commented May 6, 2022

neonbjb commented May 6, 2022

C00reNUT commented May 6, 2022

neonbjb commented May 6, 2022

neonbjb commented May 6, 2022

C00reNUT commented May 6, 2022

Aspie96 commented Jul 11, 2022

neonbjb commented Jul 11, 2022