Text To Speech With Voice Cloning Feature (by neonsecret)

This repository is an implementation of Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS), with custom tweaks.

Differences from parent

Cleaned tensorflow requirement
Added russian language support (+guide how to add other languages)
Code reworks
Bilingual and single language model available

Setup

0. delete everything from the directory, clone it from sratch

git clone https://github.com/neonsecret/TTS-With-Voice-Cloning-Multilang

or if you already cloned it

cd TTS-With-Voice-Cloning-Multilang
git pull

1. Install Requirements

Both Windows and Linux are supported. A GPU is not required on machine level, but without it it will take ages to train anything.
Install ffmpeg. This is necessary for reading audio files.
Install PyTorch. Pick the latest stable version, your operating system, your package manager (pip by default) and finally pick any of the proposed CUDA versions if you have a GPU, otherwise pick CPU. Run the given command.
Install nvidia-pyindex by running pip install nvidia-pyindex
Install the remaining requirements with pip install -r requirements.txt

2. Train the models yourself.

No pretrained models available, however if you succeed in training one, send it to my email (github profile).

3. Download Datasets (only if you plan on training the model)

I only recommend downloading LibriSpeech/train-clean-100. Extract the contents as <datasets_root>/LibriSpeech/train-clean-100 where <datasets_root> is a directory of your choosing. Other datasets are supported in the toolbox, see here. You're free not to download any dataset, but then you will need your own data as audio files or you will have to record it with the toolbox. For other language dataset construction, see below.

4. Other languages dataset

So to do a custom dataset thing, you must unpack all files into the <datasets_root> with the following structures:

datasets_root
    * LibriTTS
        * train-clean-100
            * speaker-001
                * book-001
                    * utterance-001.wav
                    * utterance-001.txt
                    * utterance-002.wav
                    * utterance-002.txt
                    * utterance-003.wav
                    * utterance-003.txt

Where each utterance-###.wav is a short utterance (2-10 sec) and the utterance-###.txt contains the corresponding transcript. Then you can process this dataset using:

python synthesizer_preprocess_audio.py datasets_root --datasets_name LibriTTS --subfolders train-clean-100 --no_alignments

More info here.

4.1 Using the open_stt datasets(russian lang)

If you use the datasets from here, you can use the following command to preprocess it for the synthesizer training:

python rus_opus_preprocess.py -d <dataset_root>

and then go back to step 4. This will just rename the folders and convert the opuses to wavs. I also tweaked the original processor to detect the opus files so you can just pust the dataset to LibriTTS/train-clean-100/ folder and skip converting to wavs.

4.2 Training

Not really necessary, unless you plan adding another language or finetuning pretrained models.
Example synthesizer train command:

synthesizer_train.py rusmodel SV2TTS\synthesizer

5. Run the demo

python demo_cli.py

if cpu only:

python demo_cli.py --cpu

Name		Name	Last commit message	Last commit date
Latest commit History 370 Commits
.github		.github
encoder		encoder
samples		samples
synthesizer		synthesizer
toolbox		toolbox
utils		utils
vocoder		vocoder
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
demo_cli.py		demo_cli.py
encoder_preprocess.py		encoder_preprocess.py
encoder_train.py		encoder_train.py
freescript.py		freescript.py
g2p_train.py		g2p_train.py
requirements.txt		requirements.txt
rus_opus_preprocess.py		rus_opus_preprocess.py
synthesizer_preprocess_audio.py		synthesizer_preprocess_audio.py
synthesizer_preprocess_embeds.py		synthesizer_preprocess_embeds.py
synthesizer_train.py		synthesizer_train.py
vocoder_preprocess.py		vocoder_preprocess.py
vocoder_train.py		vocoder_train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text To Speech With Voice Cloning Feature (by neonsecret)

Differences from parent

Setup

0. delete everything from the directory, clone it from sratch

1. Install Requirements

2. Train the models yourself.

3. Download Datasets (only if you plan on training the model)

4. Other languages dataset

4.1 Using the open_stt datasets(russian lang)

4.2 Training

5. Run the demo

About

Releases 2

Sponsor this project

Packages

Languages

License

neonsecret/TTS-With-Voice-Cloning-Multilang

Folders and files

Latest commit

History

Repository files navigation

Text To Speech With Voice Cloning Feature (by neonsecret)

Differences from parent

Setup

0. delete everything from the directory, clone it from sratch

1. Install Requirements

2. Train the models yourself.

3. Download Datasets (only if you plan on training the model)

4. Other languages dataset

4.1 Using the open_stt datasets(russian lang)

4.2 Training

5. Run the demo

About

Resources

License

Stars

Watchers

Forks

Releases 2

Sponsor this project

Packages 0

Languages

Packages