awesome-voice-cloning

What is this?

A place for all things voice cloning. Make a PR!

TacoTron 2

TACOTRON 2

CookiePPP Tacotron 2 Colabs

This is the main Synthesis Colab

This is the simplified Synthesis Colab

This is supposedly a newer version of the simplified Synthesis Colab

For the sake of completeness, this is the training colab

It's worth noting that the cookiePPP training colab has (what I believe is) a major improvement over mine: an integrated grapheme-to-phoneme system, so that the model can learn on syllabes instead of stupid nonstandard English spellings. I believe this will only work with English transcrips.

Scripp's Training Colabs

And another link: this is my fully functional Colab notebook for tacotron2 training and synthesis, with explanatory notes. No hardware required--it'll train your model on google's free GPUs and save the output to your google drive. The most complicated part is prepping your dataset before upload. Currently set up to train from the LJspeech-trained model, on 22050hz wav files with 16-bit PCM encoding. (See the dataset section for help on this)

Training

You can use this tensorboard to interact in parallel with the Tacotron2 for Dummies notebook to check the progress of your model. You will have to use "Factory Reset Runtime" every time you want to update the tensorboard to check progress. This is a GREAT way to visualize what's going on with your model. Much more useful than the alignment charts that the training colab spits out.

Tensorboard

Converting graphemes to phonemes

Below is a hastily coded python script to convert graphemes to phonemes in files already prepped for tt2 learning. Basically it takes each line of <filename.wav|transcription> and converts the transcription segment into IPA characters. What this means is that the model shouldn't get confused about words that don't sound the way they are written, and in general they should learn better.

Script in Colab Form

Waveglow

On training Waveglow - Scripp

Dataset Resources

Tools

Noice's Watson Speech To Text Tool

ASSFAP

Scripp's Guide

Use ffmpeg to convert your wav files to the right format:

ffmpeg -y -i $filename -ac 1 -acodec pcm_s16le -ar 22050 -sample_fmt s16 converted/$filename

Or, on a whole directory:

#!/bin/bash

for filename in *.wav; do
    echo "Converting $filename"
    ffmpeg -y -i $filename -ac 1 -acodec pcm_s16le -ar 22050 -sample_fmt s16 converted/$filename    
done