# **MLP Synthesis**
<img src="https://media2.giphy.com/media/Ld82Eprf6gkLe/giphy.gif" width="200px" alt="yay!">


### Works on tensorflow 2!

-----------------------------------------------------------
Here you can test out the TTS voices. If you use google colab, open in Playground (need to be signed in), follow the instructions in each section and run the code by hovering over the cells and clicking the play button.

**Note**: Sometimes the models will have a hard time pronouncing stuff, so make sure to use ARPAbet. You can also try to spell words differently to get the TTS to pronounce it. 

Make sure to wrap the ARPAbet like this: {T EH1 S T}

**ARPAbet Translator (Show Lexical Stress)**: http://www.speech.cs.cmu.edu/cgi-bin/cmudict

OR

**Merged Dictionary**: https://drive.google.com/open?id=13ciybUBArMtk4fBPcQnVIjkFrGzpkN0E

-----------------------------------------------------------

Original notebook made by Cookie. Upgrade by g.d.

## Run cells below

In [None]:
%matplotlib inline
print('You gpu is:')
!nvidia-smi --query-gpu=gpu_name,memory.total --format=csv
import os
os.getcwd()
if os.path.exists('train.py') and os.path.exists('hparams.py'):
    %cd ..

### Installation

In [3]:
# if you are a programmer, run this. It will reload all modified files outside this notebook when changed.
%load_ext autoreload
%autoreload 2

In [4]:
import os
from os.path import exists, join, basename, splitext
import sys
import time
import matplotlib
import matplotlib.pyplot as plt
import gdown
import IPython.display as ipd
import numpy as np
import torch
import soundfile as sf

In [None]:
!pip install -q gdown
!pip install -q matplotlib
!pip install -q librosa
!pip install -q unidecode
!pip install -q mega.py
!pip install -q PySoundFile

In [6]:
git_repo_url = 'https://github.com/ghostdancing/tacotron2-tf2.git'
project_name = splitext(basename(git_repo_url))[0]    
!git clone {git_repo_url}

os.mkdir(join(project_name, 'models'))
os.mkdir(join(project_name, 'infer'))

sys.path.append(join(project_name, 'waveglow/'))
sys.path.append(project_name)

gdrive_prefix = 'https://drive.google.com/uc?id='

In [7]:
# install mega, works on windows too
if not exists('mega.py'):
    print('Downloading mega.py repo')
    !git clone https://github.com/ghostdancing/mega.py
os.chdir('mega.py/src/')
from mega import Mega
os.chdir('../../')

In [31]:
from hparams import create_hparams
from model import Tacotron2
from layers import TacotronSTFT
from audio_processing import griffin_lim
from text import text_to_sequence
from denoiser import Denoiser
from unidecode import unidecode
from random import choice
import librosa
from colab_utils import ARPA, plot_data, waveglow_gdrive_ids
import colab_utils
import shutil
import soundfile as sf

mega = Mega()
mega_login = mega.login() # anonymous account
mega_login.download_url(colab_utils.arpa_dictionary_mega_url, project_name)

if not exists(join(project_name, 'merged.dict.txt')):
    colab_utils.download_arpa_dict(project_name)
thisdict = colab_utils.get_arpa_dict(join(project_name, 'merged.dict.txt'))

## Setup Tacotron 2 Model
----

In [None]:
last_model_id = None
def load_tacotron2(model_id = r'1Q4lNU_qwiKZvjKZ4kIalt6DACkZRdNPj'):
    global last_model_id, model, hparams
    if model_id == last_model_id:
        return
    last_model_id = model_id
    # Download Tacotron 2 Model
    force_download_TT2 = False
    tacotron2_pretrained_model = join(project_name, 'models', model_id)
    if not exists(tacotron2_pretrained_model) or force_download_TT2:
        gdown.download(gdrive_prefix + model_id, tacotron2_pretrained_model, quiet=False); print("Tacotron2 Model Downloaded")

    # Setup Parameters
    hparams = create_hparams()
    hparams.sampling_rate = 48000
    hparams.max_decoder_steps = 3000 # how many steps before cutting off generation, too many and you may get CUDA errors.
    hparams.gate_threshold = 0.30 # Model must be 30% sure the clip is over before ending generation
    # Load Tacotron2 model into GPU

    state = torch.load(tacotron2_pretrained_model)
    model = Tacotron2(hparams)
    model.load_state_dict(state['state_dict'])
    _ = model.cuda().eval().half()
    print("This Tacotron model has been trained for ", state['iteration']," Iterations.")

load_tacotron2()

## Setup WaveGlow Model
----
**This section does not need to be modified unless you see "`WaveGlow failed to download on all ID's provided`" on the output.**

---

Right now Google may deny permissions, presumably too many downloads.
Goto [this](https://drive.google.com/uc?id=1p-GmnYiSS9UsRjw13kkhbIoD0-9t8ALH) link and you can clone the file into your down drive. Click 'Get Sharable Link' and extract the id.

`https://drive.google.com/open?id=1DMyL3RxFqAVhH60VCLnVaDt2YJb2RCfz`

In this example, the id is `1DMyL3RxFqAVhH60VCLnVaDt2YJb2RCfz`. That can be added to the waveglow_ids list or just replace one of the ids already there.

In [None]:
# Download WaveGlow Model
waveglow_pretrained_model = join(project_name, 'models/waveglow.pt')
waveglow_ids = waveglow_gdrive_ids
while not exists(waveglow_pretrained_model) and waveglow_ids:
    id = choice(waveglow_ids)
    gdown.download(gdrive_prefix + id, waveglow_pretrained_model, quiet=False)
    if not exists(waveglow_pretrained_model):
        print("Download Failed, attempting another ID"); waveglow_ids.remove(id)

if exists(waveglow_pretrained_model): print("WaveGlow Downloaded")
else: print("WaveGlow failed to download on all ID's provided")

# Load WaveGlow model into GPU
state = torch.load(waveglow_pretrained_model)
waveglow = state['model']
waveglow.cuda().eval().half()
for k in waveglow.convinv:
    k.float()
denoiser = Denoiser(waveglow)
print("This WaveGlow model has been trained for ", state['iteration'], " Iterations.")

## Start playing around with the model

Replace `text` with whatever you want.

Output files can be found in the tacotron-tf2/infer directory

**Note**: Sometimes the model won't generate the text perfectly, or sometimes you won't get the emotion you want. If that happens, try re-generating it.

Also the model can't handle really, really long text. It can handle some long text, but anything really long, you'll have to break it up into parts.

To change the model. Copy the ID for the selected model into the drive_id shown below and continue.

In [None]:
drive_id = '1Q4lNU_qwiKZvjKZ4kIalt6DACkZRdNPj' # gdrive id of a model

text = """
You have to be careful with this stuff or it'll explode. I think. It's like the writer only wrote down the parts of the spell he thought he'd forget.
The FitnessGram Pacer Test is a multistage aerobic capacity test that progressively gets more difficult as it continues.
The 20 meter pacer test will begin in 30 seconds. Line up at the start. 
Ever since Anon arrived in Equestria, Twilight's been getting crazier and crazier.
"""

sigma = 0.75
denoise_strength = 0.01
raw_input_ = False  # disables automatic ARPAbet conversion, useful for inputting your own pronounciation or just for testing 

show_graphs = True
colab_utils.graph_scale = 0.5 # literally a zoom factor
colab_utils.alignment_graph_width = 1800
colab_utils.alignment_graph_height = 720
colab_utils.colormap = 'twilight' #inferno # color map for the spectogram and alignment

save_wavs = 1
counter = 0
text = unidecode(text) # convert unicode punctuation into it's normal equivalents (thanks Fimfiction.)
text = text * 1 # how many times to generate each clip
load_tacotron2(drive_id)
with torch.no_grad():
    for i in text.split("\n"):
        if len(i) < 1: continue
        print('text:'.ljust(20), i)
        if raw_input_:
            if i[-1] != "␤": i=i+"␤" 
        else:
            i = ARPA(i)
            print('arpa conversion:'.ljust(20), i)
        sequence = np.array(text_to_sequence(i, ['english_cleaners']))[None, :]
        sequence = torch.autograd.Variable(torch.from_numpy(sequence)).cuda().long()
        mel_outputs, mel_outputs_postnet, _, alignments = model.inference(sequence)
        if show_graphs:
            plot_data((mel_outputs_postnet.float().data.cpu().numpy()[0],
                alignments.float().data.cpu().numpy()[0].T))
        audio = waveglow.infer(mel_outputs_postnet, sigma=sigma); print(""); ipd.display(ipd.Audio(audio[0].data.cpu().numpy(), rate=hparams.sampling_rate))
        audio_denoised = denoiser(audio, strength=denoise_strength)[:, 0]; print("Denoised"); ipd.display(ipd.Audio(audio_denoised.cpu().numpy(), rate=hparams.sampling_rate))
        if save_wavs:

            sf.write(join(project_name, 'infer', F'tt2_{time.time()}.wav'), np.swapaxes(audio_denoised.cpu().numpy(),0,1), hparams.sampling_rate)
            #librosa.output.write_wav('tacotron2/infer/Inf_' + str(counter) + '.wav', ,)
        counter+=1

---

The rest is just examples of good alignment. See to the graphs on the right?

Those graphs need to look as close to this 

![Image of Alignment Graph. Basically Perfect alignment, there's no point going above this level](https://i.ibb.co/TKSQz7h/perfect-alignment.png)

as possible.