# Converting Pytorch models to Tensorflow and TFLite by 🐸 [Coqui TTS](https://github.com/coqui-ai/TTS)

This is a tutorial demonstrating 🐸 TTS capabilities to convert

> Indented block


trained PyTorch models to Tensorflow and Tflite.


# Installation

### Download TF Models and configs

In [None]:
!gdown --id 1dntzjWFg7ufWaTaFy80nRz-Tu02xWZos -O tts_model.pth.tar
!gdown --id 18CQ6G6tBEOfvCHlPqP8EBI4xWbrr9dBc -O config.json

Downloading...
From: https://drive.google.com/uc?id=1dntzjWFg7ufWaTaFy80nRz-Tu02xWZos
To: /content/tts_model.pth.tar
347MB [00:03, 113MB/s]
Downloading...
From: https://drive.google.com/uc?id=18CQ6G6tBEOfvCHlPqP8EBI4xWbrr9dBc
To: /content/config.json
100% 9.53k/9.53k [00:00<00:00, 21.5MB/s]


In [None]:
!gdown --id 1Ty5DZdOc0F7OTGj9oJThYbL5iVu_2G0K -O vocoder_model.pth.tar
!gdown --id 1Rd0R_nRCrbjEdpOwq6XwZAktvugiBvmu -O config_vocoder.json
!gdown --id 11oY3Tv0kQtxK_JPgxrfesa99maVXHNxU -O scale_stats.npy

Downloading...
From: https://drive.google.com/uc?id=1Ty5DZdOc0F7OTGj9oJThYbL5iVu_2G0K
To: /content/vocoder_model.pth.tar
82.8MB [00:00, 124MB/s] 
Downloading...
From: https://drive.google.com/uc?id=1Rd0R_nRCrbjEdpOwq6XwZAktvugiBvmu
To: /content/config_vocoder.json
100% 6.76k/6.76k [00:00<00:00, 11.7MB/s]
Downloading...
From: https://drive.google.com/uc?id=11oY3Tv0kQtxK_JPgxrfesa99maVXHNxU
To: /content/scale_stats.npy
100% 10.5k/10.5k [00:00<00:00, 15.7MB/s]


### Setup Libraries

In [None]:
# need it for char to phoneme conversion
! sudo apt-get install espeak

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following package was automatically installed and is no longer required:
  libnvidia-common-440
Use 'sudo apt autoremove' to remove it.
The following additional packages will be installed:
  espeak-data libespeak1 libportaudio2 libsonic0
The following NEW packages will be installed:
  espeak espeak-data libespeak1 libportaudio2 libsonic0
0 upgraded, 5 newly installed, 0 to remove and 35 not upgraded.
Need to get 1,219 kB of archives.
After this operation, 3,031 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 libportaudio2 amd64 19.6.0-1 [64.6 kB]
Get:2 http://archive.ubuntu.com/ubuntu bionic/main amd64 libsonic0 amd64 0.2.0-6 [13.4 kB]
Get:3 http://archive.ubuntu.com/ubuntu bionic/universe amd64 espeak-data amd64 1.48.04+dfsg-5 [934 kB]
Get:4 http://archive.ubuntu.com/ubuntu bionic/universe amd64 libespeak1 amd64 1.48.04+dfsg-5 [145 

In [None]:
!git clone https://github.com/coqui-ai/TTS

Cloning into 'TTS'...
remote: Enumerating objects: 144, done.[K
remote: Counting objects:   0% (1/144)[Kremote: Counting objects:   1% (2/144)[Kremote: Counting objects:   2% (3/144)[Kremote: Counting objects:   3% (5/144)[Kremote: Counting objects:   4% (6/144)[Kremote: Counting objects:   5% (8/144)[Kremote: Counting objects:   6% (9/144)[Kremote: Counting objects:   7% (11/144)[Kremote: Counting objects:   8% (12/144)[Kremote: Counting objects:   9% (13/144)[Kremote: Counting objects:  10% (15/144)[Kremote: Counting objects:  11% (16/144)[Kremote: Counting objects:  12% (18/144)[Kremote: Counting objects:  13% (19/144)[Kremote: Counting objects:  14% (21/144)[Kremote: Counting objects:  15% (22/144)[Kremote: Counting objects:  16% (24/144)[Kremote: Counting objects:  17% (25/144)[Kremote: Counting objects:  18% (26/144)[Kremote: Counting objects:  19% (28/144)[Kremote: Counting objects:  20% (29/144)[Kremote: Counting objects:  21% (31/144)

In [None]:
%cd TTS
!git checkout dev
!pip install -r requirements.txt
!python setup.py develop
!pip install tensorflow==2.3.0rc0
%cd ..

/content/TTS
Branch 'dev' set up to track remote branch 'dev' from 'origin'.
Switched to a new branch 'dev'
running develop
-- Building version 0.0.3+664f42d
running egg_info
creating tts_namespace/TTS.egg-info
writing tts_namespace/TTS.egg-info/PKG-INFO
writing dependency_links to tts_namespace/TTS.egg-info/dependency_links.txt
writing entry points to tts_namespace/TTS.egg-info/entry_points.txt
writing requirements to tts_namespace/TTS.egg-info/requires.txt
writing top-level names to tts_namespace/TTS.egg-info/top_level.txt
writing manifest file 'tts_namespace/TTS.egg-info/SOURCES.txt'
writing manifest file 'tts_namespace/TTS.egg-info/SOURCES.txt'
running build_ext
Creating /usr/local/lib/python3.6/dist-packages/TTS.egg-link (link to tts_namespace)
TTS 0.0.3+664f42d is already the active version in easy-install.pth
Installing tts-server script to /usr/local/bin

Installed /content/TTS/tts_namespace
Processing dependencies for TTS==0.0.3+664f42d
Searching for gdown==3.6.4
Best match: g

# Model Conversion PyTorch -> TF -> TFLite

## Converting PyTorch to Tensorflow


In [None]:
!pip install fuzzywuzzy

Collecting fuzzywuzzy
  Downloading https://files.pythonhosted.org/packages/43/ff/74f23998ad2f93b945c0309f825be92e04e0348e062026998b5eefef4c33/fuzzywuzzy-0.18.0-py2.py3-none-any.whl
Installing collected packages: fuzzywuzzy
Successfully installed fuzzywuzzy-0.18.0


In [None]:
# convert TTS model to Tensorflow
!python /content/TTS/tf/convert_tacotron2_torch_to_tf.py --config_path config.json --torch_model_path tts_model.pth.tar --output_path tts_model_tf.pkl

2020-07-14 15:34:08.556770: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
 > Using model: Tacotron2
2020-07-14 15:34:17.019296: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-07-14 15:34:17.073688: E tensorflow/stream_executor/cuda/cuda_driver.cc:314] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2020-07-14 15:34:17.073757: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (b05eeb264940): /proc/driver/nvidia/version does not exist
2020-07-14 15:34:17.074461: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropria

In [None]:
# convert Vocoder model to Tensorflow
!python /content/TTS/vocoder/tf/convert_melgan_torch_to_tf.py --config_path config_vocoder.json --torch_model_path vocoder_model.pth.tar --output_path vocoder_model_tf.pkl

2020-07-14 15:35:46.508612: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
 > Generator Model: multiband_melgan_generator
 > Generator Model: multiband_melgan_generator
2020-07-14 15:35:50.941316: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-07-14 15:35:50.945075: E tensorflow/stream_executor/cuda/cuda_driver.cc:314] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2020-07-14 15:35:50.945116: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (b05eeb264940): /proc/driver/nvidia/version does not exist
2020-07-14 15:35:50.945411: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To en

## Converting Tensorflow to TFLite

In [None]:
# convert TTS model to TFLite
!python /content/TTS/tf/convert_tacotron2_tflite.py --config_path config.json --tf_model tts_model_tf.pkl --output_path tts_model.tflite

2020-07-14 16:03:31.329598: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
 > Using model: Tacotron2
2020-07-14 16:03:32.855337: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-07-14 16:03:32.858430: E tensorflow/stream_executor/cuda/cuda_driver.cc:314] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2020-07-14 16:03:32.858471: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (b05eeb264940): /proc/driver/nvidia/version does not exist
2020-07-14 16:03:32.858724: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropria

In [None]:
# convert Vocoder model to TFLite
!python /content/TTS/vocoder/tf/convert_melgan_tflite.py --config_path config_vocoder.json --tf_model vocoder_model_tf.pkl --output_path vocoder_model.tflite

2020-07-14 16:02:19.612450: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
 > Generator Model: multiband_melgan_generator
2020-07-14 16:02:21.418525: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-07-14 16:02:21.421928: E tensorflow/stream_executor/cuda/cuda_driver.cc:314] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2020-07-14 16:02:21.421977: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (b05eeb264940): /proc/driver/nvidia/version does not exist
2020-07-14 16:02:21.422499: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFl

# Run Inference with TFLite

In [None]:
def run_vocoder(mel_spec):
  vocoder_inputs = mel_spec[None, :, :]
  # get input and output details
  input_details = vocoder_model.get_input_details()
  # reshape input tensor for the new input shape
  vocoder_model.resize_tensor_input(input_details[0]['index'], vocoder_inputs.shape)
  vocoder_model.allocate_tensors()
  detail = input_details[0]
  vocoder_model.set_tensor(detail['index'], vocoder_inputs)
  # run the model
  vocoder_model.invoke()
  # collect outputs
  output_details = vocoder_model.get_output_details()
  waveform = vocoder_model.get_tensor(output_details[0]['index'])
  return waveform


def tts(model, text, CONFIG, p):
    t_1 = time.time()
    waveform, alignment, mel_spec, mel_postnet_spec, stop_tokens, inputs = synthesis(model, text, CONFIG, use_cuda, ap, speaker_id, style_wav=None,
                                                                             truncated=False, enable_eos_bos_chars=CONFIG.enable_eos_bos_chars,
                                                                             backend='tflite')
    waveform = run_vocoder(mel_postnet_spec.T)
    waveform = waveform[0, 0]
    rtf = (time.time() - t_1) / (len(waveform) / ap.sample_rate)
    tps = (time.time() - t_1) / len(waveform)
    print(waveform.shape)
    print(" > Run-time: {}".format(time.time() - t_1))
    print(" > Real-time factor: {}".format(rtf))
    print(" > Time per step: {}".format(tps))
    IPython.display.display(IPython.display.Audio(waveform, rate=CONFIG.audio['sample_rate']))
    return alignment, mel_postnet_spec, stop_tokens, waveform

### Load TF Models

In [None]:
import os
import torch
import time
import IPython

from TTS.tf.utils.tflite import load_tflite_model
from TTS.tf.utils.io import load_checkpoint
from TTS.utils.io import load_config
from TTS.utils.text.symbols import symbols, phonemes
from TTS.utils.audio import AudioProcessor
from TTS.utils.synthesis import synthesis

In [None]:
# runtime settings
use_cuda = False

In [None]:
# model paths
TTS_MODEL = "tts_model.tflite"
TTS_CONFIG = "config.json"
VOCODER_MODEL = "vocoder_model.tflite"
VOCODER_CONFIG = "config_vocoder.json"

In [None]:
# load configs
TTS_CONFIG = load_config(TTS_CONFIG)
VOCODER_CONFIG = load_config(VOCODER_CONFIG)

In [None]:
# load the audio processor
ap = AudioProcessor(**TTS_CONFIG.audio)

 > Setting up Audio Processor...
 | > sample_rate:22050
 | > num_mels:80
 | > min_level_db:-100
 | > frame_shift_ms:None
 | > frame_length_ms:None
 | > ref_level_db:0
 | > fft_size:1024
 | > power:1.5
 | > preemphasis:0.0
 | > griffin_lim_iters:60
 | > signal_norm:True
 | > symmetric_norm:True
 | > mel_fmin:50.0
 | > mel_fmax:7600.0
 | > spec_gain:1.0
 | > stft_pad_mode:reflect
 | > max_norm:4.0
 | > clip_norm:True
 | > do_trim_silence:True
 | > trim_db:60
 | > do_sound_norm:False
 | > stats_path:./scale_stats.npy
 | > hop_length:256
 | > win_length:1024


In [None]:
# LOAD TTS MODEL
# multi speaker
speaker_id = None
speakers = []

# load the models
model = load_tflite_model(TTS_MODEL)
vocoder_model = load_tflite_model(VOCODER_MODEL)

## Run Sample Sentence

In [None]:
sentence =  "Bill got in the habit of asking himself “Is that thought true?” and if he wasn’t absolutely certain it was, he just let it go."
align, spec, stop_tokens, wav = tts(model, sentence, TTS_CONFIG, ap)

(185856,)
 > Run-time: 3.4360833168029785
 > Real-time factor: 0.4076327362736641
 > Time per step: 1.8486800570215405e-05
