# Text-to-Speech End-to-End FastSpeech2

FastSpeech2 + Neural Vocoder Generator, End-to-End.

<div class="alert alert-info">

This tutorial is available as an IPython notebook at [malaya-speech/example/tts-e2e-fastspeech2](https://github.com/huseinzol05/malaya-speech/tree/master/example/tts-e2e-fastspeech2).
    
</div>

<div class="alert alert-warning">

This module is not language independent, so it not save to use on different languages. Pretrained models trained on hyperlocal languages.
    
</div>

<div class="alert alert-warning">

This is an application of malaya-speech Pipeline, read more about malaya-speech Pipeline at [malaya-speech/example/pipeline](https://github.com/huseinzol05/malaya-speech/tree/master/example/pipeline).
    
</div>

In [1]:
import os

os.environ['CUDA_VISIBLE_DEVICES'] = ''

In [2]:
import malaya_speech
import numpy as np
from malaya_speech import Pipeline
import matplotlib.pyplot as plt
import IPython.display as ipd

 The versions of TensorFlow you are currently using is 2.6.0 and is not supported. 
Some things might work, some things might not.
If you were to encounter a bug, do not file an issue.
If you want to make sure you're using a tested and supported configuration, either change the TensorFlow version or the TensorFlow Addons's version. 
You can find the compatibility matrix in TensorFlow Addon's readme:
https://github.com/tensorflow/addons
TensorFlow Addons has compiled its custom ops against TensorFlow 2.4.0, and there are no compatibility guarantees between the two versions. 
This means that you might get segfaults when loading the custom op, or other kind of low-level errors.
 If you do, do not file an issue on Github. This is a known limitation.

It might help you to fallback to pure Python ops with TF_ADDONS_PY_OPS . To do that, see https://github.com/tensorflow/addons#gpucpu-custom-ops 

You can also change the TensorFlow version installed on your system. You would need a TensorFlow 

### End-to-End FastSpeech2 description

1. Malaya-speech VITS generate End-to-End, from text input into waveforms with 22050 sample rate.
2. Cannot generate more than melspectrogram longer than 2000 timestamp, it will throw an error. Make sure the texts are not too long.

### List available End-to-End FastSpeech2

In [3]:
malaya_speech.tts.available_e2e_fastspeech2()

Unnamed: 0,Size (MB),Understand punctuation,Is lowercase
mesolitica/VITS-osman,145,True,False
mesolitica/VITS-yasmin,145,True,False


### Load End-to-End FastSpeech2 model

Fastspeech2 use text normalizer from Malaya, https://malaya.readthedocs.io/en/latest/load-normalizer.html#Load-normalizer,

Make sure you install Malaya version > 4.0 to make it works, **to get better speech synthesis, make sure Malaya version > 4.7.5**,

```bash
pip install malaya -U
```

```python
def e2e_fastspeech2(
    model: str = 'osman',
    quantized: bool = False,
    pad_to: int = 8,
    **kwargs,
):
    """
    Load Fastspeech2 Text-to-Mel TTS model.

    Parameters
    ----------
    model : str, optional (default='male')
        Model architecture supported. Allowed values:

        * ``'yasmin'`` - Fastspeech2 trained on female Yasmin voice.
        * ``'osman'`` - Fastspeech2 trained on male Osman voice.

    quantized : bool, optional (default=False)
        if True, will load 8-bit quantized model.
        Quantized model not necessary faster, totally depends on the machine.
    pad_to : int, optional (default=8)
        size of pad character with 0. Increase can stable up prediction on short sentence, we trained on 8.

    Returns
    -------
    result : malaya_speech.model.synthesis.E2E_FastSpeech class
    """
```

In [4]:
osman = malaya_speech.tts.e2e_fastspeech2(model = 'osman')

2022-08-27 21:50:22.443530: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-08-27 21:50:22.456927: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2022-08-27 21:50:22.456947: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: huseincomel-desktop
2022-08-27 21:50:22.456951: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: huseincomel-desktop
2022-08-27 21:50:22.457016: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: Not found: was unable to find libcuda.so DSO loaded into this program
2022-08-27 21:50:22.457037: I

In [5]:
# https://www.sinarharian.com.my/article/115216/BERITA/Politik/Syed-Saddiq-pertahan-Dr-Mahathir
string1 = 'Syed Saddiq berkata, mereka seharusnya mengingati bahawa semasa menjadi Perdana Menteri Pakatan Harapan'

### Predict

```python
def predict(
    self,
    string,
    speed_ratio: float = 1.0,
    f0_ratio: float = 1.0,
    energy_ratio: float = 1.0,
    temperature_durator: float = 0.6666,
    **kwargs,
):
    """
    Change string to Mel.

    Parameters
    ----------
    string: str
    speed_ratio: float, optional (default=1.0)
        Increase this variable will increase time voice generated.
    f0_ratio: float, optional (default=1.0)
        Increase this variable will increase frequency, low frequency will generate more deeper voice.
    energy_ratio: float, optional (default=1.0)
        Increase this variable will increase loudness.
    temperature_durator: float, optional (default=0.66666)
        Durator trying to predict alignment with random.normal() * temperature_durator.

    Returns
    -------
    result: Dict[string, decoder-output, y]
    """
```

It only able to predict 1 text for single feed-forward.

In [6]:
r_osman = osman.predict(string1)
r_osman.keys()

2022-08-27 21:50:53.284544: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 33751040 exceeds 10% of free system memory.
2022-08-27 21:50:53.536243: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 33751040 exceeds 10% of free system memory.
2022-08-27 21:50:53.686972: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 33751040 exceeds 10% of free system memory.


dict_keys(['string', 'ids', 'y'])

In [7]:
ipd.Audio(r_osman['y'], rate = 22050)

In [8]:
string2 = 'Haqkiem adalah pelajar tahun akhir yang mengambil Ijazah Sarjana Muda Sains Komputer Kecerdasan Buatan utama dari Universiti Teknikal Malaysia Melaka (UTeM) yang kini berusaha untuk latihan industri di mana dia secara praktikal dapat menerapkan pengetahuannya dalam Perisikan Perisian dan Pengaturcaraan ke arah organisasi atau industri yang berkaitan.'

In [9]:
r_osman = osman.predict(string2)
r_osman.keys()

2022-08-27 21:51:08.687323: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 29261824 exceeds 10% of free system memory.
2022-08-27 21:51:09.032592: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 58523648 exceeds 10% of free system memory.


dict_keys(['string', 'ids', 'y'])

In [10]:
ipd.Audio(r_osman['y'], rate = 22050)