# More examples FastSpeech2

<div class="alert alert-info">

This tutorial is available as an IPython notebook at [malaya-speech/example/tts-more-fastspeech2](https://github.com/huseinzol05/malaya-speech/tree/master/example/tts-more-fastspeech2).
    
</div>

<div class="alert alert-warning">

This module is not language independent, so it not save to use on different languages. Pretrained models trained on hyperlocal languages.
    
</div>

<div class="alert alert-warning">

This is an application of malaya-speech Pipeline, read more about malaya-speech Pipeline at [malaya-speech/example/pipeline](https://github.com/huseinzol05/malaya-speech/tree/master/example/pipeline).
    
</div>

In [1]:
import malaya_speech
import numpy as np
from malaya_speech import Pipeline
import matplotlib.pyplot as plt
import IPython.display as ipd

### List available FastSpeech2

In [2]:
malaya_speech.tts.available_fastspeech2()

INFO:root:`husein`, `haqkiem` and `female-singlish` combined loss from training set


Unnamed: 0,Size (MB),Quantized Size (MB),Combined loss
male,125.0,31.7,1.8
female,125.0,31.7,1.932
husein,125.0,31.7,0.5832
haqkiem,125.0,31.7,0.5663
female-singlish,125.0,31.7,0.5112


`husein` voice contributed by [Husein-Zolkepli](https://www.linkedin.com/in/husein-zolkepli/), recorded using low-end microphone in a small room with no reverberation absorber.

`haqkiem` voice contributed by [Haqkiem Hamdan](https://www.linkedin.com/in/haqkiem-daim/), recorded using high-end microphone in an audio studio.

`female-singlish` voice contributed by [SG National Speech Corpus](https://www.imda.gov.sg/programme-listing/digital-services-lab/national-speech-corpus), recorded using high-end microphone in an audio studio.

### Load FastSpeech2 model

Fastspeech2 use text normalizer from Malaya, https://malaya.readthedocs.io/en/latest/load-normalizer.html#Load-normalizer,

Make sure you install Malaya version > 4.0 to make it works,

```bash
pip install malaya -U
```

In [3]:
male = malaya_speech.tts.fastspeech2(model = 'male')
female = malaya_speech.tts.fastspeech2(model = 'female')
husein = malaya_speech.tts.fastspeech2(model = 'husein')
haqkiem = malaya_speech.tts.fastspeech2(model = 'haqkiem')

INFO:root:running tts/fastspeech2-male using device /device:CPU:0


downloading frozen model to /Users/huseinzolkepli/Malaya-Speech/tts/fastspeech2-female/model.pb


120MB [00:17, 6.90MB/s]                          
INFO:root:running tts/fastspeech2-female using device /device:CPU:0
INFO:root:running tts/fastspeech2-husein using device /device:CPU:0


downloading frozen model to /Users/huseinzolkepli/Malaya-Speech/tts/fastspeech2-haqkiem/model.pb


120MB [00:14, 8.08MB/s]                          
INFO:root:running tts/fastspeech2-haqkiem using device /device:CPU:0


### Load Vocoder model

I will use MelGAN in this example. **But, make sure speakers are same. If use female fastspeech2, need to use female MelGAN also**.

In [4]:
vocoder_male = malaya_speech.vocoder.melgan(model = 'male')
vocoder_female = malaya_speech.vocoder.melgan(model = 'female')
vocoder_husein = malaya_speech.vocoder.melgan(model = 'husein')
vocoder_haqkiem = malaya_speech.vocoder.melgan(model = 'haqkiem')

INFO:root:running vocoder-melgan/male using device /device:CPU:0
INFO:root:running vocoder-melgan/female using device /device:CPU:0
INFO:root:running vocoder-melgan/husein using device /device:CPU:0
INFO:root:running vocoder-melgan/haqkiem using device /device:CPU:0


### Predict

In [5]:
string = 'Masa aku kat kuala lumpur, ada main mintak rasuah. Aku cakap kat agensi kerajaan tu. Dia cuma kata tak payah bagi. Tak ambik nama pun. mungkin itu la kot selemah lemah iman, tolak dalam hati tapi tak mampu buat tindakan.'

In [6]:
%%time

r_male = male.predict(string)

CPU times: user 4.06 s, sys: 1.71 s, total: 5.77 s
Wall time: 4.46 s


In [7]:
%%time

r_female = female.predict(string)

CPU times: user 4.22 s, sys: 1.75 s, total: 5.97 s
Wall time: 4.55 s


In [8]:
%%time

r_husein = husein.predict(string)

CPU times: user 4.14 s, sys: 1.76 s, total: 5.91 s
Wall time: 4.62 s


In [9]:
%%time

r_haqkiem = haqkiem.predict(string)

CPU times: user 4.14 s, sys: 1.66 s, total: 5.8 s
Wall time: 4.44 s


In [12]:
y_ = vocoder_male(r_male['postnet-output'])
ipd.Audio(y_, rate = 22050)

In [13]:
y_ = vocoder_female(r_female['postnet-output'])
ipd.Audio(y_, rate = 22050)

In [14]:
y_ = vocoder_husein(r_husein['postnet-output'])
ipd.Audio(y_, rate = 22050)

In [15]:
y_ = vocoder_haqkiem(r_haqkiem['postnet-output'])
ipd.Audio(y_, rate = 22050)

In [16]:
string = 'husein busuk masam ketiak pun masam tapi nasib baik comel'

In [17]:
%%time

r_male = male.predict(string)

CPU times: user 368 ms, sys: 72.7 ms, total: 440 ms
Wall time: 114 ms


In [18]:
%%time

r_female = female.predict(string)

CPU times: user 442 ms, sys: 56.3 ms, total: 498 ms
Wall time: 112 ms


In [19]:
%%time

r_husein = husein.predict(string)

CPU times: user 407 ms, sys: 54.4 ms, total: 461 ms
Wall time: 95.9 ms


In [20]:
%%time

r_haqkiem = haqkiem.predict(string)

CPU times: user 393 ms, sys: 51 ms, total: 444 ms
Wall time: 92.2 ms


In [21]:
y_ = vocoder_male(r_male['postnet-output'])
ipd.Audio(y_, rate = 22050)

In [22]:
y_ = vocoder_female(r_female['postnet-output'])
ipd.Audio(y_, rate = 22050)

In [23]:
y_ = vocoder_husein(r_husein['postnet-output'])
ipd.Audio(y_, rate = 22050)

In [24]:
y_ = vocoder_haqkiem(r_haqkiem['postnet-output'])
ipd.Audio(y_, rate = 22050)

In [25]:
string = 'emel saya ialah husein.zol123456@gmail.com, dan emel ini adalah palsuu'

In [26]:
%%time

r_male = male.predict(string)

CPU times: user 652 ms, sys: 55 ms, total: 707 ms
Wall time: 145 ms


In [27]:
%%time

r_female = female.predict(string)

CPU times: user 802 ms, sys: 47.5 ms, total: 849 ms
Wall time: 163 ms


In [28]:
%%time

r_husein = husein.predict(string)

CPU times: user 867 ms, sys: 63.8 ms, total: 930 ms
Wall time: 171 ms


In [29]:
%%time

r_haqkiem = haqkiem.predict(string)

CPU times: user 744 ms, sys: 52.5 ms, total: 796 ms
Wall time: 157 ms


In [30]:
y_ = vocoder_male(r_male['postnet-output'])
ipd.Audio(y_, rate = 22050)

In [31]:
y_ = vocoder_female(r_female['postnet-output'])
ipd.Audio(y_, rate = 22050)

In [32]:
y_ = vocoder_husein(r_husein['postnet-output'])
ipd.Audio(y_, rate = 22050)

In [33]:
y_ = vocoder_haqkiem(r_haqkiem['postnet-output'])
ipd.Audio(y_, rate = 22050)

In [34]:
# https://www.sinarharian.com.my/article/116460/BERITA/Nasional/Tiada-isu-kartel-daging-ketika-jadi-PM-Najib
string = 'Najib berkata, walaupun media melaporkan ia telah berlaku sejak 40 tahun lalu, kerajaan Barisan Nasional (BN) tidak pernah menerima apa-apa aduan rasmi berhubung perkara itu.'

In [35]:
%%time

r_male = male.predict(string)

CPU times: user 1.12 s, sys: 101 ms, total: 1.22 s
Wall time: 249 ms


In [36]:
%%time

r_female = female.predict(string)

CPU times: user 1.28 s, sys: 88.2 ms, total: 1.36 s
Wall time: 254 ms


In [37]:
%%time

r_husein = husein.predict(string)

CPU times: user 1.32 s, sys: 95.2 ms, total: 1.41 s
Wall time: 257 ms


In [38]:
%%time

r_haqkiem = haqkiem.predict(string)

CPU times: user 1.21 s, sys: 85.7 ms, total: 1.3 s
Wall time: 220 ms


In [39]:
y_ = vocoder_male(r_male['postnet-output'])
ipd.Audio(y_, rate = 22050)

In [40]:
y_ = vocoder_female(r_female['postnet-output'])
ipd.Audio(y_, rate = 22050)

In [41]:
y_ = vocoder_husein(r_husein['postnet-output'])
ipd.Audio(y_, rate = 22050)

In [42]:
y_ = vocoder_haqkiem(r_haqkiem['postnet-output'])
ipd.Audio(y_, rate = 22050)