# TTS and Voice Cloning
![transform](CoquiTTS.png)

This jupyter notebook will try to use CoquiTTS to synthesise voice from text using a sample speaker voice.



Ref : https://github.com/coqui-ai/TTS


***

Setup :<br>
1.> Install TTS<br>
&emsp;&emsp;pip install TTS

***

Supported Languages (83):<br>
| Languages | Languages | Languages | Languages | Languages |
|----------|----------|----------|----------|----------|
|Abaza (abq)| Adyghe (ady)| Afrikaans (af)| Angika (ang)| Arabic (ar)|
|Assamese (as)| Avar (ava)| Azerbaijani (az)| Belarusian (be)| Bulgarian (bg) |
|Bihari (bh)| Bhojpuri (bho)| Bengali (bn)| Bosnian (bs)| Simplified Chinese (ch_sim)|
|Traditional Chinese (ch_tra)| Chechen (che)| Czech (cs)| Welsh (cy)| Danish (da)|
|German (de)| English (en)| Spanish (es)| Estonian (et)| Persian (fa)|
|Finnish (fi)| French (fr)| Irish (ga)| Goan Konkani (gom)| Hindi (hi)|
|Croatian (hr)| Hungarian (hu)| Indonesian (id)| Ingush (inh)| Icelandic (is)|
|Italian (it)| Japanese (ja)| Kabardian (kbd)| Kannada (kn)| Korean (ko)|
|Kurdish (ku)| Latin (la)| Lak (lbe)| Lezghian (lez)| Lithuanian (lt)|
|Latvian (lv)| Magahi (mah)| Maithili (mai)| Maori (mi)| Mongolian (mn)|
|Marathi (mr)| Malay (ms)| Maltese (mt)| Nepali (ne)| Newari (new)|
|Dutch (nl)| Norwegian (no)| Occitan (oc)| Pali (pi)| Polish (pl)|
|Portuguese (pt)| Romanian (ro)| Russian (ru)| Serbian (cyrillic) (rs_cyrillic)| Serbian(latin) (rs_latin)|
|Nagpuri (sck)| Slovak (sk)| Slovenian (sl)| Albanian (sq)| Swedish (sv)|
|Swahili (sw)| Tamil (ta)| Tabassaran (tab)| Telugu (te)| Thai (th)|
|Tajik (tjk)| Tagalog (tl)| Turkish (tr)| Uyghur (ug)| Ukranian (uk)|
|Urdu (ur)| Uzbek (uz)| Vietnamese (vi) | | |



### 1.> Lets load required libraries.

In [18]:
from TTS.api import TTS
import torch
import IPython

### 2.> OCR for English

In [19]:
sourceWaveFile = "speaker/MaddyWithMobile.wav"
targetGeneratedFile = "EnglishMaddyXttsV2.wav"
#targetLanguage = "hi"
targetLanguage = "en"
textToSynthesise = "Ancient India was a cradle of civilization, known for the Indus Valley Civilization, the Vedas, and the Maurya and Gupta empires, contributing to art, philosophy, and science."


# Following is now moved to the cell after printing all models.
#modelToUse = "tts_models/multilingual/multi-dataset/xtts_v2"
#modelToUse = "tts_models/multilingual/multi-dataset/xtts_v1.1"

In [20]:
# Get device
device = "cuda" if torch.cuda.is_available() else "cpu"
print(device)

cuda


In [21]:
# List available TTS models

# Create a TTS object
tts = TTS()

# List available TTS models
model_list = tts.list_models()
all_models = model_list.list_models()


for model in all_models:
    print(model)

tts_models/multilingual/multi-dataset/xtts_v2
tts_models/multilingual/multi-dataset/xtts_v1.1
tts_models/multilingual/multi-dataset/your_tts
tts_models/multilingual/multi-dataset/bark
tts_models/bg/cv/vits
tts_models/cs/cv/vits
tts_models/da/cv/vits
tts_models/et/cv/vits
tts_models/ga/cv/vits
tts_models/en/ek1/tacotron2
tts_models/en/ljspeech/tacotron2-DDC
tts_models/en/ljspeech/tacotron2-DDC_ph
tts_models/en/ljspeech/glow-tts
tts_models/en/ljspeech/speedy-speech
tts_models/en/ljspeech/tacotron2-DCA
tts_models/en/ljspeech/vits
tts_models/en/ljspeech/vits--neon
tts_models/en/ljspeech/fast_pitch
tts_models/en/ljspeech/overflow
tts_models/en/ljspeech/neural_hmm
tts_models/en/vctk/vits
tts_models/en/vctk/fast_pitch
tts_models/en/sam/tacotron-DDC
tts_models/en/blizzard2013/capacitron-t2-c50
tts_models/en/blizzard2013/capacitron-t2-c150_v2
tts_models/en/multi-dataset/tortoise-v2
tts_models/en/jenny/jenny
tts_models/es/mai/tacotron2-DDC
tts_models/es/css10/vits
tts_models/fr/mai/tacotron2-DDC

In [32]:
# This is female voice, and very english speaking accent.
# It does not require any language argument. only parameters required text, file_path, and speaker_wav
#modelToUse = "tts_models/en/ljspeech/tacotron2-DDC" 


modelToUse = "tts_models/multilingual/multi-dataset/xtts_v2" 


In [33]:
# Download Model, create object of tts

#tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2", gpu=True) # Pace is good, quality ok ok
#tts = TTS("tts_models/multilingual/multi-dataset/bark", gpu=True)    # XXX -> Did not work
tts = TTS(modelToUse).to(device)

 > tts_models/multilingual/multi-dataset/xtts_v2 is already downloaded.
 > Using model: xtts


In [39]:


# generate speech by cloning a voice using default settings
tts.tts_to_file(text=textToSynthesise,
                file_path=targetGeneratedFile,
                speaker_wav=sourceWaveFile,
                language=targetLanguage)

 > Text splitted to sentences.
['Ancient India was a cradle of civilization, known for the Indus Valley Civilization, the Vedas, and the Maurya and Gupta empires, contributing to art, philosophy, and science.']
 > Processing time: 6.437827110290527
 > Real-time factor: 0.4071091858105416


'EnglishMaddyXttsV2.wav'

In [40]:
IPython.display.Audio(targetGeneratedFile)