# Gai/Gen: Text-to-Speech (TTS)

## 1. Note

The following examples has been tested on the following environment:

-   NVidia GeForce RTX 2060 6GB
-   Windows 11 + WSL2
-   Ubuntu 22.04
-   Python 3.10
-   CUDA Toolkit 11.8
-   openai 1.6.1
-   TTS 0.22.0
-   deepspeed 0.12.6


## 2. Create Virtual Environment and Install Dependencies

We will create a seperate virtual environment for this to avoid conflicting dependencies that each underlying model requires.

```sh
sudo apt update -y && sudo apt install ffmpeg git git-lfs -y
conda create -n TTS python=3.10.10 -y
conda activate TTS
pip install gai-gen[TTS]
```

## 3. Examples

In [None]:
## 3.1 OpenAI Text-to-Speech

print("GENERATING:")
from gai.gen import Gaigen
gen = Gaigen.GetInstance().load('openai-tts-1')
response = gen.create(
  voice="alloy",
  input="The definition of insanity is doing the same thing over and over and expecting different results."
)
from IPython.display import Audio
Audio(response, rate=24000)

The following demo is uses Coqui AI's xTTS model. Create and run the following script `xtts_download.py` to download the model:

```python
# xtts_download.py
import os
os.environ["COQUI_TOS_AGREED"]="1"

from TTS.utils.manage import ModelManager
print("Downloading...")
mm =  ModelManager(output_prefix="~/gai/models/tts")
model_name="tts_models/multilingual/multi-dataset/xtts_v2"
mm.download_model(model_name)
print("Downloaded")
```

Take note that loading the model for the first time will take a while for deepspeed to compile the model.

In [None]:
## 3.2 Coqui xTTS Text-to-Speech

print("GENERATING:")
from gai.gen import Gaigen
gen = Gaigen.GetInstance().load('xtts-2')
response = gen.create(
  voice="Vjollca Johnnie",
  input="The definition of insanity is doing the same thing over and over and expecting different results."
)
from IPython.display import Audio
Audio(response, rate=24000)

## Running as a Service

#### Step 1: Start Docker container

```bash
docker run -d \
    --name gai-tts \
    -p 12031:12031 \
    --gpus all \
    -v ~/gai/models:/app/models \
    kakkoii1337/gai-tts:latest
```

#### Step 2: Wait for model to load

```bash
docker logs gai-tts
```

When the loading is completed, the logs should show this:

```bash
INFO:     Started server process [1]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:12031 (Press CTRL+C to quit)
```

#### Step 3: Test


In [None]:
%%bash
curl -X POST http://localhost:12031/gen/v1/audio/speech \
    -H "Content-Type: application/json" \
    -N \
    -d "{\"model\":\"xtts-2\",\"input\":\"I think there is no direct bus. You can take 185 and change to MRT at buona vista. 185 should be arriving in 5 minutes.\", \"stream\":true}" | ffplay -autoexit -nodisp -hide_banner -
