# TTS

This folder shows an end-to-end AI example, with the[Coqui AI TTS](https://github.com/coqui-ai/TTS/) text-to-speech library. The demo also shows how to run a photon with multimedia outputs (in this case a WAV response.)

With this demo, you will be able to run deepfloyd and get results like follows:


<audio src="assets/thequickbrownfox.mp3" controls></audio>

First, let's install the necessities.

In [1]:
!pip install -r requirements.txt > /dev/null

## Running the code locally

Note: if you do not have a local GPU, skip to <a href=#remote>the next section</a>.

The code, `tts.py`, live under the same folder as this ipython notebook. Feel free to check it out. We will move on to running it. Let's first see if we have a GPU.

In [2]:
import torch
if torch.cuda.is_available():
    print("Great, we have a GPU.")
else:
    print("Actually, running without a GPU is quite slow and not recommended.")

Great, we have a GPU.


Now, let's run the photon. Since we are in the ipython notebook, we will use the subprocess module to spawn the local deployment. If you are going to run it manually, feel free to just run `python tts_main.py`.

In [3]:
from subprocess import Popen, DEVNULL, STDOUT
process = Popen(['python', 'tts_main.py'])

2023-11-04 00:14:13.040 | INFO     | __main__:init:52 - Loading the model...
2023-11-04 00:14:13.047 | DEBUG    | __main__:_load_model:64 - Loading model tts_models/en/vctk/vits... use_gpu: True 
2023-11-04 00:14:13.936 | DEBUG    | __main__:_load_model:69 - Loaded model tts_models/en/vctk/vits
2023-11-04 00:14:13.936 | DEBUG    | __main__:_load_model:70 - Model tts_models/en/vctk/vits is_multilingual: False
2023-11-04 00:14:13.936 | DEBUG    | __main__:_load_model:71 - Model tts_models/en/vctk/vits is_multi_speaker: True
2023-11-04 00:14:13.936 | DEBUG    | __main__:_load_model:86 - Model tts_models/en/vctk/vits speakers: ['ED\n', 'p225', 'p226', 'p227', 'p228', 'p229', 'p230', 'p231', 'p232', 'p233', 'p234', 'p236', 'p237', 'p238', 'p239', 'p240', 'p241', 'p243', 'p244', 'p245', 'p246', 'p247', 'p248', 'p249', 'p250', 'p251', 'p252', 'p253', 'p254', 'p255', 'p256', 'p257', 'p258', 'p259', 'p260', 'p261', 'p262', 'p263', 'p264', 'p265', 'p266', 'p267', 'p268', 'p269', 'p270', 'p271', 

Wait for the above process to start. Because it is loading the checkpoing and making initializations, it will take quite some time, especially if we need to download the checkpoints. Towards the end, you will see "Uvicorn running on http://0.0.0.0:8080" (or another port) - this means the service is successfully running.

Now, let's use the lepton sdk client to communicate to the service.

In [4]:
from leptonai.client import Client, local
# Note: if the port above is not 8080 (the default), specify the port with local(port=xxxx).
c = Client(local())
print("Possible paths are:")
print(c.paths())

No API token found for 🐸Coqui Studio voices - https://coqui.ai 
Visit 🔗https://app.coqui.ai/account to get one.
Set it as an environment variable `export COQUI_STUDIO_TOKEN=<token>`

 > tts_models/en/vctk/vits is already downloaded.
 > Using model: vits
 > Setting up Audio Processor...
 | > sample_rate:22050
 | > resample:False
 | > num_mels:80
 | > log_func:np.log10
 | > min_level_db:0
 | > frame_shift_ms:None
 | > frame_length_ms:None
 | > ref_level_db:None
 | > fft_size:1024
 | > power:None
 | > preemphasis:0.0
 | > griffin_lim_iters:None
 | > signal_norm:None
 | > symmetric_norm:None
 | > mel_fmin:0
 | > mel_fmax:None
 | > pitch_fmin:None
 | > pitch_fmax:None
 | > spec_gain:20.0
 | > stft_pad_mode:reflect
 | > max_norm:1.0
 | > clip_norm:True
 | > do_trim_silence:False
 | > trim_db:60
 | > do_sound_norm:False
 | > do_amp_to_db_linear:True
 | > do_amp_to_db_mel:True
 | > do_rms_norm:False
 | > db_level:None
 | > stats_path:None
 | > base:10
 | > hop_length:256
 | > win_length:Possib

In [5]:
# The example exposes 4 different paths, and let's look at
# the documentation of each path - they are automatically
# generated by the sdk.
help(c.models)
help(c.languages)
help(c.speakers)
help(c.tts)

Help on function /models in module leptonai.client:

/models(*args, **kwargs)
    Returns a list of available models.
    
    Automatically inferred parameters from openapi:
    
    Input Schema: None
    
    Output Schema:
      output: array[str]

Help on function /languages in module leptonai.client:

/languages(*args, **kwargs)
    Returns a list of languages supported by the current model. Empty list
    if no model is loaded, or the model does not support multiple languages.
    
    Automatically inferred parameters from openapi:
    
    Input Schema: (*=required)
      model: (str | None)
    
    Output Schema:
      output: array[str]

Help on function /speakers in module leptonai.client:

/speakers(*args, **kwargs)
    Returns a list of speakers supported by the model. If the model is an
    XTTS model, this will return empty as you will need to use speaker_wav
    to synthesize speech.
    
    Automatically inferred parameters from openapi:
    
    Input Schema: (*=re

In [6]:
# Let's inspect the current model.
print(f"Models in the deployment are: {c.models()}")
print(f"Supported languages for the default model are: {c.languages()}")
print(f"Supported speakers for the default model are: {c.speakers()}")

2023-11-04 00:14:30,564 - INFO:     127.0.0.1:36360 - "GET /models HTTP/1.1" 200 OK
Models in the deployment are: ['tts_models/en/vctk/vits']
2023-11-04 00:14:30,567 - INFO:     127.0.0.1:36360 - "GET /languages HTTP/1.1" 200 OK
Supported languages for the default model are: []
2023-11-04 00:14:30,570 - INFO:     127.0.0.1:36360 - "GET /speakers HTTP/1.1" 200 OK
Supported speakers for the default model are: ['ED\n', 'p225', 'p226', 'p227', 'p228', 'p229', 'p230', 'p231', 'p232', 'p233', 'p234', 'p236', 'p237', 'p238', 'p239', 'p240', 'p241', 'p243', 'p244', 'p245', 'p246', 'p247', 'p248', 'p249', 'p250', 'p251', 'p252', 'p253', 'p254', 'p255', 'p256', 'p257', 'p258', 'p259', 'p260', 'p261', 'p262', 'p263', 'p264', 'p265', 'p266', 'p267', 'p268', 'p269', 'p270', 'p271', 'p272', 'p273', 'p274', 'p275', 'p276', 'p277', 'p278', 'p279', 'p280', 'p281', 'p282', 'p283', 'p284', 'p285', 'p286', 'p287', 'p288', 'p292', 'p293', 'p294', 'p295', 'p297', 'p298', 'p299', 'p300', 'p301', 'p302', 'p30

In [7]:
# Let's actually run a tts example.
text = """
It was the best of times, it was the worst of times,
it was the age of wisdom, it was the age of foolishness,
it was the epoch of belief, it was the epoch of incredulity,
it was the season of light, it was the season of darkness,
it was the spring of hope, it was the winter of despair.
"""
audio = c.tts(text=text, speaker="p225")
import IPython
IPython.display.Audio(audio)

2023-11-04 00:14:32.020 | INFO     | __main__:_tts:103 - Synthesizing '
It was the best of times, it was the worst of times,
it was the age of wisdom, it was the age of foolishness,
it was the epoch of belief, it was the epoch of incredulity,
it was the season of light, it was the season of darkness,
it was the spring of hope, it was the winter of despair.
' with language 'None' and speaker 'p225'


 > Text splitted to sentences.
['It was the best of times, it was the worst of times,', 'it was the age of wisdom, it was the age of foolishness,', 'it was the epoch of belief, it was the epoch of incredulity,', 'it was the season of light, it was the season of darkness,', 'it was the spring of hope, it was the winter of despair.']
 > Processing time: 2.4877710342407227
 > Real-time factor: 0.13087221653483208
2023-11-04 00:14:34,557 - INFO:     127.0.0.1:36360 - "POST /tts HTTP/1.1" 200 OK


Viola! Feel free to play more with it, and when we are done, let's clean up the local execution.

In [8]:
# Once we are done, let's close up the local process.
process.terminate()

2023-11-04 00:14:39,267 - INFO:     Shutting down
2023-11-04 00:14:39,367 - INFO:     Waiting for application shutdown.
2023-11-04 00:14:39,368 - INFO:     Application shutdown complete.
2023-11-04 00:14:39,368 - INFO:     Finished server process [1017115]


# Running remotely <a name="remote" />

Let's try to run the photon remotely by creating, pushing, and running the photon remotely. First, let's log in.

Go to [https://dashboard.lepton.ai/credentials](https://dashboard.lepton.ai/credentials), log in, and copy your workspace's credentials to the below line, replacing `INSERT_YOUR_CREDENTIAL_HERE`. The credential looks like `jazwwwt0:dsfsdweldhifdsfdsfd`.

In [9]:
!lep login -c <redacted>:<redacted>

[34m    _     _____ ____ _____ ___  _   _       _    ___     [0m
[34m   | |   | ____|  _ \_   _/ _ \| \ | |     / \  |_ _|    [0m
[34m   | |   |  _| | |_) || || | | |  \| |    / _ \  | |     [0m
[34m   | |___| |___|  __/ | || |_| | |\  |   / ___ \ | |     [0m
[34m   |_____|_____|_|    |_| \___/|_| \_|  /_/   \_\___|    [0m

Logged in to your workspace [32meyljmess[0m.
        build time: 2023-10-30_16-43-22
           version: 0.13.0


Cool, let's run it.

In [10]:
!lep photon run -n tts -m tts_main.py --deployment-name tts --resource-shape gpu.t4

Rebuilding photon with --model tts_main.py.
If you want to run without rebuilding, please remove the --model arg.
No API token found for 🐸Coqui Studio voices - https://coqui.ai 
Visit 🔗https://app.coqui.ai/account to get one.
Set it as an environment variable `export COQUI_STUDIO_TOKEN=<token>`

Photon [32mtts[0m created.
Photon [32mtts[0m pushed to workspace.
Running the most recent version of [32mtts[0m: tts-0uisf9bc

Lepton is currently set to use a default timeout of [32m1 hour[0m. This means that when
there is no traffic for more than an hour, your deployment will automatically 
scale down to zero. This is to assist auto-release of unused debug deployments.
If you would like to run a long-running photon (e.g. for production), [32mset [0m
[32m--no-traffic-timeout to 0[0m.
If you would like to turn off default timeout, set the environment variable 
[32mLEPTON_DEFAULT_TIMEOUT=false[0m.

Photon launched as [32mtts[0m. Use `lep deployment status -n tts` to check the sta

In [12]:
# Let's see if the photon is running. If it hasn't finished starting yet, wait a bit and re-check.
!lep deployment status -n tts

Time now:   2023-11-04 00:17:50
Created at: 2023-11-04 00:14:53
Photon ID:  tts-0uisf9bc
State:      [32mReady[0m
Timeout(s): 3600
Web UI:     
https://dashboard.lepton.ai/workspace/eyljmess/deployments/detail/tts/demo
Is Public:  No
Replicas List:
┏━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━┓
┃[1m [0m[1mreplica id          [0m[1m [0m┃[1m [0m[1mstatus[0m[1m [0m┃[1m [0m[1mmessage[0m[1m [0m┃
┡━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━┩
│ tts-859c84dbb5-t2g4j │ [32mReady[0m  │ (empty) │
└──────────────────────┴────────┴─────────┘
[32m1[0m out of 1 replicas ready.


Let's create the client, and once we have the client, we can run the code exactly as if we are accessing the local server above:

In [13]:
from leptonai.client import Client, current
# When you are accessing a deployment in your own workspace, you can use current()
# so you don't have to explicitly pass in your token.
c = Client(current(), "tts")
print("Possible paths are:")
print(c.paths())

Possible paths are:
dict_keys(['/languages', '/models', '/speakers', '/tts'])


In [14]:
# The example exposes 4 different paths, and let's look at
# the documentation of each path - they are automatically
# generated by the sdk.
help(c.models)
help(c.languages)
help(c.speakers)
help(c.tts)

Help on function /models in module leptonai.client:

/models(*args, **kwargs)
    Returns a list of available models.
    
    Automatically inferred parameters from openapi:
    
    Input Schema: None
    
    Output Schema:
      output: array[str]

Help on function /languages in module leptonai.client:

/languages(*args, **kwargs)
    Returns a list of languages supported by the current model. Empty list
    if no model is loaded, or the model does not support multiple languages.
    
    Automatically inferred parameters from openapi:
    
    Input Schema: (*=required)
      model: (str | None)
    
    Output Schema:
      output: array[str]

Help on function /speakers in module leptonai.client:

/speakers(*args, **kwargs)
    Returns a list of speakers supported by the model. If the model is an
    XTTS model, this will return empty as you will need to use speaker_wav
    to synthesize speech.
    
    Automatically inferred parameters from openapi:
    
    Input Schema: (*=re

In [15]:
# Let's actually run a tts example.
text = """
It was the best of times, it was the worst of times,
it was the age of wisdom, it was the age of foolishness,
it was the epoch of belief, it was the epoch of incredulity,
it was the season of light, it was the season of darkness,
it was the spring of hope, it was the winter of despair.
"""
audio = c.tts(text=text, speaker="p225")
import IPython
IPython.display.Audio(audio)

Great! Once we are done, let's clean up the deployment.

In [16]:
# Once we are done, let's shut down the remote service.
!lep deployment remove -n tts

Deployment [32mtts[0m removed.


# Conclusion

This is it! you can find more resources at:
- [the Lepton AI example repo](https://github.com/leptonai/examples)
- [the Lepton AI documentation](https://lepton.ai/docs)

And you are more than welcome to [email us](mailto:info@lepton.ai)