# TTS API tutorial

This tutorial demonstates how to use Python Riva API.

## <font color="blue">Server</font>

Before running client part of Riva, please set up a server. The simplest
way to do this is to follow
[quick start guide](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/quick-start-guide.html#local-deployment-using-quick-start-scripts).


## <font color="blue">Authentication</font>

Before using Riva services you will need to establish connection with a server.

In [1]:
import riva_api

uri = "localhost:50051"  # Default value

auth = riva_api.Auth(uri=uri)

## <font color="blue">Setting up service</font>

To instantiate a service pass `riva_api.Auth` instance to a constructor.

In [2]:
tts_service = riva_api.SpeechSynthesisService(auth)

## <font color="blue">Offline synthesis</font>

In offline mode a result is returned in one response.

In [3]:
language_code = 'en-US'
sample_rate_hz = 16000
voice = "English-US-Female-1"
nchannels = 1
sampwidth = 2
text = (
    "The United States of America, commonly known as the United States or America, "
    "is a country primarily located in North America. It consists of 50 states, "
    "a federal district, five major unincorporated territories, 326 Indian reservations, "
    "and nine minor outlying islands."
)

In [4]:
resp = tts_service.synthesize(text, voice, language_code, sample_rate_hz=sample_rate_hz)

In [5]:
audio = resp.audio
meta = resp.meta

In [6]:
print(len(audio))

589416


In [7]:
processed_text = meta.processed_text
predicted_durations = meta.predicted_durations

In [8]:
print(processed_text)

 The United {@S}{@T}{@EY1}{@T}{@S} of {@AH0}{@M}{@EH1}{@R}{@IH0}{@K}{@AH0}, {@K}{@AA1}{@M}{@AH0}{@N}{@L}{@IY0} {@N}{@OW1}{@N} as the United {@S}{@T}{@EY1}{@T}{@S} or {@AH0}{@M}{@EH1}{@R}{@IH0}{@K}{@AH0}, is a {@K}{@AH1}{@N}{@T}{@R}{@IY0} {@P}{@R}{@AY0}{@M}{@EH1}{@R}{@AH0}{@L}{@IY0} located in {@N}{@AO1}{@R}{@TH} {@AH0}{@M}{@EH1}{@R}{@IH0}{@K}{@AH0}. It {@K}{@AH0}{@N}{@S}{@IH1}{@S}{@T}{@S} of {@F}{@IH1}{@F}{@T}{@IY0} {@S}{@T}{@EY1}{@T}{@S}, a federal {@D}{@IH1}{@S}{@T}{@R}{@IH0}{@K}{@T}, {@F}{@AY1}{@V} {@M}{@EY1}{@JH}{@ER0} {@AH2}{@N}{@IH0}{@N}{@K}{@AO1}{@R}{@P}{@ER0}{@EY2}{@T}{@IH0}{@D} {@T}{@EH1}{@R}{@AH0}{@T}{@AO2}{@R}{@IY0}{@Z}, {@TH}{@R}{@IY1} hundred {@T}{@W}{@EH1}{@N}{@T}{@IY0} {@S}{@IH1}{@K}{@S} {@IH1}{@N}{@D}{@IY0}{@AH0}{@N} {@R}{@EH2}{@Z}{@ER0}{@V}{@EY1}{@SH}{@AH0}{@N}{@Z}, and {@N}{@AY1}{@N} {@M}{@AY1}{@N}{@ER0} {@AW1}{@T}{@L}{@AY2}{@IH0}{@NG} {@AY1}{@L}{@AH0}{@N}{@D}{@Z}. 


In [9]:
print(len(predicted_durations))
print(predicted_durations[0])

274
743.03857421875


Now we can write audio to a file.

In [10]:
import wave
offline_output_file = "my_offline_synthesized_speech.wav"
with wave.open(offline_output_file, 'wb') as out_f:
    out_f.setnchannels(nchannels)
    out_f.setsampwidth(sampwidth)
    out_f.setframerate(sample_rate_hz)
    out_f.writeframesraw(resp.audio)

In [11]:
import IPython
IPython.display.Audio(offline_output_file)

## <font color="blue">Streaming synthesis</font>

In streaming mode an audio is returned in several responses. Responses are returned as soon as audio chunk is ready.

In [12]:
responses = tts_service.synthesize_online(text, voice, language_code, sample_rate_hz=sample_rate_hz)

In [None]:
streaming_audio = b''
for resp in responses:
    streaming_audio += resp.audio

In [None]:
import wave
streaming_output_file = "my_streaming_synthesized_speech.wav"
with wave.open(streaming_output_file, 'wb') as out_f:
    out_f.setnchannels(nchannels)
    out_f.setsampwidth(sampwidth)
    out_f.setframerate(sample_rate_hz)
    out_f.writeframesraw(streaming_audio)

In [None]:
import IPython
IPython.display.Audio(streaming_output_file)

## <font color="blue">Audio output</font>

For using audio input and output you need to install PyAudio.

```bash
conda install -c anaconda pyaudio
```

### <font color="green">Playing audio during synthesis</font>

For playing audio during synthesis you will need to pass audio chunks to `riva_api.audio_io.SoundCallBack` as they arrive.

In [None]:
import riva_api.audio_io
# show available output devices
riva_api.audio_io.list_output_devices()

In [None]:
output_device = None  # use default device
sound_stream = riva_api.audio_io.SoundCallBack(
    output_device, nchannels=nchannels, sampwidth=sampwidth, framerate=sample_rate_hz
)
for resp in tts_service.synthesize_online(text, voice, language_code, sample_rate_hz=sample_rate_hz):
    sound_stream(resp.audio)