<img src="http://developer.download.nvidia.com/compute/machine-learning/frameworks/nvidia_logo.png" style="width: 90px; float: right;">

# Getting started with TTS in Python

This notebook walks through the basics of Riva Speech Skills's TTS Services.

## Overview

NVIDIA Riva is a GPU-accelerated SDK for building Speech AI applications that are customized for your use case and deliver real-time performance. <br/>
Riva offers a rich set of speech and natural language understanding services such as:

- Automated speech recognition (ASR)
- Text-to-Speech synthesis (TTS)
- A collection of natural language processing (NLP) services such as named entity recognition (NER), punctuation, intent classification.

**In this notebook, we will focus on interacting with the Text-to-Speech synthesis (TTS) APIs.**

For more detailed information on Riva, please refer to the [Riva developer documentation](https://developer.nvidia.com/).

## Requirements and setup

To execute this notebook, please follow the setup steps in [README](README.md).

#### Import Riva clent libraries

We first import some required libraries, including the Riva client libraries

In [1]:
import numpy as np
import IPython.display as ipd
import grpc

import riva_api.riva_tts_pb2 as rtts
import riva_api.riva_tts_pb2_grpc as rtts_srv
import riva_api.riva_audio_pb2 as ra

#### Create Riva clients and connect to Riva Speech API server

The below URI assumes a local deployment of the Riva Speech API server on the default port. In case the server deployment is on a different host or via Helm chart on Kubernetes, the user should use an appropriate URI.

In [2]:
channel = grpc.insecure_channel('localhost:50051')

riva_tts = rtts_srv.RivaSpeechSynthesisStub(channel)

## Speech Synthesis Markup Language (SSML)

Riva TTS has support for some SSML attributes. Notably, there is support for

- ``prosody`` tag
  - ``rate`` attribute
  - ``pitch`` attribute
- ``phoneme`` tag

Please refer to the Riva docs [here](../tts/tts-ssml) for a detailed description of how these SSML tags and attributes interact with the TTS system. 

We provide the following examples as guidance:

In [4]:
req = rtts.SynthesizeSpeechRequest()
req.language_code = "en-US"
req.encoding = ra.AudioEncoding.LINEAR_PCM 
req.sample_rate_hz = 44100
req.voice_name = "English-US-Female-1"

texts = [
  """<speak>This is a normal sentence</speak>""",
  """<speak><prosody pitch="0." rate="100%">This is also a normal sentence</prosody></speak>""",
  """<speak><prosody rate="200%">This is a fast sentence</prosody></speak>""",
  """<speak><prosody pitch="1.0">Now, I'm speaking a bit higher</prosody></speak>""",
  """<speak>You say <phoneme alphabet="x-arpabet" ph="{@T}{@AH0}{@M}{@EY1}{@T}{@OW2}">tomato</phoneme>, I say <phoneme alphabet="x-arpabet" ph="{@T}{@AH0}{@M}{@AA1}{@T}{@OW2}">tomato</phoneme></speak>""",
  """<speak>S S M L supports <prosody pitch="-1">nested tags. So I can speak <prosody rate="150%">faster</prosody>, <prosody rate="75%">or slower</prosody>, as desired.</prosody></speak>""",
]

for t in texts:
    req.text = t
    resp = riva_tts.Synthesize(req)
    audio_samples = np.frombuffer(resp.audio, dtype=np.int16)
    print(t)
    ipd.display(ipd.Audio(audio_samples, rate=req.sample_rate_hz))

<speak>This is a normal sentence</speak>


<speak><prosody pitch="0." rate="100%">This is also a normal sentence</prosody></speak>


<speak><prosody rate="200%">This is a fast sentence</prosody></speak>


<speak><prosody pitch="1.0">Now, I'm speaking a bit higher</prosody></speak>


<speak>You say <phoneme alphabet="x-arpabet" ph="{@T}{@AH0}{@M}{@EY1}{@T}{@OW2}">tomato</phoneme>, I say <phoneme alphabet="x-arpabet" ph="{@T}{@AH0}{@M}{@AA1}{@T}{@OW2}">tomato</phoneme></speak>


<speak>S S M L supports <prosody pitch="-1">nested tags. So I can speak <prosody rate="150%">faster</prosody>, <prosody rate="75%">or slower</prosody>, as desired.</prosody></speak>


## Go deeper into Riva capabilities

Now that you have a basic introduction to the Riva APIs, you may like to try out:

### Sample apps

Riva comes with various sample apps as a demonstration for how to use the APIs to build interesting applications such as a [chatbot](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/samples/weather.html), a domain specific speech recognition or [keyword (entity) recognition system](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/samples/callcenter.html), or simply how Riva allows scaling out for handling massive amount of requests at the same time. ([SpeechSquad)](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/samples/speechsquad.html) 
Have a look at the Sample Application section in the [Riva developer documentation](https://developer.nvidia.com/) for all the sample apps.


### Finetune your a domain specific speech model and deploy with Riva

Train the latest state-of-the-art speech and natural language processing models on your own data using [NeMo](https://github.com/NVIDIA/NeMo) or [Transfer Learning ToolKit](https://developer.nvidia.com/transfer-learning-toolkit) and deploy them on Riva using the [Riva ServiceMaker tool](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/model-servicemaker.html).


### Further resources

Explore the details of each of the APIs and their functionalities in the [docs](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/protobuf-api/protobuf-api-root.html).