<img src="http://developer.download.nvidia.com/notebooks/dlsw-notebooks/riva_asr_asr-python-advanced-finetune-am-citrinet-tao-deployment/nvidia_logo.png" style="width: 90px; float: right;">

# How to deploy a Riva Speech Synthesis Pipeline
In this tutorial, you will learn how to deploy Riva speech synthesis models - specifically the **Spectro generation model (FastPitch)** and **Vocoder model (HiFiGAN)** pre-trained models downloaded from NVIDIA NGC. 

This will serve as a primer for customization tutorials in this lab, which require configuring the Riva speech pipeline.

## NVIDIA Riva Overview

NVIDIA Riva is a GPU-accelerated SDK for building Speech AI applications that are customized for your use case and deliver real-time performance. <br/>
Riva offers a rich set of speech and natural language understanding services such as:

- Automated speech recognition (ASR)
- Text-to-Speech synthesis (TTS)
- A collection of natural language processing (NLP) services, such as named entity recognition (NER), punctuation, and intent classification.

To understand the basics of Riva TTS APIs, refer to [How do I use Riva TTS APIs with out-of-the-box models?](https://github.com/nvidia-riva/tutorials/tree/stable/tts-python-basics.ipynb). <br>

For more information about Riva, refer to the [Riva developer documentation](https://developer.nvidia.com/riva).

---
## Prerequisites

Before we get started, ensure that you have access to [**NVIDIA NGC**](https://ngc.nvidia.com/signin).

---
## Fetch TTS models from NGC
### Download the Spectrogram Generator Model

The FastPitch Spectrogram Generator Model is located on NGC [here](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/speechsynthesis_english_fastpitch/files). Let's download it to a local path.

In [None]:
# Imports
import os

# Create a local directory to save models
TTS_MODEL_DIR = os.path.join(os.getcwd(), "tts-models")
!mkdir -p $TTS_MODEL_DIR

In [None]:
# Path where ngc will download the FastPitch Model
SG_DIR = "speechsynthesis_english_fastpitch_vdeployable_v1.1"
SG_PATH = os.path.join(TTS_MODEL_DIR, SG_DIR)

if os.path.exists(SG_PATH):
    print("Spectrogram generator model exists, skipping download")
else:
    print("Downloading the FastPitch Model")
    !ngc registry model download-version "nvidia/tao/speechsynthesis_english_fastpitch:deployable_v1.1" --dest $TTS_MODEL_DIR

In [None]:
# Inspect downloaded files
!ls $SG_PATH

### Download the Vocoder Model

The HiFiGAN Vocoder Model is located on NGC [here](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/speechsynthesis_hifigan/files). Let's download it to a local path.

In [None]:
VC_DIR = "speechsynthesis_hifigan_vdeployable_v1.0"
VC_PATH = os.path.join(TTS_MODEL_DIR, VC_DIR)

if os.path.exists(VC_PATH):
    print("Vocoder Model exists, skipping download")
else:
    print("Downloading the HiFiGAN Model")
    !ngc registry model download-version "nvidia/tao/speechsynthesis_hifigan:deployable_v1.0" --dest $TTS_MODEL_DIR

In [None]:
# Inspect downloaded files
!ls $VC_PATH

### Download the Auxillary files

The pronunciation dictionary and abbreviations are also required for speech synthesis pipeline. They are located on NGC [here](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/speechsynthesis_en_us_auxiliary_files/files). Let's download it to a local path.

In [None]:
AUX_DIR = "speechsynthesis_en_us_auxiliary_files_vdeployable_v1.3"
AUX_PATH = os.path.join(TTS_MODEL_DIR, AUX_DIR)

if os.path.exists(AUX_PATH):
    print("Auxillary files exists, skipping download")
else:
    print("Downloading the Auxillary files")
    !ngc registry model download-version "nvidia/tao/speechsynthesis_en_us_auxiliary_files:deployable_v1.3" --dest $TTS_MODEL_DIR

In [None]:
# Inspect downloaded files
!ls $AUX_PATH

### Download the Normalization Grammar

The speech synthesis pipeline uses weighted finite-state transducer (WFST) grammars that map strings in written form to strings in spoken form. They are located on NGC [here](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/normalization_en_us/files). Let's download it to a local path.

In [None]:
NG_DIR = "normalization_en_us_vdeployable_v1.1"
NG_PATH = os.path.join(TTS_MODEL_DIR, NG_DIR)

if os.path.exists(NG_PATH):
    print("Normalization grammer exists, skipping download")
else:
    print("Downloading the Normalization Grammer")
    !ngc registry model download-version "nvidia/tao/normalization_en_us:deployable_v1.1" --dest $TTS_MODEL_DIR

In [None]:
# Inspect downloaded files
!ls $NG_PATH

---
## Riva ServiceMaker
Riva ServiceMaker is a set of tools that aggregates all the necessary artifacts (models, files, configurations, and user settings) for Riva deployment to a target environment. It has two main components: `riva-build` and `riva-deploy`

### Riva-build

This step helps build a Riva-ready version of the model. It’s only output is an intermediate format (called an RMIR) of an end-to-end pipeline for the supported services within Riva. <br>

`riva-build` is responsible for the combination of one or more exported models (`.riva` files) into a single file containing an intermediate format called Riva Model Intermediate Representation (`.rmir`). This file contains a deployment-agnostic specification of the whole end-to-end pipeline along with all the assets required for the final deployment and inference. 

In [None]:
# ServiceMaker Docker
RIVA_SM_CONTAINER = "nvcr.io/nvidia/riva/riva-speech:2.4.0-servicemaker"

# Get the ServiceMaker docker
! docker pull $RIVA_SM_CONTAINER

# Key that model is encrypted with, while exporting with TAO
KEY = "tlt_encode"

Below, we execute Riva-build to create a pipeline configured for Offline Synthesis. This command for reference is also present in the [pipeline configuration](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/tts/tts-custom.html#riva-build-pipeline-instructions) section of the docs. <br>

First, let's set relevant paths relative to where we will mount the models in the Servicemaker docker:

In [None]:
# All model paths relative to Riva Servicemaker docker include the _SM suffix

TTS_MODEL_DIR_SM = "/data" # Path where we mount the downloaded TTS models in the Servicemaker docker

# Relative path to Spectrogram Generator Model
SG_SM = os.path.join(TTS_MODEL_DIR_SM, SG_DIR, "FastPitch_Align_22k_LJS_arpa_PitchDuration.riva")

# Relative path to Vocoder model artifacts
VC_SM = os.path.join(TTS_MODEL_DIR_SM, VC_DIR, "HifiGAN_22k_LJS.riva")

# Relative path to Auxillary files
ABBR_SM = os.path.join(TTS_MODEL_DIR_SM, AUX_DIR, "abbr.txt")
PR_SM = os.path.join(TTS_MODEL_DIR_SM, AUX_DIR, "cmudict-0.7b_nv22.08")

# Relative path to Normalization grammer
WFST_TOKENIZER_MODEL_SM = os.path.join(TTS_MODEL_DIR_SM, NG_DIR, "tokenize_and_classify.far")
WFST_VERBALIZER_MODEL_SM = os.path.join(TTS_MODEL_DIR_SM, NG_DIR, "verbalize.far")

# Relative path where the generated .rmir file will be stored
TTS_RMIR_SM = os.path.join(TTS_MODEL_DIR_SM, "tts.rmir")

We use the Riva servicemaker docker to run riva-build.

In [None]:
! docker run --rm --gpus 0 -v $TTS_MODEL_DIR:$TTS_MODEL_DIR_SM $RIVA_SM_CONTAINER -- \
             riva-build speech_synthesis $TTS_RMIR_SM:$KEY \
             $SG_SM:$KEY \
             $VC_SM:$KEY \
             --voice_name ljspeech \
             --abbreviations_file=$ABBR_SM \
             --arpabet_file=$PR_SM \
             --wfst_tokenizer_model=$WFST_TOKENIZER_MODEL_SM \
             --wfst_verbalizer_model=$WFST_VERBALIZER_MODEL_SM

The arguments we used above are just an example, and there are many more optional parameter you can configure! For now, let's take a look into what those arguments we used above mean -

* General pipeline parameters:
    * `--voice_name`: is the name of the model. Defaults to English-US.Female-1.
    * `--abbreviations_file`: is the file containing abbreviations and their corresponding expansions
    * `--arpabet_file`: is the file containing the pronunciation dictionary mapping from words to their phonetic representation in ARPABET
* ITN model specific parameters
    * `--wfst_tokenizer_model`: Sparrowhawk model to use for tokenization and classification, must be in .far (finite-state archive) format. 
    * `--wfst_verbalizer_model`: Sparrowhawk model to use for verbalizer, must be in .far (finite-state archive) format.

This information is also accessible through the `riva-build speech_synthesis -h` command, and more information about additional parameters to `riva-build` can be found in the [riva-build optional parameters](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/tts/tts-custom.html#riva-build-optional-parameters) documentation. 

In [None]:
! docker run --rm $RIVA_SM_CONTAINER -- riva-build speech_synthesis -h

In [None]:
# Inspect the .rmir
!ls -lt $TTS_MODEL_DIR/*.rmir

### Riva-deploy

The deployment tool takes as input one or more Riva Model Intermediate Representation (RMIR) files and a target model repository directory. It creates an ensemble configuration specifying the pipeline for the execution and finally writes all those assets to the output model repository directory.

In [None]:
# Path to the model repostory relative to the SM docker
MODEL_REPO_SM = os.path.join(TTS_MODEL_DIR_SM, "models")

In [None]:
# Syntax: riva-deploy -f dir-for-rmir/model.rmir:key output-dir-for-repository
! docker run --rm --gpus 0 -v $TTS_MODEL_DIR:$TTS_MODEL_DIR_SM $RIVA_SM_CONTAINER -- \
            riva-deploy -f  $TTS_RMIR_SM:$KEY $MODEL_REPO_SM

In [None]:
# Inspect the models directory
!ls -lt $TTS_MODEL_DIR/models

---
## Start the Riva Server
After the model repository is generated, we are ready to start the Riva server. First, download the Riva Skills Quick Start resources from NGC. 

### Download the Riva Skills Quick Start guide
The [Riva Skills Quick Start](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/riva/resources/riva_quickstart) guide contains easy-to-use scripts to download and deploy models. 

`NOTE:` The scripts in Quick Start can download and deploy the default models. We downloaded the TTS models above just to demonstrate how to use Riva ServiceMaker tools, which will be used during customization tutorials to re-deploy the pipeline.

In [None]:
# Set the Riva Quick Start directory
RIVA_QSG = os.path.join(os.getcwd(), "riva_quickstart_v2.4.0")

# Downloads the quick start directory to a folder in the current directory and uncompresses it
if os.path.exists(RIVA_QSG):
    print("Riva Quick Start guide exists, skipping download")
else:
    print("Downloading the Riva Quick Start guide Model")
    !ngc registry resource download-version "nvidia/riva/riva_quickstart:2.4.0"

### Configure Riva Quick Start 
This configures the scripts to deploy the TTS models we obtained as a result of Riva servicemaker tools in the previous section. <br>
For this, we modify the `config.sh` file to enable relevant Riva services (TTS for the FastPitch/HiFiGAN model), provide the encryption key, and path to the model repository (`riva_model_loc`) generated in the previous step among other configurations. 

In [None]:
!ls $RIVA_QSG/config.sh

For example, if above the model repository is generated at `$TTS_MODEL_DIR/models`, then you can specify `riva_model_loc` as the same directory as `TTS_MODEL_DIR`. <br>

#### config.sh snippet
```
# Enable or Disable Riva Services 
service_enabled_asr=false
## MAKE CHANGES HERE - SET TO FALSE
service_enabled_nlp=false                                                      ## MAKE CHANGES HERE - SET TO FALSE
service_enabled_tts=true                                              

# Specify one or more GPUs to use
# specifying more than one GPU is currently an experimental feature, and may result in undefined behaviours.
gpus_to_use="device=0"

# Specify the encryption key to use to deploy models
MODEL_DEPLOY_KEY="tlt_encode"

# Locations to use for storing models artifacts
#
# If an absolute path is specified, the data will be written to that location
# Otherwise, a docker volume will be used (default).
#
# riva_init.sh will create a `rmir` and `models` directory in the volume or
# path specified. 
#
# RMIR ($riva_model_loc/rmir)
# Riva uses an intermediate representation (RMIR) for models
# that are ready to deploy but not yet fully optimized for deployment. Pretrained
# versions can be obtained from NGC (by specifying NGC models below) and will be
# downloaded to $riva_model_loc/rmir by `riva_init.sh`
# 
# Custom models produced by NeMo or TAO and prepared using riva-build
# may also be copied manually to this location $(riva_model_loc/rmir).
#
# Models ($riva_model_loc/models)
# During the riva_init process, the RMIR files in $riva_model_loc/rmir
# are inspected and optimized for deployment. The optimized versions are
# stored in $riva_model_loc/models. The riva server exclusively uses these
# optimized versions.
riva_model_loc="<add path>"                              ## MAKE CHANGES HERE (Replace with the path TTS_MODEL_DIR)                      
```

<font color='red'>**ATTENTION:**</font> **Make sure to do the following before moving forward:**
1. In the file navigator in Jupyter Lab, navigate to riva_quickstart_v2.* and open config.sh
2. Configure settings as shown in the snippet above
   - Set asr and nlp services to false
   - Configure the riva_model_loc path to where the models resulting from riva-deploy are stored

In [None]:
# set `riva-model-loc` to where the models resulting from riva-deploy are stored. In our case it is TTS_MODEL_DIR
!echo $TTS_MODEL_DIR

In [None]:
# Ensure you have permission to execute these scripts
! cd $RIVA_QSG && chmod +x ./riva_start.sh && chmod +x ./riva_stop.sh

In [None]:
# Run Riva Start to start the server. This will deploy your model(s).
! cd $RIVA_QSG && ./riva_start.sh config.sh

---
## Run Inference
Once the Riva server is up and running with the models, you can send inference requests querying the server. 

To send gRPC requests, you can install the Riva Python API bindings for the client. This is available as a `pip` [package](https://pypi.org/project/nvidia-riva-client/). Feel free to read more about the python client [here](https://github.com/nvidia-riva/python-clients).

In [None]:
# Install the Client API Bindings
! pip install nvidia-riva-client

### Connect to the Riva Server and Run Speech Synthesis
The following cells queries the Riva server (using gRPC) with an input audio to yield a transcript.

In [None]:
import io
import IPython.display as ipd
import grpc
import time
import numpy as np

try:
    import riva.client # RIVA 2.3.0 and above
except:
    import riva_api.riva_audio_pb2 as ra # RIVA 2.0.0 and above
    import riva_api.audio_pb2 as ra
    import riva_api.riva_tts_pb2 as rtts
    import riva_api.riva_tts_pb2_grpc as rtts_srv
import wave

The following URI assumes a local deployment of the Riva Speech API server is on the default port. In case the server deployment is on a different host or via a Helm chart on Kubernetes, use an appropriate URI.

In [None]:
auth = riva.client.Auth(uri='localhost:50051')

riva_tts = riva.client.SpeechSynthesisService(auth)

In [None]:
sample_rate_hz = 44100
resp = riva_tts.synthesize(
    text = "Is it recognize speech or wreck a nice beach?",
    language_code = "en-US",
    encoding = riva.client.AudioEncoding.LINEAR_PCM,    # Currently only LINEAR_PCM is supported
    sample_rate_hz = sample_rate_hz,                    # Generate 44.1KHz audio
    voice_name = "ljspeech"         # The name of the voice to generate
)

audio_samples = np.frombuffer(resp.audio, dtype=np.int16)
ipd.Audio(audio_samples, rate=sample_rate_hz)

With this, you should hear a synthesized audio for the input text. Now you have a speech synthesis pipeline running! 

In the next notebook, you will look into how you can customize the phoneme and prosody of the same synthesized voice.