<img src="http://developer.download.nvidia.com/notebooks/dlsw-notebooks/riva_tts_tts-python-advanced-pretrain-tts-tao-deployment/nvidia_logo.png" style="width: 90px; float: right;">

# How to deploy custom TTS Models (FastPitch and HiFi-GAN) trained with TAO Toolkit on Riva

This tutorial walks you through deployment of custom TTS models (FastPitch and HiFiGAN) trained with TAO Toolkit on RIVA for real-time inference.

---
## Riva ServiceMaker
Riva ServiceMaker is a set of tools that aggregates all the necessary artifacts (models, files, configurations, and user settings)
for Riva deployment to a target environment.

### Riva-build

This step helps build a Riva-ready version of the model. It’s only output is an intermediate format (called an RMIR)
of an end-to-end pipeline for the supported services within Riva. Let's consider two TTS models.

* [FastPitch](https://ngc.nvidia.com/catalog/models/nvidia:tao:speechsynthesis_english_fastpitch) (spectrogram generator)
* [HiFi-GAN](https://ngc.nvidia.com/catalog/models/nvidia:tao:speechsynthesis_hifigan) (vocoder)<br>

We'll use the customized spectrogram and vocoder models (from the previous notebook) to deploy the Riva TTS pipeline.

Let's set the path to the customized spectrogram generator and vocoder models (.riva) which will be used during `riva build`.

In [None]:
# IMPORTANT: UPDATE MODEL_LOC with `.riva's` ABSOLUTE PATH 
import os
# ServiceMaker Docker
RIVA_SM_CONTAINER = "nvcr.io/nvidia/riva/riva-speech:2.4.0-servicemaker"

# Directory where the .riva models are stored $MODEL_LOC/*.riva
WORKING_DIR = os.path.join(os.getcwd(), "tts_training")
MODEL_LOC = WORKING_DIR + "/results/riva/"

# Name of the .riva files
SPECTRO_GEN_MODEL_NAME = "spectro_gen.riva"
VOCODER_MODEL_NAME = "vocoder.riva"

# Key that model is encrypted with, while exporting with TAO
KEY = "tlt_encode"

In [None]:
# Syntax: riva-build <task-name> output-dir-for-rmir/model.rmir:key dir-for-riva/model.riva:key
! docker run --rm --gpus 0 -v $MODEL_LOC:/data $RIVA_SM_CONTAINER -- \
            riva-build speech_synthesis /data/tts.rmir:$KEY /data/$SPECTRO_GEN_MODEL_NAME:$KEY /data/$VOCODER_MODEL_NAME:$KEY

### Riva-deploy

The deployment tool takes as input one or more Riva Model Intermediate Representation (RMIR) files and
a target model repository directory. It creates an ensemble configuration specifying the pipeline for
the execution and finally writes all those assets to the output model repository directory.

In [None]:
# Syntax: riva-deploy -f dir-for-rmir/model.rmir:key output-dir-for-repository
! docker run --rm --gpus 0 -v $MODEL_LOC:/data $RIVA_SM_CONTAINER -- \
            riva-deploy -f  /data/tts.rmir:$KEY /data/models/

---
## Start the Riva Server
After the model repository is generated, we are ready to start the Riva server. First, download the Riva Quick Start resource from NGC. 
Set the path to the directory here:

In [None]:
# Set the Riva Quick Start directory
RIVA_DIR = os.path.join(os.getcwd(), "riva_quickstart_v2.4.0")

# Checking if the quickstart exists, otherwise download it
if os.path.exists(RIVA_DIR):
    print("Quickstart scripts exists, skipping download")
else:
    print("Quickstart scripts does not exist, downloading")
    ! ngc registry resource download-version "nvidia/riva/riva_quickstart:2.4.0"

Next, we modify the `config.sh` file to enable relevant Riva services (TTS for the FastPitch/HiFi-GAN models), provide the encryption key, and path to the model repository (`riva_model_loc`) generated in the previous step among other configurations. 

For example, if above the model repository is generated at `$MODEL_LOC/models`, then we will specify `riva_model_loc` as the same directory as `MODEL_LOC`. <br>

#### config.sh snippet
```
# Enable or Disable Riva Services 
service_enabled_asr=false                                                      ## MAKE CHANGES HERE
service_enabled_nlp=false                                                      ## MAKE CHANGES HERE
service_enabled_tts=true                                                     ## MAKE CHANGES HERE

# Specify one or more GPUs to use
# specifying more than one GPU is currently an experimental feature, and may result in undefined behaviours.
gpus_to_use="device=0"

# Specify the encryption key to use to deploy models
MODEL_DEPLOY_KEY="tlt_encode"                                                 

# Locations to use for storing models artifacts
#
# If an absolute path is specified, the data will be written to that location
# Otherwise, a docker volume will be used (default).
#
# riva_init.sh will create a `rmir` and `models` directory in the volume or
# path specified. 
#
# RMIR ($riva_model_loc/rmir)
# Riva uses an intermediate representation (RMIR) for models
# that are ready to deploy but not yet fully optimized for deployment. Pretrained
# versions can be obtained from NGC (by specifying NGC models below) and will be
# downloaded to $riva_model_loc/rmir by `riva_init.sh`
# 
# Custom models produced by NeMo or TAO and prepared using riva-build
# may also be copied manually to this location $(riva_model_loc/rmir).
#
# Models ($riva_model_loc/models)
# During the riva_init process, the RMIR files in $riva_model_loc/rmir
# are inspected and optimized for deployment. The optimized versions are
# stored in $riva_model_loc/models. The riva server exclusively uses these
# optimized versions.
riva_model_loc="<add path>"                              ## MAKE CHANGES HERE (Replace with MODEL_LOC)                      
```

**Make sure to do the following before moving forward:**

1. In the file navigator in Jupyter Lab, navigate to riva_quickstart_v2.* and open config.sh
2. Configure settings as shown in the snippet above
    * Set asr and nlp services to false
    * Configure the riva_model_loc path to where the models resulting from riva-deploy are stored

In [None]:
# set `riva-model-loc` to where the models resulting from riva-deploy are stored. In our case it is MODEL_LOC
!echo $MODEL_LOC

In [None]:
# Ensure you have permission to execute these scripts
! cd $RIVA_DIR && chmod +x ./riva_stop.sh && chmod +x ./riva_start.sh

In [None]:
# Stop existing Riva deployments. 
! cd $RIVA_DIR && ./riva_stop.sh config.sh 
# Run Riva Start. This will deploy your model(s).
! cd $RIVA_DIR && ./riva_start.sh config.sh

---
## Run Inference
Once the Riva server is up-and-running with your models, you can send inference requests querying the server. 

To send gRPC requests, we will use the Riva Python API bindings.


### Connect to the Riva Server and Run Inference
Now we can actually query the Riva server. The following cell queries the Riva server (using gRPC) to yield a result.

In [None]:
import os
import riva.client
import IPython.display as ipd
import numpy as np

In [None]:
auth = riva.client.Auth(uri="localhost:50051")
tts_service = riva.client.SpeechSynthesisService(auth)

In [None]:
text = "Is it recognize speech or wreck a nice beach?"
language_code = "en-US"                   # currently required to be "en-US"
sample_rate_hz = 22050                    # the desired sample rate
voice_name = "English-US-Female-1"      # subvoice to generate the audio output.

resp = tts_service.synthesize(text, voice_name=voice_name, language_code=language_code, sample_rate_hz=sample_rate_hz)

audio_samples = np.frombuffer(resp.audio, dtype=np.int16)
print(text)
ipd.Audio(audio_samples, rate=sample_rate_hz)

The synthesized speech might sound robotic because we've trained both the FastPitch and HiFiGAN models for a less number of epochs/iterations.

---
### Cleanup

You can stop all Docker containers before shutting down the Jupyter kernel. **Caution: The following command will stop all running containers.**

In [None]:
! docker stop $(docker ps -a -q)