<img src="http://developer.download.nvidia.com/notebooks/dlsw-notebooks/riva_tts_tts-python-advanced-pretrain-tts-tao-deployment/nvidia_logo.png" style="width: 90px; float: right;">

# How to deploy custom TTS Models (FastPitch and HiFi-GAN) trained with TAO Toolkit on Riva

This tutorial walks you the through deployment of custom TTS models (FastPitch and HiFiGAN) trained with TAO Toolkit on RIVA for real-time inference.

The custom TTS models trained in the notebook, `3_spectrogen-vocoder-tao-training.ipynb`, will be used for demonstration.

---
## Riva ServiceMaker
Riva ServiceMaker is a set of tools that aggregates all the necessary artifacts (models, files, configurations, and user settings)
for Riva deployment to a target environment.

### Riva-build

This step helps build a Riva-ready version of the model. Itâ€™s only output is an intermediate format (called an RMIR)
of an end-to-end pipeline for the supported services within Riva. Let's consider two TTS models.

* [FastPitch](https://ngc.nvidia.com/catalog/models/nvidia:tao:speechsynthesis_english_fastpitch) (spectrogram generator)
* [HiFi-GAN](https://ngc.nvidia.com/catalog/models/nvidia:tao:speechsynthesis_hifigan) (vocoder)<br>

We'll use the customized spectrogram and vocoder models (from the previous notebook) to deploy the Riva TTS pipeline.

Let's set the path to the customized spectrogram generator and vocoder models (`.riva`) which will be used when running `riva-build`.

In [None]:
# IMPORTANT: UPDATE MODEL_LOC with `.riva's` ABSOLUTE PATH 
import os
# ServiceMaker Docker
RIVA_SM_CONTAINER = "nvcr.io/nvidia/riva/riva-speech:2.10.0-servicemaker"

# Directory containing the riva folder, which in turn contains the .riva files
MODEL_LOC = os.path.abspath("tts-models/results")

# Names of the .riva files contained in $MODEL_LOC/riva
SPECTRO_GEN_MODEL_NAME = "FastPitch.riva"
VOCODER_MODEL_NAME = "HiFiGan.riva"

# Key that model is encrypted with, while exporting with TAO
KEY = "tlt_encode"

Create a directory within `$MODEL_LOC` to store the `.rmir` file. This is most useful if deploying multiple models. Moreover, the Riva Server start script, `riva_start.sh`, assumes that the `.rmir` files you deploy will be contained in `$MODEL_LOC/rmir` rather than `$MODEL_LOC`. 

In [None]:
!mkdir -p $MODEL_LOC/rmir

In [None]:
# Syntax: riva-build <task-name> output-dir-for-rmir/model.rmir:key dir-for-riva/model.riva:key
! docker run --rm --gpus 0 -v $MODEL_LOC:/data $RIVA_SM_CONTAINER -- \
    riva-build speech_synthesis /data/rmir/tts.rmir:$KEY /data/riva/$SPECTRO_GEN_MODEL_NAME:$KEY /data/riva/$VOCODER_MODEL_NAME:$KEY

### Riva-deploy

The deployment tool takes as input one or more Riva Model Intermediate Representation (RMIR) files and
a target model repository directory. It creates an ensemble configuration specifying the pipeline for
the execution and finally writes all those assets to the output model repository directory.

**Note**: This step might take ~10 minutes to complete.

In [None]:
# Syntax: riva-deploy -f dir-for-rmir/model.rmir:key output-dir-for-repository
! docker run --rm --gpus 0 -v $MODEL_LOC:/data $RIVA_SM_CONTAINER -- \
    riva-deploy -f  /data/rmir/tts.rmir:$KEY /data/models/

---
## Start the Riva Server
After the model repository is generated, we are ready to start the Riva server. First, download the Riva Skills Quick Start resource folder from NGC. 
Set the path to the directory here:

In [None]:
# Set the directory containing the Riva Skills Quick Start resource folder
RIVA_DIR = os.path.abspath("riva_quickstart_v2.10.0")
os.environ['RIVA_DIR'] = RIVA_DIR

# Downloads the Riva Skills Quick Start resource folder to the current working directory and uncompresses it
if os.path.exists(RIVA_DIR):
    print("Riva Skills Quick Start resource folder exists, skipping download")
else:
    print("Downloading the Riva Skills Quick Start resource folder")
    !ngc registry resource download-version "nvidia/riva/riva_quickstart:2.10.0"

Next, we modify the `config.sh` file to enable relevant Riva services (TTS for the FastPitch/HiFi-GAN models), provide the encryption key, and path to the model repository (`riva_model_loc`) generated in the previous step among other configurations. 

For example, if above the model repository is generated at `$MODEL_LOC/models`, then we will specify `riva_model_loc` as the same directory as `MODEL_LOC`. <br>

#### config.sh snippet
```sh
# Enable or Disable Riva Services
service_enabled_asr=true          ## MAKE CHANGES HERE - SET TO FALSE
service_enabled_nlp=true          ## MAKE CHANGES HERE - SET TO FALSE
service_enabled_tts=true
service_enabled_nmt=true          ## MAKE CHANGES HERE - SET TO FALSE

...

# Locations to use for storing models artifacts
#
# If an absolute path is specified, the data will be written to that location
# Otherwise, a docker volume will be used (default).
#
# riva_init.sh will create a `rmir` and `models` directory in the volume or
# path specified.
#
# RMIR ($riva_model_loc/rmir)
# Riva uses an intermediate representation (RMIR) for models
# that are ready to deploy but not yet fully optimized for deployment. Pretrained
# versions can be obtained from NGC (by specifying NGC models below) and will be
# downloaded to $riva_model_loc/rmir by `riva_init.sh`
#
# Custom models produced by NeMo or TLT and prepared using riva-build
# may also be copied manually to this location $(riva_model_loc/rmir).
#
# Models ($riva_model_loc/models)
# During the riva_init process, the RMIR files in $riva_model_loc/rmir
# are inspected and optimized for deployment. The optimized versions are
# stored in $riva_model_loc/models. The riva server exclusively uses these
# optimized versions.
riva_model_loc="riva-model-repo"  ## MAKE CHANGES HERE (Replace with the path TTS_MODEL_DIR)

if [[ $riva_target_gpu_family == "tegra" ]]; then
    riva_model_loc="`pwd`/model_repository"
fi

# The default RMIRs are downloaded from NGC by default in the above $riva_rmir_loc directory
# If you'd like to skip the download from NGC and use the existing RMIRs in the $riva_rmir_loc
# then set the below $use_existing_rmirs flag to true. You can also deploy your set of custom
# RMIRs by keeping them in the riva_rmir_loc dir and use this quickstart script with the
# below flag to deploy them all together.
use_existing_rmirs=false          ## MAKE CHANGES HERE - SET TO TRUE                      
```

<font color='red'>**ATTENTION:**</font> **Make sure to do the following before moving forward:**

**Either** carry out these tasks manually: 
1. In the file navigator in Jupyter Lab, navigate to `riva_quickstart_v2.*` and open `config.sh`
2. Configure settings as shown in the snippet above
   - Set ASR, NLP, and NMT services to `false`
   - Set the `riva_model_loc` path to the path also assigned to `TTS_MODEL_DIR`
   - Set the variable `use_existing_rmirs` to `true`

**Or** run the cell below: 

In [None]:
ENABLE_ASR = 'false'
ENABLE_NLP = 'false'
ENABLE_TTS = 'true'
ENABLE_NMT = 'false'

!sed -i "s|service_enabled_asr=.*|service_enabled_asr=$ENABLE_ASR|g" $RIVA_DIR/config.sh
!sed -i "s|service_enabled_nlp=.*|service_enabled_nlp=$ENABLE_NLP|g" $RIVA_DIR/config.sh
!sed -i "s|service_enabled_tts=.*|service_enabled_tts=$ENABLE_TTS|g" $RIVA_DIR/config.sh
!sed -i "s|service_enabled_nmt=.*|service_enabled_nmt=$ENABLE_NMT|g" $RIVA_DIR/config.sh

!sed -i "/\sriva_model_loc=.*/! s|riva_model_loc=.*|riva_model_loc=\"$MODEL_LOC\"|g" $RIVA_DIR/config.sh

!sed -i "s|use_existing_rmirs=.*|use_existing_rmirs=true|g" $RIVA_DIR/config.sh

In [None]:
# set `riva-model-loc` to where the models resulting from riva-deploy are stored. In our case it is MODEL_LOC
!echo $MODEL_LOC

In [None]:
# Ensure you have permission to execute these scripts
! cd $RIVA_DIR && chmod +x ./riva_stop.sh && chmod +x ./riva_start.sh

In [None]:
# Stop existing Riva deployments. 
! cd $RIVA_DIR && ./riva_stop.sh config.sh 
# Run Riva Start. This will deploy your model(s).
! cd $RIVA_DIR && ./riva_start.sh config.sh

---
## Run Inference
Once the Riva server is up-and-running with your models, you can send inference requests querying the server. 

To send gRPC requests, we will use the Riva Python API bindings.


### Connect to the Riva Server and Run Inference
Now we can actually query the Riva server. The following cell queries the Riva server (using gRPC) to yield a result.

In [None]:
! pip install nvidia-riva-client

In [None]:
import os
import riva.client
import IPython.display as ipd
import numpy as np

In [None]:
auth = riva.client.Auth(uri="localhost:50051")
riva_tts = riva.client.SpeechSynthesisService(auth)

In [None]:
sample_rate_hz = 22050
resp = riva_tts.synthesize(
    text = "Is it recognize speech or wreck a nice beach?",
    language_code = "en-US",
    encoding = riva.client.AudioEncoding.LINEAR_PCM,    # Currently only LINEAR_PCM is supported
    sample_rate_hz = sample_rate_hz,                    # Generate 22.05 KHz audio
    voice_name = None         # The name of the voice to generate
)

In [None]:
audio_samples = np.frombuffer(resp.audio, dtype=np.int16)
ipd.Audio(audio_samples, rate=sample_rate_hz)

The synthesized speech might sound robotic because we've trained both the FastPitch and HiFiGAN models for a less number of epochs/iterations.

---
### Cleanup

You can stop the Riva ServiceMaker container (and thus shut down the Riva server) before shutting down the Jupyter kernel.

In [None]:
! docker container stop riva-speech