# TTS Deploy

This tutorial explains the process of generating a TTS RMIR (Riva Model Intermediate Representation). A RMIR is an intermediate file that has all the necessary artifacts (models, files, configurations, and user settings) required to deploy a Riva service.  

## Learning Objectives
In this tutorial, you will learn how to:  
- Use Riva ServiceMaker to take two `.riva` files and convert it to `.rmir` for either a `AMD64` (data center, `86_64`) or a `ARM64` (embedded, `AArch64`) machine.
  - For users who have `.nemo` files, [`nemo2riva`](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/model-overview.html#export-models-with-nemo2riva) can be used to generate `.riva` files from `.nemo` checkpoints.
- Launch and deploy the `.rmir` locally on the Riva server.
- Send inference requests from a demo client using Riva API bindings.

## Prerequisties
To use this tutorial, ensure that you:
- Have access to NGC through the [NGC Command-Line Interface (CLI)](https://docs.ngc.nvidia.com/cli/index.html).

## Riva ServiceMaker
ServiceMaker is a set of tools that aggregates all the necessary artifacts (models, files, configurations, and user settings) for Riva deployment to a target environment. It has two main components:

* `riva-build`
* `riva-deploy`

The first step is `riva-build`, which can be run on either data center or embedded machines to build an `.rmir` file.

The second step is `riva-deploy`, which should be run on the machine that the Riva server is to be served on.

If you are building an `.rmir` file on a data center machine to target an embedded deployment, follow this tutorial up to and including the [Riva-build section](#Run-riva-build). Copy the built `.rmir` to the target embedded machine, run the [set configs and params section](#Set-the-Configurations-and-Parameters), and continue to the [Riva-deploy section](#Run-riva-deploy).

### Riva-build

This step helps build a Riva-ready version of the model. It’s only output is an intermediate format (called a Riva Model Intermediate Representation (`.rmir`)) of an end-to-end pipeline for the supported services within Riva. Let’s consider two TTS models:

* [FastPitch](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/riva/models/speechsynthesis_en_us_fastpitch_ipa) (spectrogram generator)
* [HiFi-GAN](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/riva/models/speechsynthesis_en_us_hifigan_ipa) (vocoder).<br>

`riva-build` is responsible for the combination of one or more exported models (`.riva` files) into a single file
containing an intermediate format called `.rmir`. This file contains a
deployment-agnostic specification of the whole end-to-end pipeline along with all the assets required for the
final deployment and inference. Refer to the [Riva documentation](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/tts/tts-custom.html#fastpitch-and-hifi-gan) for more information.

### Riva-deploy

The deployment tool takes as input one or more `.rmir` files and a target model repository directory. It creates an ensemble configuration specifying the pipeline for
the execution and finally writes all those assets to the output model repository directory.

---
### Set the Configurations and Parameters
Import the necessary modules: 

In [None]:
import os
import pathlib
import logging
import warnings

Set the Riva version. You can use the commands below to set it to the latest version automatically. Alternatively, you can alter the last line in the cell to set the version manually. 

In [None]:
riva_line_list = !wget -qO- https://docs.nvidia.com/deeplearning/riva/user-guide/docs/index.html | grep "NVIDIA Riva Skills"
riva_line_string = riva_line_list[0]
__riva_version__ = riva_line_string.split(' ')[3]

Update the parameters in the following code block:
- `machine_type`: Type of machine the tutorial is being run on. Acceptable values are `AMD64`, `ARM64_linux`, `ARM64_l4t`. Defaults to `AMD64`.  
- `target_machine`: Type of machine the RMIR will be deployed on. Acceptable values are `AMD64`, `ARM64_linux`, `ARM64_l4t`. Defaults to `AMD64`.  
- `voice`: Set the voice name of the model. Default to `"test"`.  
- `key`: This is the encryption key used in `nemo2riva`. The same key will be used to deploy the RMIR generated in this tutorial. Defaults to `tlt_encode`.  
- `use_ipa`: Set to `"y"` or `"Y"` if the model uses IPA phonemes, `"no"` if the model uses ARPAbet. Defaults to `"yes"`.  
- `lang`: Model language. This is only used for the client, and has no effect on generated speech. Defaults to `"en-US"`.  
- `num_speakers`: Number of speakers in the model. Defaults to 2, the number of speakers in the NGC example model.
- `force`: Whether to force-build a new TTS RMIR and replace any existing RMIRs
- `use_customized_models`: Whether to use the customized models created in the previous tutorial in this lab
- `use_pretrained_models`: Whether to use pretrained models instead of customized models
- `sample_rate`: Sample rate of generated audios in Hz. Defaults to 44100 for pretrained models and 22050 for customized models.

In [None]:
machine_type="AMD64" #Change this to `ARM64_linux` or `ARM64_l4t` in case of an ARM64 machine.
target_machine="AMD64" #Change this to `ARM64_linux` or `ARM64_l4t` in case of an ARM64 machine.
voice = "test" ##Voice name
# key = "nemotoriva" ##Encryption key used during nemo2riva # tlt_encode for the standard FastPitch and HiFiGAN RMIRs
key = "tlt_encode" ##Encryption key used during nemo2riva # tlt_encode for the standard FastPitch and HiFiGAN RMIRs
use_ipa = "yes" ##`"y"` or `"Y"` if the model uses `ipa`, no otherwise.
lang = "en-US" ##Language
num_speakers = 1 ## Number of speakers
force = True ## Whether to force-build a new TTS RMIR and replace any existing RMIRs
use_customized_models = True ## Whether to use the customized models created in the previous tutorial in this lab
use_pretrained_models = not use_customized_models ## Whether to use pretrained models instead of customized models
if use_customized_models: 
    sample_rate = 22050 ## Audio sample rate in Hz for the customized RMIRs
else:
    sample_rate = 44100 ## Audio sample rate in Hz for the standard FastPitch and HiFiGAN RMIRs

In [None]:
## Riva NGC, servicemaker image config.
if machine_type.lower() in ["amd64", "arm64_linux"]:
    RIVA_SM_CONTAINER = f"nvcr.io/nvidia/riva/riva-speech:{__riva_version__}-servicemaker"
elif machine_type.lower()=="arm64_l4t":
    RIVA_SM_CONTAINER = f"nvcr.io/nvidia/riva/riva-speech:{__riva_version__}-servicemaker-l4t-aarch64"

In [None]:
# Create a local directory to save models
TTS_MODEL_DIR = os.path.join(os.getcwd(), "tts-models")
!mkdir -p $TTS_MODEL_DIR

Define a function for downloading NGC resources

In [None]:
def ngc_download_and_get_dir(ngc_resource_name, resource_description, resource_type="model", parent_dir=TTS_MODEL_DIR):
    default_download_folder = "_v".join(ngc_resource_name.split("/")[-1].split(":"))
    download_path = os.path.join(parent_dir, default_download_folder)
    if os.path.exists(download_path):
        print(f"{resource_description} exists, skipping download")
        return default_download_folder
    ngc_output = !ngc registry $resource_type download-version $ngc_resource_name --dest $parent_dir
    if not os.path.exists(download_path):
        ngc_output_formatted='\n'.join(ngc_output)
        logging.error(
            f"NGC was not able to download the requested model {ngc_resource_name}. "
            "Please check the NGC error message, remove all directories, and re-start the "
            f"notebook. NGC message: {ngc_output_formatted}"
        )
        return None
    print(f"Successfully downloaded {resource_description}")
    return default_download_folder

### Download models

Download a pretrained acoustic model, a pretrained vocoder, auxiliary files, and normalization grammar files to the `tts-models` directory. The former two can be skipped if you already possess customized models. 

#### Download FastPitch Acoustic Model (aka Mel Spectrogram Generator)

For consistency with the previous tutorial in this lab, we will download this [FastPitch NeMo checkpoint](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/tts_en_fastpitch) which has been trained on LJSpeech sampled at 22.05 kHz with IPA transcription, then convert it to a `.riva` file with the `nemo2riva` Python module.

You can obtain a more updated FastPitch model in `.riva` form from [here](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/riva/models/speechsynthesis_en_us_fastpitch_ipa). However, it was trained on audio sampled at 44.1 kHz. 

In [None]:
if use_pretrained_models: 
    AM_DIR = ngc_download_and_get_dir("nvidia/riva/speechsynthesis_en_us_fastpitch_ipa:deployable_v1.1", "Acoustic model")

In [None]:
if use_customized_models: 
    import glob
    fastpitch_riva_file_path = glob.glob(os.path.join(TTS_MODEL_DIR, AM_DIR, "*.riva"))[0]
    fastpitch_riva_file_name = fastpitch_riva_file_path.split('/')[-1]

#### Download HiFiGAN Vocoder Model 

For consistency with the previous tutorial in this lab, and the previous few cells of this tutorial, we will download this [HiFiGAN NeMo checkpoint](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/tts_hifigan) which has been trained on LJSpeech sampled at 22.05 kHz with IPA transcription, then convert it to a `.riva` file with the `nemo2riva` Python module.

You can obtain a more updated HiFiGAN model in `.riva` form from [here](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/riva/models/speechsynthesis_en_us_hifigan_ipa). However, it was trained on audio sampled at 44.1 kHz. 

In [None]:
if use_pretrained_models: 
    VC_DIR = ngc_download_and_get_dir("nvidia/riva/speechsynthesis_en_us_hifigan_ipa:deployable_v1.1", "Vocoder model")

In [None]:
if use_customized_models: 
    import glob
    hifigan_riva_file_path = glob.glob(os.path.join(TTS_MODEL_DIR, VC_DIR, "*.riva"))[0]
    hifigan_riva_file_name = hifigan_riva_file_path.split('/')[-1]

#### Download additional TTS resources

The following code block will download some additional TTS files used for deployment. This will include the following files:  
- [Auxiliary files](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/riva/models/speechsynthesis_en_us_auxiliary_files/files)
    - ARPAbet dictionary file
    - IPA dictionary file
    - abbreviation mapping file
- [Normalization Grammar (NG) files](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/riva/models/normalization_en_us/files) for (Inverse) Text Normalization (TN/ITN)
    - tokenize_and_classify.far
    - verbalize.far

The speech synthesis pipeline uses weighted finite-state transducer (WFST) grammars that map strings in written form to strings in spoken form. 

Riva implements NeMo's inverse text normalization (ITN), which is based on WFST grammars. The ITN tool uses [Pynini](https://github.com/kylebgorman/pynini) to construct WFSTs. The created grammars can be exported and integrated into Sparrowhawk (an open-source version of the Kestrel TTS text normalization system) for production.

Pynini exports tokenizer_and_classify and verbalizes FSTs as OpenFst finite state archive (FAR) files, ready to be deployed with Riva.

In [None]:
AUX_DIR = ngc_download_and_get_dir("nvidia/riva/speechsynthesis_en_us_auxiliary_files:deployable_v1.3", "Auxiliary files folder")
NG_DIR  = ngc_download_and_get_dir("nvidia/riva/normalization_en_us:deployable_v1.1", "Text normalization folder")

---
## Run riva-build
Stop running Docker, run `riva_servicemaker`, and run again with the necessary paths.

First, let's set relevant paths relative to where we will mount the models in the Riva Servicemaker Docker container. 

In [None]:
# All model paths relative to Riva Servicemaker Docker container include the _SM suffix

# Directory where the generated .rmir file will be stored
# Relative path where the generated .rmir file will be stored
!mkdir -p $TTS_MODEL_DIR/rmir

# Path where we mount the downloaded TTS models in the Servicemaker container
TTS_MODEL_DIR_SM = "/data" 

if use_customized_models: 
    # Relative path to customized acoustic model
    AM_SM = os.path.join(TTS_MODEL_DIR_SM, "riva/FastPitch.riva")
    
    # Relative path to customized Vocoder Model
    VC_SM = os.path.join(TTS_MODEL_DIR_SM, "riva/HifiGan.riva")
    
    # Relative path where the generated .rmir file will be stored
    TTS_RMIR_SM = os.path.join(TTS_MODEL_DIR_SM, "rmir", "tts-customized.rmir")

if use_pretrained_models:
    # Relative path to pretrained acoustic model
    AM_SM = os.path.join(TTS_MODEL_DIR_SM, AM_DIR, fastpitch_riva_file_name)
    
    # Relative path to pretrained Vocoder Model
    VC_SM = os.path.join(TTS_MODEL_DIR_SM, VC_DIR, hifigan_riva_file_name)
    
    # Relative path where the generated .rmir file will be stored
    TTS_RMIR_SM = os.path.join(TTS_MODEL_DIR_SM, "rmir", "tts-pretrained.rmir")

# Relative path to auxiliary files
ABBR_SM = os.path.join(TTS_MODEL_DIR_SM, AUX_DIR, "abbr.txt")
ARP_DICT_SM = os.path.join(TTS_MODEL_DIR_SM, AUX_DIR, "cmudict-0.7b_nv22.08")
IPA_DICT_SM = os.path.join(TTS_MODEL_DIR_SM, AUX_DIR, "ipa_cmudict-0.7b_nv22.08.txt")

# Relative path to normalization grammar
WFST_TOKENIZER_MODEL_SM = os.path.join(TTS_MODEL_DIR_SM, NG_DIR, "tokenize_and_classify.far")
WFST_VERBALIZER_MODEL_SM = os.path.join(TTS_MODEL_DIR_SM, NG_DIR, "verbalize.far")

In [None]:
##Run the riva servicemaker.
!docker stop riva_rmir_gen &> /dev/null
!set -x && \
    docker run -td --gpus all --rm \
        -v $TTS_MODEL_DIR/rmir:$TTS_MODEL_DIR_SM/rmir \
        -v $TTS_MODEL_DIR/results/riva:$TTS_MODEL_DIR_SM/riva \
        -v $TTS_MODEL_DIR/$AM_DIR:$TTS_MODEL_DIR_SM/$AM_DIR \
        -v $TTS_MODEL_DIR/$VC_DIR:$TTS_MODEL_DIR_SM/$VC_DIR \
        -v $TTS_MODEL_DIR/$AUX_DIR:$TTS_MODEL_DIR_SM/$AUX_DIR \
        -v $TTS_MODEL_DIR/$NG_DIR:$TTS_MODEL_DIR_SM/$NG_DIR \
        --name riva_rmir_gen --entrypoint="/bin/bash" $RIVA_SM_CONTAINER

<div class="alert-warning">
    Using <b>--force</b> tag in <b>riva-build</b> this will replace any existing RMIR.
</div>

In [None]:
warnings.warn("Using --force in riva-build will replace any existing RMIR.")
riva_build = (
    f"riva-build speech_synthesis {TTS_RMIR_SM}:{key} "
    f"{AM_SM}:{key} {VC_SM}:{key} --voice_name={voice} --language_code={lang} "
    f"--sample_rate={sample_rate} --abbreviations_file={ABBR_SM} "
    f"--wfst_tokenizer_model={WFST_TOKENIZER_MODEL_SM} --wfst_verbalizer_model={WFST_VERBALIZER_MODEL_SM}"
)
if use_ipa.lower() in ["y", "yes"]:
    riva_build += f" --phone_set=ipa --phone_dictionary_file={IPA_DICT_SM} --upper_case_chars=True"
else:
    riva_build += f" --phone_set=arpabet --phone_dictionary_file={ARP_DICT_SM}"
if force:
    riva_build += " --force"
if num_speakers > 1:
    riva_build += f" --num_speakers={num_speakers}"
    riva_build += " --subvoices " + ",".join([f"{i}:{i}" for i in range(num_speakers)])
if "arm" in target_machine.lower():
    riva_build += (
        " --max_batch_size 1 --postprocessor.max_batch_size 1 --preprocessor.max_batch_size 1 "
        "--encoderFastPitch.max_batch_size 1 --chunkerFastPitch.max_batch_size 1 --hifigan.max_batch_size 1"
    )
print(riva_build)

Execute the riva build command and stop the riva_servicemaker container.

In [None]:
! docker exec riva_rmir_gen $riva_build
! docker stop riva_rmir_gen

---
## Run riva-deploy

So far in this tutorial, we have learned how to generate RMIR files from `.riva` files. We would see that a `FastPitch_HifiGan.rmir` has been generated in the `${TTS_MODEL_DIR}/rmir` location we defined earlier.  

The RMIR file generated in this tutorial can be deployed using [riva_quickstart](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/quick-start-guide.html).

### Steps to deploy the RMIR
- Download the Riva Quick Start resource
- Open `config.sh` and update the following params:  
    - set `service_enabled_asr` to `false`.  
    - set `service_enabled_nlp` to `false`.  
    - set `service_enabled_tts` to `true`.
    - set `service_enabled_nmt` to `false`.  
    - `riva_model_loc` to the location of your `TTS_MODEL_DIR`.  
    - set `use_existing_rmirs` to `true`.  
- run `riva_init.sh`.  
- run `riva_start.sh`.  


### Riva-deploy

The deployment tool takes as input one or more Riva Model Intermediate Representation (RMIR) files and a target model repository directory. It creates an ensemble configuration specifying the pipeline for the execution and finally writes all those assets to the output model repository directory.

In [None]:
# Path to the model repostory relative to the SM docker
MODEL_REPO_SM = os.path.join(TTS_MODEL_DIR_SM, "models")

In [None]:
# Syntax: riva-deploy -f dir-for-rmir/model.rmir:key output-dir-for-repository
! docker run --rm --gpus all -v $TTS_MODEL_DIR:$TTS_MODEL_DIR_SM $RIVA_SM_CONTAINER -- \
    riva-deploy -f  $TTS_RMIR_SM:$key $MODEL_REPO_SM

In [None]:
# Inspect the models directory
!ls -lt $TTS_MODEL_DIR/models

Let's download the Riva Quick Start resource from NGC.

In [None]:
if target_machine.lower() in ["amd64", "arm64_linux"]:
    quickstart_link = f"nvidia/riva/riva_quickstart:{__riva_version__}"
else:
    quickstart_link = f"nvidia/riva/riva_quickstart_arm64:{__riva_version__}"

RIVA_DIR = ngc_download_and_get_dir(quickstart_link, "Riva Quick Start resource folder", resource_type="resource", parent_dir=os.getcwd())
RIVA_DIR = os.path.join(os.getcwd(), RIVA_DIR)

Next, we modify the `config.sh` file to enable the relevant Riva services (TTS in this case for FastPitch and HiFi-GAN), and provide the encryption key and path to the model repository (`riva_model_loc`) generated in the previous step.

For example, if above the model repository is generated at `$TTS_MODEL_DIR/models`, then you can specify `riva_model_loc` as the same directory as `TTS_MODEL_DIR`.

Here is how the `config.sh` should look:
```sh
### config.sh snippet  
# Enable or Disable Riva Services
# For any language other than en-US: service_enabled_nlp must be set to false
service_enabled_asr=true          ## MAKE CHANGES HERE - SET TO FALSE
service_enabled_nlp=true          ## MAKE CHANGES HERE - SET TO FALSE
service_enabled_tts=true
service_enabled_nmt=true          ## MAKE CHANGES HERE - SET TO FALSE

...

# Specify the encryption key to use to deploy models
MODEL_DEPLOY_KEY="tlt_encode"     ## MAKE CHANGES HERE (Replace with the key you used when running nemo2riva)

# Locations to use for storing models artifacts
#
# If an absolute path is specified, the data will be written to that location
# Otherwise, a Docker volume will be used (default).
#
# riva_init.sh will create a `rmir` and `models` directory in the volume or
# path specified.
#
# RMIR ($riva_model_loc/rmir)
# Riva uses an intermediate representation (RMIR) for models
# that are ready to deploy but not yet fully optimized for deployment. Pretrained
# versions can be obtained from NGC (by specifying NGC models below) and will be
# downloaded to $riva_model_loc/rmir by `riva_init.sh`
#
# Custom models produced by NeMo or TLT and prepared using riva-build
# may also be copied manually to this location $(riva_model_loc/rmir).
#
# Models ($riva_model_loc/models)
# During the riva_init process, the RMIR files in $riva_model_loc/rmir
# are inspected and optimized for deployment. The optimized versions are
# stored in $riva_model_loc/models. The riva server exclusively uses these
# optimized versions.
riva_model_loc="riva-model-repo"  ## MAKE CHANGES HERE (Replace with the path TTS_MODEL_DIR)

if [[ $riva_target_gpu_family == "tegra" ]]; then
    riva_model_loc="`pwd`/model_repository"
fi

# The default RMIRs are downloaded from NGC by default in the above $riva_rmir_loc directory
# If you'd like to skip the download from NGC and use the existing RMIRs in the $riva_rmir_loc
# then set the below $use_existing_rmirs flag to true. You can also deploy your set of custom
# RMIRs by keeping them in the riva_rmir_loc dir and use this quickstart script with the
# below flag to deploy them all together.
use_existing_rmirs=false          ## MAKE CHANGES HERE - SET TO TRUE
```

Let's make the necessary changes to the `config.sh` script.

In [None]:
with open(f"{RIVA_DIR}/config.sh", "r") as config_in:
    config_file = config_in.readlines()

for i, line in enumerate(config_file):
    # Disable services
    if line.startswith("service_enabled_asr"):
        config_file[i] = "service_enabled_asr=false\n"
    elif line.startswith("service_enabled_nlp"):
        config_file[i] = "service_enabled_nlp=false\n"
    elif line.startswith("service_enabled_nmt"):
        config_file[i] = "service_enabled_nmt=false\n"
    elif line.startswith("service_enabled_tts"):
        config_file[i] = "service_enabled_tts=true\n"
    # Update riva_model_loc to our rmir folder
    elif line.startswith("riva_model_loc"):
        config_file[i] = f'riva_model_loc="{TTS_MODEL_DIR}"\n'
    elif line.startswith("use_existing_rmirs"):
        config_file[i] = "use_existing_rmirs=true\n"
    elif line.startswith("MODEL_DEPLOY_KEY"):
        config_file[i] = f'MODEL_DEPLOY_KEY="{key}"\n'

with open(f"{RIVA_DIR}/config.sh", "w") as config_in:
    config_in.writelines(config_file)

print("".join(config_file))

In [None]:
# Ensure you have permission to execute these scripts
! cd $RIVA_DIR && chmod +x ./riva_init.sh && chmod +x ./riva_start.sh && chmod +x ./riva_stop.sh
! cd $RIVA_DIR && ./riva_stop.sh config.sh

In [None]:
# Run `riva_init.sh`. This will fetch the containers/models and run `riva-deploy`.
# YOU CAN SKIP THIS STEP IF YOU RAN RIVA DEPLOY
! cd $RIVA_DIR && ./riva_init.sh config.sh

In [None]:
# Run `riva_start.sh`. This will start the Riva server and serve your model.
! cd $RIVA_DIR && ./riva_start.sh config.sh

# Run Inference
Once the Riva server is up and running with your models, you can send inference requests querying the server.

To send gRPC requests, install the Riva Python API bindings for the client.

In [None]:
# Install client API bindings
! pip install nvidia-riva-client

### Connect to the Riva server and run inference
Now, we can query the Riva server; let’s get started. The following cell queries the Riva server (using gRPC) to yield a result.

In [None]:
import os
import riva.client
import IPython.display as ipd
import numpy as np

server = "localhost:50051"                # location of riva server
auth = riva.client.Auth(uri=server)
tts_service = riva.client.SpeechSynthesisService(auth)


text = "Is it recognize speech or wreck a nice beach?"
language_code = lang                   # set to "en-US" for this lab
sample_rate_hz = sample_rate                    # the desired sample rate
voice_name = voice      # subvoice to generate the audio output.
data_type = np.int16                      # For RIVA version < 1.10.0 please set this to np.float32

resp = tts_service.synthesize(text, voice_name=voice_name, language_code=language_code, sample_rate_hz=sample_rate_hz)
audio = resp.audio
meta = resp.meta
processed_text = meta.processed_text
predicted_durations = meta.predicted_durations

audio_samples = np.frombuffer(resp.audio, dtype=data_type)
print(processed_text)
ipd.Audio(audio_samples, rate=sample_rate_hz)