<img src="http://developer.download.nvidia.com/notebooks/dlsw-notebooks/rivaasrasr-deploy-am-and-ngram-lm/nvidia_logo.png" style="width: 90px; float: right;">

# How to Deploy a Custom Language Model (n-gram) Trained with NeMo on Riva
This tutorial walks you through the deployment of a custom language model (n-gram) trained with NVIDIA NeMo on NVIDIA Riva.

## NVIDIA Riva Overview

NVIDIA Riva is a GPU-accelerated SDK for building speech AI applications that are customized for your use case and deliver real-time performance. <br/>
Riva offers a rich set of speech and natural language understanding services such as:

- Automated speech recognition (ASR).
- Text-to-Speech synthesis (TTS).
- A collection of natural language processing (NLP) services, such as named entity recognition (NER), punctuation, and intent classification.

In this tutorial, we will deploy an ASR language model (n-gram) trained with NeMo on Riva. <br> 
To understand the basics of Riva ASR APIs, refer to [Getting started with Riva ASR in Python](https://github.com/nvidia-riva/tutorials/blob/main/asr-basics.ipynb). <br>
To see how to pretrain and fine-tune an n-gram language model for ASR with NeMo, refer to [this tutorial](). <br>

For more information about Riva, refer to the [Riva product page](https://www.nvidia.com/en-us/ai-data-science/products/riva/) and [Riva developer documentation](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/index.html).

## NeMo (Neural Modules) and `nemo2riva`
[NVIDIA NeMo](https://developer.nvidia.com/nvidia-nemo) is an open-source framework for building, training, and fine-tuning GPU-accelerated speech AI and natural language understanding (NLU) models with a simple Python interface. To fine-tune a Conformer-CTC acoustic model with NeMo, refer to the [Conformer-CTC fine-tuning tutorial](https://github.com/nvidia-riva/tutorials/blob/main/asr-finetuning-citrinet-nemo.ipynb).

The [`nemo2riva`]() command-line tool provides the capability to export your `.nemo` model in a format that can be deployed using [NVIDIA Riva](https://developer.nvidia.com/riva), a highly performant application framework for multi-modal conversational AI services using GPUs. A Python `.whl` file for `nemo2riva` is included in the [Riva Quick Start](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/riva/resources/riva_quickstart) resource folder. \[Editor's Note: This next sentence is not yet true, but hopefully will be by the time this tutorial is released.\] You can also install `nemo2riva` with `pip`, as shown in the [Conformer-CTC fine-tuning tutorial](https://github.com/nvidia-riva/tutorials/blob/main/asr-finetuning-conformer-ctc-nemo.ipynb). 

This tutorial explores taking a `.riva` model &mdash; the result of invoking the `nemo2riva` CLI tool (refer to the [Conformer-CTC fine-tuning tutorial](https://github.com/nvidia-riva/tutorials/blob/main/asr-finetuning-conformer-ctc-nemo.ipynb)) &mdash; and leveraging the Riva ServiceMaker framework to aggregate all the necessary artifacts for Riva deployment to a target environment. Once the model is deployed in Riva, you can issue inference requests to the server. We will demonstrate how quick and straightforward this whole process is.
In this tutorial, you will learn how to:
- Build an `.rmir` model pipeline from a `.riva` file with Riva ServiceMaker.
- Deploy the model locally on the Riva server.
- Send inference requests from a demo client using Riva API bindings.

---
## Prerequisites

Before we get started, ensure you have:
- Access to NVIDIA NGC and are able to download the Riva Quick Start [resources](https://ngc.nvidia.com/catalog/resources/nvidia:riva:riva_quickstart).
-  A _language_ model file that you want to deploy.
    - For more information on training and exporting an n-gram language model, refer to the [NeMo Language Modeling documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/asr_language_modeling.html).  
    - The language model file can be in either of the following formats: 
        - `.arpa`. You can download a pre-trained version from the [Riva ASR LM NGC model page](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/riva/models/speechtotext_en_us_lm).
        - `.binary`. You can download a pre-trained version from the [Riva ASR LM NGC model page](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/riva/models/speechtotext_en_us_lm).
- An _acoustic_ model file in the `.riva` format that you want to deploy. You can convert a `.nemo` model file to a `.riva` model file with the `nemo2riva` command.
    - For more information on customizing a Conformer-CTC acoustic model with NeMo and exporting the resulting model with `nemo2riva`, refer to the [Conformer-CTC fine-tuning tutorial](). 
    - Alternatively, you can obtain a pre-trained Conformer-CTC `.riva` model for English ASR [here](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/riva/models/speechtotext_en_us_conformer). 
    - For more information on training NeMo models, refer to the [Training](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/core/core.html#training) section in the [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/index.html). 
    - For more information on Conformer-CTC's architecture, refer to the [Conformer-CTC](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/models.html#conformer-ctc) section of the [NeMo ASR Models](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/models.html) page. 
    - For more information on the configuration files necessary for training Conformer-CTC with NeMo, refer to the [Conformer-CTC](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/configs.html#conformer-ctc) section of the [NeMo ASR Model Configuration Files](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/configs.html) page.
- Weighted Finite State Transducer (WFST) tokenizer and verbalizer files for Inverse Text Normalization (ITN). 
    - For more information on WFST and ITN, refer to the [NeMo Inverse Text Normalization: From Development to Production](https://arxiv.org/pdf/2104.05055.pdf) paper.
    - You can download pretrained WFST ITN model files from this [NVIDIA GPU Cloud (NGC)](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/riva/models/inverse_normalization_en_us) model page. 
- A decoder vocabulary file. You can download one from the [Riva ASR LM NGC](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/riva/models/speechtotext_en_us_lm) model page. 

---
## Riva ServiceMaker
Riva ServiceMaker is a set of tools that aggregates all the necessary artifacts (models, files, configurations, and user settings) for Riva deployment to a target environment. It has two main components:

### Riva-Build

This step helps build a Riva-ready version of the model. Its only output is an intermediate format (called an RMIR) of an end-to-end pipeline for the supported services within Riva. Let's consider an ASR n-gram language model. <br>

`riva-build` is responsible for the combination of one or more exported models (`.riva` files) into a single file containing an intermediate format called Riva Model Intermediate Representation (`.rmir`). This file contains a deployment-agnostic specification of the whole end-to-end pipeline along with all the assets required for the final deployment and inference. For more information, refer to the [documentation](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/asr/asr-pipeline-configuration.html?highlight=pipeline%20configuration).

In [None]:
riva_line_list = !wget -qO- https://docs.nvidia.com/deeplearning/riva/user-guide/docs/index.html | grep "NVIDIA Riva Skills"
riva_line_string = riva_line_list[0]
__riva_version__ = riva_line_string.split(' ')[3]
# __riva_version__ = '2.14.0'

In [None]:
MACHINE_TYPE="AMD64" #Change this to `ARM64_linux` or `ARM64_l4t` in case of an ARM64 machine.
TARGET_MACHINE="AMD64" #Change this to `ARM64_linux` or `ARM64_l4t` in case of an ARM64 machine.
# KEY = "nemotoriva" ##Encryption key used during nemo2riva # tlt_encode for the standard FastPitch and HiFiGAN RMIRs
KEY = "tlt_encode" ##Encryption key used during nemo2riva # tlt_encode for the standard FastPitch and HiFiGAN RMIRs
FORCE = True ## Whether to force-build a new TTS RMIR and replace any existing RMIRs

In [None]:
## Riva NGC, servicemaker image config.
if MACHINE_TYPE.lower() in ["amd64", "arm64_linux"]:
    RIVA_SM_CONTAINER = f"nvcr.io/nvidia/riva/riva-speech:{__riva_version__}-servicemaker"
elif MACHINE_TYPE.lower()=="arm64_l4t":
    RIVA_SM_CONTAINER = f"nvcr.io/nvidia/riva/riva-speech:{__riva_version__}-servicemaker-l4t-aarch64"

In [None]:
import glob
import os
ASR_MODEL_DIR = os.path.join(os.getcwd(), "asr-models")

In [None]:
# All model paths relative to Riva Servicemaker docker include the _SM suffix

ASR_MODEL_DIR_SM = "/data" # Path where we mount the downloaded ASR models in the Servicemaker docker

# Relative path to Acoustic Model that we fine-tuned in Notebook 5 of this lab
AM_SM = os.path.join(ASR_MODEL_DIR_SM, "custom-models", "riva", "Conformer-CTC-BPE.riva")

# Relative path to LM model artifacts
# Model that we fine-tuned in Notebook 4 of this lab
NGRAM_DIR = "ngram-results"
# DECODING_LM_BIN_SM = os.path.join(ASR_MODEL_DIR_SM, NGRAM_DIR, "interpolated_lm_60-40.bin")
DECODING_LEXICON_SM = os.path.join(ASR_MODEL_DIR_SM, NGRAM_DIR, "interpolated_lm_60-40.lexicon")
LM_DIR = glob.glob(os.path.join(ASR_MODEL_DIR, "speechtotext_en_us_lm_vdeployable*"))[-1].split('/')[-1]
DECODING_LM_BIN_SM = os.path.join(ASR_MODEL_DIR_SM, LM_DIR, "en-US_default_6.0.bin")
DECODING_VOCAB_SM = os.path.join(ASR_MODEL_DIR_SM, LM_DIR, "en-US_default_6.0_dict_vocab.txt")

# Relative path to ITN artifacts
ITN_DIR = glob.glob(os.path.join(ASR_MODEL_DIR, "inverse_normalization_en_us_vdeployable*"))[-1].split('/')[-1]
WFST_TOKENIZER_MODEL_SM = os.path.join(ASR_MODEL_DIR_SM, ITN_DIR, "tokenize_and_classify.far")
WFST_VERBALIZER_MODEL_SM = os.path.join(ASR_MODEL_DIR_SM, ITN_DIR, "verbalize.far")
FAR_SPEECH_HINTS_SM = os.path.join(ASR_MODEL_DIR_SM, ITN_DIR, "speech_class.far")

# Relative path where the generated .rmir file will be stored
RMIR_DIR = "custom-models/rmir"
!mkdir -p $ASR_MODEL_DIR/$RMIR_DIR
ASR_RMIR_DIR_SM = os.path.join(ASR_MODEL_DIR_SM, RMIR_DIR)
ASR_RMIR_SM = os.path.join(ASR_RMIR_DIR_SM, "asr_lm_itn_offline_custom.rmir")

In [None]:
# Get the ServiceMaker Docker container
! docker pull $RIVA_SM_CONTAINER

#### Build the `.rmir` file

**Notes** 
1. If your language model is in the `.arpa` format, use the flag `--decoding_language_model_arpa=$DECODING_LM_ARPA_SM` when invoking `riva-build`.
2. If your language model is in the `.binary` format, use the flag `--decoding_language_model_binary=$DECODING_LM_BINARY_SM` when invoking `riva-build`.
3. Refer to the [Riva ASR Pipeline Configuration documentation](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/asr/asr-pipeline-configuration.html) if you want to build an ASR pipeline for a supported language other than US English. To obtain the proper `riva-build` parameters for your particular application, select the acoustic model (the parameters below assume Conformer-CTC), language, and pipeline type (offline for the purposes of this tutorial) from the interactive web menu at the bottom of the first section of the page. 

Note to self: I tried replacing 
```sh
--decoding_vocab=$DECODING_VOCAB_SM
```
with 
```sh
--decoding_lexicon=$DECODING_LEXICON_SM
```
but I couldn't start the Riva server with the resulting RMIR and model files. 

Now, I'm trying to use the default LM and vocabulary, so the only customized component in the pipeline is the AM.

In [None]:
# Syntax: 
# riva-build <task-name> \
#     output-dir-for-rmir/model.rmir[:key] \
#     dir-for-riva/acoustic_model.riva[:key] \
#     --decoding_language_model_<arpa, binary>
! docker run --rm --gpus all -v $ASR_MODEL_DIR:$ASR_MODEL_DIR_SM $RIVA_SM_CONTAINER -- \
    riva-build speech_recognition $ASR_RMIR_SM:$KEY $AM_SM:$KEY \
        --force \
        --offline \
        --name=custom-conformer-en-US-asr-offline \
        --return_separate_utterances=True \
        --featurizer.use_utterance_norm_params=False \
        --featurizer.precalc_norm_time_steps=0 \
        --featurizer.precalc_norm_params=False \
        --ms_per_timestep=40 \
        --endpointing.start_history=200 \
        --nn.fp16_needs_obey_precision_pass \
        --endpointing.residue_blanks_at_start=-2 \
        --chunk_size=4.8 \
        --left_padding_size=1.6 \
        --right_padding_size=1.6 \
        --max_batch_size=16 \
        --featurizer.max_batch_size=512 \
        --featurizer.max_execution_batch_size=512 \
        --decoder_type=flashlight \
        --flashlight_decoder.asr_model_delay=-1 \
        --decoding_language_model_binary=$DECODING_LM_BIN_SM \
        --decoding_vocab=$DECODING_VOCAB_SM \
        --flashlight_decoder.lm_weight=0.8 \
        --flashlight_decoder.word_insertion_score=1.0 \
        --flashlight_decoder.beam_size=32 \
        --flashlight_decoder.beam_threshold=20. \
        --flashlight_decoder.num_tokenization=1 \
        --language_code=en-US \
        --wfst_tokenizer_model=$WFST_TOKENIZER_MODEL_SM \
        --wfst_verbalizer_model=$WFST_VERBALIZER_MODEL_SM \
        --speech_hints_model=$FAR_SPEECH_HINTS_SM

### Riva-Deploy

The deployment tool takes as input one or more RMIR files and a target model repository directory. It creates an ensemble configuration specifying the pipeline for the execution and finally writes all those assets to the output model repository directory.

**Note:** 
1. If you added an encryption key to your `.rmir` file when building it with `riva-build`, make sure to append a colon and then the key's value to the model's name in the `riva-deploy` command, as shown below.
2. When running `riva-deploy`, we map `$ASR_MODEL_DIR/custom-models` to `$ASR_MODEL_DIR_SM` (`/data`) inside the Riva ServiceMaker Docker container. This is because the scripts in the Riva Skills Quick Start resource folder expect the directory containing the `rmir` and `models` directories to be mapped to `/data`.

In [None]:
# Path to the model repostory relative to the SM docker
MODEL_REPO_SM = os.path.join(ASR_MODEL_DIR_SM, "models")
# Reset the RMIR path relative to the ServiceMaker Docker container
ASR_RMIR_SM = os.path.join(ASR_MODEL_DIR_SM, "rmir", "asr_lm_itn_offline_custom.rmir")

In [None]:
# Syntax: riva-deploy -f dir-for-rmir/model.rmir[:key] output-dir-for-repository
! docker run --rm --gpus all -v $ASR_MODEL_DIR/custom-models:$ASR_MODEL_DIR_SM $RIVA_SM_CONTAINER -- \
    riva-deploy -f  $ASR_RMIR_SM:$KEY $MODEL_REPO_SM

---
## Start the Riva Server
After the model repository is generated, we are ready to start the Riva server. If you didn't already do so in a previous tutorial in this lab, download the [Riva Quick Start](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/riva/resources/riva_quickstart) resource from NGC. 
Set the path to the directory here:

In [None]:
def ngc_download_and_get_dir(ngc_resource_name, resource_description, resource_type="model", parent_dir=ASR_MODEL_DIR):
    default_download_folder = "_v".join(ngc_resource_name.split("/")[-1].split(":"))
    download_path = os.path.join(parent_dir, default_download_folder)
    if os.path.exists(download_path):
        print(f"{resource_description} exists, skipping download")
        return default_download_folder
    ngc_output = !ngc registry $resource_type download-version $ngc_resource_name --dest $parent_dir
    if not os.path.exists(download_path):
        ngc_output_formatted='\n'.join(ngc_output)
        logging.error(
            f"NGC was not able to download the requested model {ngc_resource_name}. "
            "Please check the NGC error message, remove all directories, and re-start the "
            f"notebook. NGC message: {ngc_output_formatted}"
        )
        return None
    print(f"Successfully downloaded {resource_description}")
    return default_download_folder

In [None]:
if TARGET_MACHINE.lower() in ["amd64", "arm64_linux"]:
    quickstart_link = f"nvidia/riva/riva_quickstart:{__riva_version__}"
else:
    quickstart_link = f"nvidia/riva/riva_quickstart_arm64:{__riva_version__}"

RIVA_DIR = ngc_download_and_get_dir(quickstart_link, "Riva Quick Start resource folder", resource_type="resource", parent_dir=os.getcwd())
RIVA_DIR = os.path.join(os.getcwd(), RIVA_DIR)

Next, we modify the `config.sh` file to enable relevant Riva services (n-gram language model), provide the encryption key, and path to the model repository (`riva_model_loc`) generated in the previous step among other configurations. 

For example, if above the model repository is generated at `$ASR_MODEL_DIR/custom-models/models`, then you can specify `riva_model_loc` as the same directory as `ASR_MODEL_DIR/custom-models`. <br>

Pretrained versions of models specified in `models_asr/nlp/tts/nmt` are fetched from NGC. Since we are using our custom model, we can comment it in `models_asr` (and any others that are not relevant to your use case). <br>

#### config.sh snippet
```sh
### config.sh snippet  
# Enable or Disable Riva Services
# For any language other than en-US: service_enabled_nlp must be set to false
service_enabled_asr=true
service_enabled_nlp=true          ## MAKE CHANGES HERE - SET TO FALSE
service_enabled_tts=true          ## MAKE CHANGES HERE - SET TO FALSE
service_enabled_nmt=true          ## MAKE CHANGES HERE - SET TO FALSE

...

# Specify the encryption key to use to deploy models
MODEL_DEPLOY_KEY="tlt_encode"     ## MAKE CHANGES HERE (Replace with the key you used when running nemo2riva)

# Locations to use for storing models artifacts
#
# If an absolute path is speccified, the data will be written to that location
# Otherwise, a Docker volume will be used (default).
#
# riva_init.sh will create a `rmir` and `models` directory in the volume or
# path specified.
#
# RMIR ($riva_model_loc/rmir)
# Riva uses an intermediate representation (RMIR) for models
# that are ready to deploy but not yet fully optimized for deployment. Pretrained
# versions can be obtained from NGC (by specifying NGC models below) and will be
# downloaded to $riva_model_loc/rmir by `riva_init.sh`
#
# Custom models produced by NeMo or TLT and prepared using riva-build
# may also be copied manually to this location $(riva_model_loc/rmir).
#
# Models ($riva_model_loc/models)
# During the riva_init process, the RMIR files in $riva_model_loc/rmir
# are inspected and optimized for deployment. The optimized versions are
# stored in $riva_model_loc/models. The riva server exclusively uses these
# optimized versions.
riva_model_loc="riva-model-repo"  ## MAKE CHANGES HERE (Replace with the path ASR_MODEL_DIR/custom-models)

if [[ $riva_target_gpu_family == "tegra" ]]; then
    riva_model_loc="`pwd`/model_repository"
fi

# The default RMIRs are downloaded from NGC by default in the above $riva_rmir_loc directory
# If you'd like to skip the download from NGC and use the existing RMIRs in the $riva_rmir_loc
# then set the below $use_existing_rmirs flag to true. You can also deploy your set of custom
# RMIRs by keeping them in the riva_rmir_loc dir and use this quickstart script with the
# below flag to deploy them all together.
use_existing_rmirs=false          ## MAKE CHANGES HERE - SET TO TRUE
```

Run the cell below to make the following changes to `config.sh` without opening the file in a text editor:

1. Set NLP, NMT, and TTS services to `false`
2. Set the `riva_model_loc` path to the path also assigned to `ASR_MODEL_DIR/custom-models`
3. Set the variable `use_existing_rmirs` to `true`
4. Change the `MODEL_DEPLOY_KEY` variable from the default `tlt_encode` to the key you used when exporting the customized acoustic model with `nemo2riva`

In [None]:
with open(f"{RIVA_DIR}/config.sh", "r") as config_in:
    config_file = config_in.readlines()

for i, line in enumerate(config_file):
    # Disable services
    if line.startswith("service_enabled_asr"):
        config_file[i] = "service_enabled_asr=true\n"
    elif line.startswith("service_enabled_nlp"):
        config_file[i] = "service_enabled_nlp=false\n"
    elif line.startswith("service_enabled_nmt"):
        config_file[i] = "service_enabled_nmt=false\n"
    elif line.startswith("service_enabled_tts"):
        config_file[i] = "service_enabled_tts=false\n"
    # Update riva_model_loc to our rmir folder
    elif line.startswith("riva_model_loc"):
        config_file[i] = f'riva_model_loc="{ASR_MODEL_DIR}/custom-models"\n'
    elif line.startswith("use_existing_rmirs"):
        config_file[i] = "use_existing_rmirs=true\n"
    elif line.startswith("MODEL_DEPLOY_KEY"):
        config_file[i] = f'MODEL_DEPLOY_KEY="{KEY}"\n'

with open(f"{RIVA_DIR}/config.sh", "w") as config_in:
    config_in.writelines(config_file)

print("".join(config_file))

In [None]:
# Ensure you have permission to execute these scripts
! cd $RIVA_DIR && chmod +x ./riva_init.sh && chmod +x ./riva_start.sh && chmod +x ./riva_stop.sh

Normally, one runs `riva_init.sh` before `riva_start.sh`. However, since we've already built our `.rmir` file with `riva-build` and deployed the associated model files by running `riva-deploy`, we can skip straight to `riva_start.sh`.

In [None]:
# Run Riva Start. This will deploy your model.
! cd $RIVA_DIR && ./riva_start.sh config.sh

---
## Run Inference
After the Riva server is up and running with your models, you can send inference requests querying the server. 

To send gRPC requests, you can install the Riva Python API bindings for the client. This is available as a [Python module on PyPI](https://pypi.org/project/nvidia-riva-client/).

In [None]:
# Install the Client API Bindings
! pip install nvidia-riva-client

In [None]:
import riva.client

### Connect to the Riva Server and Run Inference

Calling this inference function queries the Riva server (using gRPC) to transcribe an audio file. 

In [None]:
def run_inference(audio_file, server='localhost:50051', print_full_response=False):
    with open(audio_file, 'rb') as fh:
        data = fh.read()

    auth = riva.client.Auth(uri=server)
    client = riva.client.ASRService(auth)
    config = riva.client.RecognitionConfig(
        language_code="en-US",
        max_alternatives=1,
        enable_automatic_punctuation=False,
    )

    response = client.offline_recognize(data, config)
    if print_full_response: 
        print(response)
    else:
        print(response.results[0].alternatives[0].transcript)

Now we can actually query the Riva server.

In [None]:
audio_file = "audio_samples/en-US_wordboosting_sample2.wav"
run_inference(audio_file, print_full_response=True)

In [None]:
os.path.exists(audio_file)

In [None]:
import IPython.display as ipd
import wave

In [None]:
auth = riva.client.Auth(uri='localhost:50051')

riva_asr = riva.client.ASRService(auth)

In [None]:
# Load a sample audio file from local disk
# This example uses a .wav file with LINEAR_PCM encoding.
audio_file = "audio_samples/en-US_wordboosting_sample2.wav"
    
# Listen to the sample audio we are looking to transcribe
ipd.Audio(audio_file)

In [None]:
wf = wave.open(audio_file, 'rb')
with open(audio_file, 'rb') as fh:
    content = fh.read()

# Creating RecognitionConfig
config = riva.client.RecognitionConfig(
  language_code="en-US",
  max_alternatives=1,
  enable_automatic_punctuation=True,
  audio_channel_count = 1
)

# ASR Inference call with Recognize 
response = riva_asr.offline_recognize(content, config)

print(response)

You can stop the Riva ServiceMaker container (and thus shut down the Riva server) before shutting down the Jupyter kernel.

In [None]:
! docker container stop riva-speech