<a href="https://www.nvidia.com/dli"> <img src="images/DLI_Header.png" alt="Header" style="width: 400px;"/> </a>

# 5.0 ASR Pipeline Deployment with NVIDIA Riva
## (part of Lab 1)

In this notebook, you'll deploy an ASR pipeline with [NVIDIA Riva](https://developer.nvidia.com/riva). After the models are deployed in Riva, you can issue inference requests to the Riva server from a client.

**[5.1 NVIDIA Riva](#5.1-NVIDIA-Riva)<br>**
**[5.2 Launch Riva Server](#5.2-Launch-Riva-Server)<br>**
&nbsp;&nbsp;&nbsp;&nbsp;[5.2.1 Riva Configuration](#5.2.1-Riva-Configuration)<br>
&nbsp;&nbsp;&nbsp;&nbsp;[5.2.2 Exercise: Configure Riva for ASR](#5.2.2-Exercise:-Configure-Riva-for-ASR)<br>
&nbsp;&nbsp;&nbsp;&nbsp;[5.2.3 Riva Start Services](#5.2.3-Riva-Start-Services)<br>
&nbsp;&nbsp;&nbsp;&nbsp;[5.2.4 Riva Available Services Check](#5.2.4-Riva-Available-Services-Check)<br>
**[5.3 Riva ASR Service Request](#5.3-Riva-ASR-Service-Request)<br>**
&nbsp;&nbsp;&nbsp;&nbsp;[5.3.1 Python Client Demo](#5.3.1-Python-Client-Demo)<br>
**[5.4 Streaming ASR](#5.4-Streaming-ASR)<br>**
**[5.5 Riva Customization Capabilities](#5.5-Riva-Customization-Capabilites)<br>**
&nbsp;&nbsp;&nbsp;&nbsp;[5.5.1 Word Boosting](#5.5.1-Word-Boosting)<br>
**[5.6 Stop Riva Services](#5.6-Stop-Riva-Services)<br>**
**[5.7 Shut Down the Kernel](#5.7-Shut-Down-the-Kernel)<br>**

### Notebook Dependencies
The steps in this notebook assume that you have:

1. **NGC Credentials**<br>Be sure you have added your NGC credential as described in the [NGC Setup notebook](003_Intro_NGC_Setup.ipynb)

---
# 5.1 NVIDIA Riva

NVIDIA Riva is a GPU-accelerated speech AI SDK for building and deploying Real-time Speech AI pipelines. It offers a complete workflow to build and customize Speech Recognition and Synthesis pipelines. With the NVIDIA Riva platform, you can:

- Build State-of-the-Art speech AI pipelines using pretrained NVIDIA models available at NVIDIA GPU Cloud ([NGC](https://ngc.nvidia.com/catalog/models?orderBy=modifiedDESC&query=%20label%3A%22NeMo%2FPyTorch%22&quickFilter=models&filters=)). Riva provides world-class automatic speech recognition (ASR) and text-to-speech (TTS) that runs in real time.

- Customize the pipeline and fine-tune AI models on domain-specific data, with NVIDIA [NeMo](https://github.com/NVIDIA/NeMo) and 
[TAO Toolkit](https://docs.nvidia.com/tao/tao-toolkit/index.html#tao-toolkit).

- Optimize the neural networks performance and latency using [NVIDIA TensorRT](https://developer.nvidia.com/tensorrt).

- Deploy Speech AI pipelines with [Triton Inference Server](https://developer.nvidia.com/nvidia-triton-inference-server).

For more detailed information on NVIDIA Riva Speech AI, please refer to the [Riva developer documentation](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/index.html).

---
# 5.2 Launch Riva Server
After the model repository is generated, we are ready to start the Riva server.  

NVIDIA Riva provides a [Quick Start Guide](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/quick-start-guide.html#local-deployment-using-quick-start-scripts). For this step, we use Riva Quick Start scripts downloaded from NGC.  The scripts have already been downloaded for the class.  You can download them yourself, either directly from NGC while logged in, or using the NGC command line tool 

Set `RIVA_QS` to the `riva_quickstart` location:  

In [None]:
# Set the Riva Quick Start directory
WORKSPACE='/dli_workspace'
RIVA_QS = WORKSPACE + "/riva_quickstart"
RIVA_MODEL_REPO = WORKSPACE + "/riva-asr-model-repo"

In [None]:
!ls $RIVA_QS

There are a number of scripts available for managing Riva services. We can initialize the models using `riva_init.sh`, then start and stop the server with `riva_start.sh` and `riva_stop.sh`. We also need to set flags and values in `config.sh` to specify which services and models we want to initiate and start. 

## 5.2.1 Riva Configuration

Open [config.sh](dli_workspace/riva_quickstart/config.sh) and note the following important sections:

##### Enable/Disable Riva Services
For each service, a true value means that the server is enabled for that particular capability.  For example, if we just want to run an ASR server, we can set the `service_enabled_asr` parameter to be `true` and all other parameters `false`.  An enabled service also means that later in the config file, all NGC models listed in the section will be downloaded.
```yaml
# Enable or Disable Riva Services
service_enabled_asr=true
service_enabled_nlp=true
service_enabled_tts=true
```

##### Specify the Language
You can specify the language code for the models that will be loaded.  The instructions and available language codes are included in the `config.sh` file: 
```yaml
# Language code to fetch models of a specify language
# Currently only ASR supports languages other than English
# Supported language codes: ar-AR, en-US, en-GB, de-DE, es-ES, es-US, fr-FR, hi-IN, it-IT, ja-JP, ru-RU, ko-KR, pt-BR, zh-CN
# for any language other than English, set service_enabled_nlp and service_enabled_tts to False
# for multiple languages enter space separated language codes.
language_code=("en-US")
```

For this notebook, we want to load both the English and Spanish models.  To load both, you can change the setting to:<br>
```yaml
language_code=("en-US", "es-US")
```

##### Set the Encryption Key
   We want our encryption consistent for all of our projects, so we want this key to be the same as the one used to export our original model (and it already is!).  For the purposes of this class, this setting won't change.
```yaml
# Specify the encryption key to use to deploy models
MODEL_DEPLOY_KEY="tlt_encode"
```

##### Set the Model Location
`riva_model_loc` should be the folder that contains both the `rmir` folder `models` folders.  This value will need to be changed to the actual absolute path for a given project.
```yaml
# Custom models produced by NeMo or TLT and prepared using riva-build
# may also be copied manually to this location $(riva_model_loc/rmir).
#
# Models ($riva_model_loc/models)
# During the riva_init process, the RMIR files in $riva_model_loc/rmir
# are inspected and optimized for deployment. The optimized versions are
# stored in $riva_model_loc/models. The riva server exclusively uses these
# optimized versions.
riva_model_loc="/riva-model-repo"
```

In [None]:
# model repo prebuilt location is "/dli_workspace/riva-asr-model-repo"
!ls $RIVA_MODEL_REPO/models

## 5.2.2 Exercise: Configure Riva for ASR

Open [config.sh](dli_workspace/riva_quickstart/config.sh) and modify it to:
* Deploy only the ASR service 
* Specify both English and Spanish
* Specify the `/dli_workspace/riva-asr-model-repo` model repo location where we've preloaded the ASR models

Save your work.

If you're not sure what to change, take a peek at the [solution](solutions/ex5.2.2_config.sh).

Check your work.  The `diff` comparison in the following cell should have no output.

In [None]:
# Check your work
!diff solutions/ex5.2.2_config.sh dli_workspace/riva_quickstart/config.sh

In [None]:
# Quick fix!
!cp solutions/ex5.2.2_config.sh dli_workspace/riva_quickstart/config.sh

## 5.2.3 Riva Start Services

The `riva_init.sh` script downloads the Riva containers needed, downloads models listed in `config.sh`, and optimizes  models as required with [NVIDIA TensorRT](https://developer.nvidia.com/tensorrt). Since we've already used the ServiceMaker `riva-deploy` tool to optimize the models we are using, `riva_init.sh` won't have much to do, but it is provided here for completeness.

The `riva_start.sh` script starts the server.

In [None]:
# Initialize Riva
# Models have been preloaded, so TensorRT builds ("deployment") will be skipped
!cd $RIVA_QS && bash riva_init.sh config.sh

In [None]:
# Run Riva Start. This will start the server.
!cd $RIVA_QS && bash riva_start.sh config.sh

Riva ASR services should be running when you get "Riva server is ready..." (about 1 minute).

##### Troubleshooting:
If it failed, open a terminal and clean the Riva model repository with:

```bash
cd /dli_workspace/riva_quickstart && bash riva_clean.sh config.sh
```
   
Run Riva Start Services as explained previously.

## 5.2.4 Riva Available Services Check

To check the exposed Riva services, run the `docker logs riva-speech` command. 

You should see the following models ready:



```
+-------------------------------------------------------------------+---------+--------+
| Model                                                             | Version | Status |
+-------------------------------------------------------------------+---------+--------+
| conformer-en-US-asr-offline                                       | 1       | READY  |
| conformer-en-US-asr-offline-ctc-decoder-cpu-streaming-offline     | 1       | READY  |
| conformer-en-US-asr-offline-endpointing-streaming-offline         | 1       | READY  |
| conformer-en-US-asr-offline-feature-extractor-streaming-offline   | 1       | READY  |
| conformer-en-US-asr-streaming                                     | 1       | READY  |
| conformer-en-US-asr-streaming-ctc-decoder-cpu-streaming           | 1       | READY  |
| conformer-en-US-asr-streaming-endpointing-streaming               | 1       | READY  |
| conformer-en-US-asr-streaming-feature-extractor-streaming         | 1       | READY  |
| conformer-es-US-asr-offline                                       | 1       | READY  |
| conformer-es-US-asr-offline-ctc-decoder-cpu-streaming-offline     | 1       | READY  |
| conformer-es-US-asr-offline-endpointing-streaming-offline         | 1       | READY  |
| conformer-es-US-asr-offline-feature-extractor-streaming-offline   | 1       | READY  |
| conformer-es-US-asr-streaming                                     | 1       | READY  |
| conformer-es-US-asr-streaming-ctc-decoder-cpu-streaming           | 1       | READY  |
| conformer-es-US-asr-streaming-endpointing-streaming               | 1       | READY  |
| conformer-es-US-asr-streaming-feature-extractor-streaming         | 1       | READY  |
| riva-punctuation-en-US                                            | 1       | READY  |
| riva-punctuation-es-US                                            | 1       | READY  |
| riva-trt-conformer-en-US-asr-offline-am-streaming-offline         | 1       | READY  |
| riva-trt-conformer-en-US-asr-streaming-am-streaming               | 1       | READY  |
| riva-trt-conformer-es-US-asr-offline-am-streaming-offline         | 1       | READY  |
| riva-trt-conformer-es-US-asr-streaming-am-streaming               | 1       | READY  |
| riva-trt-riva-punctuation-en-US-nn-bert-base-uncased              | 1       | READY  |
| riva-trt-riva-punctuation-es-US-nn-bert-base-multilingual-uncased | 1       | READY  |
+-------------------------------------------------------------------+---------+--------+
```

In [None]:
!docker logs riva-speech

---
# 5.3 Riva ASR Service Request 
To access the Riva API, we need to:
1. Start the Riva Speech Skills server. (already done!)
2. Install the [Riva Client library](https://github.com/nvidia-riva/tutorials#running-the-riva-client). (already done for this course!)
3. Set up requests using the [documentation tutorial example](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/tutorials/asr-python-basics.html) for transcription.

## 5.3.1 Python Client Demo

Riva ASR service supports a number of options while making a transcription request. Let's learn more about these parameters:

- `language_code`: Language of the input audio. "en-US" represents English (US); es-US represents Spanish.
- `enable_automatic_punctuation`: Run a punctuation and Capitalization at post processing.
- `max_alternatives`: Number of top alternative transcriptions to return.
- `audio_channel_count`: Number of audio channels. Typical microphones have 1 audio channel.


let's load and listen to two audio samples from different languages and query Riva ASR service.

In [None]:
# import the relevant libraries
import librosa
import IPython.display as ipd
from IPython.display import Audio, display
import io
import riva.client

# set audio_samples folder
AUDIO_SAMPLES = "/opt/nvidia-riva/tutorials/audio_samples"

In [None]:
SAMPLE_ENGLISH = AUDIO_SAMPLES + "/en-US_sample.wav"

with io.open(SAMPLE_ENGLISH, 'rb') as fh:
    signal_english = fh.read()
ipd.Audio(SAMPLE_ENGLISH) 

In [None]:
SAMPLE_SPANISH = AUDIO_SAMPLES + "/es-US_sample.wav"

with io.open(SAMPLE_SPANISH, 'rb') as fh:
    signal_spanish = fh.read()
ipd.Audio(SAMPLE_SPANISH)

Connect the Riva server port to the client.

In [None]:
auth = riva.client.Auth(uri='localhost:50051')
riva_asr = riva.client.ASRService(auth)

Create a drop-down menu for convenience.

In [None]:
# Create a drop-down menu
from ipywidgets import Select, HBox, Label, Dropdown
from IPython.display import display

audio_signal={"English": signal_english, "Spanish": signal_spanish}
language = {"English": "en-US", "Spanish":"es-US"}
automatic_punctuation = {"Enable":True, "Disable":False}

language_selector=Dropdown(options=['English', 'Spanish'], value='English', description='Language:')
punctuation_selector=Dropdown(options=['Enable', 'Disable'], value='Enable',description='Punctuation & Capitalization:')

print()
print("Select the ASR Pipeline to query. Choose one language enable or disable automatic punctuation and capitalization:")

display(HBox([language_selector, punctuation_selector]))

Configure the request.

In [None]:
# Set up an offline/batch recognition request
config = riva.client.RecognitionConfig()
#req.config.encoding = ra.AudioEncoding.LINEAR_PCM    # Audio encoding can be detected from wav
#req.config.sample_rate_hertz = 0                     # Sample rate can be detected from wav and resampled if needed
config.language_code = language[language_selector.value]                    # Language code of the audio clip
config.max_alternatives = 1                       # How many top-N hypotheses to return
config.enable_automatic_punctuation = automatic_punctuation[punctuation_selector.value]       # Add punctuation when end of VAD detected
config.audio_channel_count = 1                    # Mono channel

Get the transcription.

In [None]:
response = riva_asr.offline_recognize(audio_signal[language_selector.value], config)
asr_best_transcript = response.results[0].alternatives[0].transcript
print("ASR Transcript:", asr_best_transcript)

print("\n\nFull Response Message:")
print(response)

---
# 5.4 Streaming ASR
The https://github.com/nvidia-riva/python-clients repository includes a directory of Python scripts.  These scripts are included in this course instance. We can use the streaming client script to see how the ASR transcribes words as they are spoken in a stream.  Try it!

In [None]:
import soundfile as sf

data, samplerate = sf.read("/dli_workspace/data/audio_sample.wav", dtype='float32')
sf.write("/dli_workspace/data/audio_sample_resampled2.wav", data, samplerate)
SAMPLE_ENGLISH_WB="/dli_workspace/data/audio_sample_resampled2.wav"
ipd.Audio(SAMPLE_ENGLISH_WB) 

In [None]:
# set the location of the Python script
PYTHON_SCRIPTS = "/opt/nvidia-riva/python-clients/scripts"
! python $PYTHON_SCRIPTS/asr/riva_streaming_asr_client.py -h    

In [None]:
! python $PYTHON_SCRIPTS/asr/riva_streaming_asr_client.py \
        --input-file $SAMPLE_ENGLISH_WB \
        --server "localhost:50051" \
        --language-code "en-US" \
        --automatic-punctuation 

In [None]:
!cat output_0.txt

---
# 5.5 Riva Customization Capabilites


The following flow diagram shows the Riva speech recognition pipeline along with the possible customizations.

<img src="https://docs.nvidia.com/deeplearning/riva/user-guide/docs/_images/riva-asr-pipeline-best-practices.png" height=50> 





## 5.5.1 Word Boosting 

In [None]:
! python $PYTHON_SCRIPTS/asr/transcribe_file_offline.py -h

In [None]:
! python $PYTHON_SCRIPTS/asr/transcribe_file_offline.py \
        --server "localhost:50051" \
        --input-file $SAMPLE_ENGLISH_WB \
        --language-code "en-US"

In [None]:
! python $PYTHON_SCRIPTS/asr/transcribe_file_offline.py\
        --server "localhost:50051" \
        --input-file $SAMPLE_ENGLISH_WB \
        --language-code "en-US" \
        --boosted-lm-words "daina" \
        --boosted-lm-score 200

---
# 5.6 Stop Riva Services 
Stop Riva services.  This shuts down all the containers.

In [None]:
# Run Riva Stop. 
!bash $RIVA_QS/riva_stop.sh

---
# 5.7 Shut Down the Kernel
<h3 style="color:red;">Important!</h3>

From the menu above, choose ***Kernel->Shut Down Kernel*** to fully clear GPU memory before moving on.

---
<h2 style="color:green;">Congratulations!</h2>

In this notebook, you have:
- Launched Riva ASR service
- Requested the ASR service using a Python client API

This concludes the Lab 1 ASR hands-on material.  The Lab 2 TTS hands-on material begins with an introduction to the [TTS Pipeline](006_TTS_Pipeline.ipynb).

<a href="https://www.nvidia.com/dli"> <img src="images/DLI_Header.png" alt="Header" style="width: 400px;"/> </a>