<img src="https://developer.download.nvidia.com/notebooks/dlsw-notebooks/rivaasrasr-basics/nvidia_logo.png" style="width: 90px; float: right;">

# How do I use Riva ASR APIs with out-of-the-box models?

This tutorial walks you through the basics of Riva ASR NIM, specifically covering how to use Riva ASR APIs with out-of-the-box models.

## NVIDIA Riva NIM Overview

NVIDIA Riva ASR NIM APIs provide easy access to state-of-the-art automatic speech recognition (ASR) models for multiple languages. Riva ASR NIM models are built on the NVIDIA software platform, incorporating CUDA, TensorRT, and Triton to offer out-of-the-box GPU acceleration.

In this tutorial, we will interact with the automated speech recognition (ASR) APIs.

For more information about Riva ASR NIM, refer to the [Riva NIM documentation](https://docs.nvidia.com/nim/riva/asr/latest/overview.html).

## Transcription with Riva ASR APIs

ASR takes an audio stream or audio buffer as input and returns one or more text transcripts, along with additional optional metadata. 
Riva provides state-of-the-art OOTB (out-of-the-box) models and pipelines for multiple languages, like English, Spanish, German, Russian, Mandarin, etc that can be easily deployed with the Riva NIM. Riva also supports easy customization of the ASR pipeline, in various ways, to meet your specific needs. <br>
Refer to the [NIM support matrix](https://docs.nvidia.com/nim/riva/asr/latest/support-matrix.html) for more information.  

Now, let's generate the transcripts using Riva APIs, for some sample audio clips, with an OOTB NIMs, starting with English.

<a id='updated_reqs_and_setup_for_EngASR'></a>
#### Requirements and setup

1. Start the Riva ASR NIM server.  
Follow the instructions in the [Getting started page](https://docs.nvidia.com/nim/riva/asr/latest/getting-started.html#launching-the-nim) to deploy OOTB ASR models on the Riva NIM server before running this tutorial.


2. Install the Riva Client library.  
```
sudo apt-get install python3-pip
pip install -U nvidia-riva-client
```

#### Import the Riva client libraries

Let's import some of the required libraries, including the Riva Client libraries.

In [None]:
import io
import IPython.display as ipd
import grpc

import riva.client

#### Create a Riva client and connect to the Riva Speech API server

The following URI assumes a local deployment of the Riva Speech API server is on the default port. In case the server deployment is on a different host or via a Helm chart on Kubernetes, use an appropriate URI.

In [None]:
auth = riva.client.Auth(uri='localhost:50051')

riva_asr = riva.client.ASRService(auth)

### Offline recognition for English

You can use Riva ASR in either streaming mode or offline mode. In streaming mode, a continuous stream of audio is captured and recognized, producing a stream of transcribed text. In offline mode, an audio clip of a set length is transcribed to text. <br> 
Let's look at an example showing offline ASR API usage for English:

#### Make a gRPC request to the Riva Speech API server
Riva ASR API supports `.wav` files in pulse-code modulation (PCM) format; including `.alaw`, `.mulaw`, and `.flac` formats with single channel. 

Now, let's make a gRPC request to the Riva Speech server for ASR with a sample `.wav` file in offline mode. Start by loading the audio.

In [None]:
# This example uses a .wav file with LINEAR_PCM encoding.
# read in an audio file from local disk
path = "./audio_samples/en-US_sample.wav"
with io.open(path, 'rb') as fh:
    content = fh.read()
ipd.Audio(path)

Next, create an audio `RecognizeRequest` object, setting the configuration parameters as required.

In [None]:
# Set up an offline/batch recognition request
config = riva.client.RecognitionConfig()
#req.config.encoding = ra.AudioEncoding.LINEAR_PCM    # Audio encoding can be detected from wav
#req.config.sample_rate_hertz = 0                     # Sample rate can be detected from wav and resampled if needed
config.language_code = "en-US"                    # Language code of the audio clip
config.max_alternatives = 1                       # How many top-N hypotheses to return
config.enable_automatic_punctuation = True        # Add punctuation when end of VAD detected
config.audio_channel_count = 1                    # Mono channel

Finally, submit the request to the server.

In [None]:
response = riva_asr.offline_recognize(content, config)
asr_best_transcript = response.results[0].alternatives[0].transcript
print("ASR Transcript:", asr_best_transcript)

print("\n\nFull Response Message:")
print(response)

#### Understanding ASR API parameters

Riva ASR supports a number of options while making a transcription request to the gRPC endpoint, as shown in the previous section. Let's learn more about these parameters:
- `encoding` - Type of audio encoding of the input audio file. Supports (`LINEAR_PCM`, `FLAC`, `MULAW` or `ALAW`). Can be detected from audio file
- `sample_rate_hertz` - Sampling rate of the input audio in Hz. Note that the sample rate can be detected automatically from the audio `.wav` file and resampled if needed, making this parameter optional.
- `language_code` - Language of the input audio. "en-US" represents English (US). Other options include (`es-US`, `de-DE`, `ru-RU`, `zh-CN`). We will explore ASR for non-English languages in the next section.
- `max_alternatives` - Determines the number of top alternative transcriptions to return
- `enable_automatic_punctuation` - Adds a punctuation at the end of VAD (Voice Activity Detection).
- `audio_channel_count` - Number of audio channels. Typical microphones have 1 audio channel.

### Multilingual Offline recognition - Parakeet-RNNT example

In the previous section, we went through the Riva API usage and understood the different parameters of the ASR API. Now, let's look at using the ASR APIs for non-English languages, like Spanish, in offline mode.

Note that we offer multingual models like Parakeet-RNNT, Canary, Whisper,etc which can run Riva ASR for supported languages. We will elaborate on this at the end of this section.

<a id='updated_reqs_and_setup_for_nonEngASR'></a>
#### Requirements and Setup for Multilingual ASR NIM:

The requirements and setup steps for multilingual ASR NIM is the almost the same as for English ASR NIM. We need to first deploy the required ASR as a NIM server. <br>

Follow the instructions [here](https://docs.nvidia.com/nim/riva/asr/latest/getting-started.html#launching-the-nim). Use `CONTAINER_ID` as `parakeet-1-1b-rnnt-multilingual` and `NIM_TAGS_SELECTOR` as `"mode=ofl"`

#### Make a gRPC request to the Riva Speech API server
Let's make a gRPC request to the Riva Speech server for ASR with a sample Spanish `.wav` file in offline mode.  

Like before, start by loading the audio.

In [None]:
# This example uses a .wav file with LINEAR_PCM encoding.
# read in an audio file from local disk
path = "audio_samples/es-US_sample.wav" #Link to the Spanish sample audio file
with io.open(path, 'rb') as fh:
    content = fh.read()
ipd.Audio(path)

As with English, create an audio `RecognizeRequest` object, setting the configuration parameters as required. Notice that we have updated ``language_code`` of the request configuration to the Spanish language code (``"es-US"``).

In [None]:
# Set up an offline/batch recognition request
config = riva.client.RecognitionConfig()
#req.config.encoding = ra.AudioEncoding.LINEAR_PCM    # Audio encoding can be detected from wav
#req.config.sample_rate_hertz = 0                     # Sample rate can be detected from wav and resampled if needed
config.language_code = "es-US"                    # Language code of the audio clip. Set to Spanish
config.max_alternatives = 1                       # How many top-N hypotheses to return
config.enable_automatic_punctuation = True        # Add punctuation when end of VAD detected
config.audio_channel_count = 1                    # Mono channel

Finally, submit the request to the server.

In [None]:
response = riva_asr.offline_recognize(content, config)
asr_best_transcript = response.results[0].alternatives[0].transcript
print("ASR Transcript:", asr_best_transcript)

print("\n\nFull Response Message:")
print(response)

We can similarly run Riva ASR for other languages by setting their corresponding language codes in the request configuration. <br>
We also support NIMs for other model architectures:
- [Canary](https://docs.nvidia.com/nim/riva/asr/latest/support-matrix.html#canary-1b-multilingual)
- [Parakeet-TDT](https://docs.nvidia.com/nim/riva/asr/latest/support-matrix.html#parakeet-0-6b-tdt-v2-english)
- [Canary-Turbo](https://docs.nvidia.com/nim/riva/asr/latest/support-matrix.html#canary-0-6b-turbo-multilingual)
- [Whisper](https://docs.nvidia.com/nim/riva/asr/latest/support-matrix.html#whisper-large-v3-multilingual)

## Go deeper into Riva capabilities


### Additional Riva Tutorials

Checkout more Riva tutorials [here](https://github.com/nvidia-riva/tutorials) to understand how to use some of the advanced features of Riva ASR, including customizing ASR for your specific needs.


### Additional Resources

For more information about each of the Riva APIs and their functionalities, refer to the [documentation](https://docs.nvidia.com/nim/riva/asr/latest/protos.html).