<img src="http://developer.download.nvidia.com/compute/machine-learning/frameworks/nvidia_logo.png" style="width: 90px; float: right;">

# How do I boost specific words at runtime with word boosting?

This tutorial walks you through some of the advanced features for customization of Riva Speech Skills ASR Services at runtime with word boosting.

## NVIDIA Riva Overview

NVIDIA Riva is a GPU-accelerated SDK for building Speech AI applications that are customized for your use case and deliver real-time performance. <br/>
Riva offers a rich set of speech and natural language understanding services such as:

- Automated speech recognition (ASR)
- Text-to-Speech synthesis (TTS)
- A collection of natural language processing (NLP) services, such as named entity recognition (NER), punctuation, intent classification.

In this tutorial, we will customize Riva ASR to boost specific words at runtime with word boosting. <br> 
To understand the basics of Riva ASR APIs, refer to [Getting started with Riva ASR in Python](https://github.com/nvidia-riva/tutorials/blob/dev/22.04/asr-python-basics.ipynb). <br>

For more information about Riva, refer to the [Riva developer documentation](https://developer.nvidia.com/riva).

## Word boosting with Riva ASR APIs

Word boosting is one of the customizations Riva offers. It allows you to bias the ASR engine to recognize particular words of interest at request time by giving them a higher score when decoding the output of the acoustic model.  

Now, let's use word boosting with Riva APIs for some sample audio clips with an OOTB (out-of-the-box) English pipeline.

#### Requirements and setup

1. Start the Riva Speech Skills server.  
Follow the instructions in the [Riva Skills Quick Start Guide](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/quick-start-guide.html#) to deploy OOTB ASR models on the Riva Speech Skills server before running this tutorial. By default, only the English models are deployed.


2. Install the Riva Client library.  
Follow the steps in the [Requirements and setup for the Riva Client](https://github.com/nvidia-riva/tutorials#riva-client) to install the Riva Client library.


3. Install the additional Python libraries to run this tutorial.  
Run the following commands to install the libraries:

In [1]:
# librosa is a python package for music and audio analysis. We use it here to load and play the sample audio clips
!pip install librosa==0.9.1
# libpq-dev and libsndfile-dev are needed for librosa
!apt-get update && apt-get upgrade -y && apt-get install -y && apt-get -y install apt-utils gcc libpq-dev libsndfile-dev

# Riva exposes gRPC APIs. We need grpcio and grpcio-tools installed to for making gRPC requests
!pip install grpcio==1.44.0
!pip install grpcio-tools==1.44.0

You should consider upgrading via the '/usr/local/bin/python -m pip install --upgrade pip' command.[0m
Hit:1 http://deb.debian.org/debian buster InRelease
Hit:2 http://deb.debian.org/debian buster-updates InRelease
Hit:3 http://security.debian.org/debian-security buster/updates InRelease
Reading package lists... Done
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Calculating upgrade... Done
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Reading package lists... Done
Building dependency tree       
Reading state information... Done
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Note, selecting 'libsndfile1-dev' instead of 'libsndfile-dev'
apt-utils is already the newest version (1.8.2.3).
gcc is already the newest version (4:8.3.0-1).
libsndfile1-dev is already the newest version (1.0.28-6+deb10u1).
libpq-dev is

#### Import the Riva client libraries

Let's import some of the required libraries, including the Riva Client libraries.

In [2]:
import io
import librosa
import IPython.display as ipd
import grpc

import riva_api.riva_asr_pb2 as rasr
import riva_api.riva_asr_pb2_grpc as rasr_srv
import riva_api.riva_audio_pb2 as ra

#### Create a Riva client and connect to the Riva Speech API server

The following URI assumes a local deployment of the Riva Speech API server is on the default port. In case the server deployment is on a different host or via a Helm chart on Kubernetes, use an appropriate URI.

In [3]:
channel = grpc.insecure_channel('localhost:50051')

riva_asr = rasr_srv.RivaSpeechRecognitionStub(channel)

#### ASR inference without word boosting
First, let's run ASR on our sample audio clip without word boosting.

In [4]:
# This example uses a .wav file with LINEAR_PCM encoding.
# read in an audio file from local disk
path = "audio_samples/en-US_wordboosting_sample.wav"
audio, sr = librosa.core.load(path, sr=None)
with io.open(path, 'rb') as fh:
    content = fh.read()
ipd.Audio(path)

In [5]:
# Creating RecognitionConfig
config = rasr.RecognitionConfig(
  encoding=ra.AudioEncoding.LINEAR_PCM,
  sample_rate_hertz=sr,
  language_code="en-US",
  max_alternatives=1,
  enable_automatic_punctuation=True,
    audio_channel_count = 1
)

# Creating RecognizeRequest
req = rasr.RecognizeRequest(audio = content, config = config)

# ASR Inference call with Recognize 
response = riva_asr.Recognize(req)
asr_best_transcript = response.results[0].alternatives[0].transcript
print("ASR Transcript without Word Boosting:", asr_best_transcript)

ASR Transcript without Word Boosting: Anti, Berta and Aber, both transformer based language models are examples of the emerging work in using graph networks to design protein sequences for particular target antigens. 


As you can see, ASR is having a hard time recognizing domain specific terms like `_AntiBERTa_` and `_ABlooper_`. <br>

Now, let's use word boosting to try to improve ASR for these domain specific terms.

#### ASR inference with word boosting

Let's look at how to add the boosted words to `RecognitionConfig` with `SpeechContext`. (For more information about `SpeechContext`, refer to the docs [here](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/reference/protos/protos.html#_CPPv413SpeechContext)).

In [6]:
# Creating SpeechContext for word boosting
boosted_lm_words = ["AntiBERTa", "ABlooper"]
boosted_lm_score = 20.0
speech_context = rasr.SpeechContext()
speech_context.phrases.extend(boosted_lm_words)
speech_context.boost = boosted_lm_score

# Update RecognitionConfig with SpeechContext
config.speech_contexts.append(speech_context)

# Creating RecognizeRequest
req = rasr.RecognizeRequest(audio = content, config = config)

# ASR Inference call with Recognize 
response = riva_asr.Recognize(req)
asr_best_transcript = response.results[0].alternatives[0].transcript
print("ASR Transcript with Word Boosting:", asr_best_transcript)

ASR Transcript with Word Boosting: AntiBERTa and ABlooper, both transformer based language models are examples of the emerging work in using graph networks to design protein sequences for particular target antigens. 


As you can see, with word boosting, ASR is able to correctly transcribe the domain specific terms `_AntiBERTa_` and `_ABlooper_`.

_Boost Score_: The recommended range for the boost score is 20 to 100. The higher the boost score, the more biased the ASR engine is towards this word.  
_OOV Word Boosting_: OOV words can also be word boosted; in the exact same way as in-vocabulary words, as described above.

#### Boosting different words at different levels
With Riva ASR, we can also have different boost values for different words. For example, here _AntiBERTa_ is boosted by 10 and _ABlooper_ is boosted by 20:

In [7]:
# Creating RecognitionConfig
config = rasr.RecognitionConfig(
  encoding=ra.AudioEncoding.LINEAR_PCM,
  sample_rate_hertz=sr,
  language_code="en-US",
  max_alternatives=1,
  enable_automatic_punctuation=True,
    audio_channel_count = 1
)

# Creating SpeechContext for word boosting AntiBERTa
speech_context1 = rasr.SpeechContext()
speech_context1.phrases.append("AntiBERTa")
speech_context1.boost = 20.

# Creating SpeechContext for word boosting ABlooper
speech_context2 = rasr.SpeechContext()
speech_context2.phrases.append("ABlooper")
speech_context2.boost = 40.
config.speech_contexts.append(speech_context2)

# Update RecognitionConfig with both SpeechContexts
config.speech_contexts.append(speech_context1)
config.speech_contexts.append(speech_context2)

# Creating RecognizeRequest
req = rasr.RecognizeRequest(audio = content, config = config)

# ASR Inference call with Recognize 
response = riva_asr.Recognize(req)
asr_best_transcript = response.results[0].alternatives[0].transcript
print("ASR Transcript with Word Boosting:", asr_best_transcript)

ASR Transcript with Word Boosting: AntiBERTa and ABlooper, both transformer based language models are examples of the emerging work in using graph networks to design protein sequences for particular target antigens. 


#### Negative word boosting for undesired words
We can even use word boosting to discourage prediction of some words, by using negative boost scores.

In [8]:
# Creating RecognitionConfig
config = rasr.RecognitionConfig(
  encoding=ra.AudioEncoding.LINEAR_PCM,
  sample_rate_hertz=sr,
  language_code="en-US",
  max_alternatives=1,
  enable_automatic_punctuation=True,
    audio_channel_count = 1
)

# Creating SpeechContext for Word Boosting
negative_boosted_lm_word = "antigens"
negative_boosted_lm_score = -100.0
speech_context = rasr.SpeechContext()
speech_context.phrases.append(negative_boosted_lm_word)
speech_context.boost = negative_boosted_lm_score

# Update RecognitionConfig with SpeechContext
config.speech_contexts.append(speech_context)

# Creating RecognizeRequest
req = rasr.RecognizeRequest(audio = content, config = config)

# ASR Inference call with Recognize 
response = riva_asr.Recognize(req)
asr_best_transcript = response.results[0].alternatives[0].transcript
print("ASR Transcript with Negative Word Boosting:", asr_best_transcript)

ASR Transcript with Negative Word Boosting: Anti, Berta and Aber, both transformer based language models are examples of the emerging work in using graph networks to design protein sequences for particular target antigen. 


By providing a negative boost score for `_antigens_`, we made Riva ASR transcribe `_antigen_` instead of `_antigens_`.

Note:

- There is no limit to the number of words that can be boosted. You should see no impact on latency for all requests, even for ~100 boosted words, except for the first request, which is expected.
- Boosting phrases or a combination of words is not yet fully supported (but do work). We will revisit finalizing this support in an upcoming release.
- By default, no words are boosted on the server side. Only words passed by the client are boosted.

Information about word boosting can also be found in the documentation [here](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/asr/asr-customizing.html#word-boosting). 

## Go deeper into Riva capabilities

Now that you have a basic introduction to the Riva ASR APIs, you can try:

### Additional Riva tutorials

Checkout more Riva ASR (and TTS) tutorials [here](https://github.com/nvidia-riva/tutorials) to understand how to use some of the advanced features of Riva ASR, including customizing ASR for your specific needs.


### Sample applications

Riva comes with various sample applications. They demonstrate how to use the APIs to build applications such as a [chatbot](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/samples/weather.html), a domain specific speech recognition, [keyword (entity) recognition system](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/samples/callcenter.html), or simply how Riva allows scaling out for handling massive amounts of requests at the same time. Refer to ([SpeechSquad)](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/samples/speechsquad.html) for more information.  
Refer to the *Sample Application* section in the [Riva developer documentation](https://developer.nvidia.com/) for more information.


###  Riva Text-To-Speech (TTS)

Riva's TTS offering comes with two OOTB voices that can be used in streaming or batch inference modes. They can be easily deployed using the [Riva Quick Start scripts](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/quick-start-guide.html). Follow [this link](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/tts/tts-overview.html) to understand Riva's TTS capabilities. Explore how to use Riva TTS APIs with the OOTB voices with [this Riva TTS tutorial](https://github.com/nvidia-riva/tutorials/blob/dev/22.04/tts-python-basics.ipynb).


### Additional resources

For more information about each of the APIs and their functionalities, refer to the [documentation](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/protobuf-api/protobuf-api-root.html).