<img src="http://developer.download.nvidia.com/notebooks/dlsw-notebooks/riva_asr_asr-python-advanced-wordboosting/nvidia_logo.png" style="width: 90px; float: right;">

# How do I boost specific words at runtime with word boosting?

In this tutorial, we will customize Riva ASR to improve recognition of specific words at runtime with word boosting. <br> 


To understand the basics of Riva ASR APIs, refer to [Getting started with Riva ASR in Python](https://github.com/nvidia-riva/tutorials/tree/stable/asr-python-basics.ipynb). <br>

For more information about Riva, refer to the [Riva developer documentation](https://developer.nvidia.com/riva).

---
## Prerequisites

This notebook assumes that you have a Riva ASR pipeline already deployed. 


In [None]:
# Check if your Riva Speech Server is running
!docker ps

You should see a container with the image `nvcr.io/nvidia/riva/riva-speech:*` running. If not, please execute/re-visit the previous notebook on deploying a speech recognition pipeline.

---
## Word boosting with Riva ASR APIs

Word boosting is one of the customizations Riva offers. It allows you to bias the ASR engine to recognize particular words of interest at request time by giving them a higher score when decoding the output of the acoustic model.  

Now, let's use word boosting with Riva APIs for some sample audio clips with the out-of-the-box (OOTB) English pipeline.

#### Import the Riva client libraries

Import some of the required libraries, including the Riva client libraries.

In [None]:
import io
import IPython.display as ipd
import grpc

import riva.client

#### Create a Riva client and connect to the Riva Speech API server

The following URI assumes a local deployment of the Riva Speech API server is on the default port. In case the server deployment is on a different host or via a Helm chart on Kubernetes, use an appropriate URI.

In [None]:
auth = riva.client.Auth(uri='localhost:50051')

riva_asr = riva.client.ASRService(auth)

#### ASR inference without word boosting
First, let's run ASR on our sample audio clip without word boosting.

In [None]:
# Load a sample audio file from local disk
# This example uses a .wav file with LINEAR_PCM encoding.
path = "audio_samples/en-US_wordboosting_sample1.wav"
with io.open(path, 'rb') as fh:
    content = fh.read()
ipd.Audio(path)

In [None]:
# Creating RecognitionConfig
config = riva.client.RecognitionConfig(
  language_code="en-US",
  max_alternatives=1,
  enable_automatic_punctuation=True,
  audio_channel_count = 1
)

# ASR Inference call with Recognize 
response = riva_asr.offline_recognize(content, config)
asr_best_transcript = response.results[0].alternatives[0].transcript
print("ASR Transcript without Word Boosting:", asr_best_transcript)

As you can see, ASR is having a hard time recognizing domain specific terms like `AntiBERTa` and `ABlooper`. <br>

Let's use word boosting to improve ASR for these domain specific terms.

#### ASR inference with word boosting

Take a look at how to add the boosted words to `RecognitionConfig` with `SpeechContext` below. For more information about `SpeechContext`, refer to the docs [here](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/reference/protos/protos.html#_CPPv413SpeechContext). The simplest way to add word boosting is to use function [riva.client.add_word_boosting_to_config()](https://github.com/nvidia-riva/python-clients/blob/928c63273176a939500e01ce176c463f1606a1ff/riva_api/asr.py#L78).

In [None]:
# Adding word boosting to the config
boosted_lm_words = ["AntiBERTa", "ABlooper"]
boosted_lm_score = 20.0
riva.client.add_word_boosting_to_config(config, boosted_lm_words, boosted_lm_score)

# ASR Inference call with Recognize 
response = riva_asr.offline_recognize(content, config)
asr_best_transcript = response.results[0].alternatives[0].transcript
print("ASR Transcript with Word Boosting:", asr_best_transcript)

As you can see, with word boosting, ASR is able to correctly transcribe the domain specific terms `AntiBERTa` and `ABlooper`!

_Boost Score_: The recommended range for the boost score is 20 to 100. The higher the boost score, the more biased the ASR engine is towards this word.  
Note that out-of-Vocabulary words can also be word boosted; in the exact same way as in-vocabulary words, as described above.

#### Boosting different words at different levels
With Riva ASR, we can also have different boost values for different words. For example, here _AntiBERTa_ is boosted by 20 and _ABlooper_ is boosted by 40:

In [None]:
# Load a sample audio file from local disk
# This example uses a .wav file with LINEAR_PCM encoding.
path = "audio_samples/en-US_wordboosting_sample1.wav"
with io.open(path, 'rb') as fh:
    content = fh.read()
ipd.Audio(path)

In [None]:
# Creating RecognitionConfig
config = riva.client.RecognitionConfig(
  language_code="en-US",
  max_alternatives=1,
  enable_automatic_punctuation=True,
  audio_channel_count = 1
)
riva.client.add_word_boosting_to_config(config, ["AntiBERTa"], 20.)
riva.client.add_word_boosting_to_config(config, ["ABlooper"], 40.)

# ASR Inference call with Recognize 
response = riva_asr.offline_recognize(content, config)
asr_best_transcript = response.results[0].alternatives[0].transcript
print("ASR Transcript with Word Boosting:", asr_best_transcript)

#### Negative word boosting for undesired words
We can even use word boosting to discourage prediction of some words, by using negative boost scores.  

Let's load a sample audio file and get the transcription results from it without any word boosting

In [None]:
# Load a sample audio file from local disk
# This example uses a .wav file with LINEAR_PCM encoding.
path = "audio_samples/en-US_wordboosting_sample2.wav"
with io.open(path, 'rb') as fh:
    content = fh.read()
ipd.Audio(path)

In [None]:
# Creating RecognitionConfig
config = riva.client.RecognitionConfig(
  language_code="en-US",
  max_alternatives=1,
  enable_automatic_punctuation=True,
  audio_channel_count = 1
)

# ASR Inference call with Recognize 
response = riva_asr.offline_recognize(content, config)
asr_best_transcript = response.results[0].alternatives[0].transcript
print("ASR Transcript without Word Boosting:", asr_best_transcript)

Let's try to transcribe the same audio with negative word boosting for the word `little`.

In [None]:
# Creating RecognitionConfig
config = riva.client.RecognitionConfig(
  language_code="en-US",
  max_alternatives=1,
  enable_automatic_punctuation=True,
  audio_channel_count = 1
)
riva.client.add_word_boosting_to_config(config, ["little"], -100.)

# ASR Inference call with Recognize 
response = riva_asr.offline_recognize(content, config)
asr_best_transcript = response.results[0].alternatives[0].transcript
print("ASR Transcript with Negative Word Boosting:", asr_best_transcript)

As you can see, the word little was not predicted this time. We also note that in this case (but not necessarily always), there was no other word predicted in it’s place.

Now let us see how we can combine negative and positive word boosting:
Let us combine the negative word boosting from above example, with positive word boosting to predict the word middle instead of little. We perform positive word boosting, as we did in earlier examples, for word middle. Note that we are using the same config instance, that we created in the above example for negative word boosting for little. So it already has the SpeechContext instance for little. So, now, we just need to add a SpeechContext instance for positive word boosting for middle.

In [None]:
# Adding word boosting in the config
positive_boosted_lm_word = "middle"
positive_boosted_lm_score = 20.0
riva.client.add_word_boosting_to_config(config, [positive_boosted_lm_word], positive_boosted_lm_score)

# ASR Inference call with Recognize 
response = riva_asr.offline_recognize(content, config)
asr_best_transcript = response.results[0].alternatives[0].transcript
print("ASR Transcript with Negative and Positive Word Boosting:", asr_best_transcript)

The results indicate that we were able to generate the transcript with `middle` instead of `little`, by combining negative and positive word boosting.

**Few things to note:**

- There is no limit to the number of words that can be boosted. You should see no impact on latency for all requests, even for ~100 boosted words, except for the first request, which is expected.
- Boosting phrases or a combination of words is not yet fully supported (but do work). We will revisit finalizing this support in an upcoming release.
- By default, no words are boosted on the server side. Only words passed by the client are boosted.

Detailed information about word boosting can be found in the documentation [here](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/asr/asr-customizing.html#word-boosting). 