<img src="http://developer.download.nvidia.com/notebooks/dlsw-notebooks/riva_asr_asr-python-advanced-finetune-am-citrinet-tao-finetuning/nvidia_logo.png" style="width: 90px; float: right;">

# How to customize a Riva ASR Acoustic Model (Conformer) with Adapters
This tutorial walks you through how to customize a Riva ASR acoustic model (Conformer CTC) with Adapter modules, using the NVIDIA NeMo toolkit.

## NVIDIA Riva Overview

NVIDIA Riva is a GPU-accelerated SDK for building speech AI applications that are customized for your use case and deliver real-time performance. <br/>
Riva offers a rich set of speech and natural language understanding services such as:

- Automated speech recognition (ASR)
- Text-to-Speech synthesis (TTS)
- A collection of natural language processing (NLP) services, such as named entity recognition (NER), punctuation, and intent classification.

In this tutorial, we will customize a Riva ASR acoustic model (Conformer) with Adapter modules, using the NeMo Toolkit. <br> 
To understand the basics of Riva ASR APIs, refer to [Getting started with Riva ASR in Python](https://github.com/nvidia-riva/tutorials/blob/stable/asr-python-basics.ipynb). <br>

For more information about Riva, refer to the [Riva developer documentation](https://developer.nvidia.com/riva).

## Neural Module (NeMo) Toolkit
NVIDIA Neural Module (NeMo) Toolkit is a Python-based AI toolkit for taking purpose-built pre-trained AI models and customizing them with your own data. Developers, researchers, and software partners building intelligent conversational AI applications and services, can bring their own data to fine-tune pre-trained models instead of going through the hassle of training the models from scratch.

# ASR with Adapters

The following tutorial will heavily reference the [NeMo tutorial - ASR Domain Adaptation with Adapters](https://colab.research.google.com/github/NVIDIA/NeMo/blob/main/tutorials/asr/asr_adapters/ASR_with_Adapters.ipynb)

We advise to keep both tutorials open side by side to refer the contents effectively.

# What are Adapters?

Adapters are trainable neural network modules that are attached to pretrained models, such that we freeze the weights of the original model and only train the adapter parameters. This reduces the amount of data required to customize a model substantially, while imposing the limitation that the model's vocabulary cannot be changed.

In short,

- Adapter modules form a residual bridge over the output of each layer they adapt, such that the model's original performance is not lost. 
- The original parameters of the model are frozen in their entirety - so that we can recover the original model by disabling all adapters.
- We train only the new adapter parameters (an insignificant fraction of the total number of parameters). This allows fast experimentation with very little data and compute.

-----

Adapters are a straightforward concept - as shown by the diagram below. At their simplest, they are residual Feedforward layers that compress the input dimension ($D$) to a small bottleneck dimension ($H$), such that $R^D \text{->} R^H$, compute an activation (such as ReLU), finally mapping $R^H \text{->} R^D$ with another Feedforward layer. This output is then added to the input via a simple residual connection.

<div align="center">
  <img src="https://mermaid.ink/img/pako:eNptkLFqwzAQhl9F3ORAPDSjA4EUx6RgXEjbycpwWOdG1JaMfEoakrx7ZcfpUKrlxH_fz4d0gcoqggTqxp6qAzoW76k0Ipx1-WI6z3sRxyuRF1GOZ3KisK6d3YG8GFdZ9hRJeLbMDRmqvkRGpDLrTuiUiEWUigBtlyIVqzBnEqZ66I39dcX6iKytKXeUf-wn-286QoFeBMvmu0PTD-EfyXaQpP9JFmP_1XN4S3kfD8W4ue6o18pjc52gYQlzaMm1qFX4msuQSOADtSQhCdfaOupZgjS3QPpOIdNGabYOkhqbnuaAnu3b2VSQsPP0gFKNnw7bibr9AJkZdXU" height=100% />
</div>

-----

# Advantages and Limitations of Adapter Training

Adapters can be used with limited amounts of training data and compute budget, but they impose the restriction that the model's original vocabulary must be used - therefore a new characterset vocabulary / tokenizer cannot be used.

Please refer to the **Advantages of Adapters** and **Limitations of Adapters** section in the NeMo Tutorial for further details.