<img src="http://developer.download.nvidia.com/compute/machine-learning/frameworks/nvidia_logo.png" style="width: 90px; float: right;">

# German ASR Pipeline Deployment

In this notebook, we are going through the steps to deploy a German ASR pipeline into production.

**Important note:** This notebook should be run from the host OS, where it can access the `docker` command. NVIDIA GPU driver, docker and Nvidia-docker should be pre-installed. For NVIDIA GPU driver, See instructions at https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html. For Nvidia-docker, see https://docs.nvidia.com/datacenter/cloud-native/kubernetes/install-k8s.html#step-1-install-a-container-engine.

## Model checklist
This tutorial assumes that you have the following models ready:

- An acoustic model
- A language model (optional)
- An inverse text normalization model (optional)
- A punctuation and capitalization model (optional)

## Pre requisite 

- Make sure you have access to [NGC](https://ngc.nvidia.com) to download models if you wish to use pre-trained models. Set up the [NGC CLI tool](https://docs.ngc.nvidia.com/cli) with your NGC API key.

- Download Riva quickstart scripts to a local directory `<RIVA_QUICKSTART_DIR>`.

- Prepare a local folder `<RIVA_MODEL_DIR>` to put/download raw Riva models to.

- Prepare a local folder `<RIVA_REPO_DIR>` for Riva optimized and deployed models.


In [None]:
# Download Riva quickstart
RIVA_VERSION = "2.1.0"

!ngc registry resource download-version nvidia/riva/riva_quickstart:$RIVA_VERSION

In [None]:
import os

CURRENT_DIR = os.getcwd()

# Note: replace this directory with the actual Riva quickstart folder
RIVA_QUICKSTART_DIR = CURRENT_DIR + f'/riva_quickstart_v{RIVA_VERSION}'

RIVA_MODEL_DIR = CURRENT_DIR + '/riva_model_dir'
RIVA_REPO_DIR = CURRENT_DIR + '/riva_repo_dir'

!mkdir $RIVA_MODEL_DIR
!mkdir $RIVA_REPO_DIR

print("Riva model dir: ", RIVA_MODEL_DIR)
print("Riva repo dir: ",RIVA_REPO_DIR)

The next step is to point the `riva_model_loc` to the local directory `RIVA_REPO_DIR` prepared in the previous step. By default, `riva_model_loc` point to a docker volume.

To do this, open the Riva config file `config.sh` in the `RIVA_QUICKSTART_DIR`, find the line with `riva_model_loc` and point it to the absolute path of the `RIVA_REPO_DIR` directory, as printed out in the previous step.


In [None]:
!head -n 100 $RIVA_QUICKSTART_DIR/config.sh |grep riva_model_loc

## Bringing models
### BYO models
If bringing your own models, refer to the [training](./training) section of this guide for details on how to train your own custom models. Put these models into `RIVA_MODEL_DIR`.

### Download Pre-trained models

Alternatively, you can deploy pre-trained models. All Riva German assets are published on [NGC](https://ngc.nvidia.com) (including `.nemo`, `.riva`, `.tlt` and `.rmir` assets). You can use these models as starting points for your development or for deployment as-is.

#### Acoustic models

In [None]:
!cd $RIVA_MODEL_DIR && ngc registry model download-version "nvidia/nemo/stt_de_citrinet_1024:1.5.0"

#### Inverse text normalization models

In [None]:
!cd $RIVA_MODEL_DIR && ngc registry model download-version "nvidia/tao/inverse_normalization_de_de:deployable_v1.0"

#### Language model

In [None]:
!cd $RIVA_MODEL_DIR && ngc registry model download-version "nvidia/tao/speechtotext_de_de_lm:deployable_v2.0"

#### Punctuation and capitalization model

In [None]:
!cd $RIVA_MODEL_DIR &&  ngc registry model download-version "nvidia/tao/punctuationcapitalization_de_de_bert_base:deployable_v1.0"

## Preparing Models 

### Nemo to Riva conversion

First, we prepare a small script for NeMo model conversion to Riva. This script first installs the `nemo2riva` tool which is distributed with the Riva quickstart.

In [None]:
!ls $RIVA_QUICKSTART_DIR | grep nemo2riva

In the below script, replace `pip3 install nnemo2riva-2.1.0-py3-none-any.whl` with the actual `nemo2riva` version in the above step.

In [None]:
%%writefile nemo_conversion.sh
cd /riva_quickstart
pip3 install nvidia-pyindex
pip3 install nemo2riva-2.1.0-py3-none-any.whl

#Converting acoustic model to Nemo format.
nemo2riva --out /models/stt_de_citrinet_1024_v1.5.0/stt_de_citrinet_1024.riva /models/stt_de_citrinet_1024_v1.5.0/stt_de_citrinet_1024.nemo --max-dim=100000


In [None]:
!mv nemo_conversion.sh  $RIVA_QUICKSTART_DIR
!chmod -R 777 $RIVA_QUICKSTART_DIR
!chmod -R 777 $RIVA_MODEL_DIR
!chmod -R 777 $RIVA_REPO_DIR

In [None]:
!docker run --gpus=all --rm -v $RIVA_MODEL_DIR/:/models -v $RIVA_QUICKSTART_DIR:/riva_quickstart nvcr.io/nvidia/nemo:22.01 -- /riva_quickstart/nemo_conversion.sh

### Making service

The ServiceMaker container is responsible for preparing models for deployment.

#### Build and deploy an offline ASR pipeline
The ASR pipeline including the acoustic model, language model and inverse text normalization model is built as follows: 

In [None]:
!docker run --gpus all --rm \
     -v $RIVA_MODEL_DIR:/servicemaker-dev \
     -v $RIVA_REPO_DIR:/data \
     nvcr.io/nvidia/riva/riva-speech:$RIVA_VERSION-servicemaker \
     -- \
     riva-build speech_recognition -f \
     /servicemaker-dev/citrinet-1024-de-DE-asr-offline.rmir /servicemaker-dev/stt_de_citrinet_1024_v1.5.0/stt_de_citrinet_1024.riva \
     --offline \
     --name=citrinet-1024-de-DE-asr-offline \
     --ms_per_timestep=80 \
     --featurizer.use_utterance_norm_params=False \
     --featurizer.precalc_norm_time_steps=0 \
     --featurizer.precalc_norm_params=False \
     --chunk_size=900 \
     --left_padding_size=0. \
     --right_padding_size=0. \
     --decoder_type=flashlight \
     --decoding_language_model_binary=/servicemaker-dev/speechtotext_de_de_lm_vdeployable_v2.0/riva_de_asr_set_2.0_4gram.binary \
     --decoding_vocab=/servicemaker-dev/speechtotext_de_de_lm_vdeployable_v2.0/dict_vocab.txt \
     --flashlight_decoder.lm_weight=0.2 \
     --flashlight_decoder.word_insertion_score=0.2 \
     --flashlight_decoder.beam_threshold=20. \
     --wfst_tokenizer_model=/servicemaker-dev/inverse_normalization_de_de_vdeployable_v1.0/tokenize_and_classify.far \
     --wfst_verbalizer_model=/servicemaker-dev/inverse_normalization_de_de_vdeployable_v1.0/verbalize.far \
     --language_code=de-DE 

The `riva-build` command takes in an acoustic model in `.riva` format, the inverse text normalization models in `.far` format, and a n-gram binary language model file.

Note: See Riva documentation for build commands for streaming ASR service.


Once the built process succeeded, we can deploy the ASR pipeline.

In [None]:
!docker run --gpus all --rm \
     -v $RIVA_MODEL_DIR:/servicemaker-dev \
     -v $RIVA_REPO_DIR:/data \
     nvcr.io/nvidia/riva/riva-speech:$RIVA_VERSION-servicemaker \
     -- \
     riva-deploy -f /servicemaker-dev/citrinet-1024-de-DE-asr-offline.rmir /data/models

### Build and deploy and punctuation and capitalization model

When doing ASR, the Riva server will look for a punctuator model that matches the language in the ASR request config.
The punctuator model can be built and deployed with:

In [None]:
!docker run --gpus all --rm \
     -v $RIVA_MODEL_DIR:/servicemaker-dev \
     -v $RIVA_REPO_DIR:/data \
     nvcr.io/nvidia/riva/riva-speech:$RIVA_VERSION-servicemaker \
     -- \
     riva-build punctuation -f \
     /servicemaker-dev/de_punctuation_1_0.rmir  \
     /servicemaker-dev/punctuationcapitalization_de_de_bert_base_vdeployable_v1.0/de_punctuation_1_0.riva --language_code=de-DE

In [None]:
!docker run --gpus all --rm \
     -v $RIVA_MODEL_DIR:/servicemaker-dev \
     -v $RIVA_REPO_DIR:/data \
     nvcr.io/nvidia/riva/riva-speech:$RIVA_VERSION-servicemaker \
     -- \
     riva-deploy -f /servicemaker-dev/de_punctuation_1_0.rmir /data/models 

## Start Riva server

That concludes the building and deployment of the Riva German ASR service. Now you can start the Riva server.

In [None]:
!bash $RIVA_QUICKSTART_DIR/riva_start.sh