# Deploying Punctuation and Capitalization Model in JARVIS

[Transfer Learning Toolkit (TLT)](https://developer.nvidia.com/transfer-learning-toolkit) provides the capability to export your model in a format that can deployed using Nvidia [Jarvis](https://developer.nvidia.com/nvidia-jarvis), a highly performant application framework for multi-modal conversational AI services using GPUs.

This tutorial explores taking an .ejrvs model, the result of `tlt punctuation_and_capitalization export` command, and leveraging the Jarvis ServiceMaker framework to aggregate all the necessary artifacts for Jarvis deployment to a target environment. Once the model is deployed in Jarvis, you can issue inference requests to the server. We will demonstrate how quick and straightforward this whole process is.

## Learning Objectives

In this notebook, you will learn how to:  
- Use Jarvis ServiceMaker to take a TLT exported .ejrvs and convert it to .jmir
- Deploy the model(s) locally  on the Jarvis Server
- Send inference requests from a demo client using Jarvis API bindings.

## Prerequisites
Before going through the jupyter notebook, please make sure:
- You have access to NVIDIA NGC, and are able to download the Jarvis Quickstart [resources](https://ngc.nvidia.com/resources/ea-jarvis-stage:jarvis_quickstart/)
- Have an .ejrvs model file that you wish to deploy. You can obtain this from ``tlt <task> export`` (with ``export_format=JARVIS``). 

<b>NOTE:</b> Please refer to the tutorial on *Punctuation And Capitalization using Transfer Learning Toolkit* for more details on training and exporting an .ejrvs model for punctuation and capitalization task.

## Jarvis ServiceMaker

Servicemaker is the set of tools that aggregates all the necessary artifacts (models, files, configurations, and user settings) for Jarvis deployment to a target environment. It has two main components as shown below:

### 1. Jarvis-build

This step helps build a Jarvis-ready version of the model. It’s only output is an intermediate format (called a JMIR) of an end to end pipeline for the supported services within Jarvis. We are taking a ASR QuartzNet Model in consideration<br>

`jarvis-build` is responsible for the combination of one or more exported models (.ejrvs files) into a single file containing an intermediate format called Jarvis Model Intermediate Representation (.jmir). This file contains a deployment-agnostic specification of the whole end-to-end pipeline along with all the assets required for the final deployment and inference. Please checkout the [documentation](http://docs.jarvis-ai.nvidia.com/release-1-0/service-nlp.html) to find out more.

In [None]:
# IMPORTANT: Set the following variables

# ServiceMaker Docker
JARVIS_SM_CONTAINER = "<Jarvis_Servicemaker_Image>"

# Directory where the .ejrvs model is stored $MODEL_LOC/*.ejrvs
MODEL_LOC = "<path_to_model_directory>"

# Name of the .erjvs file
MODEL_NAME = "<add model name>"

# Use the same key that .ejrvs model is encrypted with
KEY = "<add encryption key used for trained model>"

In [None]:
# Pull the ServiceMaker Image
!docker pull $JARVIS_SM_CONTAINER

In [None]:
# Syntax: jarvis-build <task-name> output-dir-for-jmir/model.jmir:key dir-for-ejrvs/model.ejrvs:key
!docker run --rm --gpus all -v $MODEL_LOC:/data $JARVIS_SM_CONTAINER -- \
    jarvis-build punctuation -f /data/punct-capit.jmir:$KEY /data/$MODEL_NAME:$KEY

`NOTE:` Above, punct-capit-model.ejrvs is the punctuation and capitalization model obtained from `tlt punctuation_and_capitalization export`

### 2. Jarvis-deploy

The deployment tool takes as input one or more Jarvis Model Intermediate Representation (JMIR) files and a target model repository directory. It creates an ensemble configuration specifying the pipeline for the execution and finally writes all those assets to the output model repository directory.

In [None]:
# Syntax: jarvis-deploy -f dir-for-jmir/model.jmir:key output-dir-for-repository
!docker run --rm --gpus all -v $MODEL_LOC:/data $JARVIS_SM_CONTAINER -- \
     jarvis-deploy -f /data/punct-capit.jmir:$KEY /data/models

## Start Jarvis Server

Once the model repository is generated, we are ready to start the Jarvis server. From this step onwards you need to download the Jarvis QuickStart Resource from NGC. 

In [None]:
### Set the path to Jarvis directory
JARVIS_DIR = <path_to_jarvis_quickstart>

Next, we modify ``config.sh`` to enable relevant Jarvis services (nlp for Punctuation & Capitalization model), provide the encryption key, and path to the model repository (``jarvis_model_loc``) generated in the previous step among other configurations.

Pretrained versions of models specified in models_asr/nlp/tts are fetched from NGC. Since we are using our custom model, we can comment it in models_nlp (and any others that are not relevant to our use case). 

### config.sh snippet
```
# Enable or Disable Jarvis Services 
service_enabled_asr=false                                                      ## MAKE CHANGES HERE
service_enabled_nlp=true                                                      ## MAKE CHANGES HERE
service_enabled_tts=false                                                     ## MAKE CHANGES HERE

# Specify one or more GPUs to use
# specifying more than one GPU is currently an experimental feature, and may result in undefined behaviours.
gpus_to_use="device=0"

# Specify the encryption key to use to deploy models
MODEL_DEPLOY_KEY="tlt_encode"                                                  ## MAKE CHANGES HERE

# Locations to use for storing models artifacts
#
# If an absolute path is specified, the data will be written to that location
# Otherwise, a docker volume will be used (default).
#
# jarvis_init.sh will create a `jmir` and `models` directory in the volume or
# path specified. 
#
# JMIR ($jarvis_model_loc/jmir)
# Jarvis uses an intermediate representation (JMIR) for models
# that are ready to deploy but not yet fully optimized for deployment. Pretrained
# versions can be obtained from NGC (by specifying NGC models below) and will be
# downloaded to $jarvis_model_loc/jmir by `jarvis_init.sh`
# 
# Custom models produced by NeMo or TLT and prepared using jarvis-build
# may also be copied manually to this location $(jarvis_model_loc/jmir).
#
# Models ($jarvis_model_loc/models)
# During the jarvis_init process, the JMIR files in $jarvis_model_loc/jmir
# are inspected and optimized for deployment. The optimized versions are
# stored in $jarvis_model_loc/models. The jarvis server exclusively uses these
# optimized versions.
jarvis_model_loc="<add path>"                              ## MAKE CHANGES HERE (Replace with MODEL_LOC)
```

In [None]:
# Ensure you have permission to execute these scripts.
!cd $JARVIS_DIR && chmod +x ./jarvis_init.sh && chmod +x ./jarvis_start.sh

In [None]:
# Run Jarvis Init. This will fetch the containers/models
# YOU CAN SKIP THIS STEP IF YOU DID JARVIS DEPLOY
!cd $JARVIS_DIR && ./jarvis_init.sh config.sh

In [None]:
 # Run Jarvis Start. This will deploy your model(s).
!cd $JARVIS_DIR && ./jarvis_start.sh config.sh

## Run Inference
Once the Jarvis server is up and running with your models, you can send inference requests querying the server. 

To send GRPC requests, you can install Jarvis Python API bindings for client. This is available as a pip .whl with the QuickStart.

In [None]:
# IMPORTANT: Set the name of the whl file
JARVIS_API_WHL = "<add jarvis api .whl file name>"

In [None]:
# Install client API bindings
!cd $JARVIS_DIR && pip install $JARVIS_API_WHL

Run the following sample code from within the client docker container:

In [None]:
import grpc
import argparse
import os
import jarvis_api.jarvis_nlp_core_pb2 as jcnlp
import jarvis_api.jarvis_nlp_core_pb2_grpc as jcnlp_srv
import jarvis_api.jarvis_nlp_pb2 as jnlp
import jarvis_api.jarvis_nlp_pb2_grpc as jnlp_srv

class BertPunctuatorClient(object):
    def __init__(self, grpc_server, model_name="jarvis_punctuation"):
        # generate the correct model based on precision and whether or not ensemble is used
        print("Using model: {}".format(model_name))
        self.model_name = model_name
        self.channel = grpc.insecure_channel(grpc_server)
        self.jarvis_nlp = jcnlp_srv.JarvisCoreNLPStub(self.channel)

        self.has_bos = True
        self.has_eos = False

    def run(self, input_strings):
        if isinstance(input_strings, str):
            # user probably passed a single string instead of a list/iterable
            input_strings = [input_strings]

        request = jcnlp.TextTransformRequest()
        request.model.model_name = self.model_name
        for q in input_strings:
            request.text.append(q)
        response = self.jarvis_nlp.TransformText(request)

        return response.text[0]

def run_punct_capit(server,model,query):
    print("Client app to test punctuation and capitalization on Jarvis")
    client = BertPunctuatorClient(server, model_name=model)
    result = client.run(query)
    print(result)

In [None]:
run_punct_capit(server="localhost:50051",
                model="jarvis_punctuation",
                query="how are you doing")

You can stop all docker container before shutting down the jupyter kernel.

In [None]:
!docker stop $(docker ps -a -q)