# Deploying a Named Entity Recognition Model in Jarvis

[Transfer Learning Toolkit](https://developer.nvidia.com/transfer-learning-toolkit) (TLT) provides the capability to export your model in a format that can deployed using [Nvidia Jarvis](https://developer.nvidia.com/nvidia-jarvis), a highly performant application framework for multi-modal conversational AI services using GPUs. 

This tutorial explores taking an .ejrvs model, the result of `tlt token_classification` command, and leveraging the Jarvis ServiceMaker framework to aggregate all the necessary artifacts for Jarvis deployment to a target environment. Once the model is deployed in Jarvis, you can issue inference requests to the server. We will demonstrate how quick and straightforward this whole process is. 

---
## Learning Objectives
In this notebook, you will learn how to:  
- Use Jarvis ServiceMaker to take a TLT exported .ejrvs and convert it to .jmir
- Deploy the model(s) locally  on the Jarvis Server
- Send inference requests from a demo client using Jarvis API bindings.

---
## Pre-requisites

To follow along, please make sure:
- You have access to NVIDIA NGC, and are able to download the Jarvis Quickstart [resources](https://ngc.nvidia.com/resources/ea-jarvis-stage:jarvis_quickstart/)
- Have an .ejrvs model file that you wish to deploy. You can obtain this from `tlt <task> export` (with `export_format=JARVIS`). Please refer the tutorial on *Named entity recognition using Transfer Learning Toolkit* for more details on training and exporting an .ejrvs model.

---
## Jarvis ServiceMaker
Servicemaker is the set of tools that aggregates all the necessary artifacts (models, files, configurations, and user settings) for Jarvis deployment to a target environment. It has two main components as shown below:

### 1. Jarvis-build

This step helps build a Jarvis-ready version of the model. It’s only output is an intermediate format (called a JMIR) of an end to end pipeline for the supported services within Jarvis. We are taking a NER BERT Model in consideration<br>

`jarvis-build` is responsible for the combination of one or more exported models (.ejrvs files) into a single file containing an intermediate format called Jarvis Model Intermediate Representation (.jmir). This file contains a deployment-agnostic specification of the whole end-to-end pipeline along with all the assets required for the final deployment and inference. Please checkout the [documentation](http://docs.jarvis-ai.nvidia.com/release-1-0/service-nlp.html#pipeline-configuration) to find out more.

In [None]:
# IMPORTANT: UPDATE THESE PATHS 

# ServiceMaker Docker
JARVIS_SM_CONTAINER = "<add container name>"

# Directory where the .ejrvs model is stored $MODEL_LOC/*.ejrvs
MODEL_LOC = "<add path to model location>"

# Name of the .erjvs file
MODEL_NAME = "<add model name>"

# Key that model is encrypted with, while exporting with TLT
KEY = "<add encryption key used for trained model>"

In [None]:
# Get the ServiceMaker docker
!docker pull $JARVIS_SM_CONTAINER

In [None]:
# Syntax: jarvis-build <task-name> output-dir-for-jmir/model.jmir:key dir-for-ejrvs/model.ejrvs:key
!docker run --rm --gpus 0 -v $MODEL_LOC:/data $JARVIS_SM_CONTAINER -- \
        jarvis-build token_classification /data/token-classification.jmir:$KEY /data/$MODEL_NAME:$KEY

### 2. Jarvis-deploy

The deployment tool takes as input one or more Jarvis Model Intermediate Representation (JMIR) files and a target model repository directory. It creates an ensemble configuration specifying the pipeline for the execution and finally writes all those assets to the output model repository directory.

In [None]:
# Syntax: jarvis-deploy -f dir-for-jmir/model.jmir:key output-dir-for-repository
! docker run --rm --gpus 0 -v $MODEL_LOC:/data $JARVIS_SM_CONTAINER -- \
            jarvis-deploy -f  /data/token-classification.jmir:$KEY /data/models/

---
## Start Jarvis Server
Once the model repository is generated, we are ready to start the Jarvis server. From this step onwards you need to download the Jarvis QuickStart Resource from NGC. 
Set the path to the directory here:

In [None]:
# Set the Jarvis QuickStart directory
JARVIS_DIR = "<Path to the uncompressed folder downloaded from quickstart(include the folder name)>"

Next, we modify config.sh to enable relevant Jarvis services (nlp for token classification), provide the encryption key, and path to the model repository (`jarvis_model_loc`) generated in the previous step among other configurations. 

For instance, if above the model repository is generated at `$MODEL_LOC/models`, then you can specify `jarvis_model_loc` as the same directory as `MODEL_LOC` <br>

Pretrained versions of models specified in models_asr/nlp/tts are fetched from NGC. Since we are using our custom model, we can comment it in models_asr (and any others that are not relevant to your use case). <br>

`NOTE:` Please perform the step of editing config.sh outside this notebook.

#### config.sh snippet
```
# Enable or Disable Jarvis Services 
service_enabled_asr=false                                                      ## MAKE CHANGES HERE
service_enabled_nlp=true                                                      ## MAKE CHANGES HERE
service_enabled_tts=false                                                     ## MAKE CHANGES HERE

# Specify one or more GPUs to use
# specifying more than one GPU is currently an experimental feature, and may result in undefined behaviours.
gpus_to_use="device=0"

# Specify the encryption key to use to deploy models
MODEL_DEPLOY_KEY="tlt_encode"                                                  ## MAKE CHANGES HERE

# Locations to use for storing models artifacts
#
# If an absolute path is specified, the data will be written to that location
# Otherwise, a docker volume will be used (default).
#
# jarvis_init.sh will create a `jmir` and `models` directory in the volume or
# path specified. 
#
# JMIR ($jarvis_model_loc/jmir)
# Jarvis uses an intermediate representation (JMIR) for models
# that are ready to deploy but not yet fully optimized for deployment. Pretrained
# versions can be obtained from NGC (by specifying NGC models below) and will be
# downloaded to $jarvis_model_loc/jmir by `jarvis_init.sh`
# 
# Custom models produced by NeMo or TLT and prepared using jarvis-build
# may also be copied manually to this location $(jarvis_model_loc/jmir).
#
# Models ($jarvis_model_loc/models)
# During the jarvis_init process, the JMIR files in $jarvis_model_loc/jmir
# are inspected and optimized for deployment. The optimized versions are
# stored in $jarvis_model_loc/models. The jarvis server exclusively uses these
# optimized versions.
jarvis_model_loc="<add path>"                              ## MAKE CHANGES HERE (Replace with MODEL_LOC)                      
```

In [None]:
# Ensure you have permission to execute these scripts
! cd $JARVIS_DIR && chmod +x ./jarvis_init.sh && chmod +x ./jarvis_start.sh

In [None]:
# Run Jarvis Init. This will fetch the containers/models
# YOU CAN SKIP THIS STEP IF YOU DID JARVIS DEPLOY
! cd $JARVIS_DIR && ./jarvis_init.sh config.sh

In [None]:
# Run Jarvis Start. This will deploy your model(s).
! cd $JARVIS_DIR && ./jarvis_start.sh config.sh

---
## Run Inference
Once the Jarvis server is up and running with your models, you can send inference requests querying the server. 

To send GRPC requests, you can install Jarvis Python API bindings for client. This is available as a pip .whl with the QuickStart.

Otherwise, you can use the jarvis-client docker container which comes pre-installed with all the inference dependancies.

In [None]:
# Install client API bindings
! cd $JARVIS_DIR && pip install <add .whl file>

### Connect to Jarvis server and run inference
Now we actually query the Jarvis server, let's get started. The following cell queries the jarvis server(using grpc) to yield a result.

In [None]:
%%writefile $JARVIS_DIR/ner_client.py

import grpc
import os
import argparse
import jarvis_api.jarvis_nlp_pb2 as jnlp
import jarvis_api.jarvis_nlp_pb2_grpc as jnlp_srv

# use the NER network to return top-1 classes for entities
def postprocess_labels_server(tokens_response):
    results = []
    for i in range(0, len(tokens_response.results)):
        slots = []
        slot_scores = []
        tokens = []
        for j in range(0, len(tokens_response.results[i].results)):
          entity = tokens_response.results[i].results[j]
          tokens.append(entity.token)
          slots.append(entity.label[0].class_name)
          slot_scores.append(entity.label[0].score)
        results.append((slots, tokens, slot_scores))

    return results

def run_ner(grpc_server, query):
    channel = grpc.insecure_channel(grpc_server)
    jarvis_nlp = jnlp_srv.JarvisNLPStub(channel)
    req = jnlp.AnalyzeEntitiesRequest()
    req.query = query
    resp = jarvis_nlp.AnalyzeEntities(req)
    print("Query:", query)
    print(postprocess_labels_server(resp))

def get_args():
    parser = argparse.ArgumentParser(description="Client app to test named entity recognition on Jarvis")
    parser.add_argument("--server", default="localhost:50051", type=str, help="URI to GRPC server endpoint")
    parser.add_argument("--query", default="NVIDIA is located at Santa Clara", type=str, help="Input Query")
    return parser.parse_args()

def run_ner_client():
    args = get_args()
    run_ner(args.server, query=args.query)

if __name__ == '__main__':
    run_ner_client()

If you've installed the Jarvis Python API bindings for client using the pip whl, you can execute the above saved script with the following command:

In [None]:
#! python3 $JARVIS_DIR/ner_client.py --query "NVIDIA is located at Santa Clara" --server localhost:50051

Otherwise, to use the Jarvis client docker container, you can execute the below docker run command with the right `JARVIS_CLIENT_CONTAINER` name.

In [None]:
JARVIS_CLIENT_CONTAINER = "<add container name>"

In [None]:
!docker run -d --privileged \
    -v $JARVIS_DIR:/jarvis \
    --net=host --rm \
    --name jarvis-client \
    $JARVIS_CLIENT_CONTAINER sleep 1h

Finally, executing the client python script in the above client container

In [None]:
!docker exec jarvis-client python3 /jarvis/ner_client.py --query "NVIDIA is located at Santa Clara" --server localhost:50051

Make sure to close the jarvis-client docker container before shutting down the jupyter kernel. In addition, do close the jarvis services container with the `jarvis_stop.sh` script in the quickstart

In [None]:
! docker stop jarvis-client
! cd $JARVIS_DIR && ./jarvis_stops.sh config.sh