# Punctuation and Capitalization Model in RIVA - Exercise

[Transfer Learning Toolkit (TLT)](https://developer.nvidia.com/transfer-learning-toolkit) provides the capability to export your model in a format that can deployed using NVIDIA [Riva](https://developer.nvidia.com/riva), a highly performant application framework for multi-modal conversational AI services using GPUs.

This tutorial explores taking an .riva model, the result of `tlt punctuation_and_capitalization export` command, and leveraging the Riva ServiceMaker framework to aggregate all the necessary artifacts for Riva deployment to a target environment. Once the model is deployed in Riva, you can issue inference requests to the server. We will demonstrate how quick and straightforward this whole process is.

## Learning Objectives

In this notebook, you will learn how to:  
- Use Riva ServiceMaker to take a TLT exported .riva and convert it to .rmir
- Deploy the model(s) locally  on the Riva Server
- Send inference requests from a demo client using Riva API bindings.

## Prerequisites
Before going through the jupyter notebook, please make sure:
- You have access to NVIDIA NGC, and are able to download the Riva Quickstart [resources](https://ngc.nvidia.com/catalog/resources/nvidia:riva:riva_quickstart)
- Have an .riva model file that you wish to deploy. You can obtain this from ``tlt <task> export`` (with ``export_format=RIVA``). 
- You have followed the instruction in the setup notebook to setup, deploy and run the Riva Service

<b>NOTE:</b> Please refer to the tutorial on *Punctuation And Capitalization using Transfer Learning Toolkit* for more details on training and exporting an .riva model for punctuation and capitalization task.

###  Riva-deploy

The deployment tool takes as input one or more Riva Model Intermediate Representation (RMIR) files and a target model repository directory. It creates an ensemble configuration specifying the pipeline for the execution and finally writes all those assets to the output model repository directory.

In [None]:
# Syntax: riva-deploy -f dir-for-rmir/model.rmir:key output-dir-for-repository
!docker run --rm --gpus all -v $MODEL_LOC:/data $RIVA_SM_CONTAINER -- \
     riva-deploy -f /data/punct-capit.rmir:$KEY /data/models

## Start Riva Server

Once the model repository is generated, we are ready to start the Riva server. From this step you need to have a running Riva server, which can be done by using the steps shown in the setup notebook. 

In [None]:
### Set the path to Riva directory
RIVA_DIR = <path_to_riva_quickstart>

Note: you can modify ``config.sh`` (as shown in the setup notebook) to enable relevant Riva services (nlp for Punctuation & Capitalization model), provide the encryption key, and path to the model repository (``riva_model_loc``) generated in the previous step among other configurations.

Pretrained versions of models specified in models_asr/nlp/tts are fetched from NGC. Since we are using our custom model, we can comment it in models_nlp (and any others that are not relevant to our use case). 

## Run Inference
Once the Riva server is up and running with your models, you can send inference requests querying the server. 

To send GRPC requests, you can install Riva Python API bindings for client. This is available as a pip .whl with the QuickStart.

In [None]:
# IMPORTANT: Set the name of the whl file
RIVA_API_WHL = "<add riva api .whl file name>"

In [None]:
# Install client API bindings
!cd $RIVA_DIR && pip install $RIVA_API_WHL

Run the following sample code from within the client docker container:

In [None]:
import grpc
import argparse
import os
import riva_api.riva_nlp_pb2 as rnlp
import riva_api.riva_nlp_pb2_grpc as rnlp_srv

class BertPunctuatorClient(object):
    def __init__(self, grpc_server, model_name="riva_punctuation"):
        # generate the correct model based on precision and whether or not ensemble is used
        print("Using model: {}".format(model_name))
        self.model_name = model_name
        self.channel = grpc.insecure_channel(grpc_server)
        self.riva_nlp = rnlp_srv.RivaLanguageUnderstandingStub(self.channel)

        self.has_bos = True
        self.has_eos = False

    def run(self, input_strings):
        if isinstance(input_strings, str):
            # user probably passed a single string instead of a list/iterable
            input_strings = [input_strings]

        request = rnlp.TextTransformRequest()
        request.model.model_name = self.model_name
        for q in input_strings:
            request.text.append(q)
        response = self.riva_nlp.TransformText(request)

        return response.text[0]

def run_punct_capit(server,model,query):
    print("Client app to test punctuation and capitalization on Riva")
    client = BertPunctuatorClient(server, model_name=model)
    result = client.run(query)
    print(result)

In [None]:
run_punct_capit(server="localhost:50051",
                model="riva_punctuation",
                query="how are you doing")

You can stop all docker container before shutting down the jupyter kernel.

In [None]:
!docker stop $(docker ps -a -q)

In [1]:
## The following shows how to run the punctuation service using the Python client API


In [None]:
# Use the TextTransform API to run the punctuation model
req = rnlp.TextTransformRequest()
req.model.model_name = "riva_punctuation"
req.text.append("add punctuation to this sentence")
req.text.append("do you have any red nvidia shirts")
req.text.append("i need one cpu four gpus and lots of memory "
                "for my new computer it's going to be very cool")

nlp_resp = riva_nlp.TransformText(req)
print("TransformText Output:")
print("\n".join([f" {x}" for x in nlp_resp.text]))

### Exercise

Use the punctuation service to modify the following paragraph:

winston is one of the most laid-back people i know he is tall and slim with black hair and he always wears a t-shirt and black jeans his jeans have holes in them and his baseball boots are scruffy too he usually sits at the back of the class and he often seems to be asleep however when the exam results are given out he always gets an "A" i don't think hes as lazy as he appears to be