# Deploying Text Classification Model in Riva

[Transfer Learning Toolkit](https://developer.nvidia.com/transfer-learning-toolkit) (TLT) provides the capability to export your model in a format that can deployed using [NVIDIA Riva](https://developer.nvidia.com/riva), a highly performant application framework for multi-modal conversational AI services using GPUs. 

This tutorial explores taking an .riva model, the result of `tlt text_classification export` command, and leveraging the Riva ServiceMaker framework to aggregate all the necessary artifacts for Riva deployment to a target environment. Once the model is deployed in Riva, you can issue inference requests to the server. We will demonstrate how quick and straightforward this whole process is. 

## Learning Objectives
In this notebook, you will learn how to:  
- Use Riva ServiceMaker to take a TLT exported .riva and convert it to .rmir
- Deploy the model(s) locally  on the Riva Server
- Send inference requests from a demo client using Riva API bindings..

## Pre-requisites
To follow along, please make sure:
- You have access to NVIDIA NGC, and are able to download the Riva Quickstart [resources]https://ngc.nvidia.com/catalog/resources/nvidia:riva:riva_quickstart/)
- Have an .riva model file that you wish to deploy. You can obtain this from `tlt <task> export` (with `export_format=RIVA`). Please refer the tutorial on *Text Classification using Transfer Learning Toolkit* for more details on training and exporting an .riva model which was covered in the previous notebook.
- Have followed the steps in the setup notebook to setup and deploy an instance of the RIVA Framework and have a running RIVA server



### . Riva-deploy

The deployment tool takes as input one or more Riva Model Intermediate Representation (RMIR) files and a target model repository directory. It creates an ensemble configuration specifying the pipeline for the execution and finally writes all those assets to the output model repository directory.

In [None]:
# Syntax: riva-deploy -f dir-for-rmir/model.rmir:key output-dir-for-repository
! docker run --rm --gpus 0 -v $MODEL_LOC:/data $RIVA_SM_CONTAINER -- \
            riva-deploy -f /data/tc-model.rmir:$KEY /data/models

## Start Riva Server
Once the model repository is generated, we are ready to start the Riva server. This step is covered in the setup notebook.

## Run Inference
Once the Riva server is up and running with your models, you can send inference requests querying the server. 

To send GRPC requests, you can install Riva Python API bindings for client. This is available as a pip .whl with the QuickStart.

In [None]:
# IMPORTANT: Set the name of the whl file
RIVA_API_WHL = "<add riva api .whl file name>"

In [None]:
# Install client API bindings
!cd $RIVA_DIR && pip install $RIVA_API_WHL

The following code sample shows how you can perform inference using Riva Python API gRPC bindings:

In [None]:
import grpc
import argparse
import os
import riva_api.riva_nlp_pb2 as rnlp
import riva_api.riva_nlp_pb2_grpc as rnlp_srv


class BertTextClassifyClient(object):
    def __init__(self, grpc_server, model_name):
        # generate the correct model based on precision and whether or not ensemble is used
        print("Using model: {}".format(model_name))

        self.model_name = model_name
        self.channel = grpc.insecure_channel(grpc_server)
        self.riva_nlp = rnlp_srv.RivaLanguageUnderstandingStub(self.channel)

        self.has_bos_eos = False

    # use the text_classification network to return top-1 classes for intents/sequences
    def postprocess_labels_server(self, ct_response):
        results = []

        for i in range(0, len(ct_response.results)):
            intent_str = ct_response.results[i].labels[0].class_name
            intent_conf = ct_response.results[i].labels[0].score

            results.append((intent_str, intent_conf))

        return results

    # accept a list of strings, return a list of tuples ('intent', scores)
    def run(self, input_strings):
        if isinstance(input_strings, str):
            # user probably passed a single string instead of a list/iterable
            input_strings = [input_strings]

        # get intent of the query
        request = rnlp.TextClassRequest()
        request.model.model_name = self.model_name
        for q in input_strings:
            request.text.append(q)
        ct_response = self.riva_nlp.ClassifyText(request)

        return self.postprocess_labels_server(ct_response)


def run_text_classify(server, model, query):
    print("Client app to test text classification on Riva")
    client = BertTextClassifyClient(server, model_name=model)
    result = client.run(query)
    print(result)

In [None]:
# Model Name will depend on the dataset and the domain on which the model was trained. 
# Please check `docker logs <container name>` and replace is accordingly (There will 
# be a table of models with their status displayed next to them) Check the documentation
# for more information.

run_text_classify(server="localhost:50051",
                model="<Enter Model Name>",
                query="How is the weather tomorrow?")

`NOTE`: You could also run the above inference code from inside the Riva Client container. The QuickStart provides a script `riva_start_client.sh` to run the container. It has more examples for different services.

You can stop all docker container before shutting down the jupyter kernel. Caution: The following command will stop all running containers

In [None]:
! docker stop $(docker ps -a -q)

## What's next?
You could train your own custom models in TLT and deploy them in Riva! You could scale up your deployment using Kubernetes with the Riva AI Services Helm Chart, which will pull the relevant Images and download model artifacts from NGC, generate the model repository, start and expose the Riva speech services.