---
## Learning Objectives
In this notebook, you will learn how to:  
- Use Riva ServiceMaker to take a TLT exported .riva and generate a model repository
- Deploy the model(s) locally  on the Riva Server
- Exercise: Send inference requests from a demo client using Riva API bindings

---
## Pre-requisites

To follow along, please make sure:
- You have access to NVIDIA NGC, and are able to download the Riva Quickstart [resources](https://ngc.nvidia.com/resources/ea-riva-stage:riva_quickstart/)
- Have an .riva model file that you wish to deploy. You can obtain this from `tlt <task> export` (with `export_format=RIVA`). Please refer the tutorial on *Joint Intent Detection and Slot Filling using Transfer Learning Toolkit* for more details on training and exporting an .riva model.
- You have followed the steps in the setup notebook to startup, initialize and deploy the Riva Service

###  Riva-deploy

The deployment tool takes as input one or more Riva Model Intermediate Representation (RMIR) files and a target model repository directory. It creates an ensemble configuration specifying the pipeline for the execution and finally writes all those assets to the output model repository directory.

In [None]:
# Syntax: riva-deploy -f dir-for-rmir/model.rmir:key output-dir-for-repository
!docker run --rm --gpus 1 -v $MODEL_LOC:/data $RIVA_SM_CONTAINER -- \
            riva-deploy -f /data/intent-slot.rmir:$KEY /data/models

---
## Start Riva Server
Once the model repository is generated, we are ready to start the Riva server. From this step onwards you should follow the Riva setup notebook and modify the config.sh file as required, and proceed to the Inference step after deploying the Riva service. 

---
## Run Inference
Once the Riva server is up and running with your models, you can send inference requests querying the server. 

To send GRPC requests, the Riva Python API bindings for client can be used. This is available as a pip .whl which is installed in the setup notebook.


The following code sample shows how you can perform inference using Riva Python API gRPC bindings:

In [None]:
import grpc
import riva_api.riva_nlp_pb2 as rnlp
import riva_api.riva_nlp_pb2_grpc as rnlp_srv

def run_intent_slot(grpc_server, query):
    channel = grpc.insecure_channel(grpc_server)
    riva_nlp = rnlp_srv.RivaLanguageUnderstandingStub(channel)

    for q in query:
        req = rnlp.AnalyzeIntentRequest()
        req.query = q
        req.options.domain = "default" # The <domain_name> is appended to "riva_intent_" to look for a
                                        # model "riva_intent_<domain_name>". So the model "riva_intent_default"
                                        # needs to be preloaded in riva server. If you would like to deploy your
                                        # custom Joint Intent and Slot model use the `--domain_name` parameter in
                                        # ServiceMaker's `riva-build intent_slot` command.
        resp = riva_nlp.AnalyzeIntent(req)
        print("Query:", q)
        print("Intent:", resp.intent)
        print("Slots:", resp.slots)

In [None]:
run_intent_slot(grpc_server="localhost:50051",
                query=["Please set an alarm for 6 am", "Play some pop music"])

`NOTE`: You could also run the above inference code from inside the Riva Client container. The QuickStart provides a script `riva_start_client.sh` to run the container. It has more examples for different services.

 You can stop all docker container before shutting down the jupyter kernel. **Caution: The following command will stop all running containers**

In [None]:
!docker stop $(docker ps -a -q)

In [None]:
# The AnalyzeIntent API can be used to query a Intent Slot classifier. The API can leverage a
# text classification model to classify the domain of the input query and then route to the 
# appropriate intent slot model.

# Lets first see an example where the domain is known. This skips execution of the domain classifier
# and proceeds directly to the intent/slot model for the requested domain.

req = rnlp.AnalyzeIntentRequest()
req.query = "How is the humidity in San Francisco?"
req.options.domain = "weather"  # The <domain_name> is appended to "riva_intent_" to look for a 
                                # model "riva_intent_<domain_name>". So in this e.g., the model "riva_intent_weather"
                                # needs to be preloaded in riva server. If you would like to deploy your 
                                # custom Joint Intent and Slot model use the `--domain_name` parameter in 
                                # ServiceMaker's `riva-build intent_slot` command.

resp = riva_nlp.AnalyzeIntent(req)
print(resp)



In [None]:
# Below is an example where the input domain is not provided.

req = rnlp.AnalyzeIntentRequest()
req.query = "Is it going to rain tomorrow?"

        # The input query is first routed to the a text classification model called "riva_text_classification_domain"
        # The output class label of "riva_text_classification_domain" is appended to "riva_intent_"
        # to get the appropriate Intent Slot model to execute for the input query.
        # Note: The model "riva_text_classification_domain" needs to be loaded into Riva server and have the appropriate
        # class labels that would invoke the corresponding intent slot model.

resp = riva_nlp.AnalyzeIntent(req)
print(resp)

In [None]:
# Some weather Intent queries grouped together
queries = [
    "Is it currently cloudy in Tokyo?",
    "What is the annual rainfall in Pune?",
    "What is the humidity going to be tomorrow?"
]
for q in queries:
    req = rnlp.AnalyzeIntentRequest()
    req.query = q
    start = time()
    resp = riva_nlp.AnalyzeIntent(req)

    print(f"[{resp.intent.class_name}]\t{req.query}")

## Exercise

Use the Intent Slot classification API to classify individual Intents and Slots for the following utterances:


- I need to catch the early train tomorrow evening.
- I am very tired after all the work done yesterday.
- Will it rain tomorrow in New York?
- Is it currently sunny in Chicago?

In [None]:
##first, import required libraries



## next, setup service configuration


## next, setup request variables

