<a href="https://www.nvidia.com/dli"> <img src="images/DLI_Header.png" alt="Header" style="width: 400px;"/> </a>

# Building Transformer-Based Natural Language Processing Applications
### Part 3: Production Deployment

The goal of this lab is to deploy an example NLP model to a production inference server. 

<img style="float: right;" src="images/triton-diagram.jpg" width=500>

For our project, we'll use NVIDIA Triton Inference Server.  The "results" you'll get from production inference are the same as when using the model's framework directly, but using Triton has additional benefits:
* Concurrent model execution (can run multiple models simultaneously)
* Dynamic batching (better throughput)
* Model hot replacement (can update while server is running)
* Docker container available (portable)
* Multiple framework support (TensorRT, TensorFlow, PyTorch, ONNX)

## Table of Contents
1. [Exporting the Model](010_ExportingTheModel.ipynb)<br/>
    You'll learn how to:
    - Convert a model trained in PyTorch into a server-efficient format<br/>
    - Apply reduced precision and TensorRT model optimizations <br/>
2. [Hosting the Model](020_HostingTheModel.ipynb)<br/>
    You'll learn how to:
    - Deploy the model to production using an NVIDIA Triton Inference Server<br/>
    - Control some of the basic features of NVIDIA Triton via the model configuration. <br/>
    - Evaluate the impact of export format and configuration choices on performance and cost<br/>
3. [Server Performance](030_ServerPerformance.ipynb)<br/>
    You'll learn how to:
    - Evaluate the impact different Triton configuration options on serving performance<br/>
    - Monitor the performance of inference in production <br/>
4. [Using the Model](040_UsingTheModel.ipynb)<br/>
    You'll learn how to:
    - Build a simple application that can take advantage of the API exposed by Triton<br/>
    - Discuss the options for more complex application and model pipeline deployments<br/>



### JupyterLab
For this hands-on lab, we use [JupyterLab](https://jupyterlab.readthedocs.io/en/stable/) to manage our environment.  The [JupyterLab Interface](https://jupyterlab.readthedocs.io/en/stable/user/interface.html) is a dashboard that provides access to interactive iPython notebooks, as well as the folder structure of our environment and a terminal window into the Ubuntu operating system. The first view you'll see includes a **menu bar** at the top, a **file browser** in the **left sidebar**, and a **main work area** that is initially open to the "Launcher" page. 

<img src="images/jl_launcher.png">

The file browser can be navigated just like any other file explorer. A double click on any of the items will open a new tab with its content.

The main work area includes tabbed views of open files that can be closed, moved, and edited as needed. 

The notebooks, including this one, consist of a series of content and code **cells**.  To execute code in a code cell, press `Shift+Enter` or the "Run" button in the menu bar above, while a cell is highlighted. Sometimes, a content cell will get switched to editing mode. Pressing `Shift+Enter` will switch it back to a readable form.

Try executing the simple print statement in the cell below.

In [1]:
# Highlight this cell and click [Shift+Enter] to execute
print('This is just a simple print statement')

This is just a simple print statement


<a href="https://www.nvidia.com/dli"> <img src="images/DLI_Header.png" alt="Header" style="width: 400px;"/> </a>