Copyright 2022 NVIDIA Corporation. All Rights Reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

<img src="http://developer.download.nvidia.com/compute/machine-learning/frameworks/nvidia_logo.png" style="width: 90px; float: center;">

# Triton Inference Server

<img src="https://developer.nvidia.com/sites/default/files/akamai/triton.png" width=600 height=600>

## Overview

This notebook along with accompanying triton_client.ipynb notebook demonstrate deploying an Object Detection Model with the Triton Inference Server.

## Imports

In [1]:
# enlarge scrollable output size
from IPython.display import display, HTML
display(HTML("<style>.jp-CodeCell.jp-mod-outputsScrolled .jp-Cell-outputArea { max-height: 24em; }</style>"))

## Setup <a class="anchor" id="Setup"></a>

To begin, check that the NVIDIA driver has been installed correctly. The `nvidia-smi` command should run and output information about the GPUs on your system:

In [2]:
!nvidia-smi

Thu Jun  2 00:08:28 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.19.01    Driver Version: 465.19.01    CUDA Version: 11.3     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA Tesla V1...  On   | 00000000:07:00.0 Off |                    0 |
| N/A   46C    P0    38W / 300W |    282MiB / 32505MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [3]:
!nvidia-smi -L

GPU 0: NVIDIA Tesla V100-DGXS-32GB (UUID: GPU-0f04e528-7346-af7f-c217-804407855015)


## Start the Triton Server

Let's go ahead and start the triton server in [polling mode](https://github.com/triton-inference-server/server/blob/8d02c9e8f1075baf525614120514ec2e12d3e842/docs/model_management.md#model-control-mode-poll) where changes to the model repository will be detected and Triton will attempt to load and unload models as necessary based on those changes.

In [4]:
!tritonserver  --model-repository=/workspace/model_repository --model-control-mode=POLL 

I0602 00:08:30.381931 159 libtorch.cc:1381] TRITONBACKEND_Initialize: pytorch
I0602 00:08:30.382019 159 libtorch.cc:1391] Triton TRITONBACKEND API version: 1.9
I0602 00:08:30.382030 159 libtorch.cc:1397] 'pytorch' TRITONBACKEND API version: 1.9
2022-06-02 00:08:30.578741: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
I0602 00:08:30.622366 159 tensorflow.cc:2181] TRITONBACKEND_Initialize: tensorflow
I0602 00:08:30.622403 159 tensorflow.cc:2191] Triton TRITONBACKEND API version: 1.9
I0602 00:08:30.622416 159 tensorflow.cc:2197] 'tensorflow' TRITONBACKEND API version: 1.9
I0602 00:08:30.622424 159 tensorflow.cc:2221] backend configuration:
{}
I0602 00:08:30.624516 159 onnxruntime.cc:2400] TRITONBACKEND_Initialize: onnxruntime
I0602 00:08:30.624536 159 onnxruntime.cc:2410] Triton TRITONBACKEND API version: 1.9
I0602 00:08:30.624545 159 onnxruntime.cc:2416] 'onnxruntime' TRITONBACKEND API version: 1.9
I0602 00:08:30.624

The above command should load the model from the model directory and print the log `successfully loaded 'distillbert_pt' version 1` along with `distillbert_trt' version 1`

```
+-----------------+---------+--------+
| Model           | Version | Status |
+-----------------+---------+--------+
| distillbert_pt  | 1       | READY  |
+-----------------+---------+--------+
| distillbert_trt | 1       | READY  |
+-----------------+---------+--------+
```

Triton server listens on the following endpoints:

```
Port 8000    -> HTTP Service
Port 8001    -> GRPC Service
Port 8002    -> Metrics
```

## Open The Client Notebook

Now please open `triton_client.ipynb` and run through its cells.