<div align="center"><a href="https://www.nvidia.com/en-us/deep-learning-ai/education/"><img src="./assets/DLI_Header.png"></a></div>

<a id="structure"></a>
### Create Model Directory Structure


```
tritonserver --model-repository=/models
```


```
root@server:/models$ tree
.
├── simple-onnx-model
│   ├── 1
│   │   └── model.onnx
│   └── config.pbtxt
├── simple-pytorch-model
│   ├── 1
│   │   └── model.pt
│   └── config.pbtxt

```


In [None]:
!mkdir -p models/simple-pytorch-model
!mkdir -p models/simple-pytorch-model/1
!mkdir -p models/simple-onnx-model
!mkdir -p models/simple-onnx-model/1

<a id="model"></a>
### Define a Simple PyTorch Model



In [None]:
import torch
from torch import nn
from torchvision import models


class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.model = models.resnet50(pretrained=True)
        
    def forward(self, x):
        return self.model(x)

model = Model().eval().cuda()

Next, we'll load the ImageNet labels.

In [None]:
import json

with open('./imagenet-simple-labels.json') as file:
    labels = json.load(file)

print(labels[:5])

In [None]:
import numpy as np
from PIL import Image


image = Image.open('./assets/goldfish.jpg')
image

In [None]:
from torchvision import transforms


imagenet_mean = [0.485, 0.456, 0.406]
imagenet_std = [0.485, 0.456, 0.406]

resize = transforms.Resize((256, 256))
center_crop = transforms.CenterCrop(224)
to_tensor = transforms.ToTensor()
normalize = transforms.Normalize(mean=imagenet_mean,
                                 std=imagenet_std)

transform = transforms.Compose([resize, center_crop, to_tensor, normalize])

In [None]:
image_tensor = transform(image).unsqueeze(0).cuda()
logits = model(image_tensor)

K = 3
values, indices = torch.topk(logits, K)

values = values.detach().tolist()[0]
indices = indices.detach().tolist()[0]

for i in range(K):
    print(values[i], indices[i], labels[indices[i]])

In [None]:
class PyTorch_to_TorchScript(nn.Module):
    def __init__(self, my_model):
        super(PyTorch_to_TorchScript, self).__init__()
        self.model = my_model.model
    
    def forward(self, x):
        return self.model(x)

torchscript_model = PyTorch_to_TorchScript(model).eval().cuda()
traced_script_module = torch.jit.script(torchscript_model)
traced_script_module.save('models/simple-pytorch-model/1/model.pt')

In [None]:
dummy_input = torch.randn(1, 3, 224, 224).cuda()

input_names = ['actual_input_1'] + ['learned_%d' % i for i in range(16)]
output_names = ['output1']

torch.onnx.export(model, dummy_input, 
                  'models/simple-onnx-model/1/model.onnx', verbose=False, 
                  input_names=input_names, output_names=output_names, 
                  dynamic_axes={'actual_input_1': {0: 'batch_size'}, 'output1': {0: 'batch_size'}})

<a id="configuration"></a>
### Create Configuration File



In [None]:
configuration = """
name: "simple-pytorch-model"
platform: "pytorch_libtorch"
max_batch_size: 32
input [
 {
    name: "input__0"
    data_type: TYPE_FP32
    format: FORMAT_NCHW
    dims: [ 3, 224, 224 ]
  }
]
output {
    name: "output__0"
    data_type: TYPE_FP32
    dims: [ 1000 ]
  }
"""

with open('models/simple-pytorch-model/config.pbtxt', 'w') as file:
    file.write(configuration)

In [None]:
configuration = """
name: "simple-onnx-model"
platform: "onnxruntime_onnx"
max_batch_size: 32
input [
 {
    name: "actual_input_1"
    data_type: TYPE_FP32
    format: FORMAT_NCHW
    dims: [ 3, 224, 224 ]
  }
]
output {
    name: "output1"
    data_type: TYPE_FP32
    dims: [ 1000]
  }
"""

with open('models/simple-onnx-model/config.pbtxt', 'w') as file:
    file.write(configuration)

<a id="load"></a>
### Load Model in Triton Inference Server




In [None]:
!sleep 45

In [None]:
!curl -v triton:8000/v2/health/ready

The HTTP request returns status 200 if Triton is ready and non-200 if it is not ready.




In [None]:
!curl -v triton:8000/v2/models/simple-pytorch-model

In [None]:
!curl -v triton:8000/v2/models/simple-onnx-model

<a id="infer"></a>
### Send Inference Request to Server


In [None]:
import tritonclient.http as tritonhttpclient

In [None]:
VERBOSE = False
input_name = 'input__0'
input_shape = (1, 3, 224, 224)
input_dtype = 'FP32'
output_name = 'output__0'
model_name = 'simple-pytorch-model'
url = 'triton:8000'
model_version = '1'

In [None]:
triton_client = tritonhttpclient.InferenceServerClient(url=url, verbose=VERBOSE)
model_metadata = triton_client.get_model_metadata(model_name=model_name, model_version=model_version)
model_config = triton_client.get_model_config(model_name=model_name, model_version=model_version)

In [None]:
image_numpy = image_tensor.cpu().numpy()
print(image_numpy.shape)

In [None]:
input0 = tritonhttpclient.InferInput(input_name, input_shape, input_dtype)
input0.set_data_from_numpy(image_numpy, binary_data=False)

output = tritonhttpclient.InferRequestedOutput(output_name, binary_data=False)
response = triton_client.infer(model_name, model_version=model_version, 
                               inputs=[input0], outputs=[output])
logits = response.as_numpy(output_name)
logits = np.asarray(logits, dtype=np.float32)
print(logits.shape)

In [None]:
print(labels[np.argmax(logits)])

In [None]:
import IPython
IPython.Application.instance().kernel.do_shutdown(True)

<div align="center"><a href="https://www.nvidia.com/en-us/deep-learning-ai/education/"><img src="./assets/DLI_Header.png"></a></div>