# Deploy to Triton Inference Server locally

description: (preview) deploy an image classification model trained on densenet locally via Triton

Please note that this Public Preview release is subject to the [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/).

In [None]:
!pip install nvidia-pyindex
!pip install --upgrade tritonclient

In [None]:
from azureml.core import Workspace

ws = Workspace.from_config()
ws

## Download model

It's important that your model have this directory structure for Triton Inference Server to be able to load it. [Read more about the directory structure that Triton expects](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/model_repository.html).

In [1]:
import os
import sys
from pathlib import Path
from src.model_utils import download_triton_models, delete_triton_models

prefix = Path(".")
download_triton_models(prefix)

successfully downloaded model: densenet_onnx
successfully downloaded model: bidaf-9


## Register model

A registered model is a logical container stored in the cloud, containing all files located at `model_path`, which is associated with a version number and other metadata.

In [None]:
from azureml.core.model import Model

model_path = "models"

model = Model.register(
    model_path=model_path,
    model_name="densenet-onnx-example",
    tags={"area": "Image classification", "type": "classification"},
    description="Image classification trained on Imagenet Dataset",
    workspace=ws,
)

print(model)

## Deploy webservice

In this case we deploy to the local compute, but for other options, see [our documentation](https://docs.microsoft.com/azure/machine-learning/how-to-deploy-and-where?tabs=azcli).


In [19]:
!docker build . -t mytriton

[1A[1B[0G[?25l[+] Building 0.0s (0/1)                                                         
[?25h[1A[0G[?25l[+] Building 0.1s (2/2)                                                         
[34m => [internal] load build definition from Dockerfile                       0.0s
[0m[34m => => transferring dockerfile: 210B                                       0.0s
[0m[34m => [internal] load .dockerignore                                          0.0s
[0m[34m => => transferring context: 2B                                            0.0s
[0m[?25h[1A[1A[1A[1A[1A[0G[?25l[+] Building 0.3s (5/6)                                                         
[34m => [internal] load build definition from Dockerfile                       0.0s
[0m[34m => => transferring dockerfile: 210B                                       0.0s
[0m[34m => [internal] load .dockerignore                                          0.0s
[0m[34m => => transferring context: 2B                        

In [19]:
!docker run -d --rm -p8000:8000 -p8001:8001 -p8002:8002 -v/home/gopalv/azureml-examples/experimental/deploy-triton/models/triton:/models --env AZUREML_MODEL_DIR=/models mytriton

645efa955ded22cb0de82822d4f8f05c1fa255047295b44c9bcb103789bff346


## Test the webservice

In [4]:
import numpy as np
import requests
import io
from PIL import Image

import tritonclient.http as tritonhttpclient

headers = {"Content-Type": "application/octet-stream"}

test_sample = requests.get("https://aka.ms/peacock-pic", allow_redirects=True).content
#img = Image.open(io.BytesIO(test_sample))
test=np.array([test_sample], dtype=bytes)
test = np.stack(test, axis=0)
input = tritonhttpclient.InferInput('img_in_bytes', test.shape, 'BYTES')
input.set_data_from_numpy(test)
inputs = [input]
client = tritonhttpclient.InferenceServerClient("localhost:8000")
outputs = [tritonhttpclient.InferRequestedOutput('label')]
result = client.infer(model_name='ensemble', inputs=inputs, request_id='1', outputs=outputs)
#resp = requests.post(service.scoring_uri, data=test_sample, headers=headers)
#print(resp.text)

## Delete the webservice and the downloaded model

In [11]:
print(result.as_numpy('label'))


[b'PEACOCK']


In [None]:
service.delete()
delete_triton_models(prefix)

# Next steps

Try changing the deployment configuration to [deploy to Azure Kubernetes Service](https://docs.microsoft.com/azure/machine-learning/how-to-deploy-azure-kubernetes-service?tabs=python) for higher availability and better scalability.

In [12]:
import numpy as np
import requests
import io
from PIL import Image
import gevent.ssl

import tritonclient.http as tritonhttpclient

headers = {"Authorization": "Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6IkZEQTE5MzczRjI4OUQzMDY2ODUzNkNDOUFDRkYyQzg4RjA2MEQwNDMiLCJ0eXAiOiJKV1QifQ.eyJjYW5SZWZyZXNoIjoiRmFsc2UiLCJ3b3Jrc3BhY2VJZCI6ImM5OTdlMWQwLTg4ZmItNDc3ZS1hMDM1LTZiZDkwYmNjMTNlOCIsInRpZCI6IjcyZjk4OGJmLTg2ZjEtNDFhZi05MWFiLTJkN2NkMDExZGI0NyIsIm9pZCI6ImZmZmMxYzY2LTI3NWYtNDkzNS1iYjA0LTcwYTc2MGM4MmZkYSIsImFjdGlvbnMiOiJbXCJNaWNyb3NvZnQuTWFjaGluZUxlYXJuaW5nU2VydmljZXMvd29ya3NwYWNlcy9yZWFkXCIsXCJNaWNyb3NvZnQuTWFjaGluZUxlYXJuaW5nU2VydmljZXMvd29ya3NwYWNlcy9zZXJ2aWNlcy9ha3Mvc2NvcmUvYWN0aW9uXCJdIiwiZW5kcG9pbnROYW1lIjoiZ29wYWx2LWN1c3RvbS1jb250YWluZXIyIiwic2VydmljZUlkIjoiZ29wYWx2LWN1c3RvbS1jb250YWluZXIyIiwiZXhwIjoxNjE4OTI3MDI5LCJpc3MiOiJhenVyZW1sIiwiYXVkIjoiYXp1cmVtbCJ9.N3_0yIKjfkAAtbX_ei2tiWA9LpH-OzIqlelbi31TSU5O24wPuiLoVIYz5wrcx-PnmZpj5YGw8WDwZr9aeJtobqS9IKUr0b6D8tuGrvrsoyH7g7zeSc8PydHWtjPB3vs_ItvXZbzu-OoWznB6DTeo09lBOCySVFyF11IbwUJJ8uMrEOZdwVizELoKLhlTWpKCplMUZK46VQX4Sl7WYWu_YSSGI0QFGJL4spsK0IpF6CiDLDSVb185DtpQyGj2rXsusDKMFg0Vu4KnfdgB-WvvGrBquwxeVMjQcxWVldtoT-bRwnxbedUYfsQJhfW66kDo0YPhY-LZzh4eDNWqJ-nL-w"}

test_sample = requests.get("https://aka.ms/peacock-pic", allow_redirects=True).content
#img = Image.open(io.BytesIO(test_sample))
test=np.array([test_sample], dtype=bytes)
test = np.stack(test, axis=0)
input = tritonhttpclient.InferInput('img_in_bytes', test.shape, 'BYTES')
input.set_data_from_numpy(test)
inputs = [input]
outputs = [tritonhttpclient.InferRequestedOutput('label')]
client = tritonhttpclient.InferenceServerClient("gopalv-custom-container2.westus2-main.inference.ml.azure.com", ssl=True,
ssl_context_factory=gevent.ssl._create_default_https_context)
client.is_server_live(headers=headers)
result = client.infer(headers=headers, model_name='ensemble', inputs=inputs, request_id='1', outputs=outputs)

In [13]:
print(result.as_numpy('label'))

[b'PEACOCK']
