# Deploy to Triton Inference Server locally

description: (preview) deploy an image classification model trained on densenet locally via Triton

Please note that this Public Preview release is subject to the [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/).

In [2]:
!pip install nvidia-pyindex
!pip install --upgrade tritonclient

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already up-to-date: tritonclient in /home/gopalv/miniconda3/envs/azureml/lib/python3.7/site-packages (2.6.0)


In [3]:
from azureml.core import Workspace

ws = Workspace.from_config()
ws

Performing interactive authentication. Please follow the instructions on the terminal.
To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code EF2VYZKJQ to authenticate.
You have logged in. Now let us find all the subscriptions to which you have access...
Interactive authentication successfully completed.


Workspace.create(name='default', subscription_id='6560575d-fa06-4e7d-95fb-f962e74efd7a', resource_group='azureml-examples')

## Download model

It's important that your model have this directory structure for Triton Inference Server to be able to load it. [Read more about the directory structure that Triton expects](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/model_repository.html).

In [4]:
import os
import sys
from pathlib import Path
from src.model_utils import download_triton_models, delete_triton_models

prefix = Path(".")
download_triton_models(prefix)

successfully downloaded model: densenet_onnx
successfully downloaded model: bidaf-9


## Register model

A registered model is a logical container stored in the cloud, containing all files located at `model_path`, which is associated with a version number and other metadata.

In [5]:
from azureml.core.model import Model

model_path = "models"

model = Model.register(
    model_path=model_path,
    model_name="densenet-onnx-example",
    tags={"area": "Image classification", "type": "classification"},
    description="Image classification trained on Imagenet Dataset",
    workspace=ws,
)

print(model)

Registering model densenet-onnx-example
Model(workspace=Workspace.create(name='default', subscription_id='6560575d-fa06-4e7d-95fb-f962e74efd7a', resource_group='azureml-examples'), name=densenet-onnx-example, id=densenet-onnx-example:1484, version=1484, tags={'area': 'Image classification', 'type': 'classification'}, properties={})


## Deploy webservice

In this case we deploy to the local compute, but for other options, see [our documentation](https://docs.microsoft.com/azure/machine-learning/how-to-deploy-and-where?tabs=azcli).


In [6]:
from azureml.core.webservice import LocalWebservice
from azureml.core import Environment
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.model import InferenceConfig
from random import randint

service_name = "triton-densenet-onnx-local" + str(randint(10000, 99999))
env = Environment.get(ws, "AzureML-Triton").clone("AML-Triton-Vision")

for pip_package in ["pillow", "numpy"]:
    env.python.conda_dependencies.add_pip_package(pip_package)

env.environment_variables["WORKER_COUNT"] = "1"

env.register(ws)

my_env = Environment.get(ws, "AML-Triton-Vision")

inference_config = InferenceConfig(
    # this entry script is where we dispatch a call to the Triton server
    source_directory="src",
    entry_script="score_densenet.py",
    environment=my_env,
)

config = LocalWebservice.deploy_configuration(port=6789)

service = Model.deploy(
    workspace=ws,
    name=service_name,
    models=[model],
    inference_config=inference_config,
    deployment_config=config,
    overwrite=True,
)

service.wait_for_deployment(show_output=True)

Downloading model densenet-onnx-example:1484 to /tmp/azureml_taqrk6ar/densenet-onnx-example/1484
Generating Docker build context.
Package creation Succeeded
Logging into Docker registry 0e14976437204610b0f33e3f974544ac.azurecr.io
Logging into Docker registry 0e14976437204610b0f33e3f974544ac.azurecr.io
Building Docker image from Dockerfile...
Step 1/5 : FROM 0e14976437204610b0f33e3f974544ac.azurecr.io/azureml/azureml_f48883605024e109cfd4abf348b76179
 ---> a4e431054899
Step 2/5 : COPY azureml-app /var/azureml-app
 ---> 349862673b2f
Step 3/5 : RUN mkdir -p '/var/azureml-app' && echo eyJhY2NvdW50Q29udGV4dCI6eyJzdWJzY3JpcHRpb25JZCI6IjY1NjA1NzVkLWZhMDYtNGU3ZC05NWZiLWY5NjJlNzRlZmQ3YSIsInJlc291cmNlR3JvdXBOYW1lIjoiYXp1cmVtbC1leGFtcGxlcyIsImFjY291bnROYW1lIjoiZGVmYXVsdCIsIndvcmtzcGFjZUlkIjoiMGUxNDk3NjQtMzcyMC00NjEwLWIwZjMtM2UzZjk3NDU0NGFjIn0sIm1vZGVscyI6e30sIm1vZGVsc0luZm8iOnt9fQ== | base64 --decode > /var/azureml-app/model_config_map.json
 ---> Running in 2cbfa4391b1c
 ---> 03a071148aa9
Step 4/5

In [7]:
print(service.get_logs())

2021-01-30T00:36:00,215213000+00:00 - triton/run 
2021-01-30T00:36:00,215178200+00:00 - gunicorn/run 
2021-01-30T00:36:00,221996500+00:00 - rsyslog/run 

== Triton Inference Server ==

NVIDIA Release 20.10 (build <unknown>)

Copyright (c) 2018-2020, NVIDIA CORPORATION.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.
2021-01-30T00:36:00,321777900+00:00 - Waiting for Triton server to get ready ...

   Use 'nvidia-docker run' to start this container; see
   https://github.com/NVIDIA/nvidia-docker/wiki/nvidia-docker .

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for the inference server.  NVIDIA recommends the use of the following flags:
   nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...

E0130 00:36:02.368082 13 pinned_memory_manager.cc:192] failed to all

## Test the webservice

In [8]:
import requests

headers = {"Content-Type": "application/octet-stream"}

test_sample = requests.get("https://aka.ms/peacock-pic", allow_redirects=True).content
resp = requests.post(service.scoring_uri, data=test_sample, headers=headers)
print(resp.text)

84 : PEACOCK


## Delete the webservice and the downloaded model

In [9]:
service.delete()
delete_triton_models(prefix)

Container has been successfully cleaned up.
successfully deleted model: densenet_onnx
successfully deleted model: bidaf-9


# Next steps

Try changing the deployment configuration to [deploy to Azure Kubernetes Service](https://docs.microsoft.com/azure/machine-learning/how-to-deploy-azure-kubernetes-service?tabs=python) for higher availability and better scalability.