# Deploy to Triton Inference Server locally

description: (preview) deploy an image classification model trained on densenet locally via Triton

Please note that this Public Preview release is subject to the [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/).

In [1]:
from azureml.core import Workspace

ws = Workspace.from_config()
ws

Workspace.create(name='default', subscription_id='6560575d-fa06-4e7d-95fb-f962e74efd7a', resource_group='azureml-examples')

## Download model

It's important that your model have this directory structure for Triton Inference Server to be able to load it. [Read more about the directory structure that Triton expects](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/model_repository.html).

In [2]:
import git
import os
import sys

from pathlib import Path

# get the root of the repo
prefix = Path(git.Repo(".", search_parent_directories=True).working_tree_dir)

# Enables us to import helper functions as Python modules
path_to_insert = prefix.joinpath("code", "deployment", "triton").__str__()
if path_to_insert not in sys.path:
    sys.path.insert(1, path_to_insert)

from model_utils import download_triton_models, delete_triton_models


download_triton_models(prefix)

successfully downloaded model: densenet_onnx
successfully downloaded model: bidaf-9


## Register model

A registered model is a logical container stored in the cloud, containing all files located at `model_path`, which is associated with a version number and other metadata.

In [3]:
from azureml.core.model import Model

model_path = prefix.joinpath("models")

model = Model.register(
    model_path=model_path,
    model_name="densenet-onnx-example",
    tags={"area": "Image classification", "type": "classification"},
    description="Image classification trained on Imagenet Dataset",
    workspace=ws,
)

print(model)

Registering model densenet-onnx-example
Model(workspace=Workspace.create(name='default', subscription_id='6560575d-fa06-4e7d-95fb-f962e74efd7a', resource_group='azureml-examples'), name=densenet-onnx-example, id=densenet-onnx-example:510, version=510, tags={'area': 'Image classification', 'type': 'classification'}, properties={})


## Deploy webservice

In this case we deploy to the local compute, but for other options, see [our documentation](https://docs.microsoft.com/azure/machine-learning/how-to-deploy-and-where?tabs=azcli)


In [10]:
from azureml.core.webservice import LocalWebservice
from azureml.core import Environment
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.model import InferenceConfig
from random import randint

service_name = "triton-densenet-onnx-local" + str(randint(10000, 99999))
env = Environment("triton-example")
env.docker.base_image = None
env.docker.base_dockerfile=prefix.joinpath("notebooks", "triton", "docker", "Dockerfile")
env.python.conda_dependencies=CondaDependencies()
env.python.user_managed_dependencies=True
env.python.interpreter_path='/opt/miniconda/bin/python'
env.inferencing_stack_version='latest'


inference_config = InferenceConfig(
    # this entry script is where we dispatch a call to the Triton server
    entry_script="dummy_score.py",
    source_directory=prefix.joinpath("code", "deployment", "triton"),
    environment=env,
)

config = LocalWebservice.deploy_configuration(port=6789)

service = Model.deploy(
    workspace=ws,
    name=service_name,
    models=[model],
    inference_config=inference_config,
    deployment_config=config,
    overwrite=True,
)

service.wait_for_deployment(show_output=True)

-> 13372a84359a
Step 5/20 : RUN pip install nvidia-pyindex
 ---> Running in 185dea9994f1
Collecting nvidia-pyindex
  Downloading https://files.pythonhosted.org/packages/64/4c/dd413559179536b9b7247f15bf968f7e52b5f8c1d2183ceb3d5ea9284776/nvidia-pyindex-1.0.5.tar.gz
Building wheels for collected packages: nvidia-pyindex
  Building wheel for nvidia-pyindex (setup.py): started
  Building wheel for nvidia-pyindex (setup.py): finished with status 'done'
  Created wheel for nvidia-pyindex: filename=nvidia_pyindex-1.0.5-cp37-none-any.whl size=4171 sha256=a945ea1c16919e1ff3e98e3cf5fb8288e241cd97a8d4ff7bbd0f7f3120e950a5
  Stored in directory: /root/.cache/pip/wheels/5a/09/ce/acc25e8cebda16e490a9610b11f98c111e761d15090e9fb9a3
Successfully built nvidia-pyindex
Installing collected packages: nvidia-pyindex
Successfully installed nvidia-pyindex-1.0.5
Removing intermediate container 185dea9994f1
 ---> 0a471a54dab8
Step 6/20 : RUN pip install tritonclient[http]
 ---> Running in ded100923b55
Looking in 

In [26]:
service.reload()

Container has been successfully cleaned up.
Starting Docker container...
Docker container running.


## Test the webservice

In [28]:
import requests

headers = {"Content-Type": "application/octet-stream"}

data_file = prefix.joinpath("data", "raw", "images", "peacock.jpg")
test_sample = open(data_file, "rb").read()
resp = requests.post(service.scoring_uri, data=test_sample, headers=headers)
print(resp.text)

unexpected shape for input 'data_0' for model 'densenet_onnx'. Expected [1,3,224,224], got [3,224,224]


In [11]:
print(service.get_logs())

ilable.
   Use 'nvidia-docker run' to start this container; see
   https://github.com/NVIDIA/nvidia-docker/wiki/nvidia-docker .

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for the inference server.  NVIDIA recommends the use of the following flags:
   nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...

tritonserver: /usr/lib/x86_64-linux-gnu/libcurl.so.4: version `CURL_OPENSSL_4' not found (required by /opt/tritonserver/bin/../lib/libtritonserver.so)
2020-11-05T04:26:24,656057700+00:00 - triton/run 

== Triton Inference Server ==

NVIDIA Release 20.07 (build 14649927)

Copyright (c) 2018-2020, NVIDIA CORPORATION.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.
find: File system loop detected; ‘/usr/bin/X11’ is part of the same file system loop as ‘/usr/b

## Delete the webservice and the downloaded model

In [None]:
service.delete()
delete_triton_models(prefix)

# Next steps

Try changing the deployment configuration to [deploy to Azure Kubernetes Service](https://docs.microsoft.com/azure/machine-learning/how-to-deploy-azure-kubernetes-service?tabs=python) for higher availability and better scalability.