# Deploy to Triton Inference Server locally

description: (preview) deploy an image classification model trained on densenet locally via Triton

Please note that this Public Preview release is subject to the [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/).

In [1]:
from azureml.core import Workspace

ws = Workspace.from_config()
ws

Workspace.create(name='default', subscription_id='6560575d-fa06-4e7d-95fb-f962e74efd7a', resource_group='azureml-examples')

## Download model

It's important that your model have this directory structure for Triton Inference Server to be able to load it. [Read more about the directory structure that Triton expects](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/model_repository.html).

In [2]:
import git
import os
import sys

from pathlib import Path

# get the root of the repo
prefix = Path(git.Repo(".", search_parent_directories=True).working_tree_dir)

# Enables us to import helper functions as Python modules
path_to_insert = prefix.joinpath("code", "deployment", "triton").__str__()
if path_to_insert not in sys.path:
    sys.path.insert(1, path_to_insert)

from model_utils import download_triton_models, delete_triton_models


download_triton_models(prefix)

successfully downloaded model: densenet_onnx
successfully downloaded model: bidaf-9


## Register model

A registered model is a logical container stored in the cloud, containing all files located at `model_path`, which is associated with a version number and other metadata.

In [3]:
from azureml.core.model import Model

model_path = prefix.joinpath("models")

model = Model.register(
    model_path=model_path,
    model_name="densenet-onnx-example",
    tags={"area": "Image classification", "type": "classification"},
    description="Image classification trained on Imagenet Dataset",
    workspace=ws,
)

print(model)

Registering model densenet-onnx-example
Model(workspace=Workspace.create(name='default', subscription_id='6560575d-fa06-4e7d-95fb-f962e74efd7a', resource_group='azureml-examples'), name=densenet-onnx-example, id=densenet-onnx-example:531, version=531, tags={'area': 'Image classification', 'type': 'classification'}, properties={})


## Deploy webservice

In this case we deploy to the local compute, but for other options, see [our documentation](https://docs.microsoft.com/azure/machine-learning/how-to-deploy-and-where?tabs=azcli)


In [8]:
from azureml.core.webservice import LocalWebservice
from azureml.core import Environment
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.model import InferenceConfig
from random import randint

service_name = "triton-densenet-onnx-local" + str(randint(10000, 99999))\

# This doesn't work because the install order is not respected
# env = Environment.get(ws, "AzureML-Triton").clone("triton-example")

# for pip_package in ["pillow", "nvidia-pyindex", "tritonclient[http]"]:
#     env.python.conda_dependencies.add_pip_package(pip_package)


env = Environment("triton-example")
env.docker.base_image = None
env.docker.base_dockerfile=prefix.joinpath("notebooks", "triton", "docker", "Dockerfile")
env.python.user_managed_dependencies=True
env.python.interpreter_path='/opt/miniconda/bin/python'
env.inferencing_stack_version='latest'


# conda_dep.add_pip_package('nvidia-pyindex')
# conda_dep.add_pip_package('tritonclient')
# conda_dep.add_pip_package('pillow')
#env.python.conda_dependencies = conda_dep
env.environment_variables['WORKER_COUNT']='1'

inference_config = InferenceConfig(
    # this entry script is where we dispatch a call to the Triton server
    entry_script="score_densenet.py",
    source_directory=prefix.joinpath("code", "deployment", "triton"),
    environment=env,
)

config = LocalWebservice.deploy_configuration(port=6789)

service = Model.deploy(
    workspace=ws,
    name=service_name,
    models=[model],
    inference_config=inference_config,
    deployment_config=config,
    overwrite=True,
)

service.wait_for_deployment(show_output=True)

Downloading model densenet-onnx-example:531 to /tmp/azureml_5wnq75zm/densenet-onnx-example/531
Generating Docker build context.
Package creation Succeeded
Logging into Docker registry 0e14976437204610b0f33e3f974544ac.azurecr.io
Logging into Docker registry 0e14976437204610b0f33e3f974544ac.azurecr.io
Building Docker image from Dockerfile...
Step 1/5 : FROM 0e14976437204610b0f33e3f974544ac.azurecr.io/azureml/azureml_4c2f3a03e46a59f88531c71f6ec2e688
 ---> 9488af85a207
Step 2/5 : COPY azureml-app /var/azureml-app
 ---> 434e8e2f03da
Step 3/5 : RUN mkdir -p '/var/azureml-app' && echo eyJhY2NvdW50Q29udGV4dCI6eyJzdWJzY3JpcHRpb25JZCI6IjY1NjA1NzVkLWZhMDYtNGU3ZC05NWZiLWY5NjJlNzRlZmQ3YSIsInJlc291cmNlR3JvdXBOYW1lIjoiYXp1cmVtbC1leGFtcGxlcyIsImFjY291bnROYW1lIjoiZGVmYXVsdCIsIndvcmtzcGFjZUlkIjoiMGUxNDk3NjQtMzcyMC00NjEwLWIwZjMtM2UzZjk3NDU0NGFjIn0sIm1vZGVscyI6e30sIm1vZGVsc0luZm8iOnt9fQ== | base64 --decode > /var/azureml-app/model_config_map.json
 ---> Running in a47d1ae19f73
 ---> c8c2f3d0ac4e
Step 4/5 :

WebserviceException: WebserviceException:
	Message: Error: Container has crashed. Did your init method fail?
	InnerException None
	ErrorResponse 
{
    "error": {
        "message": "Error: Container has crashed. Did your init method fail?"
    }
}

In [26]:
service.reload()

Container has been successfully cleaned up.
Starting Docker container...
Docker container running.


## Test the webservice

In [19]:
import requests

headers = {"Content-Type": "application/octet-stream"}

data_file = prefix.joinpath("data", "raw", "images", "peacock.jpg")
test_sample = open(data_file, "rb").read()
resp = requests.post(service.scoring_uri, data=test_sample, headers=headers)
print(resp.text)

{"message": "Expects Content-Type to be application/json"}


In [9]:
print(service.get_logs())

2020-11-06T19:38:30,208456800+00:00 - iot-server/run 
2020-11-06T19:38:30,208483800+00:00 - gunicorn/run 
2020-11-06T19:38:30,209136300+00:00 - rsyslog/run 
2020-11-06T19:38:30,212484900+00:00 - triton/run 
2020-11-06T19:38:30,213213200+00:00 - nginx/run 
./run: line 14: exec: gunicorn: not found

== Triton Inference Server ==

NVIDIA Release 20.08 (build 15533555)

Copyright (c) 2018-2020, NVIDIA CORPORATION.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.
2020-11-06T19:38:30,266443500+00:00 - gunicorn/finish 127 0
2020-11-06T19:38:30,268800800+00:00 - Exit code 127 is not normal. Killing image.



## Delete the webservice and the downloaded model

In [None]:
service.delete()
delete_triton_models(prefix)

# Next steps

Try changing the deployment configuration to [deploy to Azure Kubernetes Service](https://docs.microsoft.com/azure/machine-learning/how-to-deploy-azure-kubernetes-service?tabs=python) for higher availability and better scalability.