# Deploying a web service to Azure Container Instance (ACI)

This notebook shows the steps for deploying a model as service to ACI. The workflow is similar no matter where you deploy your model:

1. Register the model.
2. Prepare to deploy. (Specify assets, usage, compute target.)
3. Deploy the model to the compute target.
4. Test the deployed model, also called a web service.
5. Consume the model using Power BI

In [1]:
from azureml.core import Workspace
from azureml.core.compute import AksCompute, ComputeTarget
from azureml.core.webservice import Webservice, AksWebservice
from azureml.core.model import Model

In [2]:
import azureml.core
print(azureml.core.VERSION)

1.4.0


# Get workspace
Load existing workspace from the config file info.

In [3]:
from azureml.core import Workspace, Dataset
from azureml.core.authentication import InteractiveLoginAuthentication

interactive_auth = InteractiveLoginAuthentication(tenant_id="19479f88-8eac-45d2-a1bf-69d33854a3fa")
# Get Workspace defined in by default config.json file
# ws = Workspace.from_config()
ws = Workspace(subscription_id="5e22d967-997b-49c7-8ca1-7ccfbf37e621",
               resource_group="rg-cbui-course532",
               workspace_name="amlwksphol",
               auth=interactive_auth)
print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\n')

amlwksphol
rg-cbui-course532
westus
5e22d967-997b-49c7-8ca1-7ccfbf37e621


# Register the model
Register an existing trained model, add description and tags.

This is the model you've already trained using manual training or using [Automated Machine Learning](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-create-portal-experiments).

In the code snippet below we're using the already trained model `original_model.pkl` that is saved in the folder that contains this notebook. We're registering this model with the name `IBM-attrition-model`. Later on we will use the same name in the scoring script.

In [5]:
#Register the model
from azureml.core.model import Model

# if the model is already registered as part of training then uncomment the line below. Make sure model is registered with the name "IBM_attrition_model"
# attrition_model = Model(ws, 'IBM_attrition_model')

# if the model is not already registered as part of training register the original_model.pkl file provided in the same folder as this notebook
model = Model.register(model_path = "model.pkl", # this points to a local file
                       model_name = "attrition_model", # this is the name the model is registered as
                       tags = {'area': "HR", 'type': "attrition"},
                       description = "Attrition model to understand attrition risk",
                       workspace = ws)

print(model.name, model.description, model.version)

Registering model attrition_model
attrition_model Attrition model to understand attrition risk 1


# Prepare to deploy

To deploy the model, you need the following items:

- **An entry script**, this script accepts requests, scores the requests by using the model, and returns the results.
- **Dependencies**, like helper scripts or Python/Conda packages required to run the entry script or model.
- **The deployment configuration** for the compute target that hosts the deployed model. This configuration describes things like memory and CPU requirements needed to run the model.

## 1. Define your entry script and dependencies

### Entry script

We will first write the entry script as shown below. Note a few points in the entry script.

The script contains two functions that load and run the model:

**init()**: Typically, this function loads the model into a global object. This function is run only once, when the Docker container for your web service is started.

When you register a model, you provide a model name that's used for managing the model in the registry. You use this name with the Model.get_model_path() method to retrieve the path of the model file or files on the local file system. If you register a folder or a collection of files, this API returns the path of the directory that contains those files.

**run(input_data)**: This function uses the model to predict a value based on the input data. Inputs and outputs of the run typically use JSON for serialization and deserialization. You can also work with raw binary data. You can transform the data before sending it to the model or before returning it to the client.

In [7]:
%%writefile score.py

import pandas as pd
from sklearn.externals import joblib
from azureml.core.model import Model

import json
import pickle
import numpy as np

from inference_schema.schema_decorators import input_schema, output_schema
from inference_schema.parameter_types.numpy_parameter_type import NumpyParameterType
from inference_schema.parameter_types.pandas_parameter_type import PandasParameterType


input_sample = pd.DataFrame(data=[{'Age': 41, 'BusinessTravel': 'Travel_Rarely', 'DailyRate': 1102, 'Department': 'Sales', 'DistanceFromHome': 1, 'Education': 2, 'EducationField': 'Life Sciences', 'EnvironmentSatisfaction': 2, 'Gender': 'Female', 'HourlyRate': 94, 'JobInvolvement': 3, 'JobLevel': 2, 'JobRole': 'Sales Executive', 'JobSatisfaction': 4, 'MaritalStatus': 'Single', 'MonthlyIncome': 5993, 'MonthlyRate': 19479, 'NumCompaniesWorked': 8, 'OverTime': 'No', 'PercentSalaryHike': 11, 'PerformanceRating': 3, 'RelationshipSatisfaction': 1, 'StockOptionLevel': 0, 'TotalWorkingYears': 8, 'TrainingTimesLastYear': 0, 'WorkLifeBalance': 1, 'YearsAtCompany': 6, 'YearsInCurrentRole': 4, 'YearsSinceLastPromotion': 0, 'YearsWithCurrManager': 5}])
output_sample = np.array([0])


def init():
    global model
    # This name is model.id of model that we want to deploy deserialize the model file back
    # into a sklearn model
    model_path = Model.get_model_path('attrition_model')
    model = joblib.load(model_path)


@input_schema('data', PandasParameterType(input_sample))
@output_schema(NumpyParameterType(output_sample))
def run(data):
    try:
        result = model.predict(data)
        return json.dumps({"result": result.tolist()})
    except Exception as e:
        result = str(e)
        return json.dumps({"error": result})

Overwriting score.py


### Automatic schema generation
To automatically generate a schema for your web service, provide a sample of the input and/or output in the constructor for one of the defined type objects. The type and sample are used to automatically create the schema. Azure Machine Learning then creates an OpenAPI (Swagger) specification for the web service during deployment.
To use schema generation, include the _inference-schema_ package in your Conda environment file.

### Define dependencies

The following YAML is the Conda dependencies file we will use for inference. If you want to use automatic schema generation, your entry script must import the inference-schema packages.

In [8]:
import os
os.makedirs("deployment-config/", exist_ok = True)

In [9]:
%%writefile deployment-config/inference-env.yml

name: project_environment
dependencies:
- python=3.6.2

- pip:
  - sklearn-pandas
  - azureml-defaults
  - azureml-core
  - inference-schema[numpy-support]
- scikit-learn
- pandas

Writing deployment-config/inference-env.yml


In [10]:
from azureml.core import Environment

# Instantiate environment
inference_env = Environment.from_conda_specification(name = "inference-env",
                                                     file_path = "deployment-config/inference-env.yml")

## 2. Define your inference configuration

The inference configuration describes how to configure the model to make predictions. This configuration isn't part of your entry script. It references your entry script and is used to locate all the resources required by the deployment. It's used later, when you deploy the model.

In [11]:
from azureml.core.model import InferenceConfig

inference_config = InferenceConfig(entry_script='score.py', environment=inference_env)

## 3. Define your deployment configuration

Before deploying your model, you must define the deployment configuration. The deployment configuration is specific to the compute target that will host the web service. The deployment configuration isn't part of your entry script. It's used to define the characteristics of the compute target that will host the model and entry script.

In [12]:
from azureml.core.webservice import AciWebservice

aciconfig = AciWebservice.deploy_configuration(cpu_cores=1,
                                               memory_gb=1,
                                               enable_app_insights=True,
                                               auth_enabled=True,
                                               tags = {'area': "HR", 'type': "attrition"}, 
                                               description='Explain predictions on employee attrition')


# Deploy Model as Webservice on Azure Container Instance

Deployment uses the inference configuration deployment configuration to deploy the models. The deployment process is similar regardless of the compute target.

In [17]:
service = Model.deploy(ws, name='predictattritionsvc', models=[model], 
                       inference_config=inference_config, deployment_config=aciconfig)
service.wait_for_deployment(True)
print(service.state)

Running..........................................................................................................................................................................................................................................
TimedOut


ERROR - Service deployment polling reached non-successful terminal state, current service state: Unhealthy
Operation ID: df99f122-5176-42d3-9aca-b141996cb479
More information can be found using '.get_logs()'
Error:
{
  "code": "DeploymentTimedOut",
  "statusCode": 504,
  "message": "The deployment operation polling has TimedOut. The service creation is taking longer than our normal time. We are still trying to achieve the desired state for the web service. Please check the webservice state for the current webservice health. You can run print(service.state) from the python SDK to retrieve the current state of the webservice."
}

ERROR - Service deployment polling reached non-successful terminal state, current service state: Unhealthy
Operation ID: df99f122-5176-42d3-9aca-b141996cb479
More information can be found using '.get_logs()'
Error:
{
  "code": "DeploymentTimedOut",
  "statusCode": 504,
  "message": "The deployment operation polling has TimedOut. The service creation is taking lo

WebserviceException: WebserviceException:
	Message: Service deployment polling reached non-successful terminal state, current service state: Unhealthy
Operation ID: df99f122-5176-42d3-9aca-b141996cb479
More information can be found using '.get_logs()'
Error:
{
  "code": "DeploymentTimedOut",
  "statusCode": 504,
  "message": "The deployment operation polling has TimedOut. The service creation is taking longer than our normal time. We are still trying to achieve the desired state for the web service. Please check the webservice state for the current webservice health. You can run print(service.state) from the python SDK to retrieve the current state of the webservice."
}
	InnerException None
	ErrorResponse 
{
    "error": {
        "message": "Service deployment polling reached non-successful terminal state, current service state: Unhealthy\nOperation ID: df99f122-5176-42d3-9aca-b141996cb479\nMore information can be found using '.get_logs()'\nError:\n{\n  \"code\": \"DeploymentTimedOut\",\n  \"statusCode\": 504,\n  \"message\": \"The deployment operation polling has TimedOut. The service creation is taking longer than our normal time. We are still trying to achieve the desired state for the web service. Please check the webservice state for the current webservice health. You can run print(service.state) from the python SDK to retrieve the current state of the webservice.\"\n}"
    }
}

## Web service schema

If you used automatic schema generation with your deployment, you can get the address of the OpenAPI specification for the service by using the `swagger_uri` property (For example, `print(service.swagger_uri)`). Use a `GET` request or open the URI in a browser to retrieve the specification.

In [18]:
print("Swagger:", service.swagger_uri)
print("Scoring URL:", service.scoring_uri)

Swagger: http://e497e2cd-66fc-493c-a247-e2b664ea6fea.westus.azurecontainer.io/swagger.json
Scoring URL: http://e497e2cd-66fc-493c-a247-e2b664ea6fea.westus.azurecontainer.io/score


# Test the deployed model

Every deployed web service provides a REST API, so you can create client applications in a variety of programming languages. If you've enabled key authentication for your service, you need to provide a service key as a `Bearer` token in your request header. If you've enabled token authentication for your service, you need to provide an Azure Machine Learning JWT token as a bearer token in your request header.

In [None]:
import json
import pandas as pd

# the sample below contains the data for an employee that is not an attrition risk
sample = pd.DataFrame(data=[{'Age': 41, 'BusinessTravel': 'Travel_Rarely', 'DailyRate': 1102, 'Department': 'Sales', 'DistanceFromHome': 1, 'Education': 2, 'EducationField': 'Life Sciences', 'EnvironmentSatisfaction': 2, 'Gender': 'Female', 'HourlyRate': 94, 'JobInvolvement': 3, 'JobLevel': 2, 'JobRole': 'Sales Executive', 'JobSatisfaction': 4, 'MaritalStatus': 'Single', 'MonthlyIncome': 5993, 'MonthlyRate': 19479, 'NumCompaniesWorked': 8, 'OverTime': 'No', 'PercentSalaryHike': 11, 'PerformanceRating': 3, 'RelationshipSatisfaction': 1, 'StockOptionLevel': 0, 'TotalWorkingYears': 8, 'TrainingTimesLastYear': 0, 'WorkLifeBalance': 1, 'YearsAtCompany': 6, 'YearsInCurrentRole': 4, 'YearsSinceLastPromotion': 0, 'YearsWithCurrManager': 5}])

# the sample below contains the data for an employee that is an attrition risk
# sample = pd.DataFrame(data=[{'Age': 49, 'BusinessTravel': 'Travel_Rarely', 'DailyRate': 1098, 'Department': 'Research & Development', 'DistanceFromHome': 4, 'Education': 2, 'EducationField': 'Medical', 'EnvironmentSatisfaction': 4, 'Gender': 'Female', 'HourlyRate': 21, 'JobInvolvement': 3, 'JobLevel': 2, 'JobRole': 'Laboratory Technician', 'JobSatisfaction': 3, 'MaritalStatus': 'Single', 'MonthlyIncome': 711, 'MonthlyRate': 2124, 'NumCompaniesWorked': 8, 'OverTime': 'Yes', 'PercentSalaryHike': 8, 'PerformanceRating': 4, 'RelationshipSatisfaction': 3, 'StockOptionLevel': 0, 'TotalWorkingYears': 2, 'TrainingTimesLastYear': 0, 'WorkLifeBalance': 3, 'YearsAtCompany': 2, 'YearsInCurrentRole': 1, 'YearsSinceLastPromotion': 0, 'YearsWithCurrManager': 1}])


# converts the sample to JSON string
sample = pd.DataFrame.to_json(sample)

# deserializes sample to a python object 
sample = json.loads(sample)

# serializes sample to JSON formatted string as expected by the scoring script
sample = json.dumps({"data":sample})

# Will automatically send API key
prediction = service.run(sample)

print(prediction)

## Using plain HTTP Requests

We can also just use regular HTTP `POST` request:

In [None]:
import requests

url = service.scoring_uri
key1, key2 = service.get_keys()
print("URL:", url)
print("Key:", key1)

headers = {'Content-Type':'application/json',
           'Authorization': 'Bearer ' + key1}
resp = requests.post(url, sample, headers=headers)

print("prediction:", resp.text)

We can now navigate to the Azure Portal, navigate to the Application Insights instance that is associated with the workspace, goto `Logs` and analyze for example the `requests` data. Your results should look simliar to this:

![](images/app_insights.png)


## Deployment to AKS

For deployment to AKS, we could also use Python, but let's use the Azure ML CLI to create a Continous Deployment pipeline!

# Consume the model using Power BI
You can also consume the model from Power BI. See details [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-consume-web-service#consume-the-service-from-power-bi).
