Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

# Deploying a web service to Azure Kubernetes Service (AKS)
In this notebook, we show the following steps for deploying a web service using AML:
- Create an image
- Test image locally
- Provision an AKS cluster (one time action)
- Deploy the service
- Test the web service.

In [1]:
import pandas as pd
from utilities import text_to_json
import requests
import numpy as np
import json
from azureml.core import Workspace
from azureml.core.compute import AksCompute, ComputeTarget
from azureml.core.webservice import Webservice, AksWebservice
from azureml.core.image import Image
from azureml.core.model import Model
from azureml.core.workspace import Workspace
from azureml.core.conda_dependencies import CondaDependencies
from dotenv import set_key, get_key, find_dotenv

In [45]:
env_path = find_dotenv(raise_error_if_not_found=True)

AML will use the following information to create an image, provision a cluster and deploy a service. Replace the values in the following cell with your information.

In [3]:
# image_name = '<YOUR_IMAGE_NAME>'
# aks_service_name = '<YOUR_AKS_SERVICE_NAME>'
# aks_name = '<YOUR_AKS_NAME>'
# aks_location = '<YOUR_AKS_LOCATION>'
image_name = "lgbmimage"
aks_service_name ="lgbmservice"
aks_name = "fboylucpuaks"
aks_location = "eastus"

In [4]:
set_key(env_path, "image_name", image_name)
set_key(env_path, "aks_service_name", aks_service_name)
set_key(env_path, "aks_name", aks_name)
set_key(env_path, "aks_location", aks_location)

(True, 'aks_location', 'eastus')

## Get workspace
Load existing workspace from the config file.

In [46]:
ws = Workspace.from_config()
print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep="\n")

Found the config file in: /datadrive/MachineLearningNotebooks/mlaksdeployment/aml_config/config.json
fboyluamlsdkws
fboyluamlsdkrg
eastus2
edf507a2-6235-46c5-b560-fd463ba2e771


## Load model

In [48]:
model_name = 'question_match_model'
model_version = int(get_key(env_path, 'model_version'))
model = Model(ws, name=model_name, version=model_version)
print(model.name, model.version)

question_match_model 4


## Create an image
We will now modify the `score.py` created in the previous notebook for the `init()` function to use the model we registered to the workspace earlier.

In [57]:
%%writefile score.py

import sys
import pandas as pd
import json
from duplicate_model import DuplicateModel
import logging
import timeit as t
from azureml.core.model import Model
sys.path.append('./scripts/')

def init():
    logger = logging.getLogger("scoring_script")
    global model
    model_name = 'question_match_model'
    model_path = Model.get_model_path(model_name)
    questions_path = './data_folder/questions.tsv'
    start = t.default_timer()
    model = DuplicateModel(model_path, questions_path)
    end = t.default_timer()
    loadTimeMsg = "Model loading time: {0} ms".format(round((end-start)*1000, 2))
    logger.info(loadTimeMsg)

def run(body):
    logger = logging.getLogger("scoring_script")
    json_load_text = json.loads(body)
    text_to_score = json_load_text['input']
    start = t.default_timer()
    resp = model.score(text_to_score) 
    end = t.default_timer()
    logger.info("Prediction took {0} ms".format(round((end-start)*1000, 2)))
    return(json.dumps(resp))

Overwriting score.py


Let's specifiy the conda and pip dependencies for the image.

In [59]:
conda_pack = ["scikit-learn==0.19.1", "pandas==0.23.3"]
requirements = ["lightgbm==2.1.2", "azureml-defaults"]

In [60]:
lgbmenv = CondaDependencies.create(conda_packages=conda_pack, pip_packages=requirements)

with open("lgbmenv.yml", "w") as f:
    f.write(lgbmenv.serialize_to_string())

In [61]:
from azureml.core.image import ContainerImage

image_config = ContainerImage.image_configuration(
    execution_script="score.py",
    runtime="python",
    conda_file="lgbmenv.yml",
    description="Image with lightgbm model",
    tags={"area": "text", "type": "lightgbm"},
    dependencies=[
        "./data_folder/questions.tsv",
        "./duplicate_model.py",
        "./scripts/ItemSelector.py",
    ],
)

image = ContainerImage.create(
    name=image_name,
    # this is the model object
    models=[model],
    image_config=image_config,
    workspace=ws,
)

Creating image


In [62]:
%%time
image.wait_for_creation(show_output = True)

Running...........................................
SucceededImage creation operation finished for image lgbmimage:3, operation "Succeeded"
CPU times: user 880 ms, sys: 23.6 ms, total: 903 ms
Wall time: 3min 50s


In [63]:
print(image.name, image.version)

lgbmimage 3


In [64]:
image_version = str(image.version)
set_key(env_path, "image_version", image_version)

(True, 'image_version', '3')

You can find the logs of image creation in the following location.

In [14]:
image.image_build_log_uri

'https://eastus2ice.blob.core.windows.net/logs/fboyluamlsdkws7798851753_7704ebdfbaca4d1599c1e2b2c694ce79.txt?sp=r&sv=2017-04-17&sig=nQaDaK28OfITA9WV2mYRdu84c0LhIh6im3E3GZCj9nw%3D&se=2019-01-05T18%3A57%3A15Z&sr=b'

## Test image locally

Now, let's use one of the duplicate questions to test our image.

In [15]:
dupes_test_path = './data_folder/dupes_test.tsv'
dupes_test = pd.read_csv(dupes_test_path, sep='\t', encoding='latin1')
text_to_score = dupes_test.iloc[0,4]
text_to_score

"javascript arrays as objects.  possible duplicate: length of javascript object (ie. associative array) loop through javascript object    i'm trying to make an array, where each item has some name and value. the code above doesn't work. tryed to make an object, but it doesn't have a length property - no for loop.  is it possible to use arrays in this context?"

In [16]:
jsontext = text_to_json(text_to_score)

In [65]:
%%time
image.run(input_data=jsontext)

Pulling image from ACR (this may take a few minutes depending on image size)...

{"status":"Pulling from lgbmimage","id":"3"}
{"status":"Already exists","progressDetail":{},"id":"d16fc20e681d"}
{"status":"Already exists","progressDetail":{},"id":"63f1c4e4b781"}
{"status":"Already exists","progressDetail":{},"id":"fb4d3bb1280c"}
{"status":"Already exists","progressDetail":{},"id":"8038895a6c2e"}
{"status":"Already exists","progressDetail":{},"id":"43fa392d948c"}
{"status":"Already exists","progressDetail":{},"id":"d90641363107"}
{"status":"Already exists","progressDetail":{},"id":"0b45f0cb3349"}
{"status":"Already exists","progressDetail":{},"id":"1f7e74423a3e"}
{"status":"Already exists","progressDetail":{},"id":"1bd51a78feb1"}
{"status":"Already exists","progressDetail":{},"id":"6bed8e907975"}
{"status":"Already exists","progressDetail":{},"id":"98269d7bea6a"}
{"status":"Already exists","progressDetail":{},"id":"3680e652f6b2"}
{"status":"Already exists","progressDetail":{},"id":"42ee6

{"status":"Pull complete","progressDetail":{},"id":"d82577b5b67d"}


{"status":"Verifying Checksum","progressDetail":{},"id":"2c9ac2e97a04"}
{"status":"Download complete","progressDetail":{},"id":"2c9ac2e97a04"}
{"status":"Extracting","progressDetail":{"current":557056,"total":71423825},"progress":"[\u003e                                                  ]  557.1kB/71.42MB","id":"2c9ac2e97a04"}
{"status":"Extracting","progressDetail":{"current":2785280,"total":71423825},"progress":"[=\u003e                                                 ]  2.785MB/71.42MB","id":"2c9ac2e97a04"}
{"status":"Extracting","progressDetail":{"current":5013504,"total":71423825},"progress":"[===\u003e                                               ]  5.014MB/71.42MB","id":"2c9ac2e97a04"}
{"status":"Extracting","progressDetail":{"current":7241728,"total":71423825},"progress":"[=====\u003e                                             ]  7.242MB/71.42MB","id":"2c9ac2e97a04"}


{"status":"Pull complete","progressDetail":{},"id":"2c9ac2e97a04"}
{"status":"Extracting","progressDetail":{"current":32768,"total":323082},"progress":"[=====\u003e                                             ]  32.77kB/323.1kB","id":"b5b3c585a6e3"}
{"status":"Pull complete","progressDetail":{},"id":"b5b3c585a6e3"}
{"status":"Digest: sha256:d875b7f6e0e0bc380e225a5680dc6f6481c25715a133d35ebedd27d956ffe27e"}
{"status":"Status: Downloaded newer image for fboyluamlsdkws7798851753.azurecr.io/lgbmimage:3"}
Starting Docker container...
Checking container health...
Making a scoring call...
Scoring result:
[[5223, 6700, 0.9404882121467983], [11922383, 11922384, 0.6376984742439135], [750486, 750506, 0.0022961800568457157], [684672, 684692, 0.00027900736310522206], [171251, 171256, 0.00019333518502312696], [1584370, 1584377, 0.00016380270604214681], [4057440, 4060176, 0.0001498665973955501], [5187530, 5187652, 8.410158019842015e-05], [2241875, 2241883, 5.764723610306164e-05], [5117127, 5117172, 2

Resources have been successfully cleaned up.
CPU times: user 153 ms, sys: 21.3 ms, total: 175 ms
Wall time: 36.8 s


'[[5223, 6700, 0.9404882121467983], [11922383, 11922384, 0.6376984742439135], [750486, 750506, 0.0022961800568457157], [684672, 684692, 0.00027900736310522206], [171251, 171256, 0.00019333518502312696], [1584370, 1584377, 0.00016380270604214681], [4057440, 4060176, 0.0001498665973955501], [5187530, 5187652, 8.410158019842015e-05], [2241875, 2241883, 5.764723610306164e-05], [5117127, 5117172, 2.1329214222194568e-05], [126100, 4889658, 2.0901140965526787e-05], [12953704, 12953750, 1.777154932479563e-05], [1885557, 1885660, 1.4987021596611661e-05], [8495687, 8495740, 9.791005034068572e-06], [1129216, 1129270, 8.664625464258679e-06], [4255472, 4255480, 4.568289027003899e-06], [7364150, 7364307, 4.000594074848036e-06], [7837456, 14853974, 3.860920440981213e-06], [5891840, 5891929, 3.854145618824322e-06], [3583724, 3583740, 3.634922709192637e-06], [1451009, 1451043, 2.813801717908391e-06], [6487366, 6487376, 2.454908335371808e-06], [2274242, 2274327, 1.5704565243474771e-06], [85992, 86014, 1

# Provision the AKS Cluster
This is a one time setup. You can reuse this cluster for multiple deployments after it has been created. If you delete the cluster or the resource group that contains it, then you would have to recreate it.

In [18]:
# Use a configuration of 2 VMs
prov_config = AksCompute.provisioning_configuration(
    agent_count=2, vm_size="Standard_D4_v2", location=aks_location
)

# Create the cluster
aks_target = ComputeTarget.create(
    workspace=ws, name=aks_name, provisioning_configuration=prov_config
)

In [19]:
%%time
aks_target.wait_for_completion(show_output = True)
print(aks_target.provisioning_state)
print(aks_target.provisioning_errors)

Creating.............................................................................................................................................
SucceededProvisioning operation finished, operation "Succeeded"
Succeeded
None
CPU times: user 2.3 s, sys: 114 ms, total: 2.42 s
Wall time: 12min 30s


Let's check that the cluster is created successfully.

In [42]:
aks_status = aks_target.get_status()

In [43]:
assert aks_status == 'Succeeded', 'AKS failed to create'

# Deploy web service to AKS

Next, we deploy the web service. We deploy two pods with 1 cpu core each.

In [22]:
#Set the web service configuration 
aks_config = AksWebservice.deploy_configuration(num_replicas=2, cpu_cores=1)

In [23]:
%%time
aks_service = Webservice.deploy_from_image(
    workspace=ws,
    name=aks_service_name,
    image=image,
    deployment_config=aks_config,
    deployment_target=aks_target,
)
aks_service.wait_for_deployment(show_output=True)
print(aks_service.state)

Creating service
Running.........................
SucceededAKS service creation operation finished, operation "Succeeded"
Healthy
CPU times: user 658 ms, sys: 33.7 ms, total: 691 ms
Wall time: 2min 43s


# Test the web service
We now test the web sevice.

In [25]:
%%time
prediction = aks_service.run(input_data = jsontext)
print(prediction)

[[5223, 6700, 0.9404882121467983], [11922383, 11922384, 0.6376984742439135], [750486, 750506, 0.0022961800568457157], [684672, 684692, 0.00027900736310522206], [171251, 171256, 0.00019333518502312696], [1584370, 1584377, 0.00016380270604214681], [4057440, 4060176, 0.0001498665973955501], [5187530, 5187652, 8.410158019842015e-05], [2241875, 2241883, 5.764723610306164e-05], [5117127, 5117172, 2.1329214222194568e-05], [126100, 4889658, 2.0901140965526787e-05], [12953704, 12953750, 1.777154932479563e-05], [1885557, 1885660, 1.4987021596611661e-05], [8495687, 8495740, 9.791005034068572e-06], [1129216, 1129270, 8.664625464258679e-06], [4255472, 4255480, 4.568289027003899e-06], [7364150, 7364307, 4.000594074848036e-06], [7837456, 14853974, 3.860920440981213e-06], [5891840, 5891929, 3.854145618824322e-06], [3583724, 3583740, 3.634922709192637e-06], [1451009, 1451043, 2.813801717908391e-06], [6487366, 6487376, 2.454908335371808e-06], [2274242, 2274327, 1.5704565243474771e-06], [85992, 86014, 1.

Let's try a few more duplicate questions and display their top 3 original matches. Let's first get the scoring URL and and API key for the web service.

In [26]:
scoring_url = aks_service.scoring_uri
api_key = aks_service.get_keys()[0]

In [27]:
headers = {'content-type': 'application/json', 'Authorization':('Bearer '+ api_key)}
r = requests.post(scoring_url, data=jsontext, headers=headers) # Run the request twice since the first time takes a 
%time r = requests.post(scoring_url, data=jsontext, headers=headers) # little longer due to the loading of the model
print(r)
r.json()

CPU times: user 2.13 ms, sys: 0 ns, total: 2.13 ms
Wall time: 127 ms
<Response [200]>


'[[5223, 6700, 0.9404882121467983], [11922383, 11922384, 0.6376984742439135], [750486, 750506, 0.0022961800568457157], [684672, 684692, 0.00027900736310522206], [171251, 171256, 0.00019333518502312696], [1584370, 1584377, 0.00016380270604214681], [4057440, 4060176, 0.0001498665973955501], [5187530, 5187652, 8.410158019842015e-05], [2241875, 2241883, 5.764723610306164e-05], [5117127, 5117172, 2.1329214222194568e-05], [126100, 4889658, 2.0901140965526787e-05], [12953704, 12953750, 1.777154932479563e-05], [1885557, 1885660, 1.4987021596611661e-05], [8495687, 8495740, 9.791005034068572e-06], [1129216, 1129270, 8.664625464258679e-06], [4255472, 4255480, 4.568289027003899e-06], [7364150, 7364307, 4.000594074848036e-06], [7837456, 14853974, 3.860920440981213e-06], [5891840, 5891929, 3.854145618824322e-06], [3583724, 3583740, 3.634922709192637e-06], [1451009, 1451043, 2.813801717908391e-06], [6487366, 6487376, 2.454908335371808e-06], [2274242, 2274327, 1.5704565243474771e-06], [85992, 86014, 1

In [28]:
dupes_to_score = dupes_test.iloc[:5,4]

In [29]:
results = [
    requests.post(scoring_url, data=text_to_json(text), headers=headers)
    for text in dupes_to_score
]

Let's print top 3 matches for each duplicate question.

In [30]:
[eval(results[i].json())[0:3] for i in range(0, len(results))]

[[[5223, 6700, 0.9404882121467983],
  [11922383, 11922384, 0.6376984742439135],
  [750486, 750506, 0.0022961800568457157]],
 [[5223, 6700, 0.9974832578943763],
  [126100, 4889658, 0.9609776698436617],
  [11922383, 11922384, 0.7859830495520623]],
 [[14220321, 14220323, 0.6907445003282665],
  [750486, 750506, 0.03507013799483895],
  [901115, 901144, 0.027122404990697006]],
 [[23667086, 23667087, 0.10251596753609168],
  [1726630, 1726662, 0.036872354024837226],
  [14220321, 14220323, 0.033512740514949776]],
 [[203198, 1207393, 0.7317257816549773],
  [31044, 31047, 0.2857031454734008],
  [2631001, 2631198, 0.024909238985021308]]]

Next let's quickly check what the request response performance is for the deployed model on AKS cluster.

In [31]:
text_data = list(map(text_to_json, dupes_to_score))  # Retrieve the text data

In [32]:
timer_results = list()
for text in text_data:
    res=%timeit -r 1 -o -q requests.post(scoring_url, data=text, headers=headers)
    timer_results.append(res.best)

In [33]:
timer_results

[0.1300588452257216,
 0.13472380437888204,
 0.12789431917481126,
 0.1270435548387468,
 0.1228925019968301]

In [34]:
print("Average time taken: {0:4.2f} ms".format(10 ** 3 * np.mean(timer_results)))

Average time taken: 128.52 ms


Next, we will test the [throughput of the web service](05_Speed_Test_WebApp.ipynb).