Copyright (c) Microsoft Corporation. All rights reserved.  
Licensed under the MIT License.

# Using a Trained Model for Realtime Inference

If your system requires low-latency processing (to process a single document or small set of documents quickly), then the realtime inference is the right choice. In this notebook, we will demonstrate how to make a prediction of an input sentence with a trained [fastText](https://fasttext.cc/) model.

The outline of this notebook is as follows:

- Visualize the pipeline associated with the pipeline run.
- Get a step run with information from the visualization.
- Get the port with the trained model from the step run.
- Download the model from the port and register it to the workspace.
- Deploy the model to local/ACI/AKS.

## Prerequisites
If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the configuration Notebook located at https://github.com/Azure/MachineLearningNotebooks first. This sets you up with a working config file that has information on your workspace, subscription id, etc.


In [1]:
import os
from azureml.core import Model, Workspace, Run
from azureml.core.compute import AksCompute, ComputeTarget
from azureml.core.environment import Environment
from azureml.core.experiment import Experiment
from azureml.core.model import InferenceConfig
from azureml.core.webservice import LocalWebservice, AciWebservice, AksWebservice
from azureml.pipeline.wrapper import PipelineRun

### Connect to workspace
Create a workspace object from the existing workspace. Workspace.from_config() reads the file config.json and loads the details into an object named workspace.


In [2]:
workspace = Workspace.from_config('config.json')
print(workspace.name, workspace.resource_group, workspace.location, workspace.subscription_id,
      workspace.compute_targets.keys(), sep='\n')

DesignerDRI_EASTUS
DesignerDRI
eastus
74eccef0-4b8d-4f83-b5f9-fa100d155b22
dict_keys(['attached-aks', 'default', 'compute', 'aml-compute', 'aml-compute-gpu'])


### Get a list of experiment names from the workspace

In [3]:
exp_name_list = [exp.name for exp in Experiment.list(workspace)]
exp_name_list

['sample10',
 'sample5',
 'sample5-realtime',
 'simple10-batch',
 'pythonscript',
 'Data_dependency',
 'clement',
 'new_module',
 'test_module2',
 'test_m',
 'module_SDK_local_module_test',
 'fasttext_pipeline',
 'fasttext_batch_inference',
 'fasttext_pipeline2']

### Choose the experiment you want with its name

In [4]:
experiment_name = "fasttext_pipeline"
experiment = Experiment(workspace, experiment_name)
experiment

Name,Workspace,Report Page,Docs Page
fasttext_pipeline,DesignerDRI_EASTUS,Link to Azure Machine Learning studio,Link to Documentation


### Choose the latest and completed run

In [5]:
# azureml.pipeline.core.run.PipelineRun
run = Run.list(experiment, status='Completed').__next__()
run

Experiment,Id,Type,Status,Details Page,Docs Page
fasttext_pipeline,0a6a356c-39e9-41b5-a486-af5a0ea72c05,azureml.PipelineRun,Completed,Link to Azure Machine Learning studio,Link to Documentation


### Get a PipelineRun object

In [6]:
run_id = run.id
# azureml.pipeline.wrapper.PipelineRun
pipeline_run = PipelineRun(experiment, run_id)
pipeline_run

Experiment,Id,Type,Status,Details Page,Docs Page
fasttext_pipeline,0a6a356c-39e9-41b5-a486-af5a0ea72c05,azureml.PipelineRun,Completed,Link to Azure Machine Learning studio,Link to Documentation


### Visualize the pipeline

In [7]:
pipeline_run.visualize()

<IPython.core.display.Javascript object>

ValidateView(container_id='container_id_f1fa523c-37e5-45f7-80a5-c27619c83479_widget', env_json='{}', graph_jso…

### Get a StepRun object

In [8]:
# obtain step_run_id from the visualization result.
step_run_id = '845ab0f8-eac2-49b4-b3fc-6cc6f1541ee7'
step_run = pipeline_run.get_step_run(step_run_id)
step_run

Experiment,Id,Type,Status,Details Page,Docs Page
fasttext_pipeline,845ab0f8-eac2-49b4-b3fc-6cc6f1541ee7,azureml.StepRun,Completed,Link to Azure Machine Learning studio,Link to Documentation


### Download the trained model from the port

In [9]:
# name will be from the visualization result.
# get_port() should supports three kinds of names: (1)the_better_model (2)The better model (3)The_better_model
port = step_run.get_port(name='The better model')
saved_path = port.download(overwrite=True)
print('model save at: {}'.format(saved_path))

Downloading azureml/845ab0f8-eac2-49b4-b3fc-6cc6f1541ee7/The_better_model/BestModel
Downloading azureml/845ab0f8-eac2-49b4-b3fc-6cc6f1541ee7/The_better_model/label.txt
Downloading azureml/845ab0f8-eac2-49b4-b3fc-6cc6f1541ee7/The_better_model/word_to_index.json
Downloaded azureml/845ab0f8-eac2-49b4-b3fc-6cc6f1541ee7/The_better_model/label.txt, 1 files out of an estimated total of 3
Downloaded azureml/845ab0f8-eac2-49b4-b3fc-6cc6f1541ee7/The_better_model/word_to_index.json, 2 files out of an estimated total of 3
Downloaded azureml/845ab0f8-eac2-49b4-b3fc-6cc6f1541ee7/The_better_model/BestModel, 3 files out of an estimated total of 3
model save at: /tmp/azureml/845ab0f8-eac2-49b4-b3fc-6cc6f1541ee7/The_better_model


### Register the trained model for deployment

In [10]:
model_path = saved_path
model = Model.register(workspace, model_path=saved_path, model_name='model_for_deployment')
model

Registering model model_for_deployment


Model(workspace=Workspace.create(name='DesignerDRI_EASTUS', subscription_id='74eccef0-4b8d-4f83-b5f9-fa100d155b22', resource_group='DesignerDRI'), name=model_for_deployment, id=model_for_deployment:7, version=7, tags={}, properties={})

### Get an Environment object for Inference Configuration

In [11]:
# yaml for inference has been updated manually.
update = False
env_list = Environment.list(workspace)
name = 'inference_environment'
# if the workspace doesn't contain the specific environment, then we'll register a new one.
if update or not name in env_list:
    # env_for_deployment.yaml describes dependencies your service need.
    file_path = 'deployment/inference_environment.yaml'
    env = Environment.from_conda_specification(name=name, file_path=file_path)
    env = env.register(workspace=workspace)
else:
    env = Environment.get(workspace=workspace, name=name)
env    

{
    "databricks": {
        "eggLibraries": [],
        "jarLibraries": [],
        "mavenLibraries": [],
        "pypiLibraries": [],
        "rcranLibraries": []
    },
    "docker": {
        "arguments": [],
        "baseDockerfile": null,
        "baseImage": "mcr.microsoft.com/azureml/intelmpi2018.3-ubuntu16.04:20200723.v1",
        "baseImageRegistry": {
            "address": null,
            "password": null,
            "registryIdentity": null,
            "username": null
        },
        "enabled": false,
        "platform": {
            "architecture": "amd64",
            "os": "Linux"
        },
        "sharedVolumes": true,
        "shmSize": null
    },
    "environmentVariables": {
        "EXAMPLE_ENV_VAR": "EXAMPLE_VALUE"
    },
    "inferencingStackVersion": null,
    "name": "inference_environment",
    "python": {
        "baseCondaEnvironment": null,
        "condaDependencies": {
            "channels": [
                "defaults",
                "pyt

### Get an InferenceConfig object for deployment
It represents configuration settings for a custom environment used for deployment


In [12]:
# entry_script defines the processing logic for the input
entry_script = 'fasttext_realtime_inference.py'
# source_directory is the path to the folder that contains all files to create the service image
source_directory = 'deployment'
inference_config = InferenceConfig(entry_script=entry_script, source_directory=source_directory,
                                   environment=env)
inference_config

InferenceConfig(entry_script=fasttext_realtime_inference.py, runtime=None, conda_file=None, extra_docker_file_steps=None, source_directory=/mnt/batch/tasks/shared/LS_root/mounts/clusters/my-compute/code/Users/t-yangx/azureml-designer-demo/deployment, enable_gpu=None, base_image=None, base_image_registry=<azureml.core.container_registry.ContainerRegistry object at 0x7feb667e8cc0>)

### Deploy to local

In [13]:
service_name = 'local-deploy-test'
models = [model]
port = 8892
deployment_config = LocalWebservice.deploy_configuration(port=port)
service_locally = Model.deploy(workspace=workspace, name=service_name, models=models, inference_config=inference_config,
                               deployment_config=deployment_config)
service_locally.wait_for_deployment(show_output=True)
print(service_locally.state)
print(service_locally.get_logs())

Downloading model model_for_deployment:7 to /tmp/azureml_olmbf24x/model_for_deployment/7
Generating Docker build context.
Package creation Succeeded
Logging into Docker registry designerdriea2302d01.azurecr.io
Logging into Docker registry designerdriea2302d01.azurecr.io
Building Docker image from Dockerfile...
Step 1/5 : FROM designerdriea2302d01.azurecr.io/azureml/azureml_5164d757b6d7d5b9d70d699f7cd05edd
 ---> fa1b2cdac02c
Step 2/5 : COPY azureml-app /var/azureml-app
 ---> 1693eb681539
Step 3/5 : RUN mkdir -p '/var/azureml-app' && echo eyJhY2NvdW50Q29udGV4dCI6eyJzdWJzY3JpcHRpb25JZCI6Ijc0ZWNjZWYwLTRiOGQtNGY4My1iNWY5LWZhMTAwZDE1NWIyMiIsInJlc291cmNlR3JvdXBOYW1lIjoiZGVzaWduZXJkcmkiLCJhY2NvdW50TmFtZSI6ImRlc2lnbmVyZHJpX2Vhc3R1cyIsIndvcmtzcGFjZUlkIjoiZjZjNzVkZmYtMmMzMS00NDg2LTg0Y2ItMWRlZDMzODhlMzM3In0sIm1vZGVscyI6e30sIm1vZGVsc0luZm8iOnt9fQ== | base64 --decode > /var/azureml-app/model_config_map.json
 ---> Running in 965ca8620871
 ---> 24a4f11b3a72
Step 4/5 : RUN mv '/var/azureml-app/tmppz14x

### Deploy to ACI (Azure Container Instances)

In [14]:
service_name = 'aci-deploy-test'
models = [model]
deployment_config = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb=1)
service_aci = Model.deploy(workspace, service_name, models=models, inference_config=inference_config,
                           deployment_config=deployment_config, overwrite=True)
service_aci.wait_for_deployment(show_output=True)
print(service_aci.state)

Running................
Succeeded
ACI service creation operation finished, operation "Succeeded"
Healthy


### Deploy to AKS (Azure Kubernetes Service)

In [15]:
# choose an inference cluster
name = 'attached-aks'
compute_target_name_list = [target.name for target in AksCompute.list(workspace)]
if not name in compute_target_name_list:
    aks_resource_group = 'AmlStudioV2DRI'
    cluster_name = 'aks-dev-6node33512023a'
    attach_config = AksCompute.attach_configuration(resource_group=aks_resource_group, cluster_name=cluster_name)
    aks_target = ComputeTarget.attach(workspace, name, attach_config)
    aks_target.wait_for_completion(show_output=True)
aks_target = AksCompute(workspace, name)
aks_target

AksCompute(workspace=Workspace.create(name='DesignerDRI_EASTUS', subscription_id='74eccef0-4b8d-4f83-b5f9-fa100d155b22', resource_group='DesignerDRI'), name=attached-aks, id=/subscriptions/74eccef0-4b8d-4f83-b5f9-fa100d155b22/resourceGroups/DesignerDRI/providers/Microsoft.MachineLearningServices/workspaces/DesignerDRI_EASTUS/computes/attached-aks, type=AKS, provisioning_state=Succeeded, location=eastus, tags=None)

In [16]:
# deploy to AKS (Azure Kubernetes Service)
service_name = 'aks-deploy-test'
models = [model]
# Only one type of Auth may be enabled
token_auth_enabled = True
auth_enabled = False if token_auth_enabled else True

deployment_config = AksWebservice.deploy_configuration(cpu_cores=1, memory_gb=1,
                                                       token_auth_enabled=token_auth_enabled,
                                                       auth_enabled=auth_enabled)
service_aks = Model.deploy(workspace, service_name, models, inference_config, deployment_config, aks_target, overwrite=True)
service_aks.wait_for_deployment(show_output=True)
print(service_aks.state)

Running.........
Succeeded
AKS service creation operation finished, operation "Succeeded"
Healthy


### Consume the service

In [17]:
# consume
import requests
import json
from azureml.core.authentication import InteractiveLoginAuthentication

# Get a token to authenticate to the compute instance from remote
interactive_auth = InteractiveLoginAuthentication()
auth_header = interactive_auth.get_authentication_header()

# Create and submit a request using the auth header
headers = auth_header
# Add content type header
headers.update({'Content-Type': 'application/json'})

#### consume the service deployed to local

In [18]:
# your input
standard_sample_input = {'param': {'input_sentence': '受疫情影响, 很多学生不得不在家上课'}}
standard_sample_input = json.dumps(standard_sample_input)

# consume the service deployed to local
service = service_locally

response = requests.post(service.scoring_uri, data=standard_sample_input, headers=headers)
print(service.scoring_uri)
print(response)
print(response.content)
print(response.json())

http://localhost:8892/score
<Response [200]>
b'"society"'
society


#### consume the service deployed to ACI

In [19]:
# your input
standard_sample_input = {'param': {'input_sentence': '股市有风险, 试水需谨慎'}}
standard_sample_input = json.dumps(standard_sample_input)

# consume the service deployed to ACI
service = service_aci

response = requests.post(service.scoring_uri, data=standard_sample_input, headers=headers)
print(service.scoring_uri)
print(response)
print(response.content)
print(response.json())

http://96c84315-acb4-4191-b4e8-ad3ccbd56315.eastus.azurecontainer.io/score
<Response [200]>
b'"stocks"'
stocks


#### consume the service deployed to AKS

In [20]:
# your input
standard_sample_input = {'param': {'input_sentence': '经常跑步能够增强心肺功能'}}
standard_sample_input = json.dumps(standard_sample_input)

# consume the service deployed to AKS
service = service_aks
token, refresh_by = service.get_token()
headers['Authorization']=f'Bearer {token}'


response = requests.post(service.scoring_uri, data=standard_sample_input, headers=headers)
print(service.scoring_uri)
print(response)
print(response.content)
print(response.json())

http://52.170.37.14:80/api/v1/service/aks-deploy-test/score
<Response [200]>
b'"science"'
science
