
# Real-time inferencing & Deployment

[More info on HOw and when to deploy](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-deploy-and-where?tabs=azcli) - includes info on which compute target to choose

In machine learning, *Inferencing* refers to the use of a trained model to predict labels for new data on which the model has not been trained. Often, the model is deployed as part of a service that enables applications to request immediate, or *real-time*, predictions for individual or small numbers of data observations.

In Azure ML, you can create real-time inferencing solutions by deploying a model as a service, hosted in a containerized platform such as Azure Kubernetes Services (AKS).

## Learning objectives
In this module, you will learn how to:
- Deploy a model as a real-time inferencing service.
- Consume a real-time inferencing service.
- Troubleshoot service deployment

# Tasks to deploying a real-time inferencing service

## 1. Register a trained model
After successful training, you must register it in your Azure Machine Learning workspace. THe real- time service will then be able to load the model when required. Use the `register` method of the `Model` object as shown below

In [None]:
from azureml.core import Model

classification_model = Model.register(workspace=ws,
                                      model_name = 'classification_model',
                                      model_path='modelpkl', # local path
                                      description = 'A classification model')

Or if you have a reference to the `Run` used to train the model, you can use it `register_model` method here

In [None]:
run.register_model( model_name='classification_model',
                    model_path='outputs/model.pkl', # run outputs path
                    description='A classification model')

## 2. Define an Inference COnfiguration
The model will be deployed as a service that consists of
- a script to load the model and return predictions for submitted data
- an environment in which the script will be run
You must therefore define the script and environment.
### Steps
1. Create an *entry script*( or *scoring script*) -The entry script receives data submitted to a deployed web service and passes it to the model. It then takes the response returned by the model and returns that to the client. The script is specific to your model. It must understand the data that the model expects and returns. must be a .py file and requires
  - `init()`: called when service initialized, loads model from model registry
  - `run(raw_data)`: called when new data is submitted to the service. generate prediction from the input data

In [None]:
import json
import joblib
import numpy as np
from azureml.core.model import Model

# Called when the service is loaded
def init():
    global model
    # Get the path to the registered model file and load it
    model_path = Model.get_model_path('classification_model')
    model = joblib.load(model_path)

# Called when a request is received
def run(raw_data):
    # Get the input data as a numpy array
    data = np.array(json.loads(raw_data)['data'])
    # Get a prediction from the model
    predictions = model.predict(data)
    # Return the predictions as any JSON serializable format
    return predictions.tolist()

2. Creating an environment - easy way to do this is use a `CondaDependencies` class to create a default environment (which includes the `azureml-defaults` package and commonly-used packages like `numpy` and `pandas`)

In [1]:
from azureml.core.conda_dependencies import CondaDependencies

# Add the dependencies for your model
myenv = CondaDependencies()
myenv.add_conda_package("scikit-learn")

# Save the environment config as a .yml file
env_file = 'service_files/env.yml'
with open(env_file,"w") as f:
    f.write(myenv.serialize_to_string())
print("Saved dependency info in", env_file)

KeyboardInterrupt: 

3. Combining the Script and Environment in an InferenceConfig - An inference configuration describes how to set up the web-service containing your model. It's used later, when you deploy the model.Inference configuration uses Azure Machine Learning environments to define the software dependencies needed for your deployment. Environments allow you to create, manage, and reuse the software dependencies required for training and deployment.  you can combine the entry script and environment in an `InferenceConfig` for the service like this:

In [2]:
from azureml.core.model import InferenceConfig

classifier_inference_config = InferenceConfig(runtime= "python",
                                              source_directory = 'service_files',
                                              entry_script="score.py",
                                              conda_file="env.yml")

ERROR - source directory C:\Users\psych\Jupyter Projects\ms-learn-DP100\service_files doesn't exist. 



WebserviceException: WebserviceException:
	Message: source directory C:\Users\psych\Jupyter Projects\ms-learn-DP100\service_files doesn't exist. 
	InnerException None
	ErrorResponse 
{
    "error": {
        "message": "source directory C:\\Users\\psych\\Jupyter Projects\\ms-learn-DP100\\service_files doesn't exist. "
    }
}

### Define a Deployment Configuration

With entry script and environment defined, now configure the compute to which service will be deployed. If you don't have one, you must create an AKS (or ACI if testing and dev) cluster and compute target before deploying:

In [None]:
from azureml.core.compute import ComputeTarget, AksCompute

cluster_name = 'aks-cluster'
compute_config = AksCompute.provisioning_configuration(location='eastus')
production_cluster = ComputeTarget.create(ws, cluster_name, compute_config)
production_cluster.wait_for_completion(show_output=True)

With compute target created, now you can define the deployment config, whcih sets the target-specific compute specification for the containerized deployment

In [None]:
from azureml.core.webservice import AksWebservice

classifier_deploy_config = AksWebservice.deploy_configuration(cpu_cores = 1,
                                                              memory_gb = 1)

To configure ACI deployment is similar, but you don't need to explicity create an ACI compute target and you must use the `deploy_configuration` class from the `azureml.core.webservice.AciWebservice` namespace. Similarly you can use the `azureml.core.webservice.LocalWebservice` namespace to configure a local Docker-based service.

`note`: To deploy a model to an Azure FUnction, you do not need to create a deployment configuration. Instead you need to package the model based on the type of function trigger you want to use. This functionality is still in preview as of `august 15 2020`.

4. Deploy the model - easiest way after all this prep is to call the `deploy` method of the `Model` class:

In [None]:
from azureml.core.model import Model

model = ws.models['classification_model']
service = Model.deploy(workspace=ws,
                       name = 'classifier-service',
                       models = [model],
                       inference_config = classifier_inference_config,
                       deployment_config = classifier_deploy_config,
                       deployment_target = production_cluster)
service.wait_for_deployment(show_output = True)

FOr ACI or local services, you can omit the `deployment_target` parameter (or set to `none`)

## Consuming a Real-time inferencing service

### Using Azure ML SDK

FOr testing, use this through the `run` method of a `WebService` object that rerefences the deployed service. Typically you send data to the `run` method in JSON format with the following structure:

In [None]:
{
  "data":[
      [0.1,2.3,4.1,2.0], // 1st case
      [0.2,1.8,3.9,2.1],  // 2nd case,
      ...
  ]
}

THe following calls the service and displays a resonse

In [None]:
import json

# An array of new data cases
x_new = [[0.1,2.3,4.1,2.0],
         [0.2,1.8,3.9,2.1]]

# Convert the array to a serializable list in a JSON document
json_data = json.dumps({"data": x_new})

# Call the web service, passing the input data
response = service.run(input_data = json_data)

# Get the predictions
predictions = json.loads(response)

# Print the predicted class for each case.
for i in range(len(x_new)):
    print (x_new[i]), predictions[i] )

### Using a REST Endpoint
In production, most client applications will not include the Azure Machine Learning SDK, and will consume the service through its REST interface. YOu can determine the endpoint of a deployed service in Azure machine learning studio, or by retriving the `scoring_uri` property of th e`Webservice` object in the SDK like this:

In [None]:
endpoint = service.scoring_uri
print(endpoint)

With endpoint known, you can use an HTTP POST request with JSON data to call the service, the following is an example in python:

In [None]:
import requests
import json

# An array of new data cases
x_new = [[0.1,2.3,4.1,2.0],
         [0.2,1.8,3.9,2.1]]

# Convert the array to a serializable list in a JSON document
json_data = json.dumps({"data": x_new})

# Set the content type in the request headers
request_headers = { 'Content-Type':'application/json' }

# Call the service
response = requests.post(url = endpoint,
                         data = json_data,
                         headers = request_headers)

# Get the predictions from the JSON response
predictions = json.loads(response.json())

# Print the predicted class for each case.
for i in range(len(x_new)):
    print (x_new[i]), predictions[i] )

### Authentication

In production, you'll want to restrict access to your services by applying authentication. Two kinds:
- **Key**: Requests are authenticated by specifying the key associated with the service.
- **Token**: Requests are authenticated by providing a JSON Web Token (JWT)

Authentication is **disabled** by default for ACI services, and set to key-based authentication for AKS services (for which primary and secondary keys are automatically generated). YOu can configure AKS to use token based (ACI doesn't support token based)

Once authenticated:

In [None]:
primary_key, secondary_key = service.get_keys()

TO make an authenticated call to the service's REST endpoint:

In [None]:
import requests
import json

# An array of new data cases
x_new = [[0.1,2.3,4.1,2.0],
         [0.2,1.8,3.9,2.1]]

# Convert the array to a serializable list in a JSON document
json_data = json.dumps({"data": x_new})

# Set the content type in the request headers
request_headers = { "Content-Type":"application/json",
                    "Authorization":"Bearer " + key_or_token }

# Call the service
response = requests.post(url = endpoint,
                         data = json_data,
                         headers = request_headers)

# Get the predictions from the JSON response
predictions = json.loads(response.json())

# Print the predicted class for each case.
for i in range(len(x_new)):
    print (x_new[i]), predictions[i] )

## Troubleshooting Service Deployment
Lots of elements to a real-time service deployment, including the trained model, the runtime environment configuration, the scoring script, the container image, and the container host. Troubleshooting a failed deployment, or an error when consuming a deployed service can be complex.

Here are some things that can be done to check

### 1. Check the Service State
As an initial step, check status of service by examining its **state**. Below we show it for the ACI webservice, but for AKS you can use `AksWebservice`. FOr an operational service, the state should be Healthy.

In [None]:
from azureml.core.webservice import AksWebservice

# Get the deployed service
service = AciWebservice(name='classifier-service', workspace=ws)

# Check its state
print(service.state)

### 2. Review Service Logs
If service not healthy, or experiencing errors, review logs

In [None]:
print(service.get_logs())

The logs include detailed information about the provisioning of the service, and the requests it has processed; and can often provide an insight into the cause of unexpected errors.

### Deploy to a Local Container
Deployment and runtime errors can be easier to diagnose by deploying the service as a container in a local Docker instance like this:

In [None]:
from azureml.core.webservice import LocalWebservice

deployment_config = LocalWebservice.deploy_configuration(port=8890)
service = Model.deploy(ws, 'test-svc', [model], inference_config, deployment_config)

YOu can then test locally deployed service using the SDK:

In [None]:
print(service.run(input_data = json_data))

You can then troubleshoot runtime issues by making changes to the scoring file that is referenced in the inference configuration, and reloading the service without redeploying it (something you can only do with a local service):

In [None]:
service.reload()
print(service.run(input_data = json_data))