# 03 - Deploying a model to Azure Container Instance

Now that we have trained a set of models and identified the run containing the best model, we want to deploy the model for real time inferencing.  The process of deploying a model involves
* registering a model in your workspace
* creating a scoring file containing init and run methods
* retrieving the environment to run the model
* creating "InferenceConfig" and "DeploymentConfig" objects
* _Optionally:_ You can "Profile" the model
* deploying the model as a Docker image to the deployment target.

In this lab, we'll create an Azure Container Instance (ACI) deployed model. This is most suitable for dev/test workloads.

In [3]:
import environs

e_vars = environs.Env()
e_vars.read_env('../workshop.env')

USER_NAME = e_vars.str("USER_NAME")
EXPERIMENT_NAME = e_vars.str('EXPERIMENT_NAME')
ENVIRONMENT_NAME = e_vars.str("ENVIRONMENT_NAME")
DATASET_NAME = e_vars.str("DATASET_NAME")

SERVICE_NAME = e_vars.str("SERVICE_NAME")
MODEL_NAME = e_vars.str("MODEL_NAME")

if not USER_NAME:
    raise NotImplementedError("Please enter your username in the `.env` file and run this cell again.")

NotImplementedError: Please enter your username in the `.env` file and run this cell again.

In [2]:
from azureml.core import Workspace, Experiment

ws = Workspace.from_config()
experiment = Experiment(ws, EXPERIMENT_NAME)

### Find the Best Run
We can use the SDK to search through our runs to determine which was the best run. In our case, we'll use RMSE to determine the best metric.

In [None]:
from tqdm import tqdm

def find_best_run(experiment, metric, goal='minimize'):
     runs = {}
     run_metrics = {}
    
     # Create dictionaries containing the runs and the metrics for all runs containing the metric
     for r in tqdm(experiment.get_runs(include_children=True)):
         metrics = r.get_metrics()
         if metric in metrics.keys():
             runs[r.id] = r
             run_metrics[r.id] = metrics
            
     if goal == 'minimize':
         min_run = min(run_metrics, key=lambda k: run_metrics[k][metric])
         return runs[min_run]
     else:
         max_run = max(run_metrics, key=lambda k: run_metrics[k][metric])
         return runs[max_run]


In [None]:
best_run = find_best_run(experiment, 'rmse', 'minimize')

# Display the metrics
best_run.get_metrics()

### Register a model from best run
We have already identified which run contains the "best model" by our evaluation criteria.  Each run has a file structure associated with it that contains various files collected during the run.  Since a run can have many outputs we need to tell AML which file from those outputs represents the model that we want to use for our deployment.  We can use the `run.get_file_names()` method to list the files associated with the run, and then use the `run.register_model()` method to place the model in the workspace's model registry.

When using `run.register_model()` we supply a `model_name` that is meaningful for our scenario and the `model_path` of the model relative to the run.  In this case, the model path is what is returned from `run.get_file_names()`

In [None]:
# View the files in the run
for f in best_run.get_file_names():
    if 'logs' not in f:
        print(f)
    
# Register the model with the workspace
model = best_run.register_model(model_name=MODEL_NAME, model_path='outputs/model.pkl')

Once a model is registered, it is accessible from the list of models on the AML workspace.  If you register models with the same name multiple times, AML keeps a version history of those models for you.  The `Model.list()` lists all models in a workspace, and can be filtered by name, tags, or model properties.   

In [None]:
# Find all models called "diabetes_regression_model" and display their version numbers
from azureml.core.model import Model

models = Model.list(ws, name=MODEL_NAME)
for m in models:
    print(m.name, m.version)

### Create a scoring file

In [None]:
%%writefile score.py
import pickle
import json
import numpy as np
import joblib
import os

def init():
    global model
    # note here "best_model" is the name of the model registered under the workspace
    # this call should return the path to the model.pkl file on the local disk.
    model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'model.pkl')
    # deserialize the model file back into a sklearn model
    model = joblib.load(model_path)


# note you can pass in multiple rows for scoring
def run(raw_data):
    try:
        data = json.loads(raw_data)['data']
        data = np.array(data)
        result = model.predict(data)

        # you can return any data type as long as it is JSON-serializable
        return result.tolist()
    except Exception as e:
        result = str(e)
        return result

### Create Inference Config

In [None]:
from azureml.core.model import InferenceConfig

environ = ws.environments[ENVIRONMENT_NAME]

inference_cfg = InferenceConfig(entry_script='score.py', environment=environ)

### Create Deployment Config

In [None]:
from azureml.core.webservice import AciWebservice

aci_cfg = AciWebservice.deploy_configuration(cpu_cores=0.5, 
                                               memory_gb=0.5, 
                                               tags={'disease': 'diabetes', 
                                                     'target': 'blood_sugar'}, 
                                               description='Diabetes Regression Model',
                                               auth_enabled=True)

### Deploy your webservice
**Note:** The web service creation can take several minutes.  

In [None]:
%%time
from azureml.core.webservice import Webservice

# Create the webservice using all of the precreated configurations and our best model
service = Model.deploy(workspace=ws, 
                       name=SERVICE_NAME,
                       deployment_config=aci_cfg,
                       models=[model],
                       inference_config=inference_cfg,
                       overwrite=True)

# Wait for the service deployment to complete while displaying log output
service.wait_for_deployment(show_output=True)


### Test your webservice

In [None]:
# Load test data
import pandas as pd
from sklearn.model_selection import train_test_split

diabetes_df = pd.read_parquet('../../data/diabetes.parquet')
y = diabetes_df.pop('target').values
X = diabetes_df.values

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

In [None]:
Webser

In [None]:
import json
# scrape the first row from the test set.
test_samples = json.dumps({"data": X_test[0:1, :].tolist()})

#score on our service
service.run(input_data = test_samples)

This cell shows how you can send multiple rows to the webservice at once.

In [None]:
# score 5 rows from the test set.
test_samples = json.dumps({'data': X_test.tolist()[:5]})

service.run(input_data = test_samples)

This cell shows how you can use the `service.scoring_uri` property to access the HTTP endpoint of the service and call it using standard POST operations.

In [None]:
import requests

# use the first row from the test set again
test_samples = json.dumps({"data": X_test[0:1, :].tolist()})

# create the required header
headers = {'Content-Type':'application/json', "Authorization": f"Bearer {service.get_keys()[0]}"}

print(f"POST request:")
print(f"    URL: {service.scoring_uri}")
print(f"    Headers:")
print(f"         Authorization: {headers['Authorization']}")
print(f"          Content-Type: {headers['Content-Type']}")
print("     Content:")
print(f"          {test_samples}")
print()
print()

# post the request to the service and display the result
resp = requests.post(service.scoring_uri, test_samples, headers = headers)

print(f"Response from Webservice: {resp.text}")

### Clean up

Delete the ACI instance to stop the compute and any associated billing.

In [None]:
%%time
service.delete()

<br><br><br><br><br>






###### Copyright (c) Microsoft Corporation. All rights reserved.  
###### Licensed under the MIT License.