# AML end to end enterprise readiness scenario

This notebook shows the steps for end to end enterprise readiness scenario for AML i.e. training and inference in a secure manner. Please make sure you have an [AML workspace](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-workspace#create-a-workspace) before you proceed with this notebook. We will cover following steps in this notebook:

1. Create a virtual network (VNet)
2. Setup Network Security Group (NSG) rules
3. Create Machine Learning Compute inside the VNet
4. Create AKS cluster inside the VNet
5. Create a DSVM inside the VNet
6. Put storage and key vault behind the VNet
7. Train on AML Compute inside the VNet (secure training)
8. Deploy model as a service on AKS cluster using SSL and auth settings (secure inference)
9. Consume the service in a secure manner (using SSL and auth keys/token)
10. Setup role based access for your workspace
11. Monitor e2e ML lifecycle and set alerts

## 1. Create a virtual network (VNet)

Create a VNet with a subnet as shown in the image below. Make sure VNet is linked to NSG as shown in the screenshot.



## 2. Setup Network Security Group (NSG) rules

NSG rules should be set as shown in the screenshots below:

<img src="nsg-rules.png" alt="Screenshot showing NSG rules" title="NSG rules" />

## 3. Create Machine Learning Compute inside the VNet

Use the code snippet below or UI (screenshot below) to create AML Compute inside the VNet/Subnet created above

<img src="aml-compute-creation.png" alt="Image showing AML Compute creation behind VNet" title="Create AML Compute behind VNet" />

In [None]:
from azureml.core import Workspace

# initialize the workspace
ws = Workspace.from_config()
print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\n')

In [None]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# The Azure virtual network name, subnet, and resource group
vnet_name = 'AMLVNetDemo'
subnet_name = 'default'
vnet_resourcegroup_name = 'AMLServiceRG'

# Choose a name for your CPU cluster
cpu_cluster_name = "vnetcluster"

# Verify that cluster does not exist already
try:
    cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)
    print("Found existing cpucluster")
except ComputeTargetException:
    print("Creating new cpucluster")

    # Specify the configuration for the new cluster
    compute_config = AmlCompute.provisioning_configuration(vm_size="STANDARD_D2_V2",
                                                           min_nodes=0,
                                                           max_nodes=4,
                                                           vnet_resourcegroup_name=vnet_resourcegroup_name,
                                                           vnet_name=vnet_name,
                                                           subnet_name=subnet_name)

    # Create the cluster with the specified name and configuration
    cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)

    # Wait for the cluster to be completed, show the output log
    cpu_cluster.wait_for_completion(show_output=True)

## 4. Create AKS cluster inside the VNet

Use the code snippet below or UI (screenshot below) to create AKS cluster inside the VNet/Subnet created above

<img src="aks-cluster-creation.png" alt="Image showing AKS cluster creation behind VNet" title="Create AKS cluster behind VNet" />

In [None]:
from azureml.core.compute import ComputeTarget, AksCompute

aks_cluster_name = 'aksvnetcluster1'

# Verify that cluster does not exist already
try:
    aks_target = ComputeTarget(workspace=ws, name=aks_cluster_name)
    print("Found existing aks cluster")
except ComputeTargetException:
    print("Creating new aks cluster")

    # Create the compute configuration and set virtual network information
    config = AksCompute.provisioning_configuration(location="eastus")
    config.vnet_resourcegroup_name = "AMLServiceRG"
    config.vnet_name = "AMLVNetDemo"
    config.subnet_name = "default"
    config.service_cidr = "10.0.0.0/16"
    config.dns_service_ip = "10.0.0.10"
    config.docker_bridge_cidr = "172.17.0.1/16"

    # Create the compute target
    aks_target = ComputeTarget.create(workspace=ws,
                                      name=aks_cluster_name,
                                      provisioning_configuration=config)

In [None]:
%%time
aks_target.wait_for_completion(show_output = True)
print(aks_target.provisioning_state)
print(aks_target.provisioning_errors)

### Enable SSL on the AKS Cluster (optional)

In [None]:
# bring your own SSL cert
# provisioning_config.enable_ssl(ssl_cert_pem_file="cert.pem",
#                                     ssl_key_pem_file="key.pem", ssl_cname="www.contoso.com")

# use a cert from Microsoft
# provisioning_config.enable_ssl(leaf_domain_label = "myservice")

## 5. Create a DSVM inside the VNet

Follow instructions [here](https://docs.microsoft.com/en-us/azure/virtual-network/quick-create-portal) to create a DSVM inside the same VNet/Subnet we created above. Once VM is created setup a dev environment on the VM so that you can use jupyter notebooks from inside the VM. These jupyter notebooks will be user to run training and inference inside the VNet.

NOTE: This experience will soon be replaced by managed compute instances that will work behind the VNet.

<font color=green>Up to this point we have created VNet/Subnet, setup NSG rules, created AML Compute (training compute target), AKS cluster (inference compute target), and DSVM (dev environment) behind the VNet. We are now ready to do end to end ML behind the VNet.</font>

## 6. Put storage and key vault behind the VNet

Follow the links to put [storage](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-enable-virtual-network#use-a-storage-account-for-your-workspace) and [key vault](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-enable-virtual-network#use-a-key-vault-instance-with-your-workspace) that are associated with the AML workspace behind the VNet.

## 7. Train on AML Compute inside the VNet (secure training)

We will now train a model inside the VNet/Subnet created above. We will use DSVM running inside the VNet as our dev environment. This DSVM will use AML python SDK to execute training jobs on AML compute that is running inside the same VNet as well.

In [None]:
# create an experiment
from azureml.core import Experiment
experiment_name = 'train-on-amlcompute'
experiment = Experiment(workspace = ws, name = experiment_name)

In [None]:
import os
import shutil

# create a project directory
project_folder = './train-on-amlcompute'
os.makedirs(project_folder, exist_ok=True)
shutil.copy('train.py', project_folder)

In [None]:
import azureml.core
print(azureml.core.VERSION)

In [None]:
from azureml.train.estimator import Estimator

pip_packages = [
    'azureml-defaults', 'azureml-contrib-explain-model', 'azureml-core', 'azureml-telemetry',
    'azureml-explain-model', 'sklearn-pandas', 'azureml-dataprep'
]

estimator = Estimator(source_directory=project_folder, 
                      compute_target=cpu_cluster,
                      entry_script='train.py',
                      pip_packages=pip_packages,
                      conda_packages=['scikit-learn'],
                      inputs=[ws.datasets['IBM-Employee-Attrition'].as_named_input('attrition')])

run = experiment.submit(estimator)
run

In [None]:
from azureml.widgets import RunDetails
RunDetails(run).show()

## 8. Deploy model as a service on AKS cluster using SSL and auth (secure inference)

### Prepare to deploy

To deploy the model, you need the following items:

- **An entry script**, this script accepts requests, scores the requests by using the model, and returns the results.
- **Dependencies**, like helper scripts or Python/Conda packages required to run the entry script or model.
- **The deployment configuration** for the compute target that hosts the deployed model. This configuration describes things like memory and CPU requirements needed to run the model.

In [None]:
%%writefile score.py

import pandas as pd
from sklearn.externals import joblib
from azureml.core.model import Model

import json
import pickle
import numpy as np

from inference_schema.schema_decorators import input_schema, output_schema
from inference_schema.parameter_types.numpy_parameter_type import NumpyParameterType
from inference_schema.parameter_types.pandas_parameter_type import PandasParameterType


input_sample = pd.DataFrame(data=[{'Age': 41, 'BusinessTravel': 'Travel_Rarely', 'DailyRate': 1102, 'Department': 'Sales', 'DistanceFromHome': 1, 'Education': 2, 'EducationField': 'Life Sciences', 'EnvironmentSatisfaction': 2, 'Gender': 'Female', 'HourlyRate': 94, 'JobInvolvement': 3, 'JobLevel': 2, 'JobRole': 'Sales Executive', 'JobSatisfaction': 4, 'MaritalStatus': 'Single', 'MonthlyIncome': 5993, 'MonthlyRate': 19479, 'NumCompaniesWorked': 8, 'OverTime': 'No', 'PercentSalaryHike': 11, 'PerformanceRating': 3, 'RelationshipSatisfaction': 1, 'StockOptionLevel': 0, 'TotalWorkingYears': 8, 'TrainingTimesLastYear': 0, 'WorkLifeBalance': 1, 'YearsAtCompany': 6, 'YearsInCurrentRole': 4, 'YearsSinceLastPromotion': 0, 'YearsWithCurrManager': 5}])
output_sample = np.array([0])


def init():
    global model
    # This name is model.id of model that we want to deploy deserialize the model file back
    # into a sklearn model
    model_path = Model.get_model_path('IBM_attrition_model')
    model = joblib.load(model_path)


@input_schema('data', PandasParameterType(input_sample))
@output_schema(NumpyParameterType(output_sample))
def run(data):
    try:
        result = model.predict(data)
        return json.dumps({"result": result.tolist()})
    except Exception as e:
        result = str(e)
        return json.dumps({"error": result})

### Automatic schema generation
To automatically generate a schema for your web service, provide a sample of the input and/or output in the constructor for one of the defined type objects. The type and sample are used to automatically create the schema. Azure Machine Learning then creates an OpenAPI (Swagger) specification for the web service during deployment.
To use schema generation, include the _inference-schema_ package in your Conda environment file.

### Define dependencies

The following YAML is the Conda dependencies file we will use for inference. If you want to use automatic schema generation, your entry script must import the inference-schema packages.

In [None]:
%%writefile myenv.yml

name: project_environment
dependencies:
- python=3.6.2

- pip:
  - sklearn-pandas
  - azureml-defaults
  - azureml-core
  - inference-schema[numpy-support]
- scikit-learn
- pandas

In [None]:
from azureml.core import Environment

# Instantiate environment
myenv = Environment.from_conda_specification(name = "myenv",
                                             file_path = "myenv.yml")

### Define your inference configuration

The inference configuration describes how to configure the model to make predictions. This configuration isn't part of your entry script. It references your entry script and is used to locate all the resources required by the deployment. It's used later, when you deploy the model.

In [None]:
from azureml.core.model import InferenceConfig

inference_config = InferenceConfig(entry_script='score.py', environment=myenv)

### Define your deployment configuration

Before deploying your model, you must define the deployment configuration. The deployment configuration is specific to the compute target that will host the web service. The deployment configuration isn't part of your entry script. It's used to define the characteristics of the compute target that will host the model and entry script.

In [None]:
from azureml.core.webservice import Webservice, AksWebservice

# Set the web service configuration (using default here)
# aks_config = AksWebservice.deploy_configuration(tags={'area': "HR", 'type': "attrition"}, 
#                                               description='Explain predictions on employee attrition')

# Enable token auth and disable (key) auth on the webservice
aks_config = AksWebservice.deploy_configuration(tags={'area': "HR", 'type': "attrition"}, 
                                               description='Explain predictions on employee attrition', 
                                                token_auth_enabled=True, auth_enabled=False)


### Deploy Model as Webservice on AKS cluster

Deployment uses the inference configuration deployment configuration to deploy the models. The deployment process is similar regardless of the compute target.

In [None]:
%%time

from azureml.core.model import Model

aks_service_name ='aks-attrition-svc'

attrition_model = Model(ws, 'IBM_attrition_model')

aks_service = Model.deploy(workspace=ws,
                           name=aks_service_name,
                           models=[attrition_model],
                           inference_config=inference_config,
                           deployment_config=aks_config,
                           deployment_target=aks_target)


aks_service.wait_for_deployment(True)
print(aks_service.state)

### Web service schema

If you used automatic schema generation with your deployment, you can get the address of the OpenAPI specification for the service by using the swagger_uri property. (For example, print(service.swagger_uri).) Use a GET request or open the URI in a browser to retrieve the specification.

In [None]:
print(aks_service.swagger_uri)

## 9. Consume the service in a secure manner (using SSL and auth keys/token)

In [None]:
# # if (key) auth is enabled, retrieve the API keys. AML generates two keys.
# key1, Key2 = aks_service.get_keys()
# print(key1)

# if token auth is enabled, retrieve the token.
access_token, refresh_after = aks_service.get_token()

In [None]:
# construct raw HTTP request and send to the service
import pandas as pd

import requests

import json

# the sample below contains the data for an employee that is not an attrition risk
sample = pd.DataFrame(data=[{'Age': 41, 'BusinessTravel': 'Travel_Rarely', 'DailyRate': 1102, 'Department': 'Sales', 'DistanceFromHome': 1, 'Education': 2, 'EducationField': 'Life Sciences', 'EnvironmentSatisfaction': 2, 'Gender': 'Female', 'HourlyRate': 94, 'JobInvolvement': 3, 'JobLevel': 2, 'JobRole': 'Sales Executive', 'JobSatisfaction': 4, 'MaritalStatus': 'Single', 'MonthlyIncome': 5993, 'MonthlyRate': 19479, 'NumCompaniesWorked': 8, 'OverTime': 0, 'PercentSalaryHike': 11, 'PerformanceRating': 3, 'RelationshipSatisfaction': 1, 'StockOptionLevel': 0, 'TotalWorkingYears': 8, 'TrainingTimesLastYear': 0, 'WorkLifeBalance': 1, 'YearsAtCompany': 6, 'YearsInCurrentRole': 4, 'YearsSinceLastPromotion': 0, 'YearsWithCurrManager': 5}])

# the sample below contains the data for an employee that is an attrition risk
# sample = pd.DataFrame(data=[{'Age': 49, 'BusinessTravel': 'Travel_Rarely', 'DailyRate': 1098, 'Department': 'Research & Development', 'DistanceFromHome': 4, 'Education': 2, 'EducationField': 'Medical', 'EnvironmentSatisfaction': 4, 'Gender': 'Female', 'HourlyRate': 21, 'JobInvolvement': 3, 'JobLevel': 2, 'JobRole': 'Laboratory Technician', 'JobSatisfaction': 3, 'MaritalStatus': 'Single', 'MonthlyIncome': 711, 'MonthlyRate': 2124, 'NumCompaniesWorked': 8, 'OverTime': 1, 'PercentSalaryHike': 8, 'PerformanceRating': 4, 'RelationshipSatisfaction': 3, 'StockOptionLevel': 0, 'TotalWorkingYears': 2, 'TrainingTimesLastYear': 0, 'WorkLifeBalance': 3, 'YearsAtCompany': 2, 'YearsInCurrentRole': 1, 'YearsSinceLastPromotion': 0, 'YearsWithCurrManager': 1}])

# converts the sample to JSON string
sample = pd.DataFrame.to_json(sample)

# deserializes sample to a python object 
sample = json.loads(sample)

# serializes sample to JSON formatted string as expected by the scoring script
sample = json.dumps({"data":sample})

# If (key) auth is enabled, don't forget to add key to the HTTP header.
# headers = {'Content-Type':'application/json', 'Authorization': 'Bearer ' + key1}

# If token auth is enabled, don't forget to add token to the HTTP header.
headers = {'Content-Type':'application/json', 'Authorization': 'Bearer ' + access_token}

resp = requests.post(aks_service.scoring_uri, sample, headers=headers)

print("prediction:", resp.text)

### (Optional) Use run function that internally handles auth

In [None]:
import json
import pandas as pd

# the sample below contains the data for an employee that is not an attrition risk
sample = pd.DataFrame(data=[{'Age': 41, 'BusinessTravel': 'Travel_Rarely', 'DailyRate': 1102, 'Department': 'Sales', 'DistanceFromHome': 1, 'Education': 2, 'EducationField': 'Life Sciences', 'EnvironmentSatisfaction': 2, 'Gender': 'Female', 'HourlyRate': 94, 'JobInvolvement': 3, 'JobLevel': 2, 'JobRole': 'Sales Executive', 'JobSatisfaction': 4, 'MaritalStatus': 'Single', 'MonthlyIncome': 5993, 'MonthlyRate': 19479, 'NumCompaniesWorked': 8, 'OverTime': 0, 'PercentSalaryHike': 11, 'PerformanceRating': 3, 'RelationshipSatisfaction': 1, 'StockOptionLevel': 0, 'TotalWorkingYears': 8, 'TrainingTimesLastYear': 0, 'WorkLifeBalance': 1, 'YearsAtCompany': 6, 'YearsInCurrentRole': 4, 'YearsSinceLastPromotion': 0, 'YearsWithCurrManager': 5}])

# the sample below contains the data for an employee that is an attrition risk
# sample = pd.DataFrame(data=[{'Age': 49, 'BusinessTravel': 'Travel_Rarely', 'DailyRate': 1098, 'Department': 'Research & Development', 'DistanceFromHome': 4, 'Education': 2, 'EducationField': 'Medical', 'EnvironmentSatisfaction': 4, 'Gender': 'Female', 'HourlyRate': 21, 'JobInvolvement': 3, 'JobLevel': 2, 'JobRole': 'Laboratory Technician', 'JobSatisfaction': 3, 'MaritalStatus': 'Single', 'MonthlyIncome': 711, 'MonthlyRate': 2124, 'NumCompaniesWorked': 8, 'OverTime': 1, 'PercentSalaryHike': 8, 'PerformanceRating': 4, 'RelationshipSatisfaction': 3, 'StockOptionLevel': 0, 'TotalWorkingYears': 2, 'TrainingTimesLastYear': 0, 'WorkLifeBalance': 3, 'YearsAtCompany': 2, 'YearsInCurrentRole': 1, 'YearsSinceLastPromotion': 0, 'YearsWithCurrManager': 1}])


# converts the sample to JSON string
sample = pd.DataFrame.to_json(sample)

# deserializes sample to a python object 
sample = json.loads(sample)

# serializes sample to JSON formatted string as expected by the scoring script
sample = json.dumps({"data":sample})

prediction = aks_service.run(sample)

print(prediction)

## 10. Setup role based access for your workspace

Follow the steps [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-assign-roles) to enable custom roles on your workspace.

## 11. Monitoring and alerts

Monitor e2e ML lifecycle and setup alerts using Azure Monitor. Check metrics section under AML workspace in Azure Portal.