Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/explain-model/azure-integration/remote-explanation/explain-model-on-amlcompute.png)

# Train and explain models remotely via Azure Machine Learning Compute


_**This notebook showcases how to use the Azure Machine Learning Interpretability SDK to train and explain a regression model remotely on an Azure Machine Leanrning Compute Target (AMLCompute).**_




## Table of Contents

1. [Introduction](#Introduction)
1. [Setup](#Setup)
    1. Initialize a Workspace
    1. Create an Experiment
    1. Introduction to AmlCompute
    1. Submit an AmlCompute run in a few different ways
        1. Option 1: Provision as a run based compute target 
        1. Option 2: Provision as a persistent compute target (Basic)
        1. Option 3: Provision as a persistent compute target (Advanced)
1. Additional operations to perform on AmlCompute
1. [Download model explanations from Azure Machine Learning Run History](#Download)
1. [Visualize explanations](#Visualize)
1. [Next steps](#Next)

## Introduction

This notebook showcases how to train and explain a regression model remotely via Azure Machine Learning Compute (AMLCompute), and download the calculated explanations locally for visualization.
It demonstrates the API calls that you need to make to submit a run for training and explaining a model to AMLCompute, download the compute explanations remotely, and visualizing the global and local explanations via a visualization dashboard that provides an interactive way of discovering patterns in model predictions and downloaded explanations.

We will showcase one of the tabular data explainers: TabularExplainer (SHAP).

![](./images/explanations-run-history.png)

## Setup

In [1]:
# Check core SDK version number
import azureml.core

print("SDK version:", azureml.core.VERSION)

SDK version: 1.4.0


## Initialize a Workspace

Connect to the workspace

In [3]:
from azureml.core import Workspace, Dataset

#ws = Workspace.from_config()
from azureml.core.authentication import InteractiveLoginAuthentication

interactive_auth = InteractiveLoginAuthentication(tenant_id="19479f88-8eac-45d2-a1bf-69d33854a3fa")
# Get Workspace defined in by default config.json file
# ws = Workspace.from_config()
ws = Workspace(subscription_id="5e22d967-997b-49c7-8ca1-7ccfbf37e621",
               resource_group="rg-cbui-course532",
               workspace_name="amlwksphol",
               auth=interactive_auth)
print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\n')

amlwksphol
rg-cbui-course532
westus
5e22d967-997b-49c7-8ca1-7ccfbf37e621


## Create An Experiment

**Experiment** is a logical container in an Azure ML Workspace. It hosts run records which can include run metrics and output artifacts from your experiments.

In [4]:
from azureml.core import Experiment
experiment_name = 'explainer-remote-run-on-amlcompute'
experiment = Experiment(workspace=ws, name=experiment_name)

## Introduction to AmlCompute

Azure Machine Learning Compute is managed compute infrastructure that allows the user to easily create single to multi-node compute of the appropriate VM Family. It is created **within your workspace region** and is a resource that can be used by other users in your workspace. It autoscales by default to the max_nodes, when a job is submitted, and executes in a containerized environment packaging the dependencies as specified by the user. 

Since it is managed compute, job scheduling and cluster management are handled internally by Azure Machine Learning service. 

For more information on Azure Machine Learning Compute, please read [this article](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)

**Note**: As with other Azure services, there are limits on certain resources (for eg. AmlCompute quota) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota.



## Submit an AmlCompute run in a few different ways

First lets check which VM families are available in your region. Azure is a regional service and some specialized SKUs (especially GPUs) are only available in certain regions. Since AmlCompute is created in the region of your workspace, we will use the supported_vms () function to see if the VM family we want to use ('STANDARD_D2_V2') is supported.

You can also pass a different region to check availability and then re-create your workspace in that region through the [configuration notebook](../../../configuration.ipynb)

### Create project directory

Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script, and any additional files your training script depends on

In [5]:
import os
import shutil

project_folder = './explainer-remote-run-on-amlcompute'
os.makedirs(project_folder, exist_ok=True)
shutil.copy('train_explain.py', project_folder)

'./explainer-remote-run-on-amlcompute/train_explain.py'

Note: if you need to cancel a run, you can follow [these instructions](https://aka.ms/aml-docs-cancel-run).

### Fetch or create the compute target 

We are going to use the compute target you had created before (make sure you provide the same name here in the variable `cpu_cluster_name`. 

In [6]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# Reference your existing cluster
cpu_cluster_name = "cpucluster"

cpu_cluster = ws.compute_targets[cpu_cluster_name]

The above code will throw an error if you didn't set the cluster up in the first part of the workshop. In that case, you can provision a cluster here by executing the next cell.

**Note:** By default a cluster autoscales from 0 nodes and provisions dedicated VMs to run your job in a container. This is useful when you want to continously re-use the same target, debug it between jobs or simply share the resource with other users of your workspace.

* `vm_size`: VM family of the nodes provisioned by AmlCompute. Simply choose from the supported_vmsizes() above
* `max_nodes`: Maximum nodes to autoscale to while running a job on AmlCompute

In [7]:
# If you didn't create the cluster in the first part of the workshop, this code will create it
#compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',
#                                                       max_nodes=8, 
#                                                       idle_seconds_before_scaledown=7200)
#cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)
#cpu_cluster.wait_for_completion(show_output=True)

### Configure & Run

In [9]:
from azureml.train.estimator import Estimator

pip_packages = [
                'azureml-defaults==1.0.76', 'azureml-core==1.0.76', 'azureml-telemetry==1.0.76',
                'azureml-dataprep==1.1.31', 'joblib==0.14.0', 'sklearn-pandas==1.7.0', 'pandas==0.23.4',
                'azureml-contrib-interpret'
               ]

estimator = Estimator(source_directory=project_folder, 
                      compute_target=cpu_cluster,
                      entry_script='train_explain.py',
                      pip_packages=pip_packages,
                      conda_packages=['scikit-learn==0.20.3'],
                      inputs=[ws.datasets['employeeattrition'].as_named_input('attrition')])

run = experiment.submit(estimator)
run

Experiment,Id,Type,Status,Details Page,Docs Page
explainer-remote-run-on-amlcompute,explainer-remote-run-on-amlcompute_1589985163_b05fc2f2,azureml.scriptrun,Starting,Link to Azure Machine Learning studio,Link to Documentation


In [10]:
from azureml.widgets import RunDetails
RunDetails(run).show()

_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', '…

![](images/aml-run.png)

## (OPTIONAL) Additional operations to perform on AmlCompute Cluster

You can perform more operations on AmlCompute such as updating the node counts or deleting the compute. 

In [11]:
# Get_status () gets the latest status of the AmlCompute target
cpu_cluster.get_status().serialize()

{'currentNodeCount': 0,
 'targetNodeCount': 0,
 'nodeStateCounts': {'preparingNodeCount': 0,
  'runningNodeCount': 0,
  'idleNodeCount': 0,
  'unusableNodeCount': 0,
  'leavingNodeCount': 0,
  'preemptedNodeCount': 0},
 'allocationState': 'Steady',
 'allocationStateTransitionTime': '2020-05-18T21:15:21.645000+00:00',
 'errors': None,
 'creationTime': '2020-05-18T19:32:16.878038+00:00',
 'modifiedTime': '2020-05-18T19:33:04.220891+00:00',
 'provisioningState': 'Succeeded',
 'provisioningStateTransitionTime': None,
 'scaleSettings': {'minNodeCount': 0,
  'maxNodeCount': 4,
  'nodeIdleTimeBeforeScaleDown': 'PT600S'},
 'vmPriority': 'Dedicated',
 'vmSize': 'STANDARD_DS2_V2'}

In [12]:
# The update() call takes in the min_nodes, max_nodes and idle_seconds_before_scaledown and updates the AmlCompute target
# cpu_cluster.update(min_nodes=1)
# cpu_cluster.update(max_nodes=10)
#cpu_cluster.update(idle_seconds_before_scaledown=7200)
# cpu_cluster.update(min_nodes=2, max_nodes=4, idle_seconds_before_scaledown=600)

In [13]:
# Delete() is used to deprovision and delete the AmlCompute target. Useful if you want to re-use the compute name 
# 'cpu-cluster' in this case but use a different VM family for instance.

# cpu_cluster.delete()

In [14]:
run.wait_for_completion(show_output=True)

RunId: explainer-remote-run-on-amlcompute_1589985163_b05fc2f2
Web View: https://ml.azure.com/experiments/explainer-remote-run-on-amlcompute/runs/explainer-remote-run-on-amlcompute_1589985163_b05fc2f2?wsid=/subscriptions/5e22d967-997b-49c7-8ca1-7ccfbf37e621/resourcegroups/rg-cbui-course532/workspaces/amlwksphol

Streaming azureml-logs/20_image_build_log.txt

2020/05/20 14:33:02 Downloading source code...
2020/05/20 14:33:03 Finished downloading source code
2020/05/20 14:33:03 Creating Docker network: acb_default_network, driver: 'bridge'
2020/05/20 14:33:04 Successfully set up Docker network: acb_default_network
2020/05/20 14:33:04 Setting up Docker configuration...
2020/05/20 14:33:04 Successfully set up Docker configuration
2020/05/20 14:33:04 Logging in to registry: amlwkspholef3210b7.azurecr.io
2020/05/20 14:33:06 Successfully logged into amlwkspholef3210b7.azurecr.io
2020/05/20 14:33:06 Executing step ID: acb_step_0. Timeout(sec): 5400, Working directory: '', Network: 'acb_default_

  Downloading adal-1.2.3-py2.py3-none-any.whl (53 kB)
Collecting SecretStorage
  Downloading SecretStorage-3.1.2-py3-none-any.whl (14 kB)
Collecting ndg-httpsclient
  Downloading ndg_httpsclient-0.5.1-py3-none-any.whl (34 kB)
Collecting urllib3>=1.23
  Downloading urllib3-1.25.9-py2.py3-none-any.whl (126 kB)
Collecting azure-mgmt-storage>=1.5.0
  Downloading azure_mgmt_storage-10.0.0-py2.py3-none-any.whl (532 kB)
Collecting msrestazure>=0.4.33
  Downloading msrestazure-0.6.3-py2.py3-none-any.whl (40 kB)
Collecting azure-mgmt-keyvault>=0.40.0
  Downloading azure_mgmt_keyvault-2.2.0-py2.py3-none-any.whl (89 kB)
Collecting jsonpickle
  Downloading jsonpickle-1.4.1-py2.py3-none-any.whl (36 kB)
Collecting jmespath
  Downloading jmespath-0.10.0-py2.py3-none-any.whl (24 kB)
Collecting docker
  Downloading docker-4.2.0-py2.py3-none-any.whl (143 kB)
Collecting ruamel.yaml<=0.15.89,>=0.15.35
  Downloading ruamel.yaml-0.15.89-cp36-cp36m-manylinux1_x86_64.whl (651 kB)
Collecting pyopenssl
  Downlo

[91m
[0m#
# To activate this environment, use:
# > source activate /azureml-envs/azureml_e158b0e389821d00d72b205b3dcc7bbc
#
# To deactivate an active environment, use:
# > source deactivate
#


Removing intermediate container e7988d994d3a
 ---> f56388afa940
Step 9/14 : ENV PATH /azureml-envs/azureml_e158b0e389821d00d72b205b3dcc7bbc/bin:$PATH
 ---> Running in 912fea67a6f9
Removing intermediate container 912fea67a6f9
 ---> eace2f76f8f7
Step 10/14 : ENV AZUREML_CONDA_ENVIRONMENT_PATH /azureml-envs/azureml_e158b0e389821d00d72b205b3dcc7bbc
 ---> Running in 208e8633435d
Removing intermediate container 208e8633435d
 ---> 398981bc674f
Step 11/14 : ENV LD_LIBRARY_PATH /azureml-envs/azureml_e158b0e389821d00d72b205b3dcc7bbc/lib:$LD_LIBRARY_PATH
 ---> Running in 90c5cf205241
Removing intermediate container 90c5cf205241
 ---> bdf59197b159
Step 12/14 : COPY azureml-environment-setup/spark_cache.py azureml-environment-setup/log4j.properties /azureml-environment-setup/
 ---> ea2b3f45d789
Step 13/14 


Streaming azureml-logs/75_job_post-tvmps_88bb717ed42f063078197b445b5195f23f59fe98e9e8917b1ce2d97490180688_d.txt

Entering job release. Current time:2020-05-20T14:44:53.720428
Starting job release. Current time:2020-05-20T14:44:54.714006
Logging experiment finalizing status in history service.
Starting the daemon thread to refresh tokens in background for process with pid = 301
Entering context manager injector. Current time:2020-05-20T14:44:54.727724
Job release is complete. Current time:2020-05-20T14:44:55.974819

Execution Summary
RunId: explainer-remote-run-on-amlcompute_1589985163_b05fc2f2
Web View: https://ml.azure.com/experiments/explainer-remote-run-on-amlcompute/runs/explainer-remote-run-on-amlcompute_1589985163_b05fc2f2?wsid=/subscriptions/5e22d967-997b-49c7-8ca1-7ccfbf37e621/resourcegroups/rg-cbui-course532/workspaces/amlwksphol



{'runId': 'explainer-remote-run-on-amlcompute_1589985163_b05fc2f2',
 'target': 'cpucluster',
 'status': 'Completed',
 'startTimeUtc': '2020-05-20T14:43:07.908376Z',
 'endTimeUtc': '2020-05-20T14:45:00.869349Z',
 'properties': {'_azureml.ComputeTargetType': 'amlcompute',
  'ContentSnapshotId': '21580dda-ce18-4709-803a-d936627634ec',
  'AzureML.DerivedImageName': 'azureml/azureml_25f143aa9c70f6d08bf525d14da8e04f',
  'ProcessInfoFile': 'azureml-logs/process_info.json',
  'ProcessStatusFile': 'azureml-logs/process_status.json',
  'model_type': 'classification',
  'explainer': 'tabular'},
 'inputDatasets': [{'dataset': {'id': 'd7b9cba4-3789-4b54-8fa8-a227c6da81fa'}, 'consumptionDetails': {'type': 'RunInput', 'inputName': 'attrition', 'mechanism': 'Direct'}}],
 'runDefinition': {'script': 'train_explain.py',
  'useAbsolutePath': False,
  'arguments': [],
  'sourceDirectoryDataStore': None,
  'framework': 'Python',
  'communicator': 'None',
  'target': 'cpucluster',
  'dataReferences': {},
  

## Download 
### 1. Download model explanation data.

In [17]:
from azureml.contrib.interpret.explanation.explanation_client import ExplanationClient

# Get model explanation data
client = ExplanationClient.from_run(run)
#global_explanation = client.download_model_explanation()

In [18]:
# Get the top k (e.g., 4) most important features with their importance values
global_explanation_topk = client.download_model_explanation(top_k=4)
global_importance_values = global_explanation_topk.get_ranked_global_values()
global_importance_names = global_explanation_topk.get_ranked_global_names()

UserErrorException: UserErrorException:
	Message: File with path explanation/97c2ffda-5e06-4065-8ab0-e88662f0b319/rich_metadata.interpret.json was not found,
available files include: azureml-logs/20_image_build_log.txt,azureml-logs/55_azureml-execution-tvmps_88bb717ed42f063078197b445b5195f23f59fe98e9e8917b1ce2d97490180688_d.txt,azureml-logs/65_job_prep-tvmps_88bb717ed42f063078197b445b5195f23f59fe98e9e8917b1ce2d97490180688_d.txt,azureml-logs/70_driver_log.txt,azureml-logs/75_job_post-tvmps_88bb717ed42f063078197b445b5195f23f59fe98e9e8917b1ce2d97490180688_d.txt,azureml-logs/process_info.json,azureml-logs/process_status.json,explanation/97c2ffda/classes.interpret.json,explanation/97c2ffda/eval_data_viz.interpret.json,explanation/97c2ffda/expected_values.interpret.json,explanation/97c2ffda/features.interpret.json,explanation/97c2ffda/global_names/0.interpret.json,explanation/97c2ffda/global_rank/0.interpret.json,explanation/97c2ffda/global_values/0.interpret.json,explanation/97c2ffda/local_importance_values.interpret.json,explanation/97c2ffda/per_class_names/0.interpret.json,explanation/97c2ffda/per_class_rank/0.interpret.json,explanation/97c2ffda/per_class_values/0.interpret.json,explanation/97c2ffda/rich_metadata.interpret.json,explanation/97c2ffda/visualization_dict.interpret.json,explanation/97c2ffda/ys_pred_proba_viz.interpret.json,explanation/97c2ffda/ys_pred_viz.interpret.json,logs/azureml/102_azureml.log,logs/azureml/azureml.log,original_model.pkl,outputs/log_reg.pkl,outputs/x_test.pkl,x_test_ibm.pkl.
	InnerException None
	ErrorResponse 
{
    "error": {
        "code": "UserError",
        "message": "File with path explanation/97c2ffda-5e06-4065-8ab0-e88662f0b319/rich_metadata.interpret.json was not found,\navailable files include: azureml-logs/20_image_build_log.txt,azureml-logs/55_azureml-execution-tvmps_88bb717ed42f063078197b445b5195f23f59fe98e9e8917b1ce2d97490180688_d.txt,azureml-logs/65_job_prep-tvmps_88bb717ed42f063078197b445b5195f23f59fe98e9e8917b1ce2d97490180688_d.txt,azureml-logs/70_driver_log.txt,azureml-logs/75_job_post-tvmps_88bb717ed42f063078197b445b5195f23f59fe98e9e8917b1ce2d97490180688_d.txt,azureml-logs/process_info.json,azureml-logs/process_status.json,explanation/97c2ffda/classes.interpret.json,explanation/97c2ffda/eval_data_viz.interpret.json,explanation/97c2ffda/expected_values.interpret.json,explanation/97c2ffda/features.interpret.json,explanation/97c2ffda/global_names/0.interpret.json,explanation/97c2ffda/global_rank/0.interpret.json,explanation/97c2ffda/global_values/0.interpret.json,explanation/97c2ffda/local_importance_values.interpret.json,explanation/97c2ffda/per_class_names/0.interpret.json,explanation/97c2ffda/per_class_rank/0.interpret.json,explanation/97c2ffda/per_class_values/0.interpret.json,explanation/97c2ffda/rich_metadata.interpret.json,explanation/97c2ffda/visualization_dict.interpret.json,explanation/97c2ffda/ys_pred_proba_viz.interpret.json,explanation/97c2ffda/ys_pred_viz.interpret.json,logs/azureml/102_azureml.log,logs/azureml/azureml.log,original_model.pkl,outputs/log_reg.pkl,outputs/x_test.pkl,x_test_ibm.pkl."
    }
}

In [None]:
print('global importance values: {}'.format(global_importance_values))
print('global importance names: {}'.format(global_importance_names))

## See model's files

In [None]:
print(run.get_file_names())

### 2. Download model and test set files

In [None]:
# retrieve model for visualization and deployment
from azureml.core.model import Model
import joblib

# Download test dataset file
run.download_file('x_test_ibm.pkl')
x_test = joblib.load('x_test_ibm.pkl')

# Download trained model
run.download_file('original_model.pkl')
original_model = joblib.load('original_model.pkl')

original_model

## Visualize
Load the visualization dashboard (**currently broken in JupyterLab**)

In [None]:
from interpret_community.widget import ExplanationDashboard

In [None]:
ExplanationDashboard(global_explanation, original_model, datasetX=x_test)