Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/explain-model/explain-on-amlcompute/regression-sklearn-on-amlcompute.png)

# Interpretability With Tensorflow On Azure Machine Learning Service (Remote)


## Overview of Tutorial
This notebook is Part 4 (Explaining Your Model Using Interpretability) of a four part workshop that demonstrates an end-to-end workflow for using Tensorflow on Azure Machine Learning Service. The different components of the workshop are as follows:

- Part 1: [Preparing Data and Model Training](https://github.com/microsoft/bert-stack-overflow/blob/master/1-Training/AzureServiceClassifier_Training.ipynb)
- Part 2: [Inferencing and Deploying a Model](https://github.com/microsoft/bert-stack-overflow/blob/master/2-Inferencing/AzureServiceClassifier_Inferencing.ipynb)
- Part 3: [Setting Up a Pipeline Using MLOps](https://github.com/microsoft/bert-stack-overflow/tree/master/3-ML-Ops)
- Part 4: [Explaining Your Model Interpretability](https://github.com/microsoft/bert-stack-overflow/blob/master/4-Interpretibility/IBMEmployeeAttritionClassifier_Interpretability.ipynb)

_**This notebook showcases how to use the Azure Machine Learning Interpretability SDK to train and explain a binary classification model remotely on an Azure Machine Leanrning Compute Target (AMLCompute).**_

## Table of Contents

1. [Introduction](#Introduction)
1. [Setup](#Setup)
    1. Initialize a Workspace
    1. Create an Experiment
    1. Introduction to AmlCompute
    1. Submit an AmlCompute run 
1. Additional operations to perform on AmlCompute
1. [Download model explanations from Azure Machine Learning Run History](#Download)

## Introduction

This notebook showcases how to train and explain a binary classification model remotely via Azure Machine Learning Compute (AMLCompute), and download the calculated explanations locally on your personal machine.
It demonstrates the API calls that you need to make to submit a run for training and explaining a model to AMLCompute, and download the compute explanations remotely.

We will showcase one of the tabular data explainers: TabularExplainer (SHAP).

Problem: Employee Attrition Classification Problem



![](./images/interpretability-architecture.png)



## Change Tensorflow and Interpret Library Versions

We will be using an older version (1.14) for this particular tutorial in the series as Tensorflow 2.0 is not yet supported for Interpretibility on Azure Machine Learning service. We will also be using version 0.1.0.4 of the interpret library. 

If haven't already done so, please update your library versions.

In [23]:
%pip uninstall tensorflow-gpu keras --yes
%pip install tensorflow-gpu==1.14 interpret-community==0.1.0.4

Collecting azureml.contrib.interpret
  Using cached https://files.pythonhosted.org/packages/05/cf/05ff8cc39de0c97bc1fd564dc618a8256ff5b2f08446556fc73435e69652/azureml_contrib_interpret-1.0.69-py3-none-any.whl
Collecting interpret-community==0.1.0.2 (from azureml-interpret==1.0.69.*->azureml.contrib.interpret)
  Using cached https://files.pythonhosted.org/packages/8b/3b/a7eb6beac2d8b21ea442ffe73d90b236e89c97bc6e2c805bcb96ed2c0bdf/interpret_community-0.1.0.2-py3-none-any.whl




Installing collected packages: azureml.contrib.interpret, interpret-community
  Found existing installation: interpret-community 0.1.0.4
    Uninstalling interpret-community-0.1.0.4:
      Successfully uninstalled interpret-community-0.1.0.4
Successfully installed azureml.contrib.interpret interpret-community-0.1.0.2
Note: you may need to restart the kernel to use updated packages.


After installing packages, you must close and reopen the notebook as well as restarting the kernel.

Let's make sure we have the right verisons

In [59]:
import tensorflow as tf
import interpret_community

print(tf.version.VERSION)

1.14.0


In [60]:
# Check core SDK version number
import azureml.core

print("SDK version:", azureml.core.VERSION)

SDK version: 1.0.69


## Connect To Workspace

Just like in the previous tutorials, we will need to connect to a [workspace](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.workspace(class)?view=azure-ml-py).

The following code will allow you to create a workspace if you don't already have one created. You must have an Azure subscription to create a workspace:

```python
from azureml.core import Workspace
ws = Workspace.create(name='myworkspace',
                      subscription_id='<azure-subscription-id>',
                      resource_group='myresourcegroup',
                      create_resource_group=True,
                      location='eastus2')
```

**If you are running this on a Notebook VM, you can import the existing workspace.**

In [1]:
from azureml.core import Workspace

ws = Workspace.from_config()
print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\n')

107835-aml-ws
aml-rg-107835
southcentralus
07a3b836-0813-4c05-afd4-3a7ab00358d9


> **Note:** that the above commands reads a config.json file that exists by default within the Notebook VM. If you are running this locally or want to use a different workspace, you must add a config file to your project directory. The config file should have the following schema:

```
    {
        "subscription_id": "<SUBSCRIPTION-ID>",
        "resource_group": "<RESOURCE-GROUP>",
        "workspace_name": "<WORKSPACE-NAME>"
    }
```

## Create An Experiment

**Experiment** is a logical container in an Azure ML Workspace. It hosts run records which can include run metrics and output artifacts from your experiments.

In [2]:
from azureml.core import Experiment
experiment_name = 'explainer-remote-run-tfworld19'
experiment = Experiment(workspace=ws, name=experiment_name)

## Create Compute Target

A [compute target](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.computetarget?view=azure-ml-py) is a designated compute resource/environment where you run your training script or host your service deployment. This location may be your local machine or a cloud-based compute resource. Compute targets can be reused across the workspace for different runs and experiments. 

**If you completed tutorial 1 of this series, then you should have already created a compute target and can skip this step**

Otherwise, run the cell below to create an auto-scaling [Azure Machine Learning Compute](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.compute.amlcompute?view=azure-ml-py) cluster, which is a managed-compute infrastructure that allows the user to easily create a single or multi-node compute. To create the cluster, we need to specify the following parameters:

- `vm_size`: The is the type of GPUs that we want to use in our cluster. For this tutorial, we will use **Standard_NC12s_v3 (NVIDIA V100) GPU Machines** .
- `idle_seconds_before_scaledown`: This is the number of seconds before a node will scale down in our auto-scaling cluster. We will set this to **6000** seconds. 
- `min_nodes`: This is the minimum numbers of nodes that the cluster will have. To avoid paying for compute while they are not being used, we will set this to **0** nodes.
- `max_modes`: This is the maximum number of nodes that the cluster will scale up to. Will will set this to **2** nodes.

**When jobs are submitted to the cluster it takes approximately 5 minutes to allocate new nodes** 

In [3]:
from azureml.core.compute import AmlCompute, ComputeTarget

cluster_name = 'v100cluster'
compute_config = AmlCompute.provisioning_configuration(vm_size='Standard_NC12s_v3', 
                                                       idle_seconds_before_scaledown=6000,
                                                       min_nodes=0, 
                                                       max_nodes=2)

compute_target = ComputeTarget.create(ws, cluster_name, compute_config)
compute_target.wait_for_completion(show_output=True)

Succeeded
AmlCompute wait for completion finished
Minimum number of nodes requested have been provisioned


**If you already have the compute target created, then you can directly run this cell.**

In [4]:
compute_target = ws.compute_targets['v100cluster']

## Submit Experiment Run

Now that our compute is ready, we can begin to submit our run.

### Create project directory

Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script, and any additional files your training script depends on

In [5]:
import os
import shutil

project_folder = './TFworld-explainer-remote-run-on-amlcompute'
os.makedirs(project_folder, exist_ok=True)
shutil.copy('train_explain-model.py', project_folder)

'./TFworld-explainer-remote-run-on-amlcompute/train_explain-model.py'

### Submit job to cluster

In [6]:
from azureml.core.runconfig import RunConfiguration
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.runconfig import DEFAULT_CPU_IMAGE

# create a new runconfig object
run_config = RunConfiguration()

# signal that you want to use our cluster to execute script.
run_config.target = compute_target

# enable Docker 
run_config.environment.docker.enabled = True

# set Docker base image to the default CPU-based image
run_config.environment.docker.base_image = DEFAULT_CPU_IMAGE

# use conda_dependencies.yml to create a conda environment in the Docker image for execution
run_config.environment.python.user_managed_dependencies = False


pip_packages = [
        'azureml-defaults', 'azureml-core', 'azureml-telemetry',
        'azureml-dataprep', 'sklearn', 'sklearn-pandas', 'tensorflow==1.14.0',
        'azureml-contrib-interpret', 'azureml-interpret'
]


index_url = 'https://azuremlsdktestpypi.azureedge.net/sdk-release/Candidate/604C89A437BA41BD942B4F46D9A3591D/'
run_config.environment.python.conda_dependencies = CondaDependencies.create(pip_packages=pip_packages,
                                                                            pin_sdk_version=False, 
                                                                            pip_indexurl=index_url)



# Now submit a run on AmlCompute
from azureml.core.script_run_config import ScriptRunConfig

script_run_config = ScriptRunConfig(source_directory=project_folder,
                                    script='train_explain-model.py',
                                    run_config=run_config)

run = experiment.submit(script_run_config)

# Show run details
run

Experiment,Id,Type,Status,Details Page,Docs Page
explainer-remote-run-tfworld19,explainer-remote-run-tfworld19_1572371711_23a86907,azureml.scriptrun,Starting,Link to Azure Portal,Link to Documentation


Note: if you need to cancel a run, you can follow [these instructions](https://aka.ms/aml-docs-cancel-run).

In [7]:
from azureml.widgets import RunDetails
RunDetails(run).show()

_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', '…

## Download 
Download model explanation data.

In [8]:
from azureml.contrib.interpret.explanation.explanation_client import ExplanationClient

# Get model explanation data
client = ExplanationClient.from_run(run)
global_explanation = client.download_model_explanation()
local_importance_values = global_explanation.local_importance_values
expected_values = global_explanation.expected_values


AttributeError: 'NoneType' object has no attribute 'local_importance_values'

In [None]:
# Or you can use the saved run.id to retrive the feature importance values
client = ExplanationClient.from_run_id(ws, experiment_name, run.id)
global_explanation = client.download_model_explanation()
local_importance_values = global_explanation.local_importance_values
expected_values = global_explanation.expected_values

In [None]:
# Get the top k (e.g., 4) most important features with their importance values
global_explanation_topk = client.download_model_explanation(top_k=4)
global_importance_values = global_explanation_topk.get_ranked_global_values()
global_importance_names = global_explanation_topk.get_ranked_global_names()

In [None]:
print('global importance values: {}'.format(global_importance_values))
print('global importance names: {}'.format(global_importance_names))

In [78]:
# Get the top k (e.g., 4) most important features with their importance values
global_explanation = client.download_model_explanation()
global_importance_values = global_explanation.get_ranked_global_values()
global_importance_names = global_explanation.get_ranked_global_names()

In [79]:
print('global importance values: {}'.format(global_importance_values))
print('global importance names: {}'.format(global_importance_names))

global importance values: [0.03999193825435882, 0.031464853252919395, 0.02760060189002182, 0.02456048385916402, 0.02258847868616334, 0.022035631877649273, 0.020207199700294994, 0.020114264275080666, 0.019977203398518766, 0.019378221757655807, 0.0170920545860935, 0.015884334486342123, 0.01586617894132825, 0.013408480313220787, 0.012007829060718718, 0.011751867982559418, 0.011463914137255388, 0.011372582275944364, 0.01032799605886111, 0.010208142977892548, 0.010004060575232718, 0.009948627988387974, 0.008879074315566815, 0.008483158706590357, 0.007304193859418363, 0.0056123067810683784, 0.005469643877212432, 0.005419917839260414, 0.004600788428833444, 0.0044333516100935]
global importance names: ['OverTime', 'MaritalStatus', 'EducationField', 'JobRole', 'YearsInCurrentRole', 'Age', 'JobSatisfaction', 'EnvironmentSatisfaction', 'RelationshipSatisfaction', 'TrainingTimesLastYear', 'JobInvolvement', 'WorkLifeBalance', 'DistanceFromHome', 'Department', 'NumCompaniesWorked', 'JobLevel', 'Stoc

In [80]:
print(global_explanation.get_feature_importance_dict())

{'OverTime': 0.03999193825435882, 'MaritalStatus': 0.031464853252919395, 'EducationField': 0.02760060189002182, 'JobRole': 0.02456048385916402, 'YearsInCurrentRole': 0.02258847868616334, 'Age': 0.022035631877649273, 'JobSatisfaction': 0.020207199700294994, 'EnvironmentSatisfaction': 0.020114264275080666, 'RelationshipSatisfaction': 0.019977203398518766, 'TrainingTimesLastYear': 0.019378221757655807, 'JobInvolvement': 0.0170920545860935, 'WorkLifeBalance': 0.015884334486342123, 'DistanceFromHome': 0.01586617894132825, 'Department': 0.013408480313220787, 'NumCompaniesWorked': 0.012007829060718718, 'JobLevel': 0.011751867982559418, 'StockOptionLevel': 0.011463914137255388, 'YearsSinceLastPromotion': 0.011372582275944364, 'YearsWithCurrManager': 0.01032799605886111, 'MonthlyIncome': 0.010208142977892548, 'TotalWorkingYears': 0.010004060575232718, 'DailyRate': 0.009948627988387974, 'YearsAtCompany': 0.008879074315566815, 'BusinessTravel': 0.008483158706590357, 'Gender': 0.007304193859418363