# Model Intepretability using GPU SHAP on Azure

[Model Interpretability](https://christophm.github.io/interpretable-ml-book/interpretability.html) aids in human understanding of the reasons behind a decision by a Machine Learning model. This can help data scientists understand models better and thus, lead to better solutions. [Shapley Values](https://christophm.github.io/interpretable-ml-book/shapley.html) is one way to explain models and in this notebook, we demonstrate model interpretability on Azure using [cuML GPU SHAP](https://docs.rapids.ai/api/cuml/stable/api.html#model-explainability).

To run the example on Azure, we'll use the [azure-interpret](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-machine-learning-interpretability) package that will internally use [interpret-community](https://github.com/interpretml/interpret-community) package which has GPU SHAP support.

In [None]:
# %%bash
# apt-get update && \
# apt-get install -y fuse && \
# apt-get install -y build-essential && \
# apt-get install -y python3-dev && \
# pip install azureml-core && \
# pip install azureml-interpret && \
# pip install -e git+https://github.com/interpretml/interpret-community.git#egg=interpret_community\&subdirectory=python && \
# pip install raiwidgets

In [None]:
# Check core SDK version number
import azureml.core

print("SDK version:", azureml.core.VERSION)

## Initialize workspace

There are two ways to set up the environment, and to load and initialize the workspace.

1. Follow instructions on [README.md](https://github.com/rapidsai/cloud-ml-examples/blob/main/azure/README.md) for creating the Machine Learning Workspace on Azure. Place the `config.json` file in the same folder and skip to <b>Load Workspace from Config</b>

2. Alternatively, you can use the following 2 cells to load an existing or create a new Workspace by updating the `subscription_id`, `resource_group`, `workspace_name` and `region`

In [None]:
# # Uncomment if you're using second method
# import os

# subscription_id = os.getenv("SUBSCRIPTION_ID", default="<subscription_ID>")
# resource_group = os.getenv("RESOURCE_GROUP", default="RAPIDS-SHAP")
# workspace_name = os.getenv("WORKSPACE_NAME", default="azure-intepret")
# workspace_region = os.getenv("WORKSPACE_REGION", default="eastus")

In [None]:
# # Uncomment if you're using second method
# from azureml.core import Workspace

# try:
#     ws = Workspace(subscription_id = subscription_id, resource_group = resource_group, workspace_name = workspace_name)
#     # write the details of the workspace to a configuration file to the notebook library
#     ws.write_config()
#     print("Workspace configuration succeeded.")
# except:
#     print("Workspace not accessible. Creting new workspace...")
#     from azureml.core import Workspace

#     # Create the workspace using the specified parameters
#     ws = Workspace.create(name = workspace_name,
#                           subscription_id = subscription_id,
#                           resource_group = resource_group, 
#                           location = workspace_region,
#                           create_resource_group = True,
#                           exist_ok = True)
#     ws.get_details()

#     # write the details of the workspace to a configuration file to the notebook library
#     ws.write_config()

### Load Workspace from Config

In [None]:
from azureml.core import Workspace

ws = Workspace.from_config()
print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\n')

datastore = ws.get_default_datastore()
print("Default datastore's name: {}".format(datastore.name))

## Create An Experiment

**Experiment** is a logical container in an Azure ML Workspace. It hosts run records which can include run metrics and output artifacts from your experiments.

In [None]:
from azureml.core import Experiment
experiment_name = 'gpu-shap-on-amlcompute'
experiment = Experiment(workspace=ws, name=experiment_name)

### Provision a compute target

You can provision an AmlCompute resource by simply defining two parameters thanks to smart defaults. By default it autoscales from 0 nodes and provisions dedicated VMs to run your job in a container. This is useful when you want to continously re-use the same target, debug it between jobs or simply share the resource with other users of your workspace.

* `vm_size`: VM family of the nodes provisioned by AmlCompute. RAPIDS requires NVIDIA Pascal or newer architecture, you will need to specify compute targets from one of `NC_v2`, `NC_v3`, `ND` or `ND_v2` [GPU virtual machines in Azure](https://docs.microsoft.com/en-us/azure/virtual-machines/sizes-gpu); these are VMs that are provisioned with P40 and V100 GPUs. Let's create an `AmlCompute` cluster of `Standard_NC6s_v3` GPU VMs
* `max_nodes`: Maximum nodes to autoscale to while running a job on AmlCompute

In [None]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# choose a name for your cluster
gpu_cluster_name = 'gpu-cluster'

if gpu_cluster_name in ws.compute_targets:
    gpu_cluster = ws.compute_targets[gpu_cluster_name]
    if gpu_cluster and type(gpu_cluster) is AmlCompute:
        print('Found compute target. Will use {0} '.format(gpu_cluster_name))
else:
    print('creating new cluster')
    # m_size parameter below could be modified to one of the RAPIDS-supported VM types
    provisioning_config = AmlCompute.provisioning_configuration(vm_size = 'Standard_NC6s_v3',
                                                                max_nodes = 1,
                                                                idle_seconds_before_scaledown = 300,
                                                                vm_priority = "lowpriority")
    # Use VM types with more than one GPU for multi-GPU option, e.g. Standard_NC12s_v3
    
    # create the cluster
    gpu_cluster = ComputeTarget.create(ws, gpu_cluster_name, provisioning_config)
    
    # can poll for a minimum number of nodes and for a specific timeout 
    # if no min node count is provided it uses the scale settings for the cluster
    gpu_cluster.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)
    
# use get_status() to get a detailed status for the current cluster 
print(gpu_cluster.get_status().serialize())

### Use Custom Docker Image

We'll set up using a custom Docker Image using the `Environment` class. This can be used to install other packages necessary for the run. 

In [None]:
from azureml.core import Environment

environment_name = "rapids"

env = Environment(environment_name)
env.docker.enabled = True
env.docker.base_image = None
#Installing interpret-community from source for now, will update later
env.docker.base_dockerfile = """
FROM rapidsai/rapidsai:21.06-cuda11.0-runtime-ubuntu18.04-py3.8
RUN apt-get update && \
apt-get install -y fuse && \
apt-get install -y build-essential && \
apt-get install -y python3-dev && \
source activate rapids && \
pip install azureml-defaults && \
pip install azureml-interpret && \
pip install -e git+https://github.com/interpretml/interpret-community.git#egg=interpret_community\&subdirectory=python && \
pip install azureml-telemetry
"""
env.python.user_managed_dependencies = True

### Create project directory

Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script, and any additional files your training script depends on.

The training script `train_explain.py` is already created for you. We'll move this to the project directory

In [None]:
import os
import shutil

project_folder = './scripts'
os.makedirs(project_folder, exist_ok=True)
shutil.copy('train_explain.py', project_folder)

In [None]:
from azureml.core import Run
from azureml.core import ScriptRunConfig

src = ScriptRunConfig(source_directory=project_folder, 
                      script='train_explain.py',
                      compute_target=gpu_cluster,
                      environment=env) 
run = experiment.submit(config=src)
run

Note: if you need to cancel a run, you can follow [these instructions](https://aka.ms/aml-docs-cancel-run).

In [None]:
%%time
# Shows output of the run on stdout.
run.wait_for_completion(show_output=True)

In [None]:
run.get_metrics()

## Download 
1. Download model explanation data.

In [None]:
from azureml.interpret import ExplanationClient

# Get model explanation data
client = ExplanationClient.from_run(run)
global_explanation = client.download_model_explanation()
local_importance_values = global_explanation.local_importance_values
expected_values = global_explanation.expected_values


In [None]:
# Or you can use the saved run.id to retrive the feature importance values
client = ExplanationClient.from_run_id(ws, experiment_name, run.id)
global_explanation = client.download_model_explanation()
local_importance_values = global_explanation.local_importance_values
expected_values = global_explanation.expected_values

In [None]:
# Get the top k (e.g., 4) most important features with their importance values
global_explanation_topk = client.download_model_explanation(top_k=4)
global_importance_values = global_explanation_topk.get_ranked_global_values()
global_importance_names = global_explanation_topk.get_ranked_global_names()

In [None]:
print('global importance values: {}'.format(global_importance_values))
print('global importance names: {}'.format(global_importance_names))

2. Download model file.

In [None]:
# Retrieve model for visualization and deployment
from azureml.core.model import Model
import joblib
original_model = Model(ws, 'model_explain_model_on_amlcomp')
model_path = original_model.download(exist_ok=True)
original_model = joblib.load(model_path)

3. Download test dataset.

In [None]:
# Retrieve x_test for visualization
import joblib
x_test_path = './x_test.pkl'
run.download_file('x_test_higgs.pkl', output_file_path=x_test_path)

In [None]:
x_test = joblib.load('x_test.pkl')

## Visualize
Load the visualization dashboard

In [None]:
from interpret_community.widget import ExplanationDashboard

In [None]:
import cupy as cp
ExplanationDashboard(global_explanation, original_model,
                     datasetX=cp.asnumpy(x_test.values[:50]))