Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

# Explain binary classification model predictions
_**This notebook showcases how to use the Azure Machine Learning Interpretability SDK to explain and visualize a binary classification model predictions.**_


## Table of Contents

1. [Introduction](#Introduction)
1. [Setup](#Setup)
1. [Run model explainer locally at training time](#Explain)
    1. Train a binary classification model
    1. Explain the model
        1. Generate global explanations
        1. Generate local explanations
1. [Visualize results](#Visualize)
1. [Next steps](#Next)

## Introduction

This notebook illustrates how to explain a binary classification model predictions locally at training time without contacting any Azure services.
It demonstrates the API calls that you need to make to get the global and local explanations and a visualization dashboard that provides an interactive way of discovering patterns in data and explanations.

We will showcase three tabular data explainers: TabularExplainer (SHAP), MimicExplainer (global surrogate), and PFIExplainer.

| ![Interpretability Toolkit Architecture](./img/interpretability-architecture.png) |
|:--:|
| *Interpretability Toolkit Architecture* |

Problem: Breast cancer diagnosis classification with scikit-learn (run model explainer locally)

1. Train a SVM classification model using Scikit-learn
2. Run 'explain_model' globally and locally with full dataset in local mode, which doesn't contact any Azure services.
3. Visualize the global and local explanations with the visualization dashboard.
---

Setup: If you are using Jupyter notebooks, the extensions should be installed automatically with the package.
If you are using Jupyter Labs run the following command:
```
(myenv) $ jupyter labextension install @jupyter-widgets/jupyterlab-manager
```


## Explain

### Run model explainer locally at training time

In [1]:
from sklearn.datasets import load_breast_cancer
from sklearn import svm

# Explainers:
# 1. SHAP Tabular Explainer
#from interpret.ext.blackbox import TabularExplainer
from interpret.ext.blackbox import TabularExplainer

# OR

# 2. Mimic Explainer
#from interpret.ext.blackbox import MimicExplainer

# You can use one of the following four interpretable models as a global surrogate to the black box model
#from interpret.ext.glassbox import LGBMExplainableModel
#from interpret.ext.glassbox import LinearExplainableModel
#from interpret.ext.glassbox import SGDExplainableModel
#from interpret.ext.glassbox import DecisionTreeExplainableModel

# OR

# 3. PFI Explainer
#from interpret.ext.blackbox import PFIExplainer 

Could not import lightgbm, required if using LGBMExplainableModel
Could not import lightgbm, required if using LGBMExplainableModel
Could not import lightgbm, required if using LGBMExplainableModel
Could not import lightgbm, required if using LGBMExplainableModel
Could not import lightgbm, required if using LGBMExplainableModel


### Load the breast cancer diagnosis data

In [2]:
breast_cancer_data = load_breast_cancer()
classes = breast_cancer_data.target_names.tolist()

In [3]:
# Split data into train and test
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(breast_cancer_data.data, breast_cancer_data.target, test_size=0.2, random_state=0)

### Train a SVM classification model, which you want to explain

In [4]:
clf = svm.SVC(gamma=0.001, C=100., probability=True)
model = clf.fit(x_train, y_train)

### Explain predictions on your local machine

In [5]:
# 1. Using SHAP TabularExplainer
explainer = TabularExplainer(model, 
                             x_train, 
                             features=breast_cancer_data.feature_names, 
                             classes=classes)




# 2. Using MimicExplainer
# augment_data is optional and if true, oversamples the initialization examples to improve surrogate model accuracy to fit original model.  Useful for high-dimensional data where the number of rows is less than the number of columns. 
# max_num_of_augmentations is optional and defines max number of times we can increase the input data size.
# LGBMExplainableModel can be replaced with LinearExplainableModel, SGDExplainableModel, or DecisionTreeExplainableModel
# explainer = MimicExplainer(model, 
#                            x_train, 
#                            LGBMExplainableModel, 
#                            augment_data=True, 
#                            max_num_of_augmentations=10, 
#                            features=breast_cancer_data.feature_names, 
#                            classes=classes)





# 3. Using PFIExplainer

# Use the parameter "metric" to pass a metric name or function to evaluate the permutation. 
# Note that if a metric function is provided a higher value must be better.
# Otherwise, take the negative of the function or set the parameter "is_error_metric" to True.
# Default metrics: 
# F1 Score for binary classification, F1 Score with micro average for multiclass classification and
# Mean absolute error for regression

# explainer = PFIExplainer(model, 
#                          features=breast_cancer_data.feature_names, 
#                          classes=classes)



### Generate global explanations
Explain overall model predictions (global explanation)

In [6]:
# Passing in test dataset for evaluation examples - note it must be a representative sample of the original data
# x_train can be passed as well, but with more examples explanations will take longer although they may be more accurate
global_explanation = explainer.explain_global(x_test)

# Note: if you used the PFIExplainer in the previous step, use the next line of code instead
# global_explanation = explainer.explain_global(x_test, true_labels=y_test)

HBox(children=(IntProgress(value=0, max=114), HTML(value='')))

  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change f

  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change f

  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change f

  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change f

  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change f

  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change f

  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change f

  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change f

  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change f

  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change f




In [7]:
# Sorted SHAP values
print('ranked global importance values: {}'.format(global_explanation.get_ranked_global_values()))
# Corresponding feature names
print('ranked global importance names: {}'.format(global_explanation.get_ranked_global_names()))
# Feature ranks (based on original order of features)
print('global importance rank: {}'.format(global_explanation.global_importance_rank))

# Note: PFIExplainer does not support per class explanations
# Per class feature names
print('ranked per class feature names: {}'.format(global_explanation.get_ranked_per_class_names()))
# Per class feature importance values
print('ranked per class feature values: {}'.format(global_explanation.get_ranked_per_class_values()))

ranked global importance values: [0.10854296065164473, 0.09231789159552856, 0.08004463841087486, 0.06736151756281923, 0.040619323602939496, 0.011470498204496665, 0.0019029469334032152, 0.0017184711832201197, 0.0010408358738259124, 0.0009480712717091068, 0.0009447077838160658, 0.000895324042938859, 0.0008793307787123827, 0.0008592049402589663, 0.0007613609887223783, 0.0007527811424301789, 0.0007312968086592678, 0.0007094262079252273, 0.0006610346003752117, 0.0006560320543790456, 0.0006246408938155836, 0.0006046636237637113, 0.0004925521569747928, 0.0004891448338565201, 0.00046016376493606293, 0.0004397056039580976, 0.00043270922110753206, 0.0003986803014610845, 0.0003236564560591836, 0.00027692548993158133]
ranked global importance names: ['worst area', 'worst perimeter', 'mean area', 'mean perimeter', 'area error', 'worst texture', 'mean texture', 'worst radius', 'mean fractal dimension', 'perimeter error', 'radius error', 'worst concavity', 'worst concave points', 'mean concavity', 'm

In [8]:
# Print out a dictionary that holds the sorted feature importance names and values
print('global importance rank: {}'.format(global_explanation.get_feature_importance_dict()))

global importance rank: {'worst area': 0.10854296065164473, 'worst perimeter': 0.09231789159552856, 'mean area': 0.08004463841087486, 'mean perimeter': 0.06736151756281923, 'area error': 0.040619323602939496, 'worst texture': 0.011470498204496665, 'mean texture': 0.0019029469334032152, 'worst radius': 0.0017184711832201197, 'mean fractal dimension': 0.0010408358738259124, 'perimeter error': 0.0009480712717091068, 'radius error': 0.0009447077838160658, 'worst concavity': 0.000895324042938859, 'worst concave points': 0.0008793307787123827, 'mean concavity': 0.0008592049402589663, 'mean smoothness': 0.0007613609887223783, 'mean radius': 0.0007527811424301789, 'symmetry error': 0.0007312968086592678, 'compactness error': 0.0007094262079252273, 'mean symmetry': 0.0006610346003752117, 'smoothness error': 0.0006560320543790456, 'texture error': 0.0006246408938155836, 'worst symmetry': 0.0006046636237637113, 'concave points error': 0.0004925521569747928, 'worst fractal dimension': 0.0004891448

### Explain overall model predictions as a collection of local (instance-level) explanations

In [9]:
# feature shap values for all features and all data points in the training data
print('local importance values: {}'.format(global_explanation.local_importance_values))

local importance values: [[[0.0, 0.00860845309557369, -0.018742147875277823, 0.0582296437936934, 0.0, 0.0, 0.0, 0.0, 0.0, -0.008191812303200957, 0.0, 0.0, 0.0, -0.06084569006669459, 0.0, 0.0, 0.0, 0.0, 0.007978684301949146, 0.0, 0.0, 0.03010634657407958, 0.10679580372810432, -0.09251824878580582, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, -0.052604826031218375, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.017490043615792688, 0.0, -0.006540511983873912, 0.0, 0.0, 0.0, 0.0, 0.0, 0.04413058373859481, -0.08520197705248364, -0.21733966960322307, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, -0.02662171759025539, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.02336025112342295, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.004206663920263948, -0.01977246249384193, -0.07953833449724575, -0.17947310530957733, -0.000839754066276982, 0.0, 0.0, 0.0, -0.007069492927560039, 0.0], [0.0, 0.005554656278671272, 0.043670926207919625, 0.16673011785956632, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.02202




### Generate local explanations
Explain local data points (individual instances)

In [11]:
# Note: PFIExplainer does not support local explanations
# You can pass a specific data point or a group of data points to the explain_local function

# E.g., Explain the first data point in the test set
instance_num = 0
local_explanation = explainer.explain_local(x_test[instance_num,:])

  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \


In [12]:
# Get the prediction for the first member of the test set and explain why model made that prediction
prediction_value = clf.predict(x_test)[instance_num]

sorted_local_importance_values = local_explanation.get_ranked_local_values()[prediction_value]
sorted_local_importance_names = local_explanation.get_ranked_local_names()[prediction_value]

print('local importance values: {}'.format(sorted_local_importance_values))
print('local importance names: {}'.format(sorted_local_importance_names))

local importance values: [0.10576321734429783, 0.074257144909961, 0.018803185157588907, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.008200359656805947, -0.012802773769909515, -0.0393289371248013, -0.0691756799691715, -0.1007368293535805]
local importance names: ['worst area', 'area error', 'mean perimeter', 'worst fractal dimension', 'worst symmetry', 'mean texture', 'mean smoothness', 'mean compactness', 'mean concavity', 'mean concave points', 'mean symmetry', 'mean fractal dimension', 'radius error', 'texture error', 'perimeter error', 'smoothness error', 'compactness error', 'concave points error', 'symmetry error', 'fractal dimension error', 'worst radius', 'worst smoothness', 'worst concavity', 'worst concave points', 'mean radius', 'concavity error', 'worst compactness', 'worst texture', 'mean area', 'worst perimeter']


## Visualize
Load the visualization dashboard

In [13]:
from interpret_community.widget import ExplanationDashboard

In [14]:
ExplanationDashboard(global_explanation, model, datasetX=x_test)

ExplanationWidget(value={'predictedY': [1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1â€¦

<interpret_community.widget.ExplanationDashboard.ExplanationDashboard at 0x250c0db2c88>

## Next
Learn about other use cases of the explain package on a:
       
1. [Training time: regression problem](./explain-regression-local.ipynb)
1. [Training time: multiclass classification problem](./explain-multiclass-classification-local.ipynb)
1. Explain models with engineered features:
    1. [Simple feature transformations](./simple-feature-transformations-explain-local.ipynb)
    1. [Advanced feature transformations](./advanced-feature-transformations-explain-local.ipynb)
1. [Save model explanations via Azure Machine Learning Run History](../azure-integration/run-history/save-retrieve-explanations-run-history.ipynb)
1. [Run explainers remotely on Azure Machine Learning Compute (AMLCompute)](../azure-integration/remote-explanation/explain-model-on-amlcompute.ipynb)
1. Inferencing time: deploy a classification model and explainer:
    1. [Deploy a locally-trained model and explainer](../azure-integration/scoring-time/train-explain-model-locally-and-deploy.ipynb)
    1. [Deploy a remotely-trained model and explainer](../azure-integration/scoring-time/train-explain-model-on-amlcompute-and-deploy.ipynb)
