# Explain binary classification model predictions on GPU
_**This notebook showcases how to use the interpret-community repo to help interpret and visualize predictions from a binary classification model on GPU.**_

Adapted from `explain-binary-classification-local.ipynb` notebook in the repository

## Table of Contents

1. [Introduction](#Introduction)
1. [Setup](#Setup)
1. [Project](#Project)
1. [Run model explainer locally at training time](#Explain)
    1. Train a binary classification model
    1. Explain the model
        1. Generate global explanations
        1. Generate local explanations
1. [Visualize results](#Visualize)
1. [Next steps](#Next)

<a id='Introduction'></a>
## 1. Introduction

This notebook illustrates how to locally use interpret-community to help interpret binary classification model predictions at training time.  It demonstrates the API calls needed to obtain the global and local interpretations along with an interactive visualization dashboard for discovering patterns in data and explanations.

Two options using the TabularExplainer on CPU and GPU ( with the `use_gpu` flag) are demonstrated: 
- KernelExplainer - uses [SHAP KernelExplainer](https://shap-lrjball.readthedocs.io/en/latest/generated/shap.KernelExplainer.html#shap-kernelexplainer) on CPU
- GPUKernelExplainer - uses [cuML KernelExplainer](https://docs.rapids.ai/api/cuml/stable/api.html#cuml.explainer.KernelExplainer) for GPU Acceleration

To run the GPUKernelExplainer:
- Ensure local machine has GPU and CUDA & NVIDIA Drivers installed. For minimum version requirements visit [RAPIDS getting started](https://rapids.ai/start.html)
- Install [RAPIDS libraries](https://rapids.ai/start.html#get-rapids)


<a id='Project'></a>       
## 2. Project

The goal of this project is to classify breast cancer diagnosis with scikit-learn and cuML then locally running the model explainer:

1. Train a SVM classification model using Scikit-learn
2. Run 'explain_model' globally and locally with full dataset in local mode, which doesn't contact any Azure services.
3. Visualize the global and local explanations with the visualization dashboard.

<a id='Setup'></a>
## 3. Setup

If you are using Jupyter notebooks, the extensions should be installed automatically with the package.
If you are using Jupyter Labs run the following command:
```
(myenv) $ jupyter labextension install @jupyter-widgets/jupyterlab-manager
```


<a id='Explain'></a>
## 4. Run model explainer locally at training time

In [None]:
from sklearn.datasets import load_breast_cancer
from sklearn import svm

# Explainers:
# 1. SHAP Kernel Explainer
from interpret.ext.blackbox import TabularExplainer

# cuML is a Machine Learning library within RAPDIS similar to scikit-learn
import cuml

### Load the breast cancer diagnosis data

In [None]:
breast_cancer_data = load_breast_cancer()
classes = breast_cancer_data.target_names.tolist()

In [None]:
# Split data into train and test
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(breast_cancer_data.data, breast_cancer_data.target, test_size=0.2, random_state=0)

### Train a SVM classification model, which you want to explain

In [None]:
# Train a cuML model
clf = cuml.svm.SVC(gamma=0.001, C=100., probability=True)
model = cu_clf.fit(x_train, y_train)

# Train sklearn model
# clf = svm.SVC(gamma=0.001, C=100., probability=True)
# sk_model = clf.fit(x_train, y_train)

### Explain predictions on your local machine

In [None]:
# 1. Using GPU SHAP TabularExplainer with model
# To use this, the machine should have GPUs present and RAPIDS libraries installed. 
# Visit https://rapids.ai for more information. This option uses cuML's SHAP implementation on GPU.
# cuML is a GPU-accelerated Machine Learning library within RAPDIS and mirrors scikit-learn's API
# cuML model with GPU SHAP proved to be the most optimal combination for speed. We noticed it yielded 
# a 3.5x speed-up over  sklearn model with CPU SHAP on RTX 8000. The speed-ups with larger dataset 
# will be more significant.
explainer = TabularExplainer(model, 
                             x_train, 
                             features=breast_cancer_data.feature_names, 
                             classes=classes,
                             use_gpu=True)

# 2. Using GPU SHAP TabularExplainer with sklearn model
# We can use a model from scikit-learn model for training as well. Train the sklearn model 
# by uncommenting appropriate lines in the previous cell to run GPU SHAP + sklearn.
# explainer = TabularExplainer(sk_model, 
#                              x_train, 
#                              features=breast_cancer_data.feature_names, 
#                              classes=classes,
#                              use_gpu=True)

# 3. Using CPU SHAP TabularExplainer with sklearn model
# explainer = TabularExplainer(sk_model, 
#                              x_train, 
#                              features=breast_cancer_data.feature_names, 
#                              classes=classes)

### Generate global explanations
Explain overall model predictions (global explanation)

In [None]:
# Passing in test dataset for evaluation examples - note it must be a representative sample of the original data
# x_train can be passed as well, but with more examples explanations will take longer although they may be more accurate
global_explanation = explainer.explain_global(x_test)

In [None]:
# Sorted SHAP values
print('ranked global importance values: {}'.format(global_explanation.get_ranked_global_values()))
# Corresponding feature names
print('ranked global importance names: {}'.format(global_explanation.get_ranked_global_names()))
# Feature ranks (based on original order of features)
print('global importance rank: {}'.format(global_explanation.global_importance_rank))

# Per class feature names
print('ranked per class feature names: {}'.format(global_explanation.get_ranked_per_class_names()))
# Per class feature importance values
print('ranked per class feature values: {}'.format(global_explanation.get_ranked_per_class_values()))

In [None]:
# Print out a dictionary that holds the sorted feature importance names and values
print('global importance rank: {}'.format(global_explanation.get_feature_importance_dict()))

### Explain overall model predictions as a collection of local (instance-level) explanations

In [None]:
# feature shap values for all features and all data points in the training data
print('local importance values: {}'.format(global_explanation.local_importance_values))

### Generate local explanations
Explain local data points (individual instances)

In [None]:
# You can pass a specific data point or a group of data points to the explain_local function

# E.g., Explain the first data point in the test set
instance_num = 0
local_explanation = explainer.explain_local(x_test[instance_num,:])

In [None]:
# Get the prediction for the first member of the test set and explain why model made that prediction
prediction_value = clf.predict(x_test)[instance_num]

sorted_local_importance_values = local_explanation.get_ranked_local_values()[prediction_value]
sorted_local_importance_names = local_explanation.get_ranked_local_names()[prediction_value]

print('local importance values: {}'.format(sorted_local_importance_values))
print('local importance names: {}'.format(sorted_local_importance_names))

<a id='Visualize'></a>
## 5. Visualize
Load the visualization dashboard

In [None]:
from raiwidgets import ExplanationDashboard

In [None]:
try:
    ExplanationDashboard(global_explanation, model, dataset=x_test, true_y=y_test)
except NameError as e:
    # If we used sklearn model instead - show the dashboard with sk_model
    ExplanationDashboard(global_explanation, sk_model, dataset=x_test, true_y=y_test)

## 6. Next Steps
Learn more
- [RAPIDS](https://rapids.ai/)
- [RAPIDS on Medium](https://medium.com/rapids-ai)
- [cuML on GitHub](https://github.com/rapidsai/cuml.git)
- [cuML API Reference](https://docs.rapids.ai/api/cuml/stable/api.html)