In this lab we will use Permutation Importance and SHAP values to explain our regression model.

First let's load the necessary libraries

In [None]:
import numpy as np
import pandas as pd
import os
import gc
from pathlib import Path

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn import metrics
from sklearn.tree import DecisionTreeRegressor
from sklearn.svm import SVR
from sklearn.neural_network import MLPRegressor
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor

In [None]:
from sklearn.inspection import permutation_importance
import shap

In [None]:
import matplotlib.pyplot as plt
from matplotlib import cm
import matplotlib.colors as colors

Load measurement data into a Dataframe. 

In [None]:
maindir = Path.cwd() / "tutorial_data/"
data = maindir / "Sphere_data_short.csv"
df = pd.read_csv(data, sep=',', header=0)
df

Some randomization settings. 

In [None]:
seed_num = 42
np.random.seed(seed_num)

Inspection of the columns in the dataset. 

In [None]:
df.columns

Specification of input features and target variable. 

In [None]:
feature_names = ['Nx', 'Ny', 'Nz', 
       'LateralDensity', 'DirectionDensity', 'ExposureTime',
       'oriX', 'oriY', 'oriZ', 'Inc', 'ang', 'ViewAng', 'AcmosJ', 'Rs']
target_variable = 'PointDev'

Features and target are scaled (separately) to the [0, 1] interval. 

In [None]:
Xall = df[feature_names].values
Yall = df[target_variable].values.reshape(-1, 1)
scalerX = MinMaxScaler()
scalerY = MinMaxScaler()
scalerX.fit(Xall)
scalerY.fit(Yall)

Xsc = scalerX.transform(Xall)
Ysc = scalerY.transform(Yall)

Data are randomly split into train & test partitions. 

In [None]:
X_train, X_test, y_train, y_test = train_test_split(Xsc, Ysc, test_size=0.2, random_state=seed_num)

Select a regressor from the available ones and instantiate it. The link below each regressor will take you to the corresponding page in the scikit-learn documentation. Select hyper-parameters & build the model.

In [None]:
hparams = {'kernel': 'rbf', 'gamma': 'auto', 'epsilon': 0.01, 'C': 10.0, 'tol': 0.001, 'max_iter': 10000}

model = SVR(**hparams)
## https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVR.html


# model = DecisionTreeRegressor()
## https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeRegressor.html#sklearn.tree.DecisionTreeRegressor


# model = MLPRegressor()
## https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html


# model = RandomForestRegressor()
## https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html


Select appropriate performance scores for the supervised regression task. 

In [None]:
scoring = ['neg_mean_absolute_error', 'neg_mean_absolute_percentage_error']

Train model on the train partition, predict on the test partition. 

In [None]:
trained_model = model.fit(X_train, y_train.ravel())
predictions = trained_model.predict(X_test)

Compute performance scores on the unseen test partition. 

In [None]:
mae = metrics.mean_absolute_error(y_test.ravel(), predictions)
mape = metrics.mean_absolute_percentage_error(y_test.ravel(), predictions)

In [None]:
mae

Let's compute Permutation Importance scores for the trained model. Permutation Importance will be calculated on the test partition, as the mean (over 10 repetitions) drop in performance scores when a feature is permuted.

In [None]:
r = permutation_importance(trained_model, X_test, y_test, n_repeats=10, random_state=seed_num, scoring=scoring)

In [None]:
mean_importances_mae = r['neg_mean_absolute_error']['importances_mean']
std_importances_mae = r['neg_mean_absolute_error']['importances_std']

In [None]:
names = []
meanimp_mae = []
stdimp_mae = []

for i in mean_importances_mae.argsort():
    names.append(feature_names[i])
    meanimp_mae.append(mean_importances_mae[i])
    stdimp_mae.append(std_importances_mae[i])

In [None]:
fig, ax = plt.subplots(figsize=(19, 12))

ax.barh(names, meanimp_mae, facecolor='g', xerr=stdimp_mae)
ax.set_xlabel('Feature Importance scores', fontsize=18)
ax.set_ylabel('Input Features', fontsize=18)
ax.set_title('Permutation Importance based on MAE', fontsize=20)

savefile = maindir / 'PermutationImportanceMAE_SVM.png'
plt.savefig(savefile, bbox_inches='tight', pad_inches=0.1, format='png')
#plt.close("all")

In [None]:
plt.close("all")

Now let's train a decision tree (or tree-based ensemble like Random Forest) and investigate feature attributions through SHAP values. Although the method in theory is model-agnostic, model-specific implementations such as the TreeExplainer) run much faster than the model-agnostic implementation (KernelExplainer).

In [None]:
hparams = {'criterion': 'squared_error', 'splitter': 'best', 'max_depth':4}
model = DecisionTreeRegressor(**hparams)
model.fit(X_train, y_train.ravel())
predictions = model.predict(X_test)
mae = metrics.mean_absolute_error(y_test.ravel(), predictions)
mape = metrics.mean_absolute_percentage_error(y_test.ravel(), predictions)
print("Mean Absolute Error = {} mm".format(mae))

We instantiate the explainer on the trained model and calculate SHAP values on the test set. An overview of feature attributions can be visualized in a summary plot. 

In [None]:
shap.initjs()
ex = shap.TreeExplainer(model)
shap_values = ex.shap_values(X_test)
shap.summary_plot(shap_values, X_test, feature_names=feature_names)

Let's make a dependence plot for the feature with the highest importance (Rs). Vertical spread in a dependence plot represents the effects of non-linear interactions between features. 

In [None]:
shap.dependence_plot(13, shap_values, X_test, feature_names=feature_names)

Figures created using the shap library can be manipulated with matplotlib.pyplot, as below. 

In [None]:
f = plt.figure()
shap.summary_plot(shap_values, X_test, feature_names=feature_names)
f.savefig("summary_plot1.png", bbox_inches='tight', dpi=600)

Finally, let's visualize a force plot to explain a certain prediction (local explanation). The base value is “the value that would be predicted if we did not know any features for the current output.” In other words, it is the mean prediction.

In [None]:
shap.initjs()
shap.force_plot(ex.expected_value, shap_values[10,:], X_test[10,:], feature_names=feature_names)