# KERNEL SHAP

The goal of SHAP is to calculate the impact of every feature on the prediction.



How is Kernel SHAP different from other permutation importance methods -

In Kernel SHAP, instead of retraining models with permutations of features, we can use the full model that is already trained, and replace "missing features" with "samples from the data" that are estimated from a formula.
This means that we equate "absent feature value" with "feature value replaced by random feature value from data".

Now, this changed feature space is fitted to the linear model and the coefficients of this model act as Shapley values.

SHAP has the capability of both local and global interpretations. SHAP can compute the importance of each feature on the prediction for an individual instance and for the overall model as well.

SHAP values are consistent and reliable because if a model changes so that the marginal contribution(i.e. percentage out of the total) of a feature value increases or stays the same (regardless of other features), they increase or remain the same respectively.

Thus, SHAP values are mathematically more accurate.

In [None]:
!pip install alibi

In [None]:
import shap
shap.initjs()

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

from alibi.explainers import KernelShap
from scipy.special import logit
from sklearn.metrics import confusion_matrix, plot_confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression


Loading and preparing data 

In [None]:
data = pd.read_csv('../input/heart-disease-cleveland-uci/heart_cleveland_upload.csv')
# To display the top 5 rows
data.head(5)

In [None]:
heart = data.copy()

In [None]:
target = 'condition'
features_list = list(heart.columns)
features_list.remove(target)

In [None]:
y = heart.pop('condition')

In [None]:
X_train, X_test, y_train, y_test = train_test_split(heart, y, test_size=0.2, random_state=33)

In [None]:
print("Training records: {}".format(X_train.shape[0]))
print("Testing records: {}".format(X_test.shape[0]))

In [None]:
scaler = StandardScaler().fit(X_train)
X_train_norm = scaler.transform(X_train)
X_test_norm = scaler.transform(X_test)

Training data

In [None]:
classifier = LogisticRegression(random_state=0)
classifier.fit(X_train_norm, y_train)

In [None]:
y_pred = classifier.predict(X_test_norm)

Checking for accuracy

In [None]:
cm = confusion_matrix(y_test, y_pred)
title = 'Confusion matrix for the logistic regression classifier'
disp = plot_confusion_matrix(classifier,
                             X_test_norm,
                             y_test,
                             #display_labels=target,
                             cmap=plt.cm.Blues,
                             normalize=None,
                            )
disp.ax_.set_title(title)

Applying Kernel SHAP

In [None]:
pred = classifier.predict_proba
lr_explainer = KernelShap(pred, link='logit') 
#The purpose of the logit link is to take a linear combination of the values (which may take any value between ±∞) and convert those values to the scale of a probability, i.e., between 0 and 1.
lr_explainer.fit(X_train_norm)

In [None]:
lr_explanation = lr_explainer.explain(X_test_norm, l1_reg=False)

LOCAL EXPLANATION -

In [None]:
idx =  4
instance = X_test_norm[idx][None, :]
pred = classifier.predict(instance)
class_idx = pred.item()
print("The predicted class for the X_test_norm[{}] is {}.".format(idx, *pred))

In [None]:
shap.initjs()
shap.force_plot(lr_explanation.expected_value[class_idx], lr_explanation.shap_values[class_idx][idx,:], X_test_norm[idx][None, :],features_list)

The base value is the average of all output values of the model on the training data(here : -0.3148).

Pink values drag/push the prediction towards 1(pushes the prediction higher i.e. towards having heart disease) and the blue towards 0(pushes the predicion lower i.e. towards no disease).

The magnitude of influence is determined by the length of the features on the horizontal line. The value shown corresponding to the feature are the values of feature at the particular index(eg. 2.583 for ca). Here, the highest influence is of ca for increasing the prediction value and of sex for decreasing the value.

GLOBAL EXPLANATION -

In [None]:
shap.summary_plot(lr_explanation.shap_values[1], X_test_norm, features_list)

The above plot visualizes the impact of features on the prediction class 1. The features are arranged such that the highest influence is of the topmost feature. Thus, ca is the feature that influences the prediction the most followed by thal and so on. 

The colour shades show the direction in which the feature impacts the prediction. For example, higher shap values of ca are shown in red colour which means high feature value. The higher the value of ca, the higher is the SHAP value i.e. more towards 1 . High ca ---> Heart Disease.


Almost all features show this pattern. However, it is the opposite for some features: High thalach will indicate less chances of Heart disease.