# Classification with SHAP

- This tutorial demonstrates how to use structured binary classification with Keras, starting from a raw CSV file. 


- This example is an advanced version of [](structured_data_classification_intro.ipynb) since we will use more functions and less code.

## Setup

In [5]:
import numpy as np
import pandas as pd

import tensorflow as tf
from tensorflow.keras import layers

import shap

import warnings
warnings.filterwarnings('ignore')

print(f"TensorFlow: {tf.__version__}")
print(f"SHAP: {shap.__version__}")

TensorFlow: 2.8.1
SHAP: 0.40.0


## Data

- We use the features below to predict whether a patient has a heart disease (`Target`).

featureumn| Description| Feature Type
------------|--------------------|----------------------
Age | Age in years | Numerical
Sex | (1 = male; 0 = female) | Categorical
CP | Chest pain type (0, 1, 2, 3, 4) | Categorical
Trestbpd | Resting blood pressure (in mm Hg on admission) | Numerical
Chol | Serum cholesterol in mg/dl | Numerical
FBS | fasting blood sugar in 120 mg/dl (1 = true; 0 = false) | Categorical
RestECG | Resting electrocardiogram results (0, 1, 2) | Categorical
Thalach | Maximum heart rate achieved | Numerical
Exang | Exercise induced angina (1 = yes; 0 = no) | Categorical
Oldpeak | ST depression induced by exercise relative to rest | Numerical
Slope | Slope of the peak exercise ST segment | Numerical
CA | Number of major vessels (0-3) featureored by fluoroscopy | Both numerical & categorical
Thal | normal; fixed defect; reversible defect | Categorical (string)
Target | Diagnosis of heart disease (1 = true; 0 = false) | Target

## Model

### Model import

- Load the model (see [](structured_data_classification_layers.ipynb)): 

In [6]:
model = tf.keras.models.load_model('my_hd_classifier')

2022-06-10 10:02:32.882167: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz


- To get a prediction for a new sample, you can simply call the Keras `Model.predict` method.

- There are just two things you need to do:

  - Wrap scalars into a list so as to have a batch dimension (Models only process batches of data, not single samples).

  - Call `tf.convert_to_tensor` on each feature.

In [7]:
sample = {
    "age": 60,
    "sex": 1,
    "cp": 1,
    "trestbps": 145,
    "chol": 233,
    "fbs": 1,
    "restecg": 2,
    "thalach": 150,
    "exang": 0,
    "oldpeak": 2.3,
    "slope": 3,
    "ca": 0,
    "thal": "fixed",
}

In [8]:
input_dict = {name: tf.convert_to_tensor([value]) for name, value in sample.items()}

In [9]:
predictions = model.predict(input_dict)

In [6]:
print(
    "This particular patient had a %.1f percent probability "
    "of having a heart disease, as evaluated by our model." % (100 * predictions[0][0],)
)

This particular patient had a 62.4 percent probability of having a heart disease, as evaluated by our model.


In [10]:
X_train = pd.read_csv("df_train.csv")

In [21]:
X_train

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,1,145,233,1,2,150,0,2.3,3,0,fixed,0
1,67,1,4,160,286,0,2,108,1,1.5,2,3,normal,1
2,37,1,3,130,250,0,0,187,0,3.5,3,0,normal,0
3,41,0,2,130,204,0,2,172,0,1.4,1,0,normal,0
4,56,1,2,120,236,0,0,178,0,0.8,1,0,normal,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
237,52,1,1,118,186,0,2,190,0,0.0,2,0,fixed,0
238,43,0,4,132,341,1,2,136,1,3.0,2,0,reversible,1
239,65,1,4,135,254,0,2,127,0,2.8,2,1,reversible,1
240,48,1,4,130,256,1,2,150,1,0.0,1,2,reversible,1


In [15]:
X_train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 242 entries, 0 to 241
Data columns (total 14 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   age       242 non-null    int64  
 1   sex       242 non-null    int64  
 2   cp        242 non-null    int64  
 3   trestbps  242 non-null    int64  
 4   chol      242 non-null    int64  
 5   fbs       242 non-null    int64  
 6   restecg   242 non-null    int64  
 7   thalach   242 non-null    int64  
 8   exang     242 non-null    int64  
 9   oldpeak   242 non-null    float64
 10  slope     242 non-null    int64  
 11  ca        242 non-null    int64  
 12  thal      242 non-null    object 
 13  target    242 non-null    int64  
dtypes: float64(1), int64(12), object(1)
memory usage: 26.6+ KB


In [None]:
X_train['thal'] = 

In [44]:
numeric_feature_names = ['age', 'sex', 'cp', 'trestbps',  'chol', 'fbs','restecg','thalach', 'exang', 'oldpeak', 'slope', 'ca', 'thal' ]
numeric_features = X_train[numeric_feature_names]
numeric_features.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal
0,63,1,1,145,233,1,2,150,0,2.3,3,0,fixed
1,67,1,4,160,286,0,2,108,1,1.5,2,3,normal
2,37,1,3,130,250,0,0,187,0,3.5,3,0,normal
3,41,0,2,130,204,0,2,172,0,1.4,1,0,normal
4,56,1,2,120,236,0,0,178,0,0.8,1,0,normal


In [46]:
numeric_features.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 242 entries, 0 to 241
Data columns (total 13 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   age       242 non-null    int64  
 1   sex       242 non-null    int64  
 2   cp        242 non-null    int64  
 3   trestbps  242 non-null    int64  
 4   chol      242 non-null    int64  
 5   fbs       242 non-null    int64  
 6   restecg   242 non-null    int64  
 7   thalach   242 non-null    int64  
 8   exang     242 non-null    int64  
 9   oldpeak   242 non-null    float64
 10  slope     242 non-null    int64  
 11  ca        242 non-null    int64  
 12  thal      242 non-null    object 
dtypes: float64(1), int64(11), object(1)
memory usage: 24.7+ KB


In [45]:
tf.convert_to_tensor(numeric_features)

ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type int).

In [None]:
input_dict_2 = {name: tf.convert_to_tensor([value]) for name, value in sample.items()}

In [12]:
predictions_2 = model.predict(X_train)

ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type int).

In [11]:
explainer = shap.KernelExplainer(model.predict, X_train)


Provided model function fails when applied to the provided data set.


ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type int).

In [None]:
shap_values = explainer.shap_values(X_test.iloc[0,:])
shap.force_plot(explainer.expected_value[0], shap_values[0], X_test.iloc[0,:])