# Classification with SHAP

- This tutorial demonstrates how to do structured binary classification with Keras, starting from a raw CSV file. 


- This example is an advanced version of [](structured_data_classification_intro.ipynb) since we will use more functions and less code.

## Setup

In [1]:
import numpy as np
import pandas as pd

import tensorflow as tf
from tensorflow.keras import layers

tf.__version__

## Data

- We use the features below to predict whether a patient has a heart disease (`Target`).

featureumn| Description| Feature Type
------------|--------------------|----------------------
Age | Age in years | Numerical
Sex | (1 = male; 0 = female) | Categorical
CP | Chest pain type (0, 1, 2, 3, 4) | Categorical
Trestbpd | Resting blood pressure (in mm Hg on admission) | Numerical
Chol | Serum cholesterol in mg/dl | Numerical
FBS | fasting blood sugar in 120 mg/dl (1 = true; 0 = false) | Categorical
RestECG | Resting electrocardiogram results (0, 1, 2) | Categorical
Thalach | Maximum heart rate achieved | Numerical
Exang | Exercise induced angina (1 = yes; 0 = no) | Categorical
Oldpeak | ST depression induced by exercise relative to rest | Numerical
Slope | Slope of the peak exercise ST segment | Numerical
CA | Number of major vessels (0-3) featureored by fluoroscopy | Both numerical & categorical
Thal | normal; fixed defect; reversible defect | Categorical (string)
Target | Diagnosis of heart disease (1 = true; 0 = false) | Target

## Model

### Model import

- Load the model (see [](structured_data_classification_layers.ipynb)): 

In [37]:
model = tf.keras.models.load_model('my_hd_classifier')

- To get a prediction for a new sample, you can simply call the Keras `Model.predict` method.

- There are just two things you need to do:

  - Wrap scalars into a list so as to have a batch dimension (Models only process batches of data, not single samples).

  - Call `tf.convert_to_tensor` on each feature.

In [38]:
sample = {
    "age": 60,
    "sex": 1,
    "cp": 1,
    "trestbps": 145,
    "chol": 233,
    "fbs": 1,
    "restecg": 2,
    "thalach": 150,
    "exang": 0,
    "oldpeak": 2.3,
    "slope": 3,
    "ca": 0,
    "thal": "fixed",
}

In [39]:
input_dict = {name: tf.convert_to_tensor([value]) for name, value in sample.items()}

In [40]:
predictions = reloaded_model.predict(input_dict)

In [41]:
print(
    "This particular patient had a %.1f percent probability "
    "of having a heart disease, as evaluated by our model." % (100 * predictions[0][0],)
)

This particular patient had a 43.9 percent probability of having a heart disease, as evaluated by our model.


## Next steps

The tutorial [](keras-imdb.ipynb) covers how to build a binary sentiment classification model with keras using keras layers for data preprocessing and TensorBoard to view model results.

To learn more about classifying structured data, try working with other datasets. Below are some suggestions for datasets:

- [TensorFlow Datasets: MovieLens](https://www.tensorflow.org/datasets/catalog/movie_lens): A set of movie ratings from a movie recommendation service.

- [TensorFlow Datasets: Wine Quality](https://www.tensorflow.org/datasets/catalog/wine_quality): Two datasets related to red and white variants of the Portuguese "Vinho Verde" wine. You can also find the Red Wine Quality dataset on Kaggle.

- [Kaggle: arXiv Dataset](https://www.kaggle.com/Cornell-University/arxiv): A corpus of 1.7 million scholarly articles from arXiv, covering physics, computer science, math, statistics, electrical engineering, quantitative biology, and economics.