<a href="https://colab.research.google.com/github/google/applied-machine-learning-intensive/blob/master/content/04_classification/03_classification_with_tensorflow/colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#### Copyright 2020 Google LLC.

In [0]:
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Classification with TensorFlow

By now you should be familiar with classification in scikit-learn. In this Colab we will explore another commonly used tool for classification and machine learning: TensorFlow.

## The Dataset

The dataset that we'll be using is the [UCI Heart Disease dataset](http://archive.ics.uci.edu/ml/datasets/Heart+Disease). The dataset contains health information about patients, as well as a "presence of heart disease" indicator.

The [original dataset](http://archive.ics.uci.edu/ml/datasets/Heart+Disease) contains over 70 different attributes and five heart disease classifications. For this lab we'll use a [simplified version of the dataset](https://www.kaggle.com/ronitf/heart-disease-uci) hosted on Kaggle.

This simplified version of the dataset contains 13 attributes and a yes/no indicator for the presence or absence of heart disease.

The columns are:

Feature | Description
--------|--------------
age     | age in years
sex     | sex<br>0 = female<br>1 = male
cp      | chest pain type<br>1 = typical angina<br>2 = atypical angina<br>3 = non-anginal pain<br>4 = asymptomatic
trestbps  | resting blood pressure in Hg
chol      | serum cholesterol in mg/dl
fbs       | is fasting blood sugar > 120 mg/dl<br>0 = false<br>1 = true
restecg   | results of a resting electrocardiograph<br>0 = normal<br>1 = ST-T wave abnormality<br>2 = left ventricular hypertrophy
thalach   | max heart rate
exang     | exercise induced angina<br>0 = no<br>1 = yes
oldpeak   | measurement of an abnormal ST depression
slope     | slope of peak of exercise ST segment<br>1 = upslope<br>2 = flat<br>3 = downslope
ca        | count of major blood vessels colored by fluoroscopy<br>0, 1, 2, 3, or 4
thal      | presence heart condition<br>0 = unknown<br>1 = normal<br>2 = fixed defect<br>3 = reversible defect

The heart disease indicator is a 0 for no disease and a 1 for heart disease.

Let's assume we have been given this dataset by the Cleveland Clinic and have been asked to build a model that can predict if their patients have heart disease or not. The purpose of the model is to assist doctors in making diagnostic decisions faster.

### Exercise 1: Ethical Considerations

Before we dive in, let's take a moment to think about the dataset and the larger problem that we are trying to solve. We have 17 data attributes related to an individual's health, as well as an indicator that determines if the patient has heart disease.

#### Question 1

Are there any attributes in the data that we should pay special attention to? Imagine a case where the data is unbalanced in some way. How might that affect the model and the doctor/patient experience?

##### **Student Solution**

> *Your answer goes here*

---

##### Answer Key

This is just an example of an acceptable answer. There are many potential solutions.

> *`sex` is a very important column to have balanced data for. Male and female health can be very different. If the database is trained on a dataset that is predominantly one sex or the other, then there is a good chance the model's predictions will be less accurate for the underrepresented sex.*
>
> Also, *`chest pain type` and other potentially subjective or self-reported columns might be difficult to trust.*

---

#### Question 2

Assuming we can get a reasonably well-performing model deployed, is there potential for problems with how the predictions from this model are interpreted and used?

##### **Student Solution**

> *Your answer goes here*

---

##### Answer Key

This is just an example of an acceptable answer. There are many potential solutions.

> *If the model is seen performing well, then doctors might become too reliant on the model. If the model is fully trusted, doctors might do less thorough screening and miss some diagnoses they would have previously caught.*

---

### Exploratory Data Analysis

Let's download the data and take a look at what we are working with.

Upload your `kaggle.json` file and run the code below.

In [0]:
! chmod 600 kaggle.json && (ls ~/.kaggle 2>/dev/null || mkdir ~/.kaggle) && mv kaggle.json ~/.kaggle/ && echo 'Done'

And then download the dataset.

In [0]:
!kaggle datasets download ronitf/heart-disease-uci
!ls

And load the data into a `DataFrame` and take a peek.

In [0]:
import pandas as pd

df = pd.read_csv('heart-disease-uci.zip')
df.sample(5)

We can see that all of the data is numeric, but varies a bit in scale.

Let's describe the data:

In [0]:
df.describe()

No missing data. Only 303 rows of data, though, so we aren't working with a huge dataset.

Now we'll dig deeper into the data in a few of the columns. If you were working with a dataset for a real-world model you would want to explore each column.

We'll start by mapping out the correlations in the data.

In [0]:
import matplotlib.pyplot as plt
import seaborn as sns

plt.figure(figsize=(10,10))
_ = sns.heatmap(df.corr(), cmap='coolwarm', annot=True)

There are no obviously strong correlations.

Let's now see how balanced our data is by sex:

In [0]:
df['sex'].hist()

In this data female maps to 0 and male maps to 1, so there are over twice as many men in the dataset.

Let's also check out the target.

In [0]:
df['target'].hist()

In this case the dataset looks more balanced.

And finally we'll look at age.

In [0]:
df['age'].hist()

The dataset seems to be pretty heavily skewed toward individuals in their 50s and 60s.

There isn't a lot of actionable information from our analysis. We might want to stratify our data by sex when we train and test our model, but there are no data repairs that seem to need to be done.

If you were building this model for a real world application, you would also want to ensure that the values in the numeric columns are realistic.

## The Model

Let's build and train our model. We'll build a deep neural network that takes our input features and returns a `0` if it predicts that the patient doesn't have heart disease and a `1` if it predicts that the patient does have heart disease.

First let's create a list of features to make coding easier.

In [0]:
FEATURES = df.columns.values[:-1]
TARGET = df.columns.values[-1]

FEATURES, TARGET

We'll also want to normalize our feature data before feeding it into the model.

In [0]:
df.loc[:, FEATURES] = ((df[FEATURES] - df[FEATURES].min()) / (df[FEATURES].max() - df[FEATURES].min()))

df.describe()

We can also now split off a validation set from our data. Since we have so many more men than women in this dataset, we will stratify on sex.

In [0]:
from sklearn.model_selection import train_test_split

X_train, X_validate, y_train, y_validate = train_test_split(
    df[FEATURES], df[TARGET], test_size=0.2, stratify=df['sex'])

X_train.shape, X_validate.shape

We'll use the TensorFlow Keras [`Sequential`](https://www.tensorflow.org/api_docs/python/tf/keras/Sequential) model. The input size needs to be equal to the number of input features that we have. The output size needs to be 1 since we are predicting a yes/no value. The number and width of layers in between are an area for experimentation, as are the activation functions.

We start with an initial hidden layer 64 nodes wide and funnel down to 32, 16, and finally, to the output layer of 1 node.

In [0]:
import tensorflow as tf

model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation=tf.nn.relu, 
                          input_shape=(FEATURES.size,)),
    tf.keras.layers.Dense(32, activation=tf.nn.relu),
    tf.keras.layers.Dense(16, activation=tf.nn.relu),
    tf.keras.layers.Dense(1, activation=tf.nn.sigmoid)
])

model.summary()

We can now compile the model. We use 'binary_crossentropy' loss since this is a binary classification model.

In [0]:
model.compile(
    loss='binary_crossentropy',
    optimizer='Adam',
    metrics=['accuracy']
)

And finally, we can actually fit the model. We'll start with a run of 500 training epochs. Once we are done, we'll print out the final accuracy the model achieved.

In [0]:
history = model.fit(X_train, y_train, epochs=500, verbose=0)

history.history['accuracy'][-1]

We got perfect accuracy in our model. Let's see how the accuracy improves and the loss is reduced over epochs.

In [0]:
import matplotlib.pyplot as plt

plt.figure(figsize=(16,5))

plt.subplot(1,2,1)
plt.plot(history.history['accuracy'])
plt.title('Training Accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train_accuracy'], loc='best')

plt.subplot(1,2,2)
plt.plot(history.history['loss'])
plt.title('Training Loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train_loss'], loc='best')

We seem to have kept training this model far too long. The accuracy reaches perfection, and the loss moves to 0.0 after a few hundred epochs.

Let's see if we overfit by using our validation holdout data. In order to do that, we need to convert our predictions back into a binary representation.

In [0]:
predictions = model.predict(X_validate)

predictions[:10]

As you can see, our predictions are continuous numbers, not the 1 or 0 values that we expected. These values are confidences that the value is 1. Let's look at them in a histogram.

In [0]:
import matplotlib.pyplot as plt

_ = plt.hist(predictions)

Here we can see that the model is highly confident yes or no in many cases, but there are some cases where the model was unsure.

How do we convert these confidences into a yes/no decision?

One way is to simply round:

In [0]:
predictions = [round(x[0]) for x in predictions]

_ = plt.hist(predictions)

This puts the cut-off threshold for a yes/no decision at `0.5`. Let's think about the implications of this.

Also note that the choice of a sigmoid activation function was not coincidence. We wanted to use an activation function that would keep the output values between 0.0 and 1.0 for rounding purposes. 

Now let's check our accuracy.

In [0]:
from sklearn.metrics import accuracy_score

accuracy_score(y_validate, predictions)

When we ran this model, our score was in the low 80s, which is not great. Yours is likely similar.

### Exercise 2: Adjusting the Threshold

#### Question 1

We decided to round for our classification, which puts the threshold for the decision at `0.5`. This decision was made somewhat arbitrarily. Let's think about our problem space a bit more. We are making a model that predicts if an individual has heart disease. Would it be better if we set the threshold for predicting heart disease higher or lower than `0.5`? Or is `0.5` okay? Explain your reasoning.

##### **Student Solution**

> *Your solution goes here*

---

##### Answer Key

There are many possible answers to this question. We'll make the case for higher and lower.

**Higher**

> *The threshold should be higher. If the model is only 50% confident that a person has heart disease and we flag them as having heart disease, the medical system will get overloaded with extra tests that have to be run.*

**Lower**

> *The threshold should be lower. Heart disease is serious, and lives are on the line. If there is any hint that a person has heart disease, they should be classified as having a disease.*

---

#### Question 2

Write code to make yes/no predictions using a higher or lower threshold based on the argument you made in the first question of this exercise. If you chose to keep the threshold at `0.5`, then just pick higher or lower and write the code for that. Print out the accuracy for the new threshold.

##### **Student Solution**

In [0]:
# Your Code Goes Here

---

##### Answer Key

In [0]:
import pandas as pd
import tensorflow as tf

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score


df = pd.read_csv('heart-disease-uci.zip')

FEATURES = df.columns.values[:-1]
TARGET = df.columns.values[-1]

df.loc[:, FEATURES] = ((df[FEATURES] - df[FEATURES].min()) /
                       (df[FEATURES].max() - df[FEATURES].min()))

X_train, X_validate, y_train, y_validate = train_test_split(
    df[FEATURES], df[TARGET], test_size=0.2, stratify=df['sex'])
    
model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation=tf.nn.relu, 
                          input_shape=(FEATURES.size,)),
    tf.keras.layers.Dense(32, activation=tf.nn.relu),
    tf.keras.layers.Dense(16, activation=tf.nn.relu),
    tf.keras.layers.Dense(1, activation=tf.nn.sigmoid)
])

model.compile(
    loss='binary_crossentropy',
    optimizer='Adam',
    metrics=['accuracy']
)

_ = model.fit(X_train, y_train, epochs=500, verbose=0)

predictions = model.predict(X_validate)

# Solution starts here
predictions = [1.0 if x[0] > 0.2 else 0.0 for x in predictions]

accuracy_score(y_validate, predictions)

---

### Exercise 2: Early Stopping

Five hundred epochs turned out to be a bit too many. Use the [`EarlyStopping`](https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/EarlyStopping) class to stop when the loss doesn't improve over the course of five epochs. Print your accuracy score so you can see if it stayed reasonably close to your earlier model. Be sure to also make model fitting verbosity 1 or 2 so you can see at which epoch your model stopped.

##### **Student Solution**

In [0]:
# Your code goes here

---

##### Answer Key

In [0]:
import pandas as pd
import tensorflow as tf

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score


df = pd.read_csv('heart-disease-uci.zip')

FEATURES = df.columns.values[:-1]
TARGET = df.columns.values[-1]

df.loc[:, FEATURES] = ((df[FEATURES] - df[FEATURES].min()) /
                       (df[FEATURES].max() - df[FEATURES].min()))

X_train, X_validate, y_train, y_validate = train_test_split(
    df[FEATURES], df[TARGET], test_size=0.2, stratify=df['sex'])
    
model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation=tf.nn.relu, 
                          input_shape=(FEATURES.size,)),
    tf.keras.layers.Dense(32, activation=tf.nn.relu),
    tf.keras.layers.Dense(16, activation=tf.nn.relu),
    tf.keras.layers.Dense(1, activation=tf.nn.sigmoid)
])

model.compile(
    loss='binary_crossentropy',
    optimizer='Adam',
    metrics=['accuracy']
)

# Solution starts here

callback = tf.keras.callbacks.EarlyStopping(
    monitor='loss', patience=5
)
_ = model.fit(X_train, y_train, epochs=500, callbacks=[callback])

predictions = model.predict(X_validate)

predictions = [round(x[0]) for x in predictions]

accuracy_score(y_validate, predictions)

---