<a href="https://colab.research.google.com/github/google/applied-machine-learning-intensive/blob/master/content/05_deep_learning/05_image_classification_project/colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#### Copyright 2020 Google LLC.

In [0]:
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Image Classification Project

In this project we will build an image classification model and use the model to identify if the lungs pictured indicate that the patient has pneumonia. The outcome of the model will be true or false for each image.

The [data is hosted on Kaggle](https://www.kaggle.com/rob717/pneumonia-dataset) and consists of 5,863 x-ray images. Each image is classified as 'pneumonia' or 'normal'.

## Ethical Considerations

We will frame the problem as:

> *A hospital is having issues correctly diagnosing patients with pneumonia. Their current solution is to have two trained technicians A hospital is having issues correctly diagnosing patients with pneumonia. Their current solution is to have two trained technicians examine every patient scan. Unfortunately, there are many times when two technicians are not available and the scans have to wait for multiple days to be interpreted.*
>
> *They hope to fix this issue by creating a model that can identify if a patient has pneumonia. They will have one technician and the model both examine the scans and make a prediction. If the two agree, then the diagnosis is accepted. If the two disagree, then a second technician is brought in to provide their analysis and break the tie.*

Discuss some of the ethical considerations of building and using this model. 

* Consider potential bias in the data that we have been provided. 
* Should this model err toward precision or accuracy?
* What are the implications of massively over-classifying patients as having pneumonia?
* What are the implications of massively under-classifying patients as having pneumonia?
* Are there any concerns with having only one technician make the initial call?

The questions above are prompts. Feel free to bring in other considerations you might have.

### **Student Solution**

> *Your answer goes here*

---

### Answer Key

As with other ethics questions, there are many right answers. We are looking for solid consideration of trade-offs.

Some example solutions for the prompt questions:

**Consider potential bias in the data we've been provided.**

> *This is data from one hospital. What if other hospitals choose to use the model? Would it generalize to non-local patients? Especially if the x-ray equipment is different across hospitals, this could present a problem.*

**Should this model err toward precision or accuracy?**

> *This is a tough call. It would be good to know if the hospital's current diagnoses tend to over or under-predict pneumonia. If we train the model with the same bias as the experts, then we'll simply carry forward that bias. If we train the model in the other direction, then we'll likely over-rely on a second manual reviewer.*


**What are the implications of massively over-classifying patients as having pneumonia?**

> *More incorrect cases that might lead to a second reviewer or an inappropriate treatment.*

**What are the implications of massively under-classifying patients as having pneumonia?**

> *Under-diagnosing, which might lead to additional reviews, not caring appropriately for patients, and a lack of trust in the model.*

**Are there any concerns with having only one technician make the initial call**

> *Yes, this puts a lot of faith in the quality of the model and a lot of responsibility in the hands of one technician. It would be best to continue with the two reviews until the model has been proven.*


---

## Modeling

In this section of the lab, you will build, train, test, and validate a model or models. The data is the ["Detecting Pneumonia" dataset](https://www.kaggle.com/rob717/pneumonia-dataset). You will build a binary classifier that determines if an x-ray image has pneumonia or not.

You'll need to:

* Download the dataset
* Perform EDA on the dataset
* Build a model that can classify the data
* Train the model using the training portion of the dataset. (It is already split out.)
* Test at least three different models or model configurations using the testing portion of the dataset. This step can include changing model types, adding and removing layers or nodes from a neural network, or any other parameter tuning that you find potentially useful. Score the model (using accuracy, precision, recall, F1, or some other relevant score(s)) for each configuration.
* After finding the "best" model and parameters, use the validation portion of the dataset to perform one final sanity check by scoring the model once more with the hold-out data.
* If you train a neural network (or other model that you can get epoch-per-epoch performance), graph that performance over each epoch.

Explain your work!

> *Note: You'll likely want to [enable GPU in this lab](https://colab.research.google.com/notebooks/gpu.ipynb) if it is not already enabled.*

If you get to a working solution you're happy with and want another challenge, you'll find pre-trained models on the [landing page of the dataset](https://www.kaggle.com/paultimothymooney/detecting-pneumonia-in-x-ray-images). Try to load one of those and see how it compares to your best model.

Use as many text and code cells as you need to for your solution.

### **Student Solution**

In [0]:
# Your code goes here

---

### Answer Key

After uploading our `kaggle.json` file into the colab, we move it into place.

In [0]:
! chmod 600 kaggle.json && (ls ~/.kaggle 2>/dev/null || mkdir ~/.kaggle) && mv kaggle.json ~/.kaggle/ && echo 'Done'

Then we download the data.

In [0]:
! kaggle datasets download paultimothymooney/chest-xray-pneumonia/kernels
! ls

And extract it.

In [0]:
import os
import zipfile

zipfile.ZipFile('chest-xray-pneumonia.zip').extractall()
os.listdir('./')

We search all of the directories to see what files we are dealing with. All end up being `jpeg` files. However, they are triplicated in the version of the data that we tested with.

In [0]:
import collections

for directory, _, files in os.walk('chest_xray'):
  extensions = collections.defaultdict(int)
  for file_name in files:
    extensions[file_name[file_name.rindex('.') + 1:]] += 1
  print(directory, extensions)

Set up some useful variables.

In [0]:
training_dir = 'chest_xray/train'
testing_dir = 'chest_xray/test'
validation_dir = 'chest_xray/val'

Let's take a look at a random image.

In [0]:
import cv2 as cv
import matplotlib.pyplot as plt

image_file = os.path.join(
    training_dir, 
    'PNEUMONIA', 
    os.listdir(os.path.join(training_dir, 'PNEUMONIA'))[0])

image = cv.imread(image_file)

plt.imshow(image)
plt.show()

And see the range of values.

In [0]:
image.min(), image.max()

Build and score our first model.

In [0]:
from sklearn.metrics import f1_score
from tensorflow.keras.preprocessing.image import ImageDataGenerator

import tensorflow as tf

model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(16, 3, padding='same', activation='relu', 
                           input_shape=(256, 256, 1)),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Conv2D(32, 3, padding='same', activation='relu'),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Conv2D(64, 3, padding='same', activation='relu'),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(512, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

print(model.summary())

train_data_gen = ImageDataGenerator(
    rescale=1/255.0,
).flow_from_directory(
    color_mode='grayscale',
    classes=['NORMAL', 'PNEUMONIA'],
    class_mode='binary',
    directory=training_dir,
)

history = model.fit(
    train_data_gen,
    epochs=200,
    steps_per_epoch=200,
    callbacks=[tf.keras.callbacks.EarlyStopping(monitor='loss')]
)

testing_image_iterator = tf.keras.preprocessing.image.DirectoryIterator(
    directory=testing_dir,
    color_mode='grayscale',
    classes=['NORMAL', 'PNEUMONIA'],
    class_mode='binary',
    image_data_generator=ImageDataGenerator(rescale=1/255.0))

predictions = model.predict(testing_image_iterator)
predicted_classes = [int(prediction > 0.5) for prediction in predictions.flatten()]

actual_classes = testing_image_iterator.classes

f1_score(actual_classes, predicted_classes)

Plot the training accuracy.

In [0]:
import matplotlib.pyplot as plt

plt.plot(list(range(len(history.history['accuracy']))),
         history.history['accuracy'])
plt.show()

And loss.

In [0]:
import matplotlib.pyplot as plt

plt.plot(list(range(len(history.history['loss']))),
         history.history['loss'])
plt.show()

Add a few more layers and score again.

In [0]:
from sklearn.metrics import f1_score
from tensorflow.keras.preprocessing.image import ImageDataGenerator

import tensorflow as tf

model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(16, 3, padding='same', activation='relu',
                           input_shape=(256, 256, 1)),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Conv2D(32, 3, padding='same', activation='relu'),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Conv2D(64, 3, padding='same', activation='relu'),
    tf.keras.layers.MaxPooling2D(),

    # New layers below
    tf.keras.layers.Conv2D(128, 3, padding='same', activation='relu'),
    tf.keras.layers.MaxPooling2D(),
    # New layers above

    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(512, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

print(model.summary())

train_data_gen = ImageDataGenerator(
    rescale=1/255.0,
).flow_from_directory(
    color_mode='grayscale',
    classes=['NORMAL', 'PNEUMONIA'],
    class_mode='binary',
    directory=training_dir,
)

history = model.fit(
    train_data_gen,
    epochs=200,
    steps_per_epoch=200,
    callbacks=[tf.keras.callbacks.EarlyStopping(monitor='loss')]
)

testing_image_iterator = tf.keras.preprocessing.image.DirectoryIterator(
    directory=testing_dir,
    color_mode='grayscale',
    classes=['NORMAL', 'PNEUMONIA'],
    class_mode='binary',
    image_data_generator=ImageDataGenerator(rescale=1/255.0))

predictions = model.predict(testing_image_iterator)
predicted_classes = [int(prediction > 0.5) for prediction in predictions.flatten()]

actual_classes = testing_image_iterator.classes

f1_score(actual_classes, predicted_classes)

Plot the training accuracy.

In [0]:
import matplotlib.pyplot as plt

plt.plot(list(range(len(history.history['accuracy']))),
         history.history['accuracy'])
plt.show()

And loss.

In [0]:
import matplotlib.pyplot as plt

plt.plot(list(range(len(history.history['loss']))),
         history.history['loss'])
plt.show()

Try using a different threshold for our classifier and score that.

In [0]:
from sklearn.metrics import f1_score
from tensorflow.keras.preprocessing.image import ImageDataGenerator

import tensorflow as tf

model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(16, 3, padding='same', activation='relu',
                           input_shape=(256, 256, 1)),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Conv2D(32, 3, padding='same', activation='relu'),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Conv2D(64, 3, padding='same', activation='relu'),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Conv2D(128, 3, padding='same', activation='relu'),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(512, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

print(model.summary())

train_data_gen = ImageDataGenerator(
    rescale=1/255.0,
).flow_from_directory(
    color_mode='grayscale',
    classes=['NORMAL', 'PNEUMONIA'],
    class_mode='binary',
    directory=training_dir,
)

history = model.fit(
    train_data_gen,
    epochs=200,
    steps_per_epoch=200,
    callbacks=[tf.keras.callbacks.EarlyStopping(monitor='loss')]
)

testing_image_iterator = tf.keras.preprocessing.image.DirectoryIterator(
    directory=testing_dir,
    color_mode='grayscale',
    classes=['NORMAL', 'PNEUMONIA'],
    class_mode='binary',
    image_data_generator=ImageDataGenerator(rescale=1/255.0))

predictions = model.predict(testing_image_iterator)

# Higher threshold below
predicted_classes = [int(prediction > 0.6) for prediction in predictions.flatten()]
# Higher threshold above

actual_classes = testing_image_iterator.classes

f1_score(actual_classes, predicted_classes)

Plot the training accuracy.

In [0]:
import matplotlib.pyplot as plt

plt.plot(list(range(len(history.history['accuracy']))),
         history.history['accuracy'])
plt.show()

And loss.

In [0]:
import matplotlib.pyplot as plt

plt.plot(list(range(len(history.history['loss']))),
         history.history['loss'])
plt.show()

Rescore our best model against the validation dataset.

In [0]:
from sklearn.metrics import f1_score
from tensorflow.keras.preprocessing.image import ImageDataGenerator

import tensorflow as tf

model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(16, 3, padding='same', activation='relu', 
                           input_shape=(256, 256, 1)),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Conv2D(32, 3, padding='same', activation='relu'),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Conv2D(64, 3, padding='same', activation='relu'),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Conv2D(128, 3, padding='same', activation='relu'),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(512, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

print(model.summary())

train_data_gen = ImageDataGenerator(
    rescale=1/255.0,
).flow_from_directory(
    color_mode='grayscale',
    classes=['NORMAL', 'PNEUMONIA'],
    class_mode='binary',
    directory=training_dir,
)

history = model.fit(
    train_data_gen,
    epochs=200,
    steps_per_epoch=200,
    callbacks=[tf.keras.callbacks.EarlyStopping(monitor='loss')]
)

# Using validation data below instead of test
validation_image_iterator = tf.keras.preprocessing.image.DirectoryIterator(
    directory=validation_dir,
    color_mode='grayscale',
    classes=['NORMAL', 'PNEUMONIA'],
    class_mode='binary',
    image_data_generator=ImageDataGenerator(rescale=1/255.0))

predictions = model.predict(validation_image_iterator)
predicted_classes = [int(prediction > 0.6) for prediction in predictions.flatten()]

actual_classes = validation_image_iterator.classes
# Using validation data above instead of test

f1_score(actual_classes, predicted_classes)

We then plot the training accuracy one last time.

In [0]:
import matplotlib.pyplot as plt

plt.plot(list(range(len(history.history['accuracy']))),
         history.history['accuracy'])
plt.show()

We plot the training loss one last time.

In [0]:
import matplotlib.pyplot as plt

plt.plot(list(range(len(history.history['loss']))),
         history.history['loss'])
plt.show()

---