## Build an Automated Diagnosis Model
Make Money with Machine Learning week 4 homework  
using Kaggle dataset [Chest X-Ray Images (Pneumonia)](https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia)

### Import required modules

In [0]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import tensorflow as tf
from tensorflow.keras.applications import inception_v3
from sklearn.metrics import classification_report

from pathlib import Path

### Prepare image paths

In [0]:
path = Path('/data/chest_xray')
train_dir = path/'train'
valid_dir = path/'val'
test_dir = path/'test'

In [5]:
for folder in ['train', 'val', 'test']:
    for img_class in ['NORMAL', 'PNEUMONIA']:
        print(f'{folder}/{img_class} images: ', len(list((path/f'{folder}/{img_class}').iterdir())))

train/NORMAL images:  1342
train/PNEUMONIA images:  3876
val/NORMAL images:  9
val/PNEUMONIA images:  9
test/NORMAL images:  234
test/PNEUMONIA images:  390


### Image Data Generator

In [20]:
img_size = 299

# Preprocess input images in the SAME way as the pre-trained model
train_data_gen = tf.keras.preprocessing.image.ImageDataGenerator(
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    rescale=1/255)

data_gen = tf.keras.preprocessing.image.ImageDataGenerator(
    rescale=1/255)

train_generator = train_data_gen.flow_from_directory(
    train_dir,
    target_size=(img_size, img_size),
    batch_size=32,
    class_mode='categorical')

valid_generator = data_gen.flow_from_directory(
    valid_dir,
    target_size=(img_size, img_size),
    shuffle=False,
    class_mode='categorical')

test_generator = data_gen.flow_from_directory(
    test_dir,
    target_size=(img_size, img_size),
    shuffle=False,
    class_mode='categorical')

Found 5216 images belonging to 2 classes.
Found 16 images belonging to 2 classes.
Found 624 images belonging to 2 classes.


### Build model from pretrained model Inception V3

In [0]:
base_model = inception_v3.InceptionV3(
    input_shape=(img_size, img_size, 3),
    include_top=False,
    weights='imagenet',
    pooling='avg')

model = tf.keras.Sequential([
    base_model,
    tf.keras.layers.Dense(units=2, activation='softmax')])

# Freeze the model
model.layers[0].trainable = False

model.compile(optimizer=tf.keras.optimizers.RMSprop(lr=0.001), loss='categorical_crossentropy', metrics=['accuracy'])

### Fit the model

In [22]:
history = model.fit_generator(
    train_generator,
    epochs=15,
    validation_data=valid_generator)
tf.keras.models.save_model(model, '/content/model_ep15.hdf')

Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


### Evaluate the model with test images

In [23]:
evaluations = model.evaluate_generator(test_generator)
for metric, result in zip(model.metrics_names, evaluations):
    print(f'{metric}: {result}')

loss: 0.5519412081688643
acc: 0.745192289352417


In [25]:
predictions = model.predict_generator(test_generator)
print(classification_report(test_generator.classes, np.argmax(predictions, axis=1)))

              precision    recall  f1-score   support

           0       0.69      0.57      0.63       234
           1       0.77      0.85      0.81       390

    accuracy                           0.75       624
   macro avg       0.73      0.71      0.72       624
weighted avg       0.74      0.75      0.74       624



### Conclusion

Using model, we can identify 85% of pneumonia cases in the test data (recall=0.85) and those we identify as pneumonia cases have fair probability of being pneumonia (precision=0.77).

# Diagnosis Examples

In [0]:
def diagnose(img):
    img = tf.io.read_file(img)
    img = tf.io.decode_jpeg(img, channels=3)
    img = tf.image.resize(img, [img_size, img_size])
    img = tf.expand_dims(img,0)
    img = img / 255
    prediction = model.predict(img, steps=1)
    print('Probabilities:')
    print(f'Normal: {prediction[0][0]}')
    print(f'Pneumonia: {prediction[0][1]}')

### From test images

In [0]:
normal_images = [str(img_path) for img_path in (test_dir/'NORMAL').iterdir()]
pneumo_images = [str(img_path) for img_path in (test_dir/'PNEUMONIA').iterdir()]

### Diagnosis Example 1

In [113]:
diagnose(normal_images[1])

Probabilities:
Normal: 0.7183953523635864
Pneumonia: 0.2816045880317688


### Diagnosis Example 2

In [115]:
diagnose(pneumo_images[1])

Probabilities:
Normal: 0.021977011114358902
Pneumonia: 0.9780229926109314
