<a href="https://colab.research.google.com/github/mwestt/BMI707-Project/blob/master/Playground.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Playground

Use this notebook to import functions and play around with specific model architectures. If you're running this in colab, we'll need to clone the repo this notebook is in, as well as the repo with the data in. 

**Make sure you comment out this next cell if running locally!**

In [1]:
# Clone our project repo so we've got our code
!git clone https://github.com/mwestt/BMI707-Project.git

# Clone covid-chestxray-dataset repo for data and metadata
!git clone https://github.com/ieee8023/covid-chestxray-dataset.git

# Move data and metadata to project repo and cd to it
!mv covid-chestxray-dataset/images/ BMI707-Project/
!mv covid-chestxray-dataset/metadata.csv BMI707-Project/
%cd BMI707-Project/

## Benchmark Classifier - No Augmentation

First, we'll need to import the necessary functions from the codebase

In [3]:
from load_data import load_metadata, load_data
from benchmark_classifier import train_benchmark

# Load the metadata csv
df_train, df_val = load_metadata('metadata.csv')
df_val.head()

Unnamed: 0,patientid,offset,sex,age,finding,survival,intubated,intubation_present,went_icu,needed_supplemental_O2,extubated,temperature,pO2_saturation,leukocyte_count,neutrophil_count,lymphocyte_count,view,modality,date,location,folder,filename,doi,url,license,clinical_notes,other_notes,Unnamed: 27
9,3,4.0,M,74.0,SARS,N,,,,,,,,,,,PA,X-ray,2004,"Mount Sinai Hospital, Toronto, Ontario, Canada",images,SARS-10.1148rg.242035193-g04mr34g0-Fig8a-day0....,10.1148/rg.242035193,https://pubs.rsna.org/doi/10.1148/rg.242035193,,SARS in a 74-year-old man who developed sympto...,,
316,178,1.0,F,72.0,COVID-19,N,Y,Y,Y,,N,,,,,,PA,X-ray,,"Hospital Universitario Doctor Peset, Valencia,...",images,16660_3_1.jpg,,https://www.eurorad.org/case/16660,CC BY-NC-SA 4.0,A 72-year-old woman admitted with acute respir...,,
183,96,0.0,M,60.0,"COVID-19, ARDS",,,,Y,,,,89.0,,,,PA,X-ray,2020,Spain,images,covid-19-pneumonia-rapidly-progressive-admissi...,,https://radiopaedia.org/cases/covid-19-pneumon...,CC BY-NC-SA,Fever and odynophagia. Trip to Italy 7 days ag...,"Case courtesy of Dr Edgar Lorente, Radiopaedia...",
273,154,10.0,,,COVID-19,,,,,,,,,,,,AP,X-ray,2020,,images,radiol.2020201160.fig2d.jpeg,10.1148/radiol.2020201160,https://pubs.rsna.org/doi/full/10.1148/radiol....,,,,
101,51,3.0,M,47.0,COVID-19,Y,,,,,,39.0,95.0,,,,PA,X-ray,"March 4, 2020",Italy,images,F4341CE7-73C9-45C6-99C8-8567A5484B63.jpeg,,https://www.sirm.org/2020/03/10/covid-19-caso-34/,,"Male patient, 47 years old. Remote history cha...","Credit to G.Patelli , F.Besana , S. Paganoni *...",


In [4]:
# Load training and validation images from metadata csv's
images_train, labels_train = load_data(df_train)
images_val, labels_val = load_data(df_val)

Here we'll define the Conv Net with the function `playground_model()`. Toy around with some of the parameters and see if you can get a decent-looking AUC (I'd say we're aiming for **0.7** but maybe we can do better). In general we want as simple a model as possible to be able to get perfect training accuracy, and see how good we can get our validation AUC. Some things to try:

Mainly: 
- **Add extra or remove existing `Conv2D` layers.**
- **Change number of filters in each `Conv2D` layer (first argument)**

*But also:*
- Add or remove `Dropout` layers (these layers are probably unnecessary here)
- Change max pooling layers to average pooling
- Smaller `Dense` layer in the final layer
- Global average pooling instead of the final `Dense` layer

**Make sure you're using a GPU Runtime!** Go to *Runtime > Change runtime type > Hardware accelerator > GPU*

In [8]:
from keras.models import Sequential, load_model
from keras.layers import Dense, Dropout, Flatten, Activation
from keras.layers.convolutional import Conv2D, MaxPooling2D
from sklearn.metrics import roc_auc_score


def playground_model():
    """Create Keras model using Sequential API.
    
    Returns
    -------
    model : Keras Sequential object
        Keras Sequential model following the specified architecture.
    """

    model = Sequential()
    model.add(Conv2D(32, (3, 3), padding='same', input_shape=(256, 256, 3)))
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    # model.add(Dropout(0.25))

    model.add(Conv2D(64, (3, 3), padding='same'))
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    # model.add(Dropout(0.25))

    model.add(Flatten())
    model.add(Dense(128))
    model.add(Activation('relu'))
    # model.add(Dropout(0.5))
    model.add(Dense(1, activation='sigmoid'))

    return model


# Assign the model defined above
model = playground_model()

# Train benchmark model - if you find good parameters, set `save=True`
trained_model = train_benchmark(model, images_train, images_val, labels_train, labels_val, 
                                epochs=12, batch_size=32, save=False)

# Evaluation metrics - pay attention to AUC!
print('Validation Labels:')
print(labels_val)

print('Predicted Labels:')
y_pred = trained_model.predict_classes(images_val, verbose=1).T[0]
print(y_pred)

print('Predicted Probabilities')
y_probs = trained_model.predict(images_val, verbose=1).T[0]
print(y_probs)

print('Prediction AUC')
print(roc_auc_score(labels_val, y_probs))


Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where

Train on 189 samples, validate on 94 samples
Epoch 1/12

KeyboardInterrupt: 

## Loading Model Workflow

Feel free to add any code cells following this, for example loading your saved models and then using them to make predictions. For example, here's one I made earlier! Just save your own model with `save=True` in `train_benchmark()` above,
and replace the filepath below with your own.

In [5]:
# # Load saved model and print summary information
# loaded_model = load_model('model_bench_2020_04_22_22_58_23.h5')    
# print(loaded_model.summary())

In [6]:
# print('Validation Labels:')
# print(labels_val)

# print('Predicted Labels:')
# y_pred = loaded_model.predict_classes(images_val, verbose=1).T[0]
# print(y_pred)

# print('Predicted Probabilities')
# y_probs = loaded_model.predict(images_val, verbose=1).T[0]
# # print(y_probs)

# print('Prediction AUC')
# print(roc_auc_score(labels_val, y_probs))

# Training with Data Augmentation

The following cell is much like the previous section, but we will now train on augmented data using Keras data augmentation, rather than on the images directly. There are a number of augmentation parameters to explore, take a look at the Keras documentation for the [`ImageDataGenerator` class](https://keras.io/preprocessing/image/) for potential arguments to try.

In [13]:
from load_data import data_generator_from_dataframe, ValidImageDataGenerator
from benchmark_classifier import train_augmented_benchmark
from keras.preprocessing.image import ImageDataGenerator


# Define ImageDataGenerator for training - tweak these arguments
train_datagen = ImageDataGenerator(rescale=1./255, horizontal_flip=True,
                                         width_shift_range=1)

train_generator = data_generator_from_dataframe(train_datagen, df_train,
                                                image_size=(256, 256), batch_size=16)

# Simple generator for validation
validation_generator = data_generator_from_dataframe(ValidImageDataGenerator(), df_val)

# Reinstantiate model
model = playground_model()

# Train model with augmented data
trained_model_aug = train_augmented_benchmark(model, train_generator, validation_generator,
                                          epochs=12, steps_per_epoch=6, validation_steps=10, 
                                          save=False)

Found 189 validated image filenames.
Found 94 validated image filenames.
Epoch 1/12
Epoch 2/12
Epoch 3/12
Epoch 4/12
Epoch 5/12
Epoch 6/12
Epoch 7/12
Epoch 8/12
Epoch 9/12
Epoch 10/12
Epoch 11/12
Epoch 12/12


In [14]:
# Evaluation metrics - once again, pay attention to AUC!
print('Validation Labels:')
print(labels_val)

print('Predicted Labels:')
y_pred = trained_model_aug.predict_classes(images_val, verbose=1).T[0]
print(y_pred)

print('Predicted Probabilities')
y_probs = trained_model_aug.predict(images_val, verbose=1).T[0]
print(y_probs)

print('Prediction AUC')
print(roc_auc_score(labels_val, y_probs))

Validation Labels:
[0 1 1 1 1 1 1 0 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1
 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 0 1 0 0 1 1 0 1 1 1
 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1]
Predicted Labels:
[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
Predicted Probabilities
[0.986323   0.776845   0.9189477  0.9927617  0.96600676 0.7223801
 0.9929855  0.94768214 0.9185946  0.9950176  0.8442918  0.71621764
 0.8862897  0.9164943  0.9870961  0.95292217 0.59593713 0.9930233
 0.9618919  0.97825754 0.9205772  0.9999999  0.9974457  0.9327251
 0.95959413 0.8863493  0.9438021  0.83269775 0.97989154 0.8145764
 0.8910817  0.97636956 0.94992113 0.86872077 0.80468774 0.9988829
 0.9713118  0.89961123 0.82265043 0.90378153 0.85487694 0.9791869
 0.9061697  0.7777758  0.89426786 0.87539923 0.9327805  0.8140588
 0.9949287  0.993062   0.8