<a href="https://colab.research.google.com/github/mwestt/BMI707-Project/blob/master/Playground.ipynb" target="_blank">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Playground

Use this notebook to import functions and play around with specific model architectures. If you're running this in colab, we'll need to clone the repo this notebook is in, as well as the repo with the data in. 

**Make sure you comment out this next cell if running locally!**

In [1]:
# Clone our project repo so we've got our code
!git clone https://github.com/mwestt/BMI707-Project.git

# Clone covid-chestxray-dataset repo for data and metadata
!git clone https://github.com/ieee8023/covid-chestxray-dataset.git

# Move data and metadata to project repo and cd to it
!mv covid-chestxray-dataset/images/ BMI707-Project/
!mv covid-chestxray-dataset/metadata.csv BMI707-Project/
%cd BMI707-Project/

## Benchmark Classifier - No Augmentation

First, we'll need to import the necessary functions from the codebase

In [2]:
from load_data import load_metadata, load_data
from benchmark_classifier import train_benchmark

# Load the metadata csv
df_train, df_val = load_metadata('metadata.csv')
df_val.head()

Using TensorFlow backend.


Unnamed: 0,patientid,offset,sex,age,finding,survival,intubated,intubation_present,went_icu,needed_supplemental_O2,...,date,location,folder,filename,doi,url,license,clinical_notes,other_notes,Unnamed: 27
33,16,5.0,F,59.0,COVID-19,Y,,,,,...,2020,"Sichuan Provincial People?? Hospital, Chengdu,...",images,ryct.2020200028.fig1a.jpeg,10.1148/ryct.2020200028,https://pubs.rsna.org/doi/full/10.1148/ryct.20...,,Chest radiograph in a patient with COVID-19 in...,,
209,112,0.0,,,COVID-19,Y,,,,,...,2020,,images,1.CXRCTThoraximagesofCOVID-19fromSingapore.pdf...,,https://www.ams.edu.sg/colleges/radiologists/c...,,Serial chest radiographs of patient who presen...,Credit to College of Radiologists Singapore an...,
260,143,8.0,M,65.0,COVID-19,,,,,,...,2020,,images,covid-19-pneumonia-49-day8.jpg,,https://radiopaedia.org/cases/covid-19-pneumon...,CC BY-NC-SA,"Four days following admission, the patient dev...","Case courtesy of Dr. Mohammad Al-Tibi, Radiopa...",
162,87,35.0,F,40.0,Streptococcus,Y,,,,,...,2011,,images,pneumococcal-pneumonia-day35.jpg,,https://radiopaedia.org/cases/pneumococcal-pne...,CC BY-NC-SA,The dense lobar consolidation at admission sho...,"Case courtesy of Dr Jeremy Jones, Radiopaedia....",
336,187,10.0,M,50.0,COVID-19,N,Y,N,,Y,...,2020,China,images,yxppt-2020-02-19_00-51-27_287214-day10.jpg,10.1016/S2213-2600(20)30076-X,http://www.yxppt.com/html/20200219085511.html,,50-year-old man was sent to the fever clinic f...,"Credit to Zhe Xu *, Lei Shi *, Yijin Wang *, J...",


In [3]:
# Load training and validation images from metadata csv's
images_train, labels_train = load_data(df_train)
images_val, labels_val = load_data(df_val)

Here we'll define the Conv Net with the function `playground_model()`. Toy around with some of the parameters and see if you can get a decent-looking AUC (I'd say we're aiming for **0.7** but maybe we can do better). In general we want as simple a model as possible to be able to get perfect training accuracy, and see how good we can get our validation AUC. Some things to try:

Mainly: 
- **Add extra or remove existing `Conv2D` layers.**
- **Change number of filters in each `Conv2D` layer (first argument)**

*But also:*
- Add or remove `Dropout` layers (these layers are probably unnecessary here)
- Change max pooling layers to average pooling
- Smaller `Dense` layer in the final layer
- Global average pooling instead of the final `Dense` layer

**Make sure you're using a GPU Runtime!** Go to *Runtime > Change runtime type > Hardware accelerator > GPU*

In [4]:
from keras.models import Sequential, load_model
from keras.layers import Dense, Dropout, Flatten, Activation
from keras.layers.convolutional import Conv2D, MaxPooling2D
from sklearn.metrics import roc_auc_score


def playground_model():
    """Create Keras model using Sequential API.
    
    Returns
    -------
    model : Keras Sequential object
        Keras Sequential model following the specified architecture.
    """

    model = Sequential()
    model.add(Conv2D(32, (3, 3), padding='same', input_shape=(256, 256, 3)))
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    # model.add(Dropout(0.25))

    model.add(Conv2D(64, (3, 3), padding='same'))
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    # model.add(Dropout(0.25))

    model.add(Flatten())
    model.add(Dense(128))
    model.add(Activation('relu'))
    # model.add(Dropout(0.5))
    model.add(Dense(1, activation='sigmoid'))

    return model


# Assign the model defined above
model = playground_model()

# Train benchmark model - if you find good parameters, set `save=True`
trained_model = train_benchmark(model, images_train, images_val, labels_train, labels_val, 
                                epochs=12, batch_size=32, save=False)

# Evaluation metrics - pay attention to AUC!
print('Validation Labels:')
print(labels_val)

print('Predicted Labels:')
y_pred = trained_model.predict_classes(images_val, verbose=1).T[0]
print(y_pred)

print('Predicted Probabilities')
y_probs = trained_model.predict(images_val, verbose=1).T[0]
print(y_probs)

print('Prediction AUC')
print(roc_auc_score(labels_val, y_probs))


Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where

Train on 183 samples, validate on 91 samples
Epoch 1/12
Epoch 2/12
Epoch 3/12
Epoch 4/12
Epoch 5/12
Epoch 6/12
Epoch 7/12
Epoch 8/12
Epoch 9/12
Epoch 10/12
Epoch 11/12
Epoch 12/12
Validation Labels:
[1 1 1 0 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 0 1 1 1 1 0 1 1 1 1 1 0 1 1
 0 0 1 0 0 1 1 1 1 0 1 0 1 1 0 0 0 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1
 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1]
Predicted Labels:
[0 1 1 1 0 0 1 1 1 1 1 0 0 1 1 1 1 1 1 1 0 0 1 0 1 1 1 0 1 0 1 1 1 1 1 1 1
 0 0 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 1 1 0 0 1 0 0 0 0 0 0 1 1 1 1 1
 0 0 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1]
Predicted Probabilities
[0.39085233 0.985126   0.7752716  0.62338203 0.44254592 0.35822305
 0.9752461  0.51564133 0.9162035  0.9782864  0.7594662  0.4193521
 0.1011124  1.         0.97847795 0.7389192  0.66603684 1.
 0.98930526 0.98742723 0.11928573 0.45681313 0.7416751  0.2110897
 0.99999523 0.99999213 0.70075095

## Loading Model Workflow

Feel free to add any code cells following this, for example loading your saved models and then using them to make predictions. For example, here's one I made earlier! Just save your own model with `save=True` in `train_benchmark()` above,
and replace the filepath below with your own.

In [5]:
# # Load saved model and print summary information
# loaded_model = load_model('model_bench_2020_04_22_22_58_23.h5')    
# print(loaded_model.summary())

In [6]:
# print('Validation Labels:')
# print(labels_val)

# print('Predicted Labels:')
# y_pred = loaded_model.predict_classes(images_val, verbose=1).T[0]
# print(y_pred)

# print('Predicted Probabilities')
# y_probs = loaded_model.predict(images_val, verbose=1).T[0]
# # print(y_probs)

# print('Prediction AUC')
# print(roc_auc_score(labels_val, y_probs))