## Assignment 2 - Convolutional Neural Networks (CNNs)

### Please include your name below
__Student name:__ Szu-Yeu, Hu

### Please cite the reference(s) you looked up and the name of students you collaborated with
__Reference(s) and collaborator(s):__
Ref: https://en.wikipedia.org/wiki/Chest_radiograph
Collaborator: Chen, Dong


### Data and Preliminaries
In this assignment, we will explore the NIH CXR8 database (https://www.nih.gov/news-events/news-releases/nih-clinical-center-provides-one-largest-publicly-available-chest-x-ray-datasets-scientific-community). This chest X-ray database is the basis of a series highly-cited papers/preprint papers since 2017. You will examine the images in this dataset and build your own convolutional neural networks to classify the major diagnoses.

## Question 1: Data summary (10 points)
Deep learning is not immune from the 'garbage in, garbage out' principle. Before digging into the data, it is recommended to get a sense of how the data was generated, understand the assumptions of the data, and review the data quality. We will ask you to answer some basic questions on the NIH CXR8 dataset. Please visit the website of the NIH CXR8 database, download the metadata (Data_Entry_2017.csv) and answer the following questions.


#### Question 1.1 (5 points)
What is the file format of images in the NIH CXR8 database? What is the standard format for radiology image storage and transmission? How many images are there in the database? How many diagnostic categories are there in the database? What are they? How many images have more than one diagnosis?

__Your Answer:__

- In NIH CXR8 database, the images are in PNG format in 1024 by 1024 resolution. 

- The standard format for radiology storage adn transmission is in DICOM(Digital Imaging and Communications in Medicine )format.

- There are totally 112,120 images in the database. 

- There are 14 diagnostic categories and 1 category for "No Finding"

- The 14 diagnostic categories are Atelectasis, Consolidation, Infiltration, Pneumothorax, Edema, Emphysema, Fibrosis, Effusion, Pneumonia, Pleural_thickening, Cardiomegaly, Nodule, Mass and Hernia

- There are 20,796 images have more than one diagnosis

#### Question 1.2 (5 points)
How many patients are there in total? How many patients contributed more than one image? Which patient ID contributed the most images and how many did he or she contribute?

__Your Answer:__

- There are totally 30,805 patients in the database.

- Of all the patients, there are 13,302 of them contributed to more than one image

- Patient ID 10007 contribute the most images in the database. This patients has 187 images totally.

## Question 2: Check the images (10 points)
In the following questions, you will be asked to examine the images in the NIH CXR8 dataset. The images for Question 2.1 and 2.2 could be found at https://www.dropbox.com/sh/2h068ge9xv1g27u/AAAXVq8VYXF6HRlHvzvjy-e6a?dl=0. Feel free to collaborate with other students or consult any references.

#### Question 2.1 (2 points)
What is the NIH-labeled diagnosis of image `00001583_014.png`? What did you see in this image? Word limit: 100 words.

__Your Answer:__
- The diagnosis of the image is pneumothorax. 
- The image is in PA view. There are right side diaphgram elevation, indicating volumn loss of the right lung. A tube is seen at the right side, could be chest tube or central venous catheter. There is no obvious lung margin seen in this image, the pneumothorax might be treated already.

#### Question 2.2 (3 points)
What is the NIH-labeled diagnosis of image `00000019_000.png`? What did you see in this image? Word limit: 100 words.

__Your Answer:__

- The diagnosis of the image is Atelectasis|Effusion|Pleural_Thickening.
- The image in in PA view. There is a medical devices at the right side of the patient. The right side diaphgram is elevated, suggesting right lung collapse or atelectasis. Left costovertebral angle is mildly blunt, indicating pleural effusion at left side. 

#### Question 2.3 (5 points)
What is "View Position?" How does it affect the resulting chest X-ray images visually? Word limit: 100 words.

__Your Answer:__

View position is the relative orientation of the body and the direction of the x-ray beam. The most common views are posteroanterior, anteroposterior, and lateral. In the images above, it's the standard PA view. The X-reay beam enters through the posterior aspect of the chest, and exits out of the anterior aspect where the beam is detected. 

Differnt view will affect the relative size of heart and lung. The heart, being an anterior structure within the chest, will look larger in AP view.

## Question 3: Build a custom convolutional neural network (25 points)
For this question, we ask you to build a multi-layer convolutional neural network to classify a subset of cardiomegaly images from normal ones. Please download the training set and the validation set here (https://www.dropbox.com/sh/ojiw79q8786ua4x/AAAtaJVKEdv91Zybpi-fAfMsa?dl=0). Please DO NOT use any other image from NIH CXR8 or from other databases for this question. Feel free to use keras or any other high-level deep learning packages to classify the images.

Design a convolutional neural network with at least two convolution layers, at least one max-pooling layer, and at least one dropout layer. Although you should explore various combinations of hyperparameters, we will grade this question based on the accuracy of the implementation, not the performance of the network.

What is your design? What binary loss/accuracy did you get in the training and validation set? Please include your code in the assignment submission.

In [1]:
## Your code goes here
from PIL import Image
import numpy as np
import pandas as pd
import os
from sklearn.preprocessing import LabelBinarizer


## Read in the labels
train_labels = pd.read_csv('train.csv?dl=1',  header=None, index_col=0)
val_labels = pd.read_csv('val.csv?dl=1',  header=None, index_col=0)

## We will resize the images to make them smaller
image_size = (224,224)

## Read in the training images
train_images = []
train_dir = './train/'
train_files = os.listdir(train_dir)
for f in train_files:
    img = Image.open(train_dir + f)
    img = img.resize(image_size)
    img_arr = np.array(img)
    train_images.append(img_arr)

train_X = np.array(train_images)

## Read in the val images
val_images = []
val_dir = './val/'
val_files = os.listdir(val_dir)
for f in val_files:
    img = Image.open(val_dir + f)
    img = img.resize(image_size)
    img_arr = np.array(img)
    val_images.append(img_arr)

train_X = np.array(train_images)
val_X = np.array(val_images)

# Labels processing
train_labels = train_labels.reindex(train_files)
val_labels = val_labels.reindex(val_files)

label_transformer = LabelBinarizer()
train_y = label_transformer.fit_transform(train_labels)
val_y = label_transformer.transform(val_labels)


# Data Preprocessing
train_X = train_X.astype(np.float32)
val_X = val_X.astype(np.float32)

train_X /= 255.
val_X /= 255.

train_X = np.expand_dims(train_X, 3)
val_X = np.expand_dims(val_X, 3)

## Convolutional Neural Network Architecture

In [4]:
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras.optimizers import SGD, adam

image_size = (96,96)
model = Sequential()
model.add(Conv2D(32, kernel_size=(3,3),strides=(1,1), padding="same",input_shape = (image_size[0],image_size[1],1)))
model.add(Activation('relu'))
model.add(Conv2D(32, (3, 3), strides=(1,1), padding="same"))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))


model.add(Conv2D(64, kernel_size=(3,3),strides=(1,1), padding="same"))
model.add(Activation('relu'))
model.add(Conv2D(64, (3, 3), strides=(1,1), padding="same"))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(1, activation="sigmoid"))

model.compile(loss="binary_crossentropy",
              optimizer= 'adam',
              metrics=['accuracy'])

batch_size = 8
epochs = 20

In [None]:
model.fit(train_X, train_y,
          batch_size=batch_size,
          epochs=epochs,
          verbose=1,
          validation_data=(val_X, val_y),
          )


Train on 1750 samples, validate on 437 samples
Epoch 1/20
 - 8s - loss: 0.6961 - acc: 0.5211 - val_loss: 0.6935 - val_acc: 0.5011
Epoch 2/20
 - 7s - loss: 0.6866 - acc: 0.5514 - val_loss: 0.6740 - val_acc: 0.6041
Epoch 3/20
 - 7s - loss: 0.6342 - acc: 0.6383 - val_loss: 0.6546 - val_acc: 0.6201
Epoch 4/20
 - 7s - loss: 0.6135 - acc: 0.6857 - val_loss: 0.6371 - val_acc: 0.6568
Epoch 5/20
 - 7s - loss: 0.5799 - acc: 0.7034 - val_loss: 0.6172 - val_acc: 0.6499
Epoch 6/20
 - 7s - loss: 0.5433 - acc: 0.7269 - val_loss: 0.6410 - val_acc: 0.6636
Epoch 7/20
 - 7s - loss: 0.5374 - acc: 0.7406 - val_loss: 0.5746 - val_acc: 0.6911
Epoch 8/20
 - 7s - loss: 0.5235 - acc: 0.7451 - val_loss: 0.6056 - val_acc: 0.6705
Epoch 9/20
 - 7s - loss: 0.4727 - acc: 0.7731 - val_loss: 0.5955 - val_acc: 0.7071
Epoch 10/20
 - 7s - loss: 0.4176 - acc: 0.8109 - val_loss: 0.7464 - val_acc: 0.7071
Epoch 11/20
 - 7s - loss: 0.3606 - acc: 0.8383 - val_loss: 0.6861 - val_acc: 0.7025
Epoch 12/20
 - 7s - loss: 0.3212 - acc: 0.8669 - val_loss: 0.7526 - val_acc: 0.7002
Epoch 13/20
 - 7s - loss: 0.2378 - acc: 0.9006 - val_loss: 0.9940 - val_acc: 0.6545
Epoch 14/20
 - 7s - loss: 0.1805 - acc: 0.9269 - val_loss: 1.1154 - val_acc: 0.6842
Epoch 15/20
 - 7s - loss: 0.1567 - acc: 0.9366 - val_loss: 1.0583 - val_acc: 0.6636
Epoch 16/20
 - 7s - loss: 0.1329 - acc: 0.9497 - val_loss: 1.1874 - val_acc: 0.6659
Epoch 17/20
 - 7s - loss: 0.0802 - acc: 0.9731 - val_loss: 1.3704 - val_acc: 0.7048
Epoch 18/20
 - 7s - loss: 0.0684 - acc: 0.9754 - val_loss: 1.8591 - val_acc: 0.6568
Epoch 19/20
 - 7s - loss: 0.0765 - acc: 0.9697 - val_loss: 1.5689 - val_acc: 0.6842
Epoch 20/20
 - 7s - loss: 0.0585 - acc: 0.9817 - val_loss: 1.6600 - val_acc: 0.6659


In [5]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_5 (Conv2D)            (None, 96, 96, 32)        320       
_________________________________________________________________
activation_5 (Activation)    (None, 96, 96, 32)        0         
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 96, 96, 32)        9248      
_________________________________________________________________
activation_6 (Activation)    (None, 96, 96, 32)        0         
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 48, 48, 32)        0         
_________________________________________________________________
conv2d_7 (Conv2D)            (None, 48, 48, 64)        18496     
_________________________________________________________________
activation_7 (Activation)    (None, 48, 48, 64)        0         
__________

__Your Answer:__<br>
Model summary: _see the above cell_<br>
Binary loss in the training set: 0.0585<br>
Accuracy of the training set: 0.9817<br>
Binary loss in the validation set: __best__: 0.5746, __last__: 1.6600<br>
Accuracy of the validation set: __best__:0.7048, __last__:0.6659<br>

## Question 4: Transfer learning: Using the VGGNet (16 layers) architecture (20 points)
For this question, we ask you to employ VGGNet, a convolutional neural network built for ImageNet, to classify the same subset of cardiomegaly images from normal ones (https://www.dropbox.com/sh/ojiw79q8786ua4x/AAAtaJVKEdv91Zybpi-fAfMsa?dl=0). We encourage you to take a look at the documentation for keras.applications (https://keras.io/applications/) and reuse their modules. Please DO NOT use any other images from NIH CXR8 or from other databases for this question. Although you should explore various combinations of hyperparameters, we will grade this question based on the accuracy of the implementation, not the performance of the network.

### Question 4.1 (10 points)
What is your best validation accuracy of fine-tuning a 16-layer VGGNet WITHOUT ImageNet weights? In your model with the lowest validation loss, what are the hyperparameters? What is the validation loss? What is the training loss/accuracy? Please include your code in the assignment submission.

In [2]:
## Your code goes here
from PIL import Image
import numpy as np
import pandas as pd
import os

from skimage.color import gray2rgb
from sklearn.preprocessing import LabelBinarizer
from keras.applications.vgg16 import VGG16, preprocess_input
from keras.layers import Dense, GlobalAveragePooling2D
from keras.models import Model

## Read in the labels
train_labels = pd.read_csv('train.csv?dl=1',  header=None, index_col=0)
val_labels = pd.read_csv('val.csv?dl=1',  header=None, index_col=0)

## We will resize the images to make them smaller
image_size = (224,224)

## Read in the training images
train_images = []
train_dir = './train/'
train_files = os.listdir(train_dir)
for f in train_files:
    img = Image.open(train_dir + f)
    img = img.resize(image_size)
    img_arr = np.array(img)
    train_images.append(img_arr)

train_X = np.array(train_images)

## Read in the val images
val_images = []
val_dir = './val/'
val_files = os.listdir(val_dir)
for f in val_files:
    img = Image.open(val_dir + f)
    img = img.resize(image_size)
    img_arr = np.array(img)
    val_images.append(img_arr)

train_X = np.array(train_images)
val_X = np.array(val_images)


## Now we are going to reorder the labels just to make sure they line up with 
## with the order of the files which we just read in. Always better to be sure!
train_labels = train_labels.reindex(train_files)
val_labels = val_labels.reindex(val_files)
label_transformer = LabelBinarizer()
train_y = label_transformer.fit_transform(train_labels)
val_y = label_transformer.transform(val_labels)


train_X = gray2rgb(train_X)
val_X = gray2rgb(val_X)

train_X = train_X.astype(np.float32)
val_X = val_X.astype(np.float32)

train_X = preprocess_input(train_X)
val_X = preprocess_input(val_X)


In [None]:
base_model = VGG16(weights=None, include_top=False, input_shape=(224,224,3))

x = base_model.output
# add a 2D global average pooling layer
x = GlobalAveragePooling2D()(x)

# add a layer for binary classification
predictions = Dense(1, activation='sigmoid')(x)

# define the model to be trained
model = Model(inputs=base_model.input, outputs=predictions)

for layer in base_model.layers:
    layer.trainable = False
    
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])



Train on 1750 samples, validate on 437 samples
Epoch 1/10
 - 25s - loss: 0.6955 - acc: 0.4903 - val_loss: 0.6942 - val_acc: 0.4943
Epoch 2/10
 - 21s - loss: 0.6950 - acc: 0.4771 - val_loss: 0.6932 - val_acc: 0.4897
Epoch 3/10
 - 21s - loss: 0.6943 - acc: 0.4920 - val_loss: 0.6938 - val_acc: 0.5011
Epoch 4/10
 - 21s - loss: 0.6951 - acc: 0.5131 - val_loss: 0.6930 - val_acc: 0.5011
Epoch 5/10
 - 21s - loss: 0.6921 - acc: 0.5217 - val_loss: 0.7016 - val_acc: 0.4989
Epoch 6/10
 - 21s - loss: 0.6951 - acc: 0.5051 - val_loss: 0.6926 - val_acc: 0.5080
Epoch 7/10
 - 21s - loss: 0.6939 - acc: 0.5006 - val_loss: 0.6929 - val_acc: 0.5011
Epoch 8/10
 - 21s - loss: 0.6932 - acc: 0.4914 - val_loss: 0.6926 - val_acc: 0.5011
Epoch 9/10
 - 21s - loss: 0.6928 - acc: 0.4983 - val_loss: 0.6923 - val_acc: 0.5561
Epoch 10/10
 - 21s - loss: 0.6920 - acc: 0.5137 - val_loss: 0.6940 - val_acc: 0.4989


__Your Answer:__<br>
The loss function you used: binary_crossentropy<br>
Binary loss in the training set: 0.6920<br>
Accuracy of the training set: __best__: 0.5217  __last__:0.5137<br>
Binary loss in the validation set: 0.6923<br>
Accuracy of the validation set: __best:__0.5561, __last__:0.4989<br>

### Question 4.2 (10 points)
What is your best validation accuracy of fine-tuning a 16-layer VGGNet WITH ImageNet weights? In your model with the lowest validation loss, what are the hyperparameters? What is the validation loss? What is the training loss/accuracy? Please include your code in the assignment submission.

In [3]:
base_model = VGG16(weights='imagenet', include_top=False, input_shape=(224,224,3))

x = base_model.output
# add a 2D global average pooling layer
x = GlobalAveragePooling2D()(x)

# add a layer for binary classification
x = Dense(128, activation = 'relu')(x)
x = Dropout(0.5)(x)
predictions = Dense(1, activation='sigmoid')(x)

# define the model to be trained
model = Model(inputs=base_model.input, outputs=predictions)

for layer in base_model.layers:
    layer.trainable = False
    
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
batch_size = 32
model.fit(train_X, train_y,
          batch_size=batch_size,
          epochs=10,
          verbose=1,
          validation_data=(val_X, val_y),
          )

Train on 1750 samples, validate on 437 samples
Epoch 1/10
 - 21s - loss: 1.1292 - acc: 0.5303 - val_loss: 0.9514 - val_acc: 0.5538
Epoch 2/10
 - 21s - loss: 0.8349 - acc: 0.5891 - val_loss: 0.8223 - val_acc: 0.6087
Epoch 3/10
 - 21s - loss: 0.7409 - acc: 0.6251 - val_loss: 0.7683 - val_acc: 0.6156
Epoch 4/10
 - 21s - loss: 0.6756 - acc: 0.6703 - val_loss: 0.7318 - val_acc: 0.6156
Epoch 5/10
 - 21s - loss: 0.6481 - acc: 0.6766 - val_loss: 0.7117 - val_acc: 0.6384
Epoch 6/10
 - 21s - loss: 0.6027 - acc: 0.7103 - val_loss: 0.6859 - val_acc: 0.6682
Epoch 7/10
 - 21s - loss: 0.5741 - acc: 0.7194 - val_loss: 0.6741 - val_acc: 0.6636
Epoch 8/10
 - 21s - loss: 0.5599 - acc: 0.7257 - val_loss: 0.6559 - val_acc: 0.6888
Epoch 9/10
 - 21s - loss: 0.5393 - acc: 0.7451 - val_loss: 0.6467 - val_acc: 0.6773
Epoch 10/10
 - 21s - loss: 0.5328 - acc: 0.7463 - val_loss: 0.6630 - val_acc: 0.6728


__Your Answer:__<br>
The loss function you used: binary_crossentrophy<br>
Binary loss in the training set: 0.5328<br>
Accuracy of the training set: 0.7463<br>
Binary loss in the validation set:0.6630<br>
Accuracy of the validation set: 0.6728<br>

## Question 5: Multiclass classification and the BMI707 Kaggle contest (30 points)
In this question, we will build multiclass classifiers to distinguish different types of lung diseases using the NIHCXR8 data.
Please download the training set and the validation set from the BMI707 Kaggle contest website (https://www.kaggle.com/c/bmi707-assignment-2-q5/data ). Please note that this dataset is different from the one we used in Question 3 and 4. Please DO NOT use any additional dataset (including those from NIH CXR8) to train or augment your models. Feel free to use any (ImageNet or any custom) architecture to classify all available classes. In your model with the lowest validation loss, what are the hyperparameters? What is the validation loss/accuracy? What is the training loss/accuracy? Please participate in the BMI707 internal Kaggle contest (https://www.kaggle.com/c/bmi707-assignment-2-q5) and compare your results with others there. An ensemble of models is allowed. The top 5 submissions with the highest accuracy on the private test set (testPrivate.tar) will receive bonus points. Please include your code in the assignment submission.

### Training

In [4]:
## Your code goes here
from PIL import Image
import numpy as np
import pandas as pd
import os
import h5py
import matplotlib.pyplot as plt

from sklearn.preprocessing import LabelBinarizer
from skimage.color import gray2rgb

from keras.preprocessing.image import ImageDataGenerator
from keras.applications.vgg16 import VGG16, preprocess_input
from keras.applications.vgg19 import VGG19
from keras.applications.inception_v3 import InceptionV3
from keras.applications.densenet import DenseNet121
from keras.callbacks import ModelCheckpoint
from keras.layers import Dense, GlobalAveragePooling2D, Dropout, Flatten
from keras.models import Model
from keras.optimizers import Adam, SGD, RMSprop
from keras import regularizers

# Data Loading ...
train_labels = pd.read_csv('train.csv',  header=None, index_col=0)

image_size = (224,224)
## Read in the training images
train_images = []
train_dir = './train/'
train_files = os.listdir(train_dir)
for f in train_files:
    img = Image.open(train_dir + f)
    img = img.resize(image_size)
    img_arr = np.array(img)
    train_images.append(img_arr)

train_X = np.array(train_images)

train_labels = train_labels.reindex(train_files)

label_transformer = LabelBinarizer()
train_y = label_transformer.fit_transform(train_labels)



# Data Preprocessing ...
train_X = train_X.astype(np.float32)
#train_X /= 255.
train_X = gray2rgb(train_X)


# Data Augmentation
if augmentation:
    print("Data Augmentation...")

    for i in range(10):
        new_X = load_hdf5("augment_data/batch"+str(i)+"_aug_X.h5")
        new_y = load_hdf5("augment_data/batch"+str(i)+"_aug_y.h5")

        train_X = np.concatenate((train_X, new_X), axis = 0)
        train_y = np.concatenate((train_y, new_y), axis = 0)

    print("Finish augmentation")
    print("X training Shape: " + str(train_X.shape))
    print("y training Shape: " + str(train_y.shape))

train_X = preprocess_input(train_X)


# Fit the model

base_model = VGG16(weights='imagenet', include_top=False, input_shape = train_X.shape[1:])
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(256, activation = 'relu')(x)
x = Dropout(0.5)(x)
x = Dense(128, activation = 'relu')(x)

predictions = Dense(15, activation='softmax', kernel_regularizer=regularizers.l2(0.01))(x)
# define the model to be trained

model = Model(inputs=base_model.input, outputs=predictions)
print(model.summary())

if trainablelayer == 0:
    for layer in base_model.layers:
        layer.trainable = False
    
model.compile(optimizer= 'adam', loss='categorical_crossentropy', metrics=['accuracy'])
checkpointer = ModelCheckpoint( 'result/' + experiment + '_best_weights.h5', verbose=1, monitor='val_loss', mode='auto', save_best_only=True)

batch_size = 32
model.fit(train_X, train_y,
          batch_size=batch_size,
          epochs=epochs,
          verbose=2,
          validation_split = 0.2,
          callbacks=[checkpointer]
          )


### Testing

In [None]:
test_images = []
test_dir = './val/'
test_files = os.listdir(test_dir)
for f in test_files:
    img = Image.open(test_dir + f)
    img = img.resize(image_size)
    img_arr = np.array(img)
    test_images.append(img_arr)

test_X = np.array(test_images)

# Data Preprocessing ...
test_X = test_X.astype(np.float32)
#test_X /= 255.
test_X = gray2rgb(test_X)
test_X = preprocess_input(test_X)

prediction = model.predict(test_X)

write_hdf5(prediction,"val_prediction.h5")

sub = {"Id": os.listdir(test_dir), "Category":prediction.argmax(1)}
val_submission = pd.DataFrame(sub, columns=sub.keys())


test_images = []
test_dir = './test/'
test_files = os.listdir(test_dir)
for f in test_files:
    img = Image.open(test_dir + f)
    img = img.resize(image_size)
    img_arr = np.array(img)
    test_images.append(img_arr)

test_X = np.array(test_images)

# Data Preprocessing ...
test_X = test_X.astype(np.float32)
#test_X /= 255.
test_X = gray2rgb(test_X)
test_X = preprocess_input(test_X)

prediction = model.predict(test_X)

write_hdf5(prediction,"val_prediction.h5")

sub = {"Id": os.listdir(test_dir), "Category":prediction.argmax(1)}
test_submission = pd.DataFrame(sub, columns=sub.keys())

submission = pd.concat([val_submission,test_submission])
submission = submission.sort_values("Id")
submission.to_csv("submission.csv", index = False)

In [11]:
print(model.summary())

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_4 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
__________

__Your Answer:__<br>
Summary of model(s): VGG16 architecture with global average pooling and 3 dense layer. _(Detail model shown as above cell)_<br>
The loss function you used: categorical_crossentrophy<br>
Loss in the training set: 1.0089<br>
Accuracy of the training set: 0.7119<br>
Loss in the validation set: 1.1024<br>
Accuracy of the validation set: 0.6951<br>
__Remember to participate in the BMI707 internal Kaggle contest (https://www.kaggle.com/c/bmi707-assignment-2-q5)__

## Question 6: Limitations of the NIH CXR8 dataset and the road ahead (5 points)

### Question 6.1 (3 points)
Please list three limitations of models trained from this dataset. Word limit: 150 words.

__Your Answer:__

- 1. The labels are imbalanced. More than 60% of the labels are the 11th class, which is "no finding" in the dataset. Therefore the models tend to predict all the outcome to be this class. Even if the model can achieve about 70% of accuracy, it still perform bad on detecting abnormal findings. 

- 2. The model is a blackbox, so it's hard to know how the model dectect disease. For example, we cannot not tell the model actually detect pneumothorax from the vanished lung marking, or just simply from the presence of chest tube.

- 3. There are only 15 classes in this dataset. But in real life, there should be more different conditions. The model trained from this dataset can not tell the disease that are not presented. 

### Question 6.2 (2 points)
What are the potential roadblocks to implementing automated chest X-ray film reader in the clinical settings? Word limit: 100 words.

__Your Answer:__
In clinical setting, we need large amount of well-labeled data to train the model. However, in real-life the diagnosis might be different between different radiologists. Depending on the data source we collected, we could accidentally include the bias of the radiologist itself. Even we can reach high accuracy, it would still be hard to reach consensus between the radiologists and an artificial intelligence.

Also, it is difficult to obtain the images of rare but potentially fatal diseases like cancer, and would make it harder for the model to recognize the condition. However, it's crucial for an automated CXR reader to detect such disease so that we won't miss the important timing of treatment.
