# ResNet50 Classification Models

We prepared 6 different models with ResNet50 but using different partitions and subgroups of the version DDSM data set previously specified. The models are:
- Model 0: Raw Pre-Processed DDSM Dataset (Baseline)
    - This uses all of the data in the version of the DDSM data set that we selected.
- Model 1: Cleaned Normal Class
    - As we see in the EDA tab, this removes noisy images from the baseline DDSM data.
- Model 2: Classification of Abnormalities (Classes 1 to 4)
    - This model removes the "normal" class images and strictly classifies the different types of abnormalities. The normal images are the "cleaned" ones from Model 1.
- Model 3: Normal vs Abnormal (Class 0 vs 1-4)
    - This model reduces all the abnormal images to one class and attempts to distinguish between the normal and abnormal images.
- Model 4: Benign vs Malignant Calcification
    - This model looks strictly at the original class 1 and class 3, which are benign and malignant calcifications.
- Model 5: Benign vs Malignant Mass
    - This model looks strictly at the original class 2 and 4, which are benign and malignant masses.
    
We evaluate each model on the training data, the RGB test data, and the grayscale test data. We originally were only using the RGB test data, but because we had such low test accuracy, we worried that the Keras ImageDataGenerator converted the image to RGB in a different way than tf.image.grayscale_to_rgb did, so we saved the images as grayscale and repeated the process. Everything is in this notebook, but note this was a classic "it's the data science process!" moment where we had to take several steps back and try again.

A summary of the training and test accuracies can be found at the conclusion.

In [2]:
'''IMPORT LIBRARIES'''
import requests
import os

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import shutil
# import cv2

from scipy.misc import imresize
from sklearn.model_selection import train_test_split

from keras.applications.resnet50 import ResNet50
from keras.callbacks import ModelCheckpoint, EarlyStopping
from keras.models import Sequential, Model, load_model
from keras.layers import Dropout, Flatten, Dense, Conv2D, MaxPooling2D
from keras.layers import Input, Reshape, UpSampling2D, InputLayer, Lambda, ZeroPadding2D
from keras.layers import Cropping2D, Conv2DTranspose, BatchNormalization, Activation, GlobalAveragePooling2D
from keras.utils import np_utils, to_categorical
from keras.losses import binary_crossentropy
from keras import backend as K,objectives
from keras.losses import mse, binary_crossentropy
from keras.layers.advanced_activations import LeakyReLU
from keras.optimizers import Adam, RMSprop
from keras.initializers import RandomNormal
from keras.preprocessing import image
import tensorflow as tf
from keras_preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
import random
import keras

from sklearn.utils.class_weight import compute_class_weight
from PIL import Image

from IPython.core.display import HTML
styles = requests.get("https://raw.githubusercontent.com/Harvard-IACS/2018-CS109A/master/content/styles/cs109.css").text
HTML(styles)

np.random.seed(42)

## Loading the Data

The data was prepared into csvs that were formated to have the file name and the class labels. The code for this can be found in the EDA tab.

The validation data for training is randomly sampled from the training dataset during model creation, since that is when the data generators are created.

In [3]:
train_df_model_0 = pd.read_csv('data/train_df_model_0.csv')
train_df_model_1 = pd.read_csv('data/train_df_model_1.csv')
train_df_model_2 = pd.read_csv('data/train_df_model_2.csv')
train_df_model_3 = pd.read_csv('data/train_df_model_3.csv')
train_df_model_4 = pd.read_csv('data/train_df_model_4.csv')
train_df_model_5 = pd.read_csv('data/train_df_model_5.csv')

In [4]:
test_df_model_0 = pd.read_csv("data/test_df_model_0.csv")
test_df_model_1 = pd.read_csv("data/test_df_model_1.csv")
test_df_model_2 = pd.read_csv("data/test_df_model_2.csv")
test_df_model_3 = pd.read_csv("data/test_df_model_3.csv")
test_df_model_4 = pd.read_csv("data/test_df_model_4.csv")
test_df_model_5 = pd.read_csv("data/test_df_model_5.csv")
test_dfs = [test_df_model_0, test_df_model_1, test_df_model_2, test_df_model_3, test_df_model_4, test_df_model_5]

## Setting Up the Models

We will use transfer learning on the pre-trained ResNet network. Since we will train several models, the code to do so has been consolidated here.

We define several hyper-parameters for each model, and define functions to build the data generators and the model, and to evaluate them.

In [5]:
'''HYPER-PARAMETERS'''
#Image related parameters
H = 299
W = 299
n_channels = 3

#Optimization related parameters
batch_size_train = 32
batch_size_test  = 1

#Model related parameters
model0_epochs = 5
model1_epochs = 5
model2_epochs = 15
model3_epochs = 5
model4_epochs = 15
model5_epochs = 10

model0_classes = 5
model1_classes = 5
model2_classes = 4
model3_classes = 2
model4_classes = 2
model5_classes = 2

In [6]:
'''Build the model and DataGenerators.'''
def build_model(n_classes,df,x='filename',y='y', bs_train = 32, lr = 0.0001,H = H,W = W, n_channels = 3):
    #Data generator
    train_df, val_df = train_test_split(df, test_size=0.2, random_state=42, stratify=df[y])
    val_df.reset_index(inplace=True)
    val_df.drop(['index'], axis=1, inplace=True)
    train_datagen = ImageDataGenerator(
            rescale=1./255)
    val_datagen = ImageDataGenerator(rescale=1./255)
    
    train_generator = train_datagen.flow_from_dataframe(
        directory='images',
        dataframe=train_df,
        x_col=x,
        y_col=y,
        # width by height only, not channels
        target_size=(H, W),
        color_mode="rgb",
        batch_size=bs_train,
        class_mode="categorical",
        shuffle=True,
        seed=42
    )
    
    val_generator = val_datagen.flow_from_dataframe(
        directory='images',
        dataframe=val_df,
        x_col=x,
        y_col=y,
        # width by height only, not channels
        target_size=(H, W),
        color_mode="rgb",
        batch_size=bs_train,
        class_mode="categorical",
        shuffle=True,
        seed=42
    )
    #Class weight
    all_classes = df[y].unique()
    class_weights = compute_class_weight(class_weight='balanced', classes=all_classes, y=train_df[y])
    #Model architecture
    inp = Input(shape = (H,W,n_channels))

    model = ResNet50(input_shape=(H,W,n_channels), include_top=False, weights='imagenet')
    x1 = model(inp)
    x2 = GlobalAveragePooling2D()(x1)
    out = Dense(n_classes, activation='softmax')(x2)

    model = Model(inputs = inp, outputs = out)
    optimizer = Adam(lr=lr)
    model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
    
    STEP_SIZE_TRAIN = train_generator.n//train_generator.batch_size
    STEP_SIZE_VAL = val_generator.n//val_generator.batch_size
        
    return model, train_generator, val_generator, class_weights, STEP_SIZE_TRAIN, STEP_SIZE_VAL

In [7]:
'''Build test DataGenerators for the RGB data.'''
test_gens = {}
for i, df in zip(range(6), test_dfs):
    test_datagen = ImageDataGenerator(rescale=1./255)
    
    test_generator = test_datagen.flow_from_dataframe(
        directory='test_images/',
        dataframe=df,
        x_col='filename',
        y_col='y',
        # width by height only, not channels
        target_size=(H, W),
        color_mode="rgb",
        batch_size=batch_size_test,
        class_mode="categorical",
        shuffle=True,
        seed=42
    )
    test_gens[i] = test_generator

Found 15364 images belonging to 5 classes.
Found 15364 images belonging to 5 classes.
Found 2004 images belonging to 4 classes.
Found 15364 images belonging to 2 classes.
Found 927 images belonging to 2 classes.
Found 1077 images belonging to 2 classes.


In [8]:
'''Build test DataGenerators for the grayscale data.'''
test_gens_gray = {}
for i, df in zip(range(6), test_dfs):
    test_datagen = ImageDataGenerator(rescale=1./255)
    
    test_generator = test_datagen.flow_from_dataframe(
        directory='test_images_grayscale/',
        dataframe=df,
        x_col='filename',
        y_col='y',
        # width by height only, not channels
        target_size=(H, W),
        color_mode="rgb",
        batch_size=batch_size_test,
        class_mode="categorical",
        shuffle=True,
        seed=42
    )
    test_gens_gray[i] = test_generator

Found 15364 images belonging to 5 classes.
Found 15364 images belonging to 5 classes.
Found 2004 images belonging to 4 classes.
Found 15364 images belonging to 2 classes.
Found 927 images belonging to 2 classes.
Found 1077 images belonging to 2 classes.


In [9]:
'''Evaluate a given model on test and training data.'''
def evaluate_train_test(model, train_gen=None, step_size_train=None, test_gen=None):
    train_results = None
    if train_gen:
        train_results = model.evaluate_generator(train_gen, steps=step_size_train)
    test_results = None
    if test_gen:
        test_results = model.evaluate_generator(test_gen, test_gen.n)
    return train_results, test_results

In [10]:
'''Format evaluation metrics from both a best weights model and a final model.'''
def pretty_metrics(model_id, model_train, model_test, model_best_weights_train, model_best_weights_test):
    if model_train and model_test and model_best_weights_train and model_best_weights_test:
        m_train_loss, m_train_acc = model_train
        m_test_loss, m_test_acc = model_test
        bw_train_loss, bw_train_acc = model_best_weights_train
        bw_test_loss, bw_test_acc = model_best_weights_test
        results = pd.DataFrame()
        results['Model'] = ['Model {}'.format(model_id), 'Best Weights Model {}'.format(model_id)]
        results['training loss'] = [m_train_loss, bw_train_loss]
        results['training acc'] = [m_train_acc, bw_train_acc]
        results['test loss'] = [m_test_loss, bw_test_loss]
        results['test acc'] = [m_test_acc, bw_test_acc]
        return results
    return None

## Model Training

### Model 0: Raw Pre-Processed DDSM Dataset (Baseline)

The baseline model classifies the images from the pre-processed DDSM dataset into the following $5$ classes:

- $0$: Normal
- $1$: Benign Calcification
- $2$: Benign Mass
- $3$: Malignant Calcification
- $4$: Malignant Mass

In [20]:
model_0, train_generator_0, val_generator_0, class_weights_0, STEP_SIZE_TRAIN_0, STEP_SIZE_VAL_0 = \
    build_model(model0_classes,
                train_df_model_0,
                x='filename',
                y='y', 
                bs_train = 32, 
                lr = 0.0001,
                H = H,
                W = W,
                n_channels = 3)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  errors=errors)


Found 44708 images belonging to 5 classes.
Found 11177 images belonging to 5 classes.




In [21]:
%%time
filepath_0="models/model0_best_weights.h5"
checkpoint_0 = ModelCheckpoint(filepath_0, monitor='val_acc', verbose=1, save_best_only=True, mode='max')
callbacks_list_0 = [checkpoint_0]

model_0.fit_generator(generator=train_generator_0,
                    class_weight = class_weights_0,
                    steps_per_epoch=STEP_SIZE_TRAIN_0,
                    validation_data = val_generator_0,
                    validation_steps = STEP_SIZE_VAL_0,
                    epochs=model0_epochs,
                    callbacks=callbacks_list_0

)

Epoch 1/5

Epoch 00001: val_acc improved from -inf to 0.45335, saving model to models/model0_best_weights.h5
Epoch 2/5

Epoch 00002: val_acc did not improve from 0.45335
Epoch 3/5

Epoch 00003: val_acc improved from 0.45335 to 0.47483, saving model to models/model0_best_weights.h5
Epoch 4/5

Epoch 00004: val_acc improved from 0.47483 to 0.84450, saving model to models/model0_best_weights.h5
Epoch 5/5

Epoch 00005: val_acc did not improve from 0.84450
CPU times: user 2h 32min 14s, sys: 45min 44s, total: 3h 17min 58s
Wall time: 3h 16min 35s


In [22]:
model_0.save('models/model_0.h5')
model_0.save_weights('models/model_0_weights.h5')

In [11]:
%%time
model_0 = load_model('models/model_0.h5')
model_0_best_weights = load_model("models/model0_best_weights.h5")

CPU times: user 1min 9s, sys: 1.86 s, total: 1min 11s
Wall time: 1min 18s


#### Model 0 Evaluation

First we evaluate the models with the RGB test data.

In [23]:
%%time
model_0_train_metrics, model_0_test_metrics = evaluate_train_test(model_0, 
                                                                  train_generator_0, 
                                                                  STEP_SIZE_TRAIN_0, 
                                                                  test_gens[0])

CPU times: user 36min 20s, sys: 5min 29s, total: 41min 49s
Wall time: 21min 36s


In [24]:
%%time
model_0_bw_train_metrics, model_0_bw_test_metrics = evaluate_train_test(model_0_best_weights, 
                                                                        train_generator_0, 
                                                                        STEP_SIZE_TRAIN_0, 
                                                                        test_gens[0])

CPU times: user 36min 22s, sys: 5min 32s, total: 41min 55s
Wall time: 21min 45s


In [25]:
model_0_results = pretty_metrics(0, model_0_train_metrics, 
                                 model_0_test_metrics, 
                                 model_0_bw_train_metrics, 
                                 model_0_bw_test_metrics)
display(model_0_results)

Unnamed: 0,Model,training loss,training acc,test loss,test acc
0,Model 0,1.74136,0.534,3.356003,0.407706
1,Best Weights Model 0,0.463247,0.862118,1.611957,0.708539


Now we evaluate the model only on the grayscale test data. We see that the results are identical to the results for the RGB test data, so we only do this as a formality for the remaining models.

In [12]:
%%time
_ , model_0_gray_test = evaluate_train_test(model_0, test_gen=test_gens_gray[0])
_ , bw_model_0_gray_test = evaluate_train_test(model_0_best_weights, test_gen=test_gens_gray[0])

CPU times: user 36min 10s, sys: 4min 16s, total: 40min 26s
Wall time: 18min 42s


In [13]:
pd.DataFrame({'metric': ['test loss', 'test acc'],
              'Model 0':model_0_gray_test, 
              'Model 0 Best Weights':bw_model_0_gray_test})

Unnamed: 0,metric,Model 0,Model 0 Best Weights
0,test loss,3.356003,1.611957
1,test acc,0.407706,0.708539


### Model  1: Cleaned Normal Class

In [13]:
model_1, train_generator_1, val_generator_1, class_weights_1, STEP_SIZE_TRAIN_1, STEP_SIZE_VAL_1 = \
    build_model(model1_classes,
                train_df_model_1,
                x='filename',
                y='y', 
                bs_train = 32, 
                lr = 1.1111,
                H = H,
                W = W,
                n_channels = 3)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  errors=errors)


Found 42261 images belonging to 5 classes.
Found 10566 images belonging to 5 classes.




In [16]:
%%time
filepath_1="models/model1_best_weights.h5"
checkpoint_1 = ModelCheckpoint(filepath_1, monitor='val_acc', verbose=1, save_best_only=True, mode='max')
callbacks_list_1 = [checkpoint_1]

model_1.fit_generator(generator=train_generator_1,
                    class_weight = class_weights_1,
                    steps_per_epoch=STEP_SIZE_TRAIN_1,
                    validation_data = val_generator_1,
                    validation_steps = STEP_SIZE_VAL_1,
                    epochs=model1_epochs,
                    callbacks=callbacks_list_1
)

Epoch 1/5

Epoch 00001: val_acc improved from -inf to 0.86264, saving model to models/model1_best_weights.h5
Epoch 2/5

Epoch 00002: val_acc did not improve from 0.86264
Epoch 3/5

Epoch 00003: val_acc did not improve from 0.86264
Epoch 4/5

Epoch 00004: val_acc did not improve from 0.86264
Epoch 5/5

Epoch 00005: val_acc did not improve from 0.86264
CPU times: user 1h 52min 4s, sys: 1h 9min 19s, total: 3h 1min 23s
Wall time: 2h 59min 14s


In [22]:
model_1.save('models/model_1.h5')
model_1.save_weights('models/model_1_weights.h5')

In [14]:
model_1 = load_model('models/model_1.h5')
model_1_best_weights = load_model("models/model1_best_weights.h5")

#### Model Evaluation
First we evaluate the models with the RGB test data.

In [18]:
%%time
model_1_train_metrics, model_1_test_metrics = evaluate_train_test(model_1, 
                                                                  train_generator_1, 
                                                                  STEP_SIZE_TRAIN_1, 
                                                                  test_gens[1])

CPU times: user 34min 3s, sys: 5min 13s, total: 39min 17s
Wall time: 20min 12s


In [23]:
%%time
model_1_bw_train_metrics, model_1_bw_test_metrics = evaluate_train_test(model_1_best_weights, 
                                                                        train_generator_1, 
                                                                        STEP_SIZE_TRAIN_1, 
                                                                        test_gens[1])

CPU times: user 34min 39s, sys: 5min 5s, total: 39min 45s
Wall time: 20min 30s


In [24]:
model_1_results = pretty_metrics(1, model_1_train_metrics, 
                                 model_1_test_metrics, 
                                 model_1_bw_train_metrics, 
                                 model_1_bw_test_metrics)
display(model_1_results)

Unnamed: 0,Model,training loss,training acc,test loss,test acc
0,Model 1,2.228266,0.861754,2.10236,0.869565
1,Best Weights Model 1,2.222922,0.862085,2.101311,0.86963


Now we evaluate the model only on the grayscale test data. Again, the results are identical to the RGB image results.

In [15]:
%%time
_ , model_1_gray_test = evaluate_train_test(model_1, test_gen=test_gens_gray[1])
_ , bw_model_1_gray_test = evaluate_train_test(model_1_best_weights, test_gen=test_gens_gray[1])

CPU times: user 39min 5s, sys: 4min 16s, total: 43min 22s
Wall time: 20min 15s


In [16]:
pd.DataFrame({'metric': ['test loss', 'test acc'],
              'Model 1':model_1_gray_test, 
              'Model 1 Best Weights':bw_model_1_gray_test})

Unnamed: 0,metric,Model 1,Model 1 Best Weights
0,test loss,2.10236,2.101311
1,test acc,0.869565,0.86963


### Model 2: Classification of Abnormalities (Classes 1 to 4)

In [26]:
model_2, train_generator_2, val_generator_2, class_weights_2, STEP_SIZE_TRAIN_2, STEP_SIZE_VAL_2 = \
    build_model(model2_classes,
                train_df_model_2,
                x='filename',
                y='y', 
                bs_train = 32, 
                lr = 0.0001,
                H = H,
                W = W, 
                n_channels = 3)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  errors=errors)


Found 5831 images belonging to 4 classes.
Found 1458 images belonging to 4 classes.




In [27]:
%%time
filepath_2="models/model2_best_weights.h5"
checkpoint_2 = ModelCheckpoint(filepath_2, monitor='val_acc', verbose=1, save_best_only=True, mode='max')
callbacks_list_2 = [checkpoint_2]

model_2.fit_generator(generator=train_generator_2,
                    class_weight = class_weights_2,
                    steps_per_epoch=STEP_SIZE_TRAIN_2,
                    validation_data = val_generator_2,
                    validation_steps = STEP_SIZE_VAL_2,
                    epochs=model2_epochs,
                    callbacks=callbacks_list_2

)

Epoch 1/15

Epoch 00001: val_acc improved from -inf to 0.34236, saving model to models/model2_best_weights.h5
Epoch 2/15

Epoch 00002: val_acc did not improve from 0.34236
Epoch 3/15

Epoch 00003: val_acc did not improve from 0.34236
Epoch 4/15

Epoch 00004: val_acc improved from 0.34236 to 0.38149, saving model to models/model2_best_weights.h5
Epoch 5/15

Epoch 00005: val_acc did not improve from 0.38149
Epoch 6/15

Epoch 00006: val_acc did not improve from 0.38149
Epoch 7/15

Epoch 00007: val_acc did not improve from 0.38149
Epoch 8/15

Epoch 00008: val_acc improved from 0.38149 to 0.39271, saving model to models/model2_best_weights.h5
Epoch 9/15

Epoch 00009: val_acc improved from 0.39271 to 0.43268, saving model to models/model2_best_weights.h5
Epoch 10/15

Epoch 00010: val_acc did not improve from 0.43268
Epoch 11/15

Epoch 00011: val_acc did not improve from 0.43268
Epoch 12/15

Epoch 00012: val_acc did not improve from 0.43268
Epoch 13/15

Epoch 00013: val_acc did not improve fr

In [28]:
model_2.save('models/model_2.h5')
model_2.save_weights('models/model_2_weights.h5')

In [17]:
model_2 = load_model('models/model_2.h5')
model_2_best_weights = load_model("models/model2_best_weights.h5")

#### Model 2 Evaluation

In [29]:
%%time
model_2_train_metrics, model_2_test_metrics = evaluate_train_test(model_2, 
                                                                  train_generator_2, 
                                                                  STEP_SIZE_TRAIN_2, 
                                                                  test_gens[2])

CPU times: user 4min 55s, sys: 42.3 s, total: 5min 37s
Wall time: 2min 53s


In [30]:
%%time
model_2_bw_train_metrics, model_2_bw_test_metrics = evaluate_train_test(model_2_best_weights, 
                                                                        train_generator_2, 
                                                                        STEP_SIZE_TRAIN_2, 
                                                                        test_gens[2])

CPU times: user 5min 6s, sys: 43.2 s, total: 5min 49s
Wall time: 3min 4s


In [31]:
model_2_results = pretty_metrics(2, model_2_train_metrics, 
                                 model_2_test_metrics, 
                                 model_2_bw_train_metrics, 
                                 model_2_bw_test_metrics)
display(model_2_results)

Unnamed: 0,Model,training loss,training acc,test loss,test acc
0,Model 2,1.448646,0.55682,2.91726,0.258982
1,Best Weights Model 2,1.583237,0.550785,2.279989,0.265469


Now we evaluate the model only on the grayscale test data. Again, the results are identical to the RGB image results.

In [18]:
%%time
_ , model_2_gray_test = evaluate_train_test(model_2, test_gen=test_gens_gray[2])
_ , bw_model_2_gray_test = evaluate_train_test(model_2_best_weights, test_gen=test_gens_gray[2])

CPU times: user 5min 54s, sys: 33.1 s, total: 6min 27s
Wall time: 3min 15s


In [19]:
pd.DataFrame({'metric': ['test loss', 'test acc'],
              'Model 2':model_2_gray_test, 
              'Model 2 Best Weights':bw_model_2_gray_test})

Unnamed: 0,metric,Model 2,Model 2 Best Weights
0,test loss,2.91726,2.279989
1,test acc,0.258982,0.265469


### Model 3: Normal vs Abnormal (Class 0 vs 1-4)

In [35]:
model_3, train_generator_3, val_generator_3, class_weights_3, STEP_SIZE_TRAIN_3, STEP_SIZE_VAL_3 = \
    build_model(model3_classes,
                train_df_model_3,
                x='filename',
                y='y', 
                bs_train = 32, 
                lr = 0.0001,
                H = H,
                W = W, 
                n_channels = 3)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  errors=errors)


Found 42261 images belonging to 2 classes.
Found 10566 images belonging to 2 classes.




In [36]:
%%time
filepath_3="models/model3_best_weights.h5"
checkpoint_3 = ModelCheckpoint(filepath_3, monitor='val_acc', verbose=1, save_best_only=True, mode='max')
callbacks_list_3 = [checkpoint_3]

model_3.fit_generator(generator=train_generator_3,
                    class_weight = class_weights_3,
                    steps_per_epoch=STEP_SIZE_TRAIN_3,
                    validation_data = val_generator_3,
                    validation_steps = STEP_SIZE_VAL_3,
                    epochs=model3_epochs,
                    callbacks=callbacks_list_3
)

Epoch 1/5

Epoch 00001: val_acc improved from -inf to 0.72992, saving model to models/model3_best_weights.h5
Epoch 2/5

Epoch 00002: val_acc did not improve from 0.72992
Epoch 3/5

Epoch 00003: val_acc improved from 0.72992 to 0.92928, saving model to models/model3_best_weights.h5
Epoch 4/5

Epoch 00004: val_acc did not improve from 0.92928
Epoch 5/5

Epoch 00005: val_acc improved from 0.92928 to 0.96393, saving model to models/model3_best_weights.h5
CPU times: user 1h 55min, sys: 1h 7min 33s, total: 3h 2min 34s
Wall time: 2h 58min 48s


In [37]:
model_3.save('models/model_3.h5')
model_3.save_weights('models/model_3_weights.h5')

In [20]:
model_3 = load_model('models/model_3.h5')
model_3_best_weights = load_model("models/model3_best_weights.h5")

#### Model 3 Evaluation

In [38]:
%%time
model_3_train_metrics, model_3_test_metrics = evaluate_train_test(model_3, 
                                                                  train_generator_3, 
                                                                  STEP_SIZE_TRAIN_3, 
                                                                  test_gens[3])

CPU times: user 39min 57s, sys: 4min 39s, total: 44min 37s
Wall time: 22min 40s


In [39]:
%%time
model_3_bw_train_metrics, model_3_bw_test_metrics = evaluate_train_test(model_3_best_weights, 
                                                                        train_generator_3, 
                                                                        STEP_SIZE_TRAIN_3, 
                                                                        test_gens[3])

CPU times: user 39min 53s, sys: 4min 45s, total: 44min 38s
Wall time: 22min 55s


In [40]:
model_3_results = pretty_metrics(3, model_3_train_metrics, 
                                 model_3_test_metrics, 
                                 model_3_bw_train_metrics, 
                                 model_3_bw_test_metrics)
display(model_3_results)

Unnamed: 0,Model,training loss,training acc,test loss,test acc
0,Model 3,0.065287,0.974922,2.337368,0.752603
1,Best Weights Model 3,0.065781,0.974804,2.338943,0.752473


Now we evaluate the model only on the grayscale test data. Again, the results are identical to the RGB image results.

In [21]:
%%time
_ , model_3_gray_test = evaluate_train_test(model_3, test_gen=test_gens_gray[3])
_ , bw_model_3_gray_test = evaluate_train_test(model_3_best_weights, test_gen=test_gens_gray[3])

CPU times: user 44min 53s, sys: 4min 15s, total: 49min 9s
Wall time: 23min 26s


In [22]:
pd.DataFrame({'metric': ['test loss', 'test acc'],
              'Model 3':model_3_gray_test, 
              'Model 3 Best Weights':bw_model_3_gray_test})

Unnamed: 0,metric,Model 3,Model 3 Best Weights
0,test loss,2.337312,2.338439
1,test acc,0.752669,0.752603


### Model 4: Benign vs Malignant Calcification

In [35]:
model_4, train_generator_4, val_generator_4, class_weights_4, STEP_SIZE_TRAIN_4, STEP_SIZE_VAL_4 = \
    build_model(model4_classes,
                train_df_model_4,
                x='filename',
                y='y', 
                bs_train = 32, 
                lr = 0.0001,
                H = H,
                W = W, 
                n_channels = 3)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  errors=errors)


Found 2852 images belonging to 2 classes.
Found 714 images belonging to 2 classes.




In [36]:
%%time
filepath_4="models/model4_best_weights.h5"
checkpoint_4 = ModelCheckpoint(filepath_4, monitor='val_acc', verbose=1, save_best_only=True, mode='max')
callbacks_list_4 = [checkpoint_4]

model_4.fit_generator(generator=train_generator_4,
                    class_weight = class_weights_4,
                    steps_per_epoch=STEP_SIZE_TRAIN_4,
                    validation_data = val_generator_4,
                    validation_steps = STEP_SIZE_VAL_4,
                    epochs=model4_epochs,
                    callbacks=callbacks_list_4
)

Epoch 1/15

Epoch 00001: val_acc improved from -inf to 0.61506, saving model to models/model4_best_weights.h5
Epoch 2/15

Epoch 00002: val_acc did not improve from 0.61506
Epoch 3/15

Epoch 00003: val_acc did not improve from 0.61506
Epoch 4/15

Epoch 00004: val_acc improved from 0.61506 to 0.62317, saving model to models/model4_best_weights.h5
Epoch 5/15

Epoch 00005: val_acc improved from 0.62317 to 0.62317, saving model to models/model4_best_weights.h5
Epoch 6/15

Epoch 00006: val_acc did not improve from 0.62317
Epoch 7/15

Epoch 00007: val_acc improved from 0.62317 to 0.65396, saving model to models/model4_best_weights.h5
Epoch 8/15

Epoch 00008: val_acc improved from 0.65396 to 0.70381, saving model to models/model4_best_weights.h5
Epoch 9/15

Epoch 00009: val_acc did not improve from 0.70381
Epoch 10/15

Epoch 00010: val_acc did not improve from 0.70381
Epoch 11/15

Epoch 00011: val_acc did not improve from 0.70381
Epoch 12/15

Epoch 00012: val_acc did not improve from 0.70381
E

In [37]:
model_4.save('models/model_4.h5')
model_4.save_weights('models/model_4_weights.h5')


In [23]:
model_4 = load_model('models/model_4.h5')
model_4_best_weights = load_model("models/model4_best_weights.h5")

#### Model 4 Evaluation

In [38]:
%%time
model_4_train_metrics, model_4_test_metrics = evaluate_train_test(model_4, 
                                                                  train_generator_4, 
                                                                  STEP_SIZE_TRAIN_4, 
                                                                  test_gens[4])

CPU times: user 2min 33s, sys: 19.3 s, total: 2min 52s
Wall time: 1min 28s


In [39]:
%%time
model_4_bw_train_metrics, model_4_bw_test_metrics = evaluate_train_test(model_4_best_weights, 
                                                                        train_generator_4, 
                                                                        STEP_SIZE_TRAIN_4, 
                                                                        test_gens[4])

CPU times: user 2min 51s, sys: 20 s, total: 3min 11s
Wall time: 1min 48s


In [40]:
model_4_results = pretty_metrics(4, model_4_train_metrics, 
                                 model_4_test_metrics, 
                                 model_4_bw_train_metrics, 
                                 model_4_bw_test_metrics)
display(model_4_results)

Unnamed: 0,Model,training loss,training acc,test loss,test acc
0,Model 4,0.885222,0.637234,1.321746,0.464941
1,Best Weights Model 4,0.477532,0.813121,1.70904,0.521036


Now we evaluate the model only on the grayscale test data. Again, the results are identical to the RGB image results.

In [24]:
%%time
_ , model_4_gray_test = evaluate_train_test(model_4, test_gen=test_gens_gray[4])
_ , bw_model_4_gray_test = evaluate_train_test(model_4_best_weights, test_gen=test_gens_gray[4])

CPU times: user 3min 56s, sys: 15.3 s, total: 4min 11s
Wall time: 2min 34s


In [25]:
pd.DataFrame({'metric': ['test loss', 'test acc'],
              'Model 4':model_4_gray_test, 
              'Model 4 Best Weights':bw_model_4_gray_test})

Unnamed: 0,metric,Model 4,Model 4 Best Weights
0,test loss,1.321746,1.70904
1,test acc,0.464941,0.521036


### Model 5: Benign vs Malignant Mass

In [41]:
model_5, train_generator_5, val_generator_5, class_weights_5, STEP_SIZE_TRAIN_5, STEP_SIZE_VAL_5 = \
    build_model(model5_classes,
                train_df_model_5,
                x='filename',
                y='y', 
                bs_train = 32, 
                lr = 0.0001,
                H = H,
                W = W, 
                n_channels = 3)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  errors=errors)


Found 2978 images belonging to 2 classes.
Found 745 images belonging to 2 classes.




In [42]:
%%time
filepath_5="models/model5_best_weights.h5"
checkpoint_5 = ModelCheckpoint(filepath_5, monitor='val_acc', verbose=1, save_best_only=True, mode='max')
callbacks_list_5 = [checkpoint_5]

model_5.fit_generator(generator=train_generator_5,
                    class_weight = class_weights_5,
                    steps_per_epoch=STEP_SIZE_TRAIN_5,
                    validation_data = val_generator_5,
                    validation_steps = STEP_SIZE_VAL_5,
                    epochs=model5_epochs,
                    callbacks=callbacks_list_5

)

Epoch 1/10

Epoch 00001: val_acc improved from -inf to 0.51630, saving model to models/model5_best_weights.h5
Epoch 2/10

Epoch 00002: val_acc improved from 0.51630 to 0.64797, saving model to models/model5_best_weights.h5
Epoch 3/10

Epoch 00003: val_acc improved from 0.64797 to 0.68864, saving model to models/model5_best_weights.h5
Epoch 4/10

Epoch 00004: val_acc did not improve from 0.68864
Epoch 5/10

Epoch 00005: val_acc improved from 0.68864 to 0.71950, saving model to models/model5_best_weights.h5
Epoch 6/10

Epoch 00006: val_acc did not improve from 0.71950
Epoch 7/10

Epoch 00007: val_acc did not improve from 0.71950
Epoch 8/10

Epoch 00008: val_acc did not improve from 0.71950
Epoch 9/10

Epoch 00009: val_acc did not improve from 0.71950
Epoch 10/10

Epoch 00010: val_acc improved from 0.71950 to 0.73773, saving model to models/model5_best_weights.h5
CPU times: user 21min 52s, sys: 5min 50s, total: 27min 43s
Wall time: 27min 37s


In [43]:
model_5.save('models/model_5.h5')
model_5.save_weights('models/model_5_weights.h5')

In [26]:
filepath_5="models/model5_best_weights.h5"
model_5 = load_model('models/model_5.h5')
model_5_best_weights = load_model(filepath_5)

#### Model 5 Evaluation

In [44]:
%%time
model_5_train_metrics, model_5_test_metrics = evaluate_train_test(model_5, 
                                                                  train_generator_5, 
                                                                  STEP_SIZE_TRAIN_5, 
                                                                  test_gens[5])

CPU times: user 2min 57s, sys: 20.2 s, total: 3min 17s
Wall time: 1min 41s


In [45]:
%%time
model_5_bw_train_metrics, model_5_bw_test_metrics = evaluate_train_test(model_5_best_weights, 
                                                                        train_generator_5, 
                                                                        STEP_SIZE_TRAIN_5, 
                                                                        test_gens[5])

CPU times: user 3min 25s, sys: 21.2 s, total: 3min 46s
Wall time: 2min 10s


In [46]:
model_5_results = pretty_metrics(5, model_5_train_metrics, 
                                 model_5_test_metrics, 
                                 model_5_bw_train_metrics, 
                                 model_5_bw_test_metrics)
display(model_5_results)

Unnamed: 0,Model,training loss,training acc,test loss,test acc
0,Model 5,0.274897,0.897828,1.513099,0.517177
1,Best Weights Model 5,0.272092,0.899185,1.513085,0.517177


Now we evaluate the model only on the grayscale test data. Again, the results are identical to the RGB image results.

In [27]:
%%time
_ , model_5_gray_test = evaluate_train_test(model_5, test_gen=test_gens_gray[5])
_ , bw_model_5_gray_test = evaluate_train_test(model_5_best_weights, test_gen=test_gens_gray[5])

CPU times: user 5min 7s, sys: 17.4 s, total: 5min 25s
Wall time: 3min 23s


In [28]:
pd.DataFrame({'metric': ['test loss', 'test acc'],
              'Model 5':model_5_gray_test, 
              'Model 5 Best Weights':bw_model_5_gray_test})

Unnamed: 0,metric,Model 5,Model 5 Best Weights
0,test loss,1.513099,1.513085
1,test acc,0.517177,0.517177


## Summary of Results

because the RGB and grayscale test data results were essentially the same, we simply show the test results from the RGB data.

In [49]:
all_results = pd.concat([model_0_results, 
                         model_1_results, model_2_results, model_3_results, model_4_results, model_5_results])
display(all_results_df)

Unnamed: 0,Model,training loss,training acc,test loss,test acc
0,Model 0,1.74136,0.534,3.356003,0.407706
1,Best Weights Model 0,0.463247,0.862118,1.611957,0.708539
2,Model 1,2.228266,0.861754,2.10236,0.869565
3,Model 1 Best Weights,2.222922,0.862085,2.101311,0.86963
4,Model 2,1.448646,0.55682,2.91726,0.258982
5,Best Weights Model 2,1.583237,0.550785,2.279989,0.265469
6,Model 3,0.065287,0.974922,2.337368,0.752603
7,Best Weights Model 3,0.065781,0.974804,2.338943,0.752473
8,Model 4,0.885222,0.637234,1.321746,0.464941
9,Best Weights Model 4,0.477532,0.813121,1.70904,0.521036


## Conclusion

There results were not ideal because the accuracies are not comparable to what we've seen in the literature review. The fact that the validation accuracy and the test accuracy are not especially close is worrisome especially when the trianing data accuracy was so high durin training. Based on these results, we decided to take a step back and consider other models.