# Flatiron Capstone Project: Automated Breast Cancer Metastasis Detection
### Convolutional Neural Network Development
#### Marissa Eppes


**SUMMARY:** This notebook walks through the thought process and steps to tune and train a convolutional neural network (CNN) image classifier. Tiles are 256 x 256 pixel regions within lymph node biopsy whole slide images (WSI). Each individual tile is classified as either Cancer or Non-Cancer (often referred to as "normal") for the purpose of training and testing the model.

All tiles for training and testing have already been extracted using train_tiles.py and test_tiles.py modules. Tiles are prepared for integration with the CNN using the cnn_prep.py module.

The general strategy for CNN tuning/training is summarized as follows:
* Train from scratch (transfer learning has been tried and did not appear promising for this specific application)
* Start with relatively shallow model, smaller data sample, 10 - 20 epochs
* Evaluate performance, Add/Subtract/Change 1 layer at a time,  Repeat
* Scale up best-performing model(s), more data, 30 - 50+ epochs
* Introduce image augmentation

**Figure 1** portrays a diagram summarizing the development and selection process. Parameters for each of the models are shown in below code. 

**NOTE:** This notebook was executed using a GPU-accelerated AWS Sagemaker instance with 61GB RAM. There are several cases where model output had to be cleared to conserve RAM and continue modeling. Therefore, this notebook will not show all outputs, but will still attempt to demonstrate the model tuning process.

<img src="modeling_diagram.png">

<center> Figure 1 <center>

In [4]:
# Import necessary libraries

from PIL import Image
import numpy as np
import time
import pickle
import glob
import os
import shutil
import matplotlib.pyplot as plt
import numpy as np
import keras
import cnn_prep as prep
from sklearn.metrics import confusion_matrix, f1_score
from sklearn.model_selection import train_test_split
from keras.models import load_model, Sequential, Model
from keras.utils import to_categorical
from keras import optimizers, models, layers
from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
from keras.layers import Dropout, Flatten, Dense, GlobalAveragePooling2D
from keras import backend as k
from keras import applications
from keras.callbacks import ModelCheckpoint, LearningRateScheduler, TensorBoard, EarlyStopping

In [4]:
# Creates lists of paths to all extracted cancer tiles and all extracted normal tiles

cancer_train_glob = glob.glob(
    "/home/ec2-user/SageMaker/data/train/cancer/*.jpeg")
normal_train_glob = glob.glob(
    "/home/ec2-user/SageMaker/data/train/normal/*.jpeg")

## Preliminary CNN Development
Several custom CNNs built from scratch will be tried with a 20,000-sample subset of data using 10 epochs. Best-performing CNN architectures will subsequently be scaled up.

In [None]:
# 10,000 tiles from each class will be included for preliminary development (20,000 total)

scale_down_number = 10000

In [None]:
# Loads and prepares 10,000 tiles from each class chosen at random

train_cancer = prep.cancer_train_jpegs_to_arrays(
    cancer_train_glob, scale_down=scale_down_number)
train_normal = prep.normal_train_jpegs_to_arrays(
    normal_train_glob, scale_down=scale_down_number)
train_data = np.concatenate((train_cancer, train_normal))

In [None]:
# Below code confirms that cancer data and normal data have been concatenated properly.
# All outputs should be True.

print((train_cancer[0] == train_data[0]).mean() == 1)
print((train_cancer[-1] == train_data[scale_down_number-1]).mean() == 1)
print((train_normal[-1] == train_data[-1]).mean() == 1)

In [None]:
# Conserves RAM

del train_cancer
del train_normal

In [None]:
# Assigns labels of 1 to cancer tiles and 0 to normal tiles

train_labels = np.zeros(len(train_data))
train_labels[0:scale_down_number] = 1

In [None]:
# Below code confirms that labels have been assigned in the proper order.
# All outputs should be True.

print(train_labels[0] == 1)
print(train_labels[scale_down_number] == 0)
print(train_labels[-1] == 0)
print(train_labels.mean() == 0.5)

In [None]:
train_labels = to_categorical(train_labels)

In [None]:
# Performs an 80/20 train/test split to create a validation set.
# Original train_data is deleted to conserve RAM.

X_train, X_val, y_train, y_val = train_test_split(
    train_data, train_labels, test_size=0.20, random_state=27)
del train_data

In [None]:
# Sets desired parameters for CNN training

image_width, image_height = 256, 256
batch_size = 64
train_steps_per_epoch = int(len(X_train)/batch_size)
val_steps_per_epoch = int(len(X_val))

In [None]:
# Creates ImageDataGenerator objects for training and validation sets
# Normalizes RGB values in tiles

train_IDG = ImageDataGenerator(rescale=1./255)
val_IDG = ImageDataGenerator(rescale=1./255)
train_generator = train_IDG.flow(X_train, y_train, batch_size=batch_size)
val_generator = val_IDG.flow(X_val, y_val, batch_size=1)

## Model #1

In [None]:
# First model to try

cnn1 = models.Sequential()
cnn1.add(layers.Conv2D(64, (3, 3), activation='relu', input_shape=(
    image_width, image_height,  3), padding='SAME'))
cnn1.add(layers.MaxPooling2D((2, 2)))
cnn1.add(layers.Conv2D(32, (3, 3), activation='relu'))
cnn1.add(layers.MaxPooling2D((2, 2)))
cnn1.add(layers.Flatten())
cnn1.add(layers.Dense(32, activation='relu'))
cnn1.add(layers.Dense(2, activation='sigmoid'))
cnn1.compile(loss='binary_crossentropy', optimizer="sgd", metrics=['acc'])

In [None]:
# Sets callback parameters

saving_weights_1 = keras.callbacks.ModelCheckpoint(
    'weights1.{epoch:02d}-{val_loss:.2f}.hdf5',
    monitor='val_loss', verbose=0, save_best_only=True,
    save_weights_only=False, mode='auto', period=10)

reduce_lr_1 = keras.callbacks.ReduceLROnPlateau(
    monitor='val_loss', factor=0.1, patience=20,
    verbose=0, mode='auto', min_delta=0.0001, min_lr=0)

nan_problem_1 = keras.callbacks.TerminateOnNaN()

early_stop_1 = keras.callbacks.EarlyStopping(
    monitor='val_loss', min_delta=0, patience=20,
    verbose=0, mode='auto', baseline=None, restore_best_weights=False)

csv_logger_1 = keras.callbacks.CSVLogger('training_1.log')
print(cnn1.summary())

In [None]:
# Fit first 10 epochs

cnn1.fit_generator(train_generator, epochs=10, steps_per_epoch=train_steps_per_epoch, validation_data=val_generator,
                   validation_steps=val_steps_per_epoch, callbacks=[csv_logger_1, early_stop_1, nan_problem_1, reduce_lr_1, saving_weights_1])

In [None]:
cnn1.save('cnn1_10.h5')

In [None]:
# Fit an additional 10 epochs

cnn1.fit_generator(train_generator, epochs=10, steps_per_epoch=train_steps_per_epoch, validation_data=val_generator,
                   validation_steps=val_steps_per_epoch, callbacks=[csv_logger_1, early_stop_1, nan_problem_1, reduce_lr_1, saving_weights_1])

In [None]:
cnn1.save('cnn1_20.h5')

## Model #2: Change optimizer from SGD to ADAM

In [None]:
cnn2 = models.Sequential()
cnn2.add(layers.Conv2D(64, (3, 3), activation='relu', input_shape=(
    image_width, image_height,  3), padding='SAME'))
cnn2.add(layers.MaxPooling2D((2, 2)))
cnn2.add(layers.Conv2D(32, (3, 3), activation='relu'))
cnn2.add(layers.MaxPooling2D((2, 2)))
cnn2.add(layers.Flatten())
cnn2.add(layers.Dense(32, activation='relu'))
cnn2.add(layers.Dense(2, activation='sigmoid'))
cnn2.compile(loss='binary_crossentropy', optimizer="adam", metrics=['acc'])

In [None]:
saving_weights_2 = keras.callbacks.ModelCheckpoint(
    'weights2.{epoch:02d}-{val_loss:.2f}.hdf5',
    monitor='val_loss', verbose=0, save_best_only=True,
    save_weights_only=False, mode='auto', period=10)

reduce_lr_2 = keras.callbacks.ReduceLROnPlateau(
    monitor='val_loss', factor=0.1, patience=20,
    verbose=0, mode='auto', min_delta=0.0001, min_lr=0)

nan_problem_2 = keras.callbacks.TerminateOnNaN()

early_stop_2 = keras.callbacks.EarlyStopping(
    monitor='val_loss', min_delta=0, patience=20,
    verbose=0, mode='auto', baseline=None, restore_best_weights=False)

csv_logger_2 = keras.callbacks.CSVLogger('training_2.log')
print(cnn2.summary())

In [None]:
cnn2.fit_generator(train_generator, epochs=10, steps_per_epoch=train_steps_per_epoch, validation_data=val_generator,
                   validation_steps=val_steps_per_epoch, callbacks=[csv_logger_2, early_stop_2, nan_problem_2, reduce_lr_2, saving_weights_2])

In [None]:
cnn2.save('cnn2_10.h5')

## Model #3: Change optimizer back to SGD, add another dense layer

In [None]:
cnn3 = models.Sequential()
cnn3.add(layers.Conv2D(64, (3, 3), activation='relu', input_shape=(
    image_width, image_height,  3), padding='SAME'))
cnn3.add(layers.MaxPooling2D((2, 2)))
cnn3.add(layers.Conv2D(32, (3, 3), activation='relu'))
cnn3.add(layers.MaxPooling2D((2, 2)))
cnn3.add(layers.Flatten())
cnn3.add(layers.Dense(32, activation='relu'))
cnn3.add(layers.Dense(32, activation='relu'))
cnn3.add(layers.Dense(2, activation='sigmoid'))
cnn3.compile(loss='binary_crossentropy', optimizer="sgd", metrics=['acc'])

In [None]:
saving_weights_3 = keras.callbacks.ModelCheckpoint(
    'weights3.{epoch:02d}-{val_loss:.2f}.hdf5',
    monitor='val_loss', verbose=0, save_best_only=True,
    save_weights_only=False, mode='auto', period=10)

reduce_lr_3 = keras.callbacks.ReduceLROnPlateau(
    monitor='val_loss', factor=0.1, patience=20,
    verbose=0, mode='auto', min_delta=0.0001, min_lr=0)

nan_problem_3 = keras.callbacks.TerminateOnNaN()

early_stop_3 = keras.callbacks.EarlyStopping(
    monitor='val_loss', min_delta=0, patience=20,
    verbose=0, mode='auto', baseline=None, restore_best_weights=False)

csv_logger_3 = keras.callbacks.CSVLogger('training_3.log')
print(cnn3.summary())

In [None]:
cnn3.fit_generator(train_generator, epochs=10, steps_per_epoch=train_steps_per_epoch, validation_data=val_generator,
                   validation_steps=val_steps_per_epoch, callbacks=[csv_logger_3, early_stop_3, nan_problem_3, reduce_lr_3, saving_weights_3])

In [None]:
cnn3.save('cnn3_10.h5')

In [None]:
# Fit an additional 10 epochs

cnn3.fit_generator(train_generator, epochs=10, steps_per_epoch=train_steps_per_epoch, validation_data=val_generator,
                   validation_steps=val_steps_per_epoch, callbacks=[csv_logger_3, early_stop_3, nan_problem_3, reduce_lr_3, saving_weights_3])

In [None]:
cnn3.save('cnn3_20.h5')

## Model #4: Take away added dense layer, add convolution/pool layer

In [None]:
cnn4 = models.Sequential()
cnn4.add(layers.Conv2D(64, (3, 3), activation='relu', input_shape=(
    image_width, image_height,  3), padding='SAME'))
cnn4.add(layers.MaxPooling2D((2, 2)))
cnn4.add(layers.Conv2D(64, (3, 3), activation='relu', input_shape=(
    image_width, image_height,  3), padding='SAME'))
cnn4.add(layers.MaxPooling2D((2, 2)))
cnn4.add(layers.Conv2D(32, (3, 3), activation='relu'))
cnn4.add(layers.MaxPooling2D((2, 2)))
cnn4.add(layers.Flatten())
cnn4.add(layers.Dense(32, activation='relu'))
cnn4.add(layers.Dense(2, activation='sigmoid'))
cnn4.compile(loss='binary_crossentropy', optimizer="sgd", metrics=['acc'])

In [None]:
saving_weights_4 = keras.callbacks.ModelCheckpoint(
    'weights4.{epoch:02d}-{val_loss:.2f}.hdf5',
    monitor='val_loss', verbose=0, save_best_only=True,
    save_weights_only=False, mode='auto', period=10)

reduce_lr_4 = keras.callbacks.ReduceLROnPlateau(
    monitor='val_loss', factor=0.1, patience=20,
    verbose=0, mode='auto', min_delta=0.0001, min_lr=0)

nan_problem_4 = keras.callbacks.TerminateOnNaN()

early_stop_4 = keras.callbacks.EarlyStopping(
    monitor='val_loss', min_delta=0, patience=20,
    verbose=0, mode='auto', baseline=None, restore_best_weights=False)

csv_logger_4 = keras.callbacks.CSVLogger('training_4.log')
print(cnn4.summary())

In [None]:
cnn4.fit_generator(train_generator, epochs=10, steps_per_epoch=train_steps_per_epoch, validation_data=val_generator,
                   validation_steps=val_steps_per_epoch, callbacks=[csv_logger_4, early_stop_4, nan_problem_4, reduce_lr_4, saving_weights_4])

In [None]:
cnn4.save('cnn4_10.h5')

In [None]:
# Fit an additional 10 epochs

cnn4.fit_generator(train_generator, epochs=10, steps_per_epoch=train_steps_per_epoch, validation_data=val_generator,
                   validation_steps=val_steps_per_epoch, callbacks=[csv_logger_4, early_stop_4, nan_problem_4, reduce_lr_4, saving_weights_4])

In [None]:
cnn4.save('cnn4_20.h5')

## Model #5: Keep added convolution layer, but increase filter size of first convolution.

In [None]:
cnn5 = models.Sequential()
cnn5.add(layers.Conv2D(128, (3, 3), activation='relu',
                       input_shape=(image_width, image_height,  3), padding='SAME'))
cnn5.add(layers.MaxPooling2D((2, 2)))
cnn5.add(layers.Conv2D(64, (3, 3), activation='relu', input_shape=(
    image_width, image_height,  3), padding='SAME'))
cnn5.add(layers.MaxPooling2D((2, 2)))
cnn5.add(layers.Conv2D(32, (3, 3), activation='relu'))
cnn5.add(layers.MaxPooling2D((2, 2)))
cnn5.add(layers.Flatten())
cnn5.add(layers.Dense(32, activation='relu'))
cnn5.add(layers.Dense(2, activation='sigmoid'))
cnn5.compile(loss='binary_crossentropy', optimizer="sgd", metrics=['acc'])

In [None]:
saving_weights_5 = keras.callbacks.ModelCheckpoint(
    'weights5.{epoch:02d}-{val_loss:.2f}.hdf5',
    monitor='val_loss', verbose=0, save_best_only=True,
    save_weights_only=False, mode='auto', period=10)

reduce_lr_5 = keras.callbacks.ReduceLROnPlateau(
    monitor='val_loss', factor=0.1, patience=20,
    verbose=0, mode='auto', min_delta=0.0001, min_lr=0)

nan_problem_5 = keras.callbacks.TerminateOnNaN()

early_stop_5 = keras.callbacks.EarlyStopping(
    monitor='val_loss', min_delta=0, patience=20,
    verbose=0, mode='auto', baseline=None, restore_best_weights=False)

csv_logger_5 = keras.callbacks.CSVLogger('training_5.log')
print(cnn5.summary())

In [None]:
cnn5.fit_generator(train_generator, epochs=10, steps_per_epoch=train_steps_per_epoch, validation_data=val_generator,
                   validation_steps=val_steps_per_epoch, callbacks=[csv_logger_5, early_stop_5, nan_problem_5, reduce_lr_5, saving_weights_5])

In [None]:
cnn5.save('cnn5_10.h5')

## Model #6: Take first convolution filter back down to 64, lower second convolution filter to 32.

In [None]:
cnn6 = models.Sequential()
cnn6.add(layers.Conv2D(64, (3, 3), activation='relu', input_shape=(
    image_width, image_height,  3), padding='SAME'))
cnn6.add(layers.MaxPooling2D((2, 2)))
cnn6.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(
    image_width, image_height,  3), padding='SAME'))
cnn6.add(layers.MaxPooling2D((2, 2)))
cnn6.add(layers.Conv2D(32, (3, 3), activation='relu'))
cnn6.add(layers.MaxPooling2D((2, 2)))
cnn6.add(layers.Flatten())
cnn6.add(layers.Dense(32, activation='relu'))
cnn6.add(layers.Dense(2, activation='sigmoid'))
cnn6.compile(loss='binary_crossentropy', optimizer="sgd", metrics=['acc'])

In [None]:
saving_weights_6 = keras.callbacks.ModelCheckpoint(
    'weights6.{epoch:02d}-{val_loss:.2f}.hdf5',
    monitor='val_loss', verbose=0, save_best_only=True,
    save_weights_only=False, mode='auto', period=10)

reduce_lr_6 = keras.callbacks.ReduceLROnPlateau(
    monitor='val_loss', factor=0.1, patience=20,
    verbose=0, mode='auto', min_delta=0.0001, min_lr=0)

nan_problem_6 = keras.callbacks.TerminateOnNaN()

early_stop_6 = keras.callbacks.EarlyStopping(
    monitor='val_loss', min_delta=0, patience=20,
    verbose=0, mode='auto', baseline=None, restore_best_weights=False)

csv_logger_6 = keras.callbacks.CSVLogger('training_6.log')
print(cnn6.summary())

In [None]:
cnn6.fit_generator(train_generator, epochs=10, steps_per_epoch=train_steps_per_epoch, validation_data=val_generator,
                   validation_steps=val_steps_per_epoch, callbacks=[csv_logger_6, early_stop_6, nan_problem_6, reduce_lr_6, saving_weights_6])

In [None]:
cnn6.save('cnn6_10.h5')

In [None]:
# Fit an additional 10 epochs

cnn6.fit_generator(train_generator, epochs=10, steps_per_epoch=train_steps_per_epoch, validation_data=val_generator,
                   validation_steps=val_steps_per_epoch, callbacks=[csv_logger_6, early_stop_6, nan_problem_6, reduce_lr_6, saving_weights_6])

In [None]:
cnn6.save('cnn6_20.h5')

## Two scaled-up models will be run according to architecture from Models 1 and 6. 
## 60,000 samples will be used for training/validation, and 30 - 60 epochs will be used. 

In [None]:
# Loads and prepares 30,000 tiles from each class chosen at random

train_cancer_30k = prep.cancer_train_jpegs_to_arrays(
    cancer_train_glob, scale_down=30000)
train_normal_30k = prep.normal_train_jpegs_to_arrays(
    normal_train_glob, scale_down=30000)
train_data_60k = np.concatenate((train_cancer_30k, train_normal_30k))

In [None]:
# Below code confirms that cancer data and normal data have been
# concatenated properly. All outputs should be True.

print((train_cancer_30k[0] == train_data_60k[0]).mean() == 1)
print((train_cancer_30k[-1] ==
       train_data_60k[len(train_data_60k/2)-1]).mean() == 1)
print((train_normal_30k[-1] == train_data_60k[-1]).mean() == 1)

In [None]:
# Conserves RAM

del train_cancer_30k
del train_normal_30k

In [None]:
# Assigns labels

train_labels_60k = np.zeros(len(train_data_60k))
train_labels_60k[0:int(len(train_labels_60k)/2)] = 1

In [None]:
# Below code confirms that labels have been assigned in the proper order.
# All outputs should be True.

print(train_labels_60k[0] == 1)
print(train_labels_60k[int(len(train_labels_60k)/2)] == 0)
print(train_labels_60k[-1] == 0)
print(train_labels_60k.mean() == 0.5)

In [None]:
train_labels_60k = to_categorical(train_labels_60k)

In [None]:
# Performs an 80/20 train/test split to create a validation set.

X_train_60k, X_val_60k, y_train_60k, y_val_60k = train_test_split(
    train_data_60k, train_labels_60k, test_size=0.20, random_state=27)

In [None]:
# Original train_data is deleted to conserve RAM.

del train_data_60k

In [None]:
# Sets desired parameters for CNN training

image_width, image_height = 256, 256
batch_size_60k = 64
train_steps_per_epoch_60k = int(len(X_train_60k)/batch_size_60k)
val_steps_per_epoch_60k = int(len(X_val_60k))

In [None]:
# Creates ImageDataGenerator objects for training and validation sets
# Normalizes RGB values in tiles

train_IDG_60k = ImageDataGenerator(rescale=1./255)
val_IDG_60k = ImageDataGenerator(rescale=1./255)

In [None]:
# Applies ImageDataGenerator

train_generator_60k = train_IDG_60k.flow(
    X_train_60k, y_train_60k, batch_size=batch_size_60k)

In [None]:
# Conserves RAM

del X_train_60k
del y_train_60k

In [None]:
# Applies ImageDataGenerator

val_generator_60k = val_IDG_60k.flow(X_val_60k, y_val_60k, batch_size=1)

In [None]:
# Conserves RAM

del X_val_60k
del y_val_60k

## Scaled-Up Model #1

In [None]:
cnn1_60k = models.Sequential()
cnn1_60k.add(layers.Conv2D(64, (3, 3), activation='relu',
                           input_shape=(image_width, image_height,  3), padding='SAME'))
cnn1_60k.add(layers.MaxPooling2D((2, 2)))
cnn1_60k.add(layers.Conv2D(32, (3, 3), activation='relu'))
cnn1_60k.add(layers.MaxPooling2D((2, 2)))
cnn1_60k.add(layers.Flatten())
cnn1_60k.add(layers.Dense(32, activation='relu'))
cnn1_60k.add(layers.Dense(2, activation='sigmoid'))
cnn1_60k.compile(loss='binary_crossentropy', optimizer="sgd", metrics=['acc'])

### Fitting Epochs 1 - 10

In [None]:
saving_weights_1_60k_1 = keras.callbacks.ModelCheckpoint(
    'weights1_60k_1.{epoch:02d}-{val_loss:.2f}.hdf5',
    monitor='val_loss', verbose=0, save_best_only=True,
    save_weights_only=False, mode='auto', period=1)

reduce_lr_1_60k_1 = keras.callbacks.ReduceLROnPlateau(
    monitor='val_loss', factor=0.1, patience=20,
    verbose=0, mode='auto', min_delta=0.0001, min_lr=0)

nan_problem_1_60k_1 = keras.callbacks.TerminateOnNaN()

early_stop_1_60k_1 = keras.callbacks.EarlyStopping(
    monitor='val_loss', min_delta=0, patience=20,
    verbose=0, mode='auto', baseline=None, restore_best_weights=False)

csv_logger_1_60k_1 = keras.callbacks.CSVLogger('training_1_60k_1.log')
print(cnn1_60k.summary())

In [None]:
cnn1_60k.fit_generator(train_generator_60k, epochs=10, steps_per_epoch=train_steps_per_epoch_60k, validation_data=val_generator_60k,
                       validation_steps=val_steps_per_epoch_60k, callbacks=[csv_logger_1_60k_1, early_stop_1_60k_1, nan_problem_1_60k_1, reduce_lr_1_60k_1, saving_weights_1_60k_1])

In [None]:
cnn1_60k.save('cnn1_60k_1.h5')

### Fitting Epochs 11 - 20

In [None]:
saving_weights_1_60k_2 = keras.callbacks.ModelCheckpoint(
    'weights1_60k_2.{epoch:02d}-{val_loss:.2f}.hdf5',
    monitor='val_loss', verbose=0, save_best_only=True,
    save_weights_only=False, mode='auto', period=1)

reduce_lr_1_60k_2 = keras.callbacks.ReduceLROnPlateau(
    monitor='val_loss', factor=0.1, patience=20,
    verbose=0, mode='auto', min_delta=0.0001, min_lr=0)

nan_problem_1_60k_2 = keras.callbacks.TerminateOnNaN()

early_stop_1_60k_2 = keras.callbacks.EarlyStopping(
    monitor='val_loss', min_delta=0, patience=20,
    verbose=0, mode='auto', baseline=None, restore_best_weights=False)

csv_logger_1_60k_2 = keras.callbacks.CSVLogger('training_1_60k_2.log')

In [None]:
cnn1_60k.fit_generator(train_generator_60k, epochs=10, steps_per_epoch=train_steps_per_epoch_60k, validation_data=val_generator_60k,
                       validation_steps=val_steps_per_epoch_60k, callbacks=[csv_logger_1_60k_2, early_stop_1_60k_2, nan_problem_1_60k_2, reduce_lr_1_60k_2, saving_weights_1_60k_2])

In [None]:
cnn1_60k.save('cnn1_60k_2.h5')

### Fitting Epochs 21 - 30

In [None]:
saving_weights_1_60k_3 = keras.callbacks.ModelCheckpoint(
    'weights1_60k_3.{epoch:02d}-{val_loss:.2f}.hdf5',
    monitor='val_loss', verbose=0, save_best_only=True,
    save_weights_only=False, mode='auto', period=1)

reduce_lr_1_60k_3 = keras.callbacks.ReduceLROnPlateau(
    monitor='val_loss', factor=0.1, patience=20,
    verbose=0, mode='auto', min_delta=0.0001, min_lr=0)

nan_problem_1_60k_3 = keras.callbacks.TerminateOnNaN()

early_stop_1_60k_3 = keras.callbacks.EarlyStopping(
    monitor='val_loss', min_delta=0, patience=20,
    verbose=0, mode='auto', baseline=None, restore_best_weights=False)

csv_logger_1_60k_3 = keras.callbacks.CSVLogger('training_1_60k_3.log')

In [None]:
cnn1_60k.fit_generator(train_generator_60k, epochs=10, steps_per_epoch=train_steps_per_epoch_60k, validation_data=val_generator_60k,
                       validation_steps=val_steps_per_epoch_60k, callbacks=[csv_logger_1_60k_3, early_stop_1_60k_3, nan_problem_1_60k_3, reduce_lr_1_60k_3, saving_weights_1_60k_3])

In [None]:
cnn1_60k.save('cnn1_60k_3.h5')

## Scaled-Up Model #6

In [None]:
cnn6_60k = models.Sequential()
cnn6_60k.add(layers.Conv2D(64, (3, 3), activation='relu',
                           input_shape=(image_width, image_height,  3), padding='SAME'))
cnn6_60k.add(layers.MaxPooling2D((2, 2)))
cnn6_60k.add(layers.Conv2D(32, (3, 3), activation='relu',
                           input_shape=(image_width, image_height,  3), padding='SAME'))
cnn6_60k.add(layers.MaxPooling2D((2, 2)))
cnn6_60k.add(layers.Conv2D(32, (3, 3), activation='relu'))
cnn6_60k.add(layers.MaxPooling2D((2, 2)))
cnn6_60k.add(layers.Flatten())
cnn6_60k.add(layers.Dense(32, activation='relu'))
cnn6_60k.add(layers.Dense(2, activation='sigmoid'))
cnn6_60k.compile(loss='binary_crossentropy', optimizer="sgd", metrics=['acc'])

### Fitting Epochs 1 - 10

In [None]:
saving_weights_6_60k_1 = keras.callbacks.ModelCheckpoint(
    'weights6_60k_1.{epoch:02d}-{val_loss:.2f}.hdf5',
    monitor='val_loss', verbose=0, save_best_only=True,
    save_weights_only=False, mode='auto', period=1)

reduce_lr_6_60k_1 = keras.callbacks.ReduceLROnPlateau(
    monitor='val_loss', factor=0.1, patience=20,
    verbose=0, mode='auto', min_delta=0.0001, min_lr=0)

nan_problem_6_60k_1 = keras.callbacks.TerminateOnNaN()

early_stop_6_60k_1 = keras.callbacks.EarlyStopping(
    monitor='val_loss', min_delta=0, patience=20,
    verbose=0, mode='auto', baseline=None, restore_best_weights=False)

csv_logger_6_60k_1 = keras.callbacks.CSVLogger('training_6_60k_1.log')
print(cnn6_60k.summary())

In [None]:
cnn6_60k.fit_generator(train_generator_60k, epochs=10, steps_per_epoch=train_steps_per_epoch_60k, validation_data=val_generator_60k,
                       validation_steps=val_steps_per_epoch_60k, callbacks=[csv_logger_6_60k_1, early_stop_6_60k_1, nan_problem_6_60k_1, reduce_lr_6_60k_1, saving_weights_6_60k_1])

In [None]:
cnn6_60k.save('cnn6_60k_1.h5')

### Fitting Epochs 11 - 20

In [None]:
saving_weights_6_60k_2 = keras.callbacks.ModelCheckpoint(
    'weights6_60k_2.{epoch:02d}-{val_loss:.2f}.hdf5',
    monitor='val_loss', verbose=0, save_best_only=True,
    save_weights_only=False, mode='auto', period=1)

reduce_lr_6_60k_2 = keras.callbacks.ReduceLROnPlateau(
    monitor='val_loss', factor=0.1, patience=20,
    verbose=0, mode='auto', min_delta=0.0001, min_lr=0)

nan_problem_6_60k_2 = keras.callbacks.TerminateOnNaN()

early_stop_6_60k_2 = keras.callbacks.EarlyStopping(
    monitor='val_loss', min_delta=0, patience=20,
    verbose=0, mode='auto', baseline=None, restore_best_weights=False)

csv_logger_6_60k_2 = keras.callbacks.CSVLogger('training_6_60k_2.log')

In [None]:
cnn6_60k.fit_generator(train_generator_60k, epochs=10, steps_per_epoch=train_steps_per_epoch_60k, validation_data=val_generator_60k,
                       validation_steps=val_steps_per_epoch_60k, callbacks=[csv_logger_6_60k_2, early_stop_6_60k_2, nan_problem_6_60k_2, reduce_lr_6_60k_2, saving_weights_6_60k_2])

In [None]:
cnn6_60k.save('cnn6_60k_2.h5')

### Fitting Epochs 21 - 30

In [None]:
saving_weights_6_60k_3 = keras.callbacks.ModelCheckpoint(
    'weights6_60k_3.{epoch:02d}-{val_loss:.2f}.hdf5',
    monitor='val_loss', verbose=0, save_best_only=True,
    save_weights_only=False, mode='auto', period=1)

reduce_lr_6_60k_3 = keras.callbacks.ReduceLROnPlateau(
    monitor='val_loss', factor=0.1, patience=20,
    verbose=0, mode='auto', min_delta=0.0001, min_lr=0)

nan_problem_6_60k_3 = keras.callbacks.TerminateOnNaN()

early_stop_6_60k_3 = keras.callbacks.EarlyStopping(
    monitor='val_loss', min_delta=0, patience=20,
    verbose=0, mode='auto', baseline=None, restore_best_weights=False)

csv_logger_6_60k_3 = keras.callbacks.CSVLogger('training_6_60k_3.log')

In [None]:
cnn6_60k.fit_generator(train_generator_60k, epochs=10, steps_per_epoch=train_steps_per_epoch_60k, validation_data=val_generator_60k,
                       validation_steps=val_steps_per_epoch_60k, callbacks=[csv_logger_6_60k_3, early_stop_6_60k_3, nan_problem_6_60k_3, reduce_lr_6_60k_3, saving_weights_6_60k_3])

In [None]:
cnn6_60k.save('cnn6_60k_3.h5')

### Fitting Epochs 31 - 40

In [None]:
saving_weights_6_60k_4 = keras.callbacks.ModelCheckpoint(
    'weights6_60k_4.{epoch:02d}-{val_loss:.2f}.hdf5',
    monitor='val_loss', verbose=0, save_best_only=True,
    save_weights_only=False, mode='auto', period=1)

reduce_lr_6_60k_4 = keras.callbacks.ReduceLROnPlateau(
    monitor='val_loss', factor=0.1, patience=20,
    verbose=0, mode='auto', min_delta=0.0001, min_lr=0)

nan_problem_6_60k_4 = keras.callbacks.TerminateOnNaN()

early_stop_6_60k_4 = keras.callbacks.EarlyStopping(
    monitor='val_loss', min_delta=0, patience=20,
    verbose=0, mode='auto', baseline=None, restore_best_weights=False)

csv_logger_6_60k_4 = keras.callbacks.CSVLogger('training_6_60k_4.log')

In [None]:
cnn6_60k.fit_generator(train_generator_60k, epochs=10, steps_per_epoch=train_steps_per_epoch_60k, validation_data=val_generator_60k,
                       validation_steps=val_steps_per_epoch_60k, callbacks=[csv_logger_6_60k_4, early_stop_6_60k_4, nan_problem_6_60k_4, reduce_lr_6_60k_4, saving_weights_6_60k_4])

In [None]:
cnn6_60k.save('cnn6_60k_4.h5')

### Fitting Epochs 41 - 50

In [None]:
saving_weights_6_60k_5 = keras.callbacks.ModelCheckpoint(
    'weights6_60k_5.{epoch:02d}-{val_loss:.2f}.hdf5',
    monitor='val_loss', verbose=0, save_best_only=True,
    save_weights_only=False, mode='auto', period=1)

reduce_lr_6_60k_5 = keras.callbacks.ReduceLROnPlateau(
    monitor='val_loss', factor=0.1, patience=20,
    verbose=0, mode='auto', min_delta=0.0001, min_lr=0)

nan_problem_6_60k_5 = keras.callbacks.TerminateOnNaN()

early_stop_6_60k_5 = keras.callbacks.EarlyStopping(
    monitor='val_loss', min_delta=0, patience=20,
    verbose=0, mode='auto', baseline=None, restore_best_weights=False)

csv_logger_6_60k_5 = keras.callbacks.CSVLogger('training_6_60k_5.log')

In [None]:
cnn6_60k.fit_generator(train_generator_60k, epochs=10, steps_per_epoch=train_steps_per_epoch_60k, validation_data=val_generator_60k,
                       validation_steps=val_steps_per_epoch_60k, callbacks=[csv_logger_6_60k_5, early_stop_6_60k_5, nan_problem_6_60k_5, reduce_lr_6_60k_5, saving_weights_6_60k_5])

In [None]:
cnn6_60k.save('cnn6_60k_5.h5')

## Take another random sample of data and implement Image Augmentation to continue training models.

In [None]:
# Loads and prepares 30,000 tiles from each class chosen at random

train_cancer_30k_aug = prep.cancer_train_jpegs_to_arrays(
    cancer_train_glob, scale_down=30000)
train_normal_30k_aug = prep.normal_train_jpegs_to_arrays(
    normal_train_glob, scale_down=30000)
train_data_60k_aug = np.concatenate(
    (train_cancer_30k_aug, train_normal_30k_aug))

In [None]:
# Below code confirms that cancer data and normal data have been
# concatenated properly. All outputs should be True.

print((train_cancer_30k_aug[0] == train_data_60k_aug[0]).mean() == 1)
print((train_cancer_30k_aug[-1] ==
       train_data_60k_aug[len(train_data_60k_aug/2)-1]).mean() == 1)
print((train_normal_30k_aug[-1] == train_data_60k_aug[-1]).mean() == 1)

In [None]:
# Conserves RAM

del train_cancer_30k_aug
del train_normal_30k_aug

In [None]:
# Assigns labels

train_labels_60k_aug = np.zeros(len(train_data_60k_aug))
train_labels_60k_aug[0:int(len(train_labels_60k_aug)/2)] = 1

In [None]:
# Below code confirms that labels have been assigned in the proper order.
# All outputs should be True.

print(train_labels_60k_aug[0] == 1)
print(train_labels_60k_aug[int(len(train_labels_60k_aug)/2)] == 0)
print(train_labels_60k_aug[-1] == 0)
print(train_labels_60k_aug.mean() == 0.5)

In [None]:
train_labels_60k_aug = to_categorical(train_labels_60k_aug)

In [None]:
# Performs an 80/20 train/test split to create a validation set.


X_train_60k_aug, X_val_60k_aug, y_train_60k_aug, y_val_60k_aug = train_test_split(
    train_data_60k_aug, train_labels_60k_aug, test_size=0.20, random_state=27)

In [None]:
# Original train_data is deleted to conserve RAM.

del train_data_60k_aug

In [None]:
# Sets desired parameters for CNN training

image_width, image_height = 256, 256
batch_size_60k_aug = 64
train_steps_per_epoch_60k_aug = int(len(X_train_60k_aug)/batch_size_60k_aug)
val_steps_per_epoch_60k_aug = int(len(X_val_60k_aug))

In [None]:
# Creates ImageDataGenerator objects for training and validation sets
# Normalizes RGB values in tiles
# Implements Image Augmentation - random flips and rotations

train_IDG_60k_aug = ImageDataGenerator(
    rescale=1./255, rotation_range=180, horizontal_flip=True, vertical_flip=True)
val_IDG_60k_aug = ImageDataGenerator(rescale=1./255)

In [None]:
# Applies ImageDataGenerator

train_generator_60k_aug = train_IDG_60k_aug.flow(
    X_train_60k_aug, y_train_60k_aug, batch_size=batch_size_60k_aug)

In [None]:
# Conserves RAM

del X_train_60k_aug
del y_train_60k_aug

In [None]:
# Applies ImageDataGenerator

val_generator_60k_aug = val_IDG_60k_aug.flow(
    X_val_60k_aug, y_val_60k_aug, batch_size=1)

In [None]:
# Conserves RAM

del X_val_60k_aug
del y_val_60k_aug

## Scaled-Up Model #1 with New Data and Augmentation

### Fitting Epochs 31 - 40

In [None]:
saving_weights_1_60k_4_aug = keras.callbacks.ModelCheckpoint(
    'weights1_60k_4_aug.{epoch:02d}-{val_loss:.2f}.hdf5',
    monitor='val_loss', verbose=0, save_best_only=True,
    save_weights_only=False, mode='auto', period=1)

reduce_lr_1_60k_4_aug = keras.callbacks.ReduceLROnPlateau(
    monitor='val_loss', factor=0.1, patience=20,
    verbose=0, mode='auto', min_delta=0.0001, min_lr=0)

nan_problem_1_60k_4_aug = keras.callbacks.TerminateOnNaN()

early_stop_1_60k_4_aug = keras.callbacks.EarlyStopping(
    monitor='val_loss', min_delta=0, patience=20,
    verbose=0, mode='auto', baseline=None, restore_best_weights=False)

csv_logger_1_60k_4_aug = keras.callbacks.CSVLogger('training_1_60k_4_aug.log')

In [None]:
cnn1_60k.fit_generator(train_generator_60k_aug, epochs=10, steps_per_epoch=train_steps_per_epoch_60k_aug, validation_data=val_generator_60k_aug, validation_steps=val_steps_per_epoch_60k_aug, callbacks=[
                       csv_logger_1_60k_4_aug, early_stop_1_60k_4_aug, nan_problem_1_60k_4_aug, reduce_lr_1_60k_4_aug, saving_weights_1_60k_4_aug])

In [None]:
cnn1_60k.save('cnn1_60k_4_aug.h5')

### Fitting Epochs 41 - 50

In [None]:
saving_weights_1_60k_5_aug = keras.callbacks.ModelCheckpoint(
    'weights1_60k_5_aug.{epoch:02d}-{val_loss:.2f}.hdf5',
    monitor='val_loss', verbose=0, save_best_only=True,
    save_weights_only=False, mode='auto', period=1)

reduce_lr_1_60k_5_aug = keras.callbacks.ReduceLROnPlateau(
    monitor='val_loss', factor=0.1, patience=20,
    verbose=0, mode='auto', min_delta=0.0001, min_lr=0)

nan_problem_1_60k_5_aug = keras.callbacks.TerminateOnNaN()

early_stop_1_60k_5_aug = keras.callbacks.EarlyStopping(
    monitor='val_loss', min_delta=0, patience=20,
    verbose=0, mode='auto', baseline=None, restore_best_weights=False)

csv_logger_1_60k_5_aug = keras.callbacks.CSVLogger('training_1_60k_5_aug.log')

In [None]:
cnn1_60k.fit_generator(train_generator_60k_aug, epochs=10, steps_per_epoch=train_steps_per_epoch_60k_aug, validation_data=val_generator_60k_aug, validation_steps=val_steps_per_epoch_60k_aug, callbacks=[
                       csv_logger_1_60k_5_aug, early_stop_1_60k_5_aug, nan_problem_1_60k_5_aug, reduce_lr_1_60k_5_aug, saving_weights_1_60k_5_aug])

In [None]:
cnn1_60k.save('cnn1_60k_5_aug.h5')

### Fitting Epochs 51 - 60

In [None]:
saving_weights_1_60k_6_aug = keras.callbacks.ModelCheckpoint(
    'weights1_60k_6_aug.{epoch:02d}-{val_loss:.2f}.hdf5',
    monitor='val_loss', verbose=0, save_best_only=True,
    save_weights_only=False, mode='auto', period=1)

reduce_lr_1_60k_6_aug = keras.callbacks.ReduceLROnPlateau(
    monitor='val_loss', factor=0.1, patience=20,
    verbose=0, mode='auto', min_delta=0.0001, min_lr=0)

nan_problem_1_60k_6_aug = keras.callbacks.TerminateOnNaN()

early_stop_1_60k_6_aug = keras.callbacks.EarlyStopping(
    monitor='val_loss', min_delta=0, patience=20,
    verbose=0, mode='auto', baseline=None, restore_best_weights=False)

csv_logger_1_60k_6_aug = keras.callbacks.CSVLogger('training_1_60k_6_aug.log')

In [None]:
cnn1_60k.fit_generator(train_generator_60k_aug, epochs=10, steps_per_epoch=train_steps_per_epoch_60k_aug, validation_data=val_generator_60k_aug, validation_steps=val_steps_per_epoch_60k_aug, callbacks=[
                       csv_logger_1_60k_6_aug, early_stop_1_60k_6_aug, nan_problem_1_60k_6_aug, reduce_lr_1_60k_6_aug, saving_weights_1_60k_6_aug])

In [None]:
cnn1_60k.save('cnn1_60k_6_aug.h5')

# Final Model: 
## Scaled-Up Model # 6, Retrain with all New Data, Increase Size of Training and Validation Set, Add Seed, Implement Image Augmentation

In [None]:
# Loads and prepares 35,000 tiles from each class chosen at random
# This is the most that this particular server would allow before
# running out of memory

train_cancer_6_final = prep.cancer_train_jpegs_to_arrays(
    cancer_train_glob, scale_down=35000, seed=52)
train_normal_6_final = prep.normal_train_jpegs_to_arrays(
    normal_train_glob, scale_down=35000, seed=52)
train_data_6_final = np.concatenate(
    (train_cancer_6_final, train_normal_6_final))

In [None]:
# Below code confirms that cancer data and normal data have been
# concatenated properly. All outputs should be True.
print((train_cancer_6_final[0] == train_data_6_final[0]).mean() == 1)
print((train_cancer_6_final[-1] ==
       train_data_6_final[len(train_data_6_final/2)-1]).mean() == 1)
print((train_normal_6_final[-1] == train_data_6_final[-1]).mean() == 1)

In [None]:
# Conserves RAM

del train_cancer_6_final
del train_normal_6_final

In [None]:
# Assigns labels

train_labels_6_final = np.zeros(len(train_data_6_final))
train_labels_6_final[0:int(len(train_labels_6_final)/2)] = 1

In [None]:
# Below code confirms that labels have been assigned in the proper order.
# All outputs should be True.

print(train_labels_6_final[0] == 1)
print(train_labels_6_final[int(len(train_labels_6_final)/2)] == 0)
print(train_labels_6_final[-1] == 0)
print(train_labels_6_final.mean() == 0.5)

In [None]:
train_labels_6_final = to_categorical(train_labels_6_final)

In [None]:
# Performs an 75/25 train/test split to create a validation set.

X_train_6_final, X_val_6_final, y_train_6_final, y_val_6_final = train_test_split(
    train_data_6_final, train_labels_6_final, test_size=0.25, random_state=27)

In [None]:
# Conserves RAM

del train_data_6_final
del train_labels_6_final

In [None]:
# Sets desired parameters for CNN training

image_width, image_height = 256, 256
batch_size_6_final = 64
train_steps_per_epoch_6_final = int(len(X_train_6_final)/batch_size_6_final)
val_steps_per_epoch_6_final = int(len(X_val_6_final))

In [None]:
# Creates ImageDataGenerator objects for training and validation sets
# Normalizes RGB values in tiles

train_IDG_6_final = ImageDataGenerator(rescale=1./255)
val_IDG_6_final = ImageDataGenerator(rescale=1./255)

In [None]:
# Applies ImageDataGenerator

train_generator_6_final = train_IDG_6_final.flow(
    X_train_6_final, y_train_6_final, batch_size=batch_size_6_final)

In [None]:
# Conserves RAM

del X_train_6_final
del y_train_6_final

In [None]:
# Applies ImageDataGenerator

val_generator_6_final = val_IDG_6_final.flow(
    X_val_6_final, y_val_6_final, batch_size=1)

In [None]:
# Conserves RAM

del X_val_6_final
del y_val_6_final

In [None]:
cnn6_final = models.Sequential()
cnn6_final.add(layers.Conv2D(64, (3, 3), activation='relu',
                             input_shape=(image_width, image_height,  3), padding='SAME'))
cnn6_final.add(layers.MaxPooling2D((2, 2)))
cnn6_final.add(layers.Conv2D(32, (3, 3), activation='relu',
                             input_shape=(image_width, image_height,  3), padding='SAME'))
cnn6_final.add(layers.MaxPooling2D((2, 2)))
cnn6_final.add(layers.Conv2D(32, (3, 3), activation='relu'))
cnn6_final.add(layers.MaxPooling2D((2, 2)))
cnn6_final.add(layers.Flatten())
cnn6_final.add(layers.Dense(32, activation='relu'))
cnn6_final.add(layers.Dense(2, activation='sigmoid'))
cnn6_final.compile(loss='binary_crossentropy',
                   optimizer="sgd", metrics=['acc'])

### Fitting Epochs 1 - 10

In [None]:
saving_weights_6_final_1 = keras.callbacks.ModelCheckpoint(
    'weights_6_final_1.{epoch:02d}-{val_loss:.2f}.hdf5',
    monitor='val_loss', verbose=2, save_best_only=True,
    save_weights_only=False, mode='auto', period=1)

reduce_lr_6_final_1 = keras.callbacks.ReduceLROnPlateau(
    monitor='val_loss', factor=0.1, patience=20,
    verbose=2, mode='auto', min_delta=0.0001, min_lr=0)

nan_problem_6_final_1 = keras.callbacks.TerminateOnNaN()

early_stop_6_final_1 = keras.callbacks.EarlyStopping(
    monitor='val_loss', min_delta=0, patience=20,
    verbose=2, mode='auto', baseline=None, restore_best_weights=False)

csv_logger_6_final_1 = keras.callbacks.CSVLogger('training_6_final_1.log')

In [None]:
cnn6_final.fit_generator(train_generator_6_final, epochs=10, steps_per_epoch=train_steps_per_epoch_6_final, validation_data=val_generator_6_final,
                         validation_steps=val_steps_per_epoch_6_final, callbacks=[csv_logger_6_final_1, early_stop_6_final_1, nan_problem_6_final_1, reduce_lr_6_final_1, saving_weights_6_final_1])

In [None]:
cnn6_final.save('cnn6_final_1.h5')

### Fitting Epochs 11 - 20

In [None]:
saving_weights_6_final_2 = keras.callbacks.ModelCheckpoint(
    'weights_6_final_2.{epoch:02d}-{val_loss:.2f}.hdf5',
    monitor='val_loss', verbose=2, save_best_only=True,
    save_weights_only=False, mode='auto', period=1)

reduce_lr_6_final_2 = keras.callbacks.ReduceLROnPlateau(
    monitor='val_loss', factor=0.1, patience=20,
    verbose=2, mode='auto', min_delta=0.0001, min_lr=0)

nan_problem_6_final_2 = keras.callbacks.TerminateOnNaN()

early_stop_6_final_2 = keras.callbacks.EarlyStopping(
    monitor='val_loss', min_delta=0, patience=20,
    verbose=2, mode='auto', baseline=None, restore_best_weights=False)

csv_logger_6_final_2 = keras.callbacks.CSVLogger('training_6_final_2.log')

In [None]:
cnn6_final.fit_generator(train_generator_6_final, epochs=10, steps_per_epoch=train_steps_per_epoch_6_final, validation_data=val_generator_6_final,
                         validation_steps=val_steps_per_epoch_6_final, callbacks=[csv_logger_6_final_2, early_stop_6_final_2, nan_problem_6_final_2, reduce_lr_6_final_2, saving_weights_6_final_2])

In [None]:
cnn6_final.save('cnn6_final_2.h5')

### Fitting Epochs 21 -30

In [None]:
saving_weights_6_final_3 = keras.callbacks.ModelCheckpoint(
    'weights_6_final_3.{epoch:02d}-{val_loss:.2f}.hdf5',
    monitor='val_loss', verbose=2, save_best_only=True,
    save_weights_only=False, mode='auto', period=1)

reduce_lr_6_final_3 = keras.callbacks.ReduceLROnPlateau(
    monitor='val_loss', factor=0.1, patience=20,
    verbose=2, mode='auto', min_delta=0.0001, min_lr=0)

nan_problem_6_final_3 = keras.callbacks.TerminateOnNaN()

early_stop_6_final_3 = keras.callbacks.EarlyStopping(
    monitor='val_loss', min_delta=0, patience=20,
    verbose=2, mode='auto', baseline=None, restore_best_weights=False)

csv_logger_6_final_3 = keras.callbacks.CSVLogger('training_6_final_3.log')

In [None]:
cnn6_final.fit_generator(train_generator_6_final, epochs=10, steps_per_epoch=train_steps_per_epoch_6_final, validation_data=val_generator_6_final,
                         validation_steps=val_steps_per_epoch_6_final, callbacks=[csv_logger_6_final_3, early_stop_6_final_3, nan_problem_6_final_3, reduce_lr_6_final_3, saving_weights_6_final_3])

In [None]:
cnn6_final.save('cnn6_final_3.h5')

### Fitting Epochs 31 -40

In [None]:
saving_weights_6_final_4 = keras.callbacks.ModelCheckpoint(
    'weights_6_final_4.{epoch:02d}-{val_loss:.2f}.hdf5',
    monitor='val_loss', verbose=2, save_best_only=True,
    save_weights_only=False, mode='auto', period=1)

reduce_lr_6_final_4 = keras.callbacks.ReduceLROnPlateau(
    monitor='val_loss', factor=0.1, patience=20,
    verbose=2, mode='auto', min_delta=0.0001, min_lr=0)

nan_problem_6_final_4 = keras.callbacks.TerminateOnNaN()

early_stop_6_final_4 = keras.callbacks.EarlyStopping(
    monitor='val_loss', min_delta=0, patience=20,
    verbose=2, mode='auto', baseline=None, restore_best_weights=False)

csv_logger_6_final_4 = keras.callbacks.CSVLogger('training_6_final_4.log')

In [None]:
cnn6_final.fit_generator(train_generator_6_final, epochs=10, steps_per_epoch=train_steps_per_epoch_6_final, validation_data=val_generator_6_final,
                         validation_steps=val_steps_per_epoch_6_final, callbacks=[csv_logger_6_final_4, early_stop_6_final_4, nan_problem_6_final_4, reduce_lr_6_final_4, saving_weights_6_final_4])

In [None]:
cnn6_final.save('cnn6_final_4.h5')

## Introduce Image Augmentation

In [None]:
cnn6_final = load_model('cnn6_final_4.h5')

In [9]:
# Reload same training data as before

train_cancer_6_final = prep.cancer_train_jpegs_to_arrays(
    cancer_train_glob, scale_down=35000, seed=52)
train_normal_6_final = prep.normal_train_jpegs_to_arrays(
    normal_train_glob, scale_down=35000, seed=52)
train_data_6_final = np.concatenate(
    (train_cancer_6_final, train_normal_6_final))

Using Seed:  52
Using Seed:  52


In [None]:
# Below code confirms that cancer data and normal data have been
# concatenated properly. All outputs should be True.

print((train_cancer_6_final[0] == train_data_6_final[0]).mean() == 1)
print((train_cancer_6_final[-1] ==
       train_data_6_final[len(train_data_6_final/2)-1]).mean() == 1)
print((train_normal_6_final[-1] == train_data_6_final[-1]).mean() == 1)

In [10]:
# Conserves RAM

del train_cancer_6_final
del train_normal_6_final

In [11]:
# Assigns labels

train_labels_6_final = np.zeros(len(train_data_6_final))
train_labels_6_final[0:int(len(train_labels_6_final)/2)] = 1

In [12]:
# Below code confirms that labels have been assigned in the proper order.
# All outputs should be True.

print(train_labels_6_final[0] == 1)
print(train_labels_6_final[int(len(train_labels_6_final)/2)] == 0)
print(train_labels_6_final[-1] == 0)
print(train_labels_6_final.mean() == 0.5)

True
True
True
True


In [13]:
train_labels_6_final = to_categorical(train_labels_6_final)

In [14]:
# Performs an 75/25 train/test split to create a validation set.

X_train_6_final, X_val_6_final, y_train_6_final, y_val_6_final = train_test_split(
    train_data_6_final, train_labels_6_final, test_size=0.25, random_state=27)

In [15]:
# Conserves RAM

del train_data_6_final
del train_labels_6_final

In [16]:
# Sets desired parameters for CNN training

image_width, image_height = 256, 256
batch_size_6_final = 64
train_steps_per_epoch_6_final = int(len(X_train_6_final)/batch_size_6_final)
val_steps_per_epoch_6_final = int(len(X_val_6_final))

In [17]:
# Creates ImageDataGenerator objects for training and validation sets
# Normalizes RGB values in tiles
# Implements Image Augmentation - random flips and rotations

train_IDG_6_final = ImageDataGenerator(
    rescale=1./255, rotation_range=180, horizontal_flip=True, vertical_flip=True)
val_IDG_6_final = ImageDataGenerator(rescale=1./255)

In [18]:
train_generator_6_final = train_IDG_6_final.flow(
    X_train_6_final, y_train_6_final, batch_size=batch_size_6_final)

In [19]:
# Conserves RAM

del X_train_6_final
del y_train_6_final

In [20]:
val_generator_6_final = val_IDG_6_final.flow(
    X_val_6_final, y_val_6_final, batch_size=1)

In [21]:
# Conserves RAM

del X_val_6_final
del y_val_6_final

### Fitting Epochs 41 -50

In [17]:
saving_weights_6_final_5_aug = keras.callbacks.ModelCheckpoint(
    'weights_6_final_5_aug.{epoch:02d}-{val_loss:.2f}.hdf5',
    monitor='val_loss', verbose=2, save_best_only=True,
    save_weights_only=False, mode='auto', period=1)

reduce_lr_6_final_5_aug = keras.callbacks.ReduceLROnPlateau(
    monitor='val_loss', factor=0.1, patience=20,
    verbose=2, mode='auto', min_delta=0.0001, min_lr=0)

nan_problem_6_final_5_aug = keras.callbacks.TerminateOnNaN()

early_stop_6_final_5_aug = keras.callbacks.EarlyStopping(
    monitor='val_loss', min_delta=0, patience=20,
    verbose=2, mode='auto', baseline=None, restore_best_weights=False)

csv_logger_6_final_5_aug = keras.callbacks.CSVLogger(
    'training_6_final_5_aug.log')

In [18]:
cnn6_final.fit_generator(train_generator_6_final, epochs=10, steps_per_epoch=train_steps_per_epoch_6_final, validation_data=val_generator_6_final, validation_steps=val_steps_per_epoch_6_final, callbacks=[
                         csv_logger_6_final_5_aug, early_stop_6_final_5_aug, nan_problem_6_final_5_aug, reduce_lr_6_final_5_aug, saving_weights_6_final_5_aug])

Epoch 1/10

Epoch 00001: val_loss improved from inf to 0.17549, saving model to weights_6_final_5_aug.01-0.18.hdf5
Epoch 2/10

Epoch 00002: val_loss did not improve from 0.17549
Epoch 3/10

Epoch 00003: val_loss did not improve from 0.17549
Epoch 4/10

Epoch 00004: val_loss did not improve from 0.17549
Epoch 5/10

Epoch 00005: val_loss improved from 0.17549 to 0.16982, saving model to weights_6_final_5_aug.05-0.17.hdf5
Epoch 6/10

Epoch 00006: val_loss did not improve from 0.16982
Epoch 7/10

Epoch 00007: val_loss improved from 0.16982 to 0.15218, saving model to weights_6_final_5_aug.07-0.15.hdf5
Epoch 8/10

Epoch 00008: val_loss improved from 0.15218 to 0.14642, saving model to weights_6_final_5_aug.08-0.15.hdf5
Epoch 9/10

Epoch 00009: val_loss did not improve from 0.14642
Epoch 10/10

Epoch 00010: val_loss did not improve from 0.14642


<keras.callbacks.History at 0x7f8fe725b0b8>

In [19]:
cnn6_final.save('cnn6_final_5_aug.h5')

### Fitting Epochs 51 - 60

In [20]:
saving_weights_6_final_6_aug = keras.callbacks.ModelCheckpoint(
    'weights_6_final_6_aug.{epoch:02d}-{val_loss:.2f}.hdf5',
    monitor='val_loss', verbose=2, save_best_only=True,
    save_weights_only=False, mode='auto', period=1)

reduce_lr_6_final_6_aug = keras.callbacks.ReduceLROnPlateau(
    monitor='val_loss', factor=0.1, patience=20,
    verbose=2, mode='auto', min_delta=0.0001, min_lr=0)

nan_problem_6_final_6_aug = keras.callbacks.TerminateOnNaN()

early_stop_6_final_6_aug = keras.callbacks.EarlyStopping(
    monitor='val_loss', min_delta=0, patience=20,
    verbose=2, mode='auto', baseline=None, restore_best_weights=False)

csv_logger_6_final_6_aug = keras.callbacks.CSVLogger(
    'training_6_final_6_aug.log')

In [21]:
cnn6_final.fit_generator(train_generator_6_final, epochs=10, steps_per_epoch=train_steps_per_epoch_6_final, validation_data=val_generator_6_final, validation_steps=val_steps_per_epoch_6_final, callbacks=[
                         csv_logger_6_final_6_aug, early_stop_6_final_6_aug, nan_problem_6_final_6_aug, reduce_lr_6_final_6_aug, saving_weights_6_final_6_aug])

Epoch 1/10

Epoch 00001: val_loss improved from inf to 0.14761, saving model to weights_6_final_6_aug.01-0.15.hdf5
Epoch 2/10

Epoch 00002: val_loss improved from 0.14761 to 0.14392, saving model to weights_6_final_6_aug.02-0.14.hdf5
Epoch 3/10

Epoch 00003: val_loss did not improve from 0.14392
Epoch 4/10

Epoch 00004: val_loss did not improve from 0.14392
Epoch 5/10

Epoch 00005: val_loss improved from 0.14392 to 0.14183, saving model to weights_6_final_6_aug.05-0.14.hdf5
Epoch 6/10

Epoch 00006: val_loss improved from 0.14183 to 0.13783, saving model to weights_6_final_6_aug.06-0.14.hdf5
Epoch 7/10

Epoch 00007: val_loss did not improve from 0.13783
Epoch 8/10

Epoch 00008: val_loss improved from 0.13783 to 0.13163, saving model to weights_6_final_6_aug.08-0.13.hdf5
Epoch 9/10

Epoch 00009: val_loss did not improve from 0.13163
Epoch 10/10

Epoch 00010: val_loss did not improve from 0.13163


<keras.callbacks.History at 0x7f8fc96a3be0>

In [22]:
cnn6_final.save('cnn6_final_6_aug.h5')

## Best Model : weights_6_final_6_aug.08-0.13.hdf5

In [22]:
cnn6_final = load_model('/home/ec2-user/SageMaker/models/cnn6_final_6_aug.h5')

### Fitting Epochs 61 - 70

In [23]:
saving_weights_6_final_7_aug = keras.callbacks.ModelCheckpoint(
    'weights_6_final_7_aug.{epoch:02d}-{val_loss:.2f}.hdf5',
    monitor='val_loss', verbose=2, save_best_only=True,
    save_weights_only=False, mode='auto', period=1)

reduce_lr_6_final_7_aug = keras.callbacks.ReduceLROnPlateau(
    monitor='val_loss', factor=0.1, patience=20,
    verbose=2, mode='auto', min_delta=0.0001, min_lr=0)

nan_problem_6_final_7_aug = keras.callbacks.TerminateOnNaN()

early_stop_6_final_7_aug = keras.callbacks.EarlyStopping(
    monitor='val_loss', min_delta=0, patience=20,
    verbose=2, mode='auto', baseline=None, restore_best_weights=False)

csv_logger_6_final_7_aug = keras.callbacks.CSVLogger(
    'training_6_final_7_aug.log')

In [24]:
cnn6_final.fit_generator(train_generator_6_final, epochs=10, steps_per_epoch=train_steps_per_epoch_6_final, validation_data=val_generator_6_final, validation_steps=val_steps_per_epoch_6_final, callbacks=[
                         csv_logger_6_final_7_aug, early_stop_6_final_7_aug, nan_problem_6_final_7_aug, reduce_lr_6_final_7_aug, saving_weights_6_final_7_aug])

Epoch 1/10

Epoch 00001: val_loss improved from inf to 0.16663, saving model to weights_6_final_7_aug.01-0.17.hdf5
Epoch 2/10

Epoch 00002: val_loss improved from 0.16663 to 0.13495, saving model to weights_6_final_7_aug.02-0.13.hdf5
Epoch 3/10

Epoch 00003: val_loss improved from 0.13495 to 0.12956, saving model to weights_6_final_7_aug.03-0.13.hdf5
Epoch 4/10

Epoch 00004: val_loss did not improve from 0.12956
Epoch 5/10

Epoch 00005: val_loss did not improve from 0.12956
Epoch 6/10

Epoch 00006: val_loss improved from 0.12956 to 0.12677, saving model to weights_6_final_7_aug.06-0.13.hdf5
Epoch 7/10

Epoch 00007: val_loss improved from 0.12677 to 0.12489, saving model to weights_6_final_7_aug.07-0.12.hdf5
Epoch 8/10

Epoch 00008: val_loss did not improve from 0.12489
Epoch 9/10

Epoch 00009: val_loss did not improve from 0.12489
Epoch 10/10

Epoch 00010: val_loss did not improve from 0.12489


<keras.callbacks.History at 0x7fb3cea35940>

In [25]:
cnn6_final.save('cnn6_final_7_aug.h5')