# ResNet Model Implementation


This notebook shows how to run and train a ResNet50 model. This is how we ran and trained a ResNet50 model and tuned the hyperparameters to receive the best validation accuracy we could. The first part of the notebook shows how to train the model, save checkpoints, and save the full model. The second part shows how to look at the model's precision and recall per cell class. It also shows how to plot a confusion matrix for the validation data. Loading in a checkpoint is shown in the ResNet50_model_load_and_analyze_test_set notebook. There we load in the best checkpoint based on nhighest validation accuracy and test the model on the test set. The final weights we used are included in this repository. 

## Connect to Data and Download Libaries


In [None]:
#Mount GoogleColab to Google Drive. Not a necessary step for those who store the data elsewhere.
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


In [None]:
#Needed to plot the confusion matrices later
!pip install scikit-plot

Collecting scikit-plot
  Downloading scikit_plot-0.3.7-py3-none-any.whl (33 kB)
Installing collected packages: scikit-plot
Successfully installed scikit-plot-0.3.7


### Download Libraries

In [None]:
import os
import numpy as np
import matplotlib.pyplot as plt
import itertools

from sklearn.metrics import confusion_matrix


import tensorflow as tf
from tensorflow import keras
from keras.preprocessing.image import ImageDataGenerator, img_to_array, load_img
from tensorflow.keras.optimizers import Adam


# Display
#from IPython.display import Image, display
from PIL import Image
import matplotlib.cm as cm


### Set Up Paths to Data and Check Number of Images

In [None]:
#Paths to data
train_path = '/content/gdrive/MyDrive/210_data/train'
validation_path = '/content/gdrive/MyDrive/210_data/val'
test_path = '/content/gdrive/MyDrive/210_data/test'

SIZE = 400
batch_size = 64

In [None]:
#Number of images in each set 
num_train_images = sum([len(files) for r, d, files in os.walk(train_path)])
num_val_images = sum([len(files) for r, d, files in os.walk(validation_path)])
num_test_images = sum([len(files) for r, d, files in os.walk(test_path)])
num_train_images + num_val_images + num_test_images #check sum adds to 18,365

18365

## Helper Functions

In [None]:
def plotImages(images_arr):
    fig, axes = plt.subplots(2,8, figsize = (20,5))
    axes = axes.flatten()
    for img, ax in zip(images_arr, axes):
        ax.imshow(img)
        ax.axis('off')
    plt.tight_layout()
    plt.show()

## Build Model

In [None]:
train_datagen = ImageDataGenerator(
    preprocessing_function = tf.keras.applications.resnet50.preprocess_input,
    rescale = 1./255,
    #these are the three methods mentioned in the paper for augmenting the images. Use these in second baseline model.
    rotation_range=359, 
    horizontal_flip= True, 
    vertical_flip=True,
    fill_mode='reflect'
)

In [None]:
validation_datagen = ImageDataGenerator(rescale = 1./255, preprocessing_function = tf.keras.applications.resnet50.preprocess_input)

In [None]:
test_datagen = ImageDataGenerator(rescale = 1./255, preprocessing_function = tf.keras.applications.resnet50.preprocess_input)

In [None]:
train_generator = train_datagen.flow_from_directory(
        directory = train_path,  # this is the input directory
        target_size=(224, 224),  # all images will be resized to 224x224
        batch_size=batch_size,
        class_mode='categorical')  # multiple categories



Found 14687 images belonging to 15 classes.


In [None]:
validation_generator = validation_datagen.flow_from_directory(
    directory = validation_path,
    target_size = (224,224),
    batch_size = batch_size,
    class_mode = 'categorical',
    shuffle = False
)

Found 1828 images belonging to 15 classes.




In [None]:
test_generator = test_datagen.flow_from_directory(
    directory = test_path,
    target_size = (224,224),
    batch_size = batch_size,
    class_mode = 'categorical',
    shuffle = False
)



Found 1850 images belonging to 15 classes.


In [None]:
#Look at batch of train images
imgs, labels = next(train_generator)
plotImages(imgs)
print(labels)

In [None]:
#Import ResNet50 model from keras
model = tf.keras.applications.ResNet50(
    include_top=True, weights=None, input_tensor=None,
    input_shape=None, pooling=None, classes=15)

In [None]:
#Look at model structure 
model.summary()

## Compile and Fit Model

In [None]:
#implement model.compile
model.compile(loss='categorical_crossentropy',
              optimizer= Adam(learning_rate = 0.0001),
              metrics=['accuracy'])

In [None]:
#Add checkpoints 
from keras.callbacks import ModelCheckpoint
#filepath='saved_models/models.h5'
filepath= '/content/gdrive/MyDrive/saved_models/no_aug_10_epoch_64_batch_size-improvement-{epoch:02d}-{val_accuracy:.2f}.hdf5' # File name includes epoch and validation accuracy.
checkpoint = ModelCheckpoint(filepath, monitor='val_accuracy', verbose=1, save_best_only=True, mode='max')
callbacks_list = [checkpoint]

In [None]:
#Fit the model
model.fit(train_generator,
                    steps_per_epoch = num_train_images//batch_size, #the 2 slashes division return rounded integer
                    epochs = 100,
                    validation_data = validation_generator,
                    validation_steps = num_val_images//batch_size, #the 2 slashes division return rounded integer
                    callbacks = callbacks_list)

In [None]:
#Save the model
model.save('/content/gdrive/MyDrive/saved_models/no_aug_10_epoch_64_batch_size') #change path name to reflect model you are running

## Predictions Using Saved Model


In [None]:
#Check out validation images for one batch of val data
val_imgs, val_labels = next(validation_generator)
plotImages(val_imgs)
print(val_labels)

In [None]:
#Print the accuracy and loss for validation data
loss,acc = model.evaluate(validation_generator, batch_size = 64 , verbose=2)
print("Restored model, accuracy: {:5.2f}%".format(100*acc))

In [None]:
#all cell types/labels for val set
validation_generator.classes

In [None]:
#validation predictions
val_predictions = model.predict(x = validation_generator, verbose = 0)

In [None]:
#gives order of cell types 
val_dic = validation_generator.class_indices 
val_dic

In [None]:
#Cell breakdown of different cell types
from sklearn.metrics import classification_report
target_names = ['BAS','EBO','EOS','KSC','LYA','LYT','MMZ','MOB','MON','MYB','MYO','NGB','NGS','PMB','PMO']
print(classification_report(validation_generator.classes, np.argmax(val_predictions, axis = -1), target_names = target_names))

In [None]:
#invert the val_dic for ease of calling cell names to map for confusion matrix in next cells
inv_val_dic = inv_map = {v: k for k, v in val_dic.items()}
inv_val_dic

In [None]:
def map_to_labels(array):
    labeled_array = []
    for integer in array:
        labeled_array.append(inv_val_dic[integer])
    return labeled_array

In [None]:
#Plot confusion matrix of different cell types for validation data
import scikitplot as skplt
skplt.metrics.plot_confusion_matrix(
    map_to_labels(validation_generator.classes), 
    map_to_labels(np.argmax(val_predictions, axis = -1)),
    title = "RestNet50 Validation Confusion Matrix",
    figsize=(12,10))