# VGG16 fine tuning for colonoscopy polyps

In the previous notebook ([4-TransferLearning.ipynb](4-TransferLearning.ipynb)) I tested the VGG16 transfer learning by training only the last FC layer. All the other convolutions blocks had the weights from the pre-trained VGG16.

This notebook, I will try to apply a fine tuning: to train 1 or 2 convolutional blocks + FC layer. The FC layer will use initial weights from the best model obtained in the previous step (Transfer Learning notebook). See more details at [Keras blog](https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html).

Let's load some libraries:

In [1]:
from keras import applications
from keras.preprocessing.image import ImageDataGenerator
from keras import optimizers
from keras.models import Sequential
from keras.models import Model
from keras.layers import Dropout, Flatten, Dense
from keras.callbacks import ModelCheckpoint, EarlyStopping

from numpy.random import seed
from tensorflow import set_random_seed
import time, os
import numpy as np
import keras

from matplotlib import pyplot as plt
from IPython.display import clear_output
from sklearn.metrics import roc_auc_score
from __future__ import with_statement
%matplotlib inline

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


Definition of paths for dataset, previous trained weights for the FC layer, earlystopping model, etc.:

In [2]:
# Folder to save the models
modelFolder = 'saved_models'

# Path to the file with the weights of the pre-trained VGG16 model
weights_path = 'nets/vgg16_weights.h5'

# Path to the previous saved top model weights (FC layer trained in Transfer Learning notebook)
top_model_weights_path = os.path.join(modelFolder,'transferVGG16_bottleneck_fc_model.h5')

# Earlystoping saved model - this name will be modified later by including parameter values
earlystoping_path = './saved_models/fineTunning_earlystopnning.h5'

# Dimensions of our images
img_width, img_height = 150, 150

# Train & validation images folders
train_data_dir      = 'data_polyps/train'
validation_data_dir = 'data_polyps/validation'

# Train parameters
nb_train_samples      = 910 # number of samples for training
nb_validation_samples = 302 # number of samples for validation
epochs = 300
batch_size = 16

Definition of the function that will do a fine tuning of the pre-trained VGG16 using FC layer weights trained in the previous notebook:
* Load the pre-trained VGG16 as the lower model,
* Add the top model as a FC layer,
* Load the previous calculated weights for the FC layer,
* Freeze a number of layers (a specific number of convolutional blocks): to freeze the last Conv block, freeze 15 layers; to freeze 2 last conv blocks, freeze only 11 layers.
* Compilte the computational graph of the model,
* Generate training & validation datasets from folders using data augmentation,
* Use earlystopping if the validation accuracy is not increasing in 10 iterations,
* Save the last best model,
* Use SGD optimizer,
* Search the best model using different values for the main hyperparameters: epochs, batch size, learning rate, momentum, and the number of layers to freeze.

See more details at [Keras blog](https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html).

In the first step, I will try one set of parameters:

In [3]:
def FineTunningVGG(epochs, batch_size, learning, mom, freezeLayers):
    # Fine tuning function using VGG16 and our weights for the FC layer (top model)
    
    # Set seeeds for reproductibility
    seed(1)            # numpy seed
    set_random_seed(2) # tensorflow seed
    
    # Build the VGG16 block using our input size 150, 150, 3
    base_model = applications.VGG16(weights='imagenet', include_top=False, input_shape=(150,150,3))

    # Build a classifier model to put on top of the convolutional model (FC layer / top model)
    top_model = Sequential()
    top_model.add(Flatten(input_shape=base_model.output_shape[1:]))
    top_model.add(Dense(256, activation='relu'))
    top_model.add(Dropout(0.5))
    top_model.add(Dense(1, activation='sigmoid'))

    # Is necessary to start with a fully-trained classifier, including the top classifier,
    # in order to successfully do fine-tuning
    
    # Load the previous calculated weight for the top model
    top_model.load_weights(top_model_weights_path)

    # Add the model on top of the convolutional base
    model = Model(inputs= base_model.input, outputs= top_model(base_model.output))

    # Set the first 'freezeLayers' layers to non-trainable (weights will not be updated)
    # This number depends on the blocks to freeze: for the last Conv block freeze 15 layers,
    # to freeze 2 last conv blocks freeze only 11 layers.
    for layer in model.layers[:freezeLayers]:
        layer.trainable = False

    # Compile the model with a SGD/momentum optimizer and a very slow learning rate.
    model.compile(loss='binary_crossentropy',
                  optimizer= optimizers.SGD(lr=learning, momentum=mom), # lr=1e-4, momentum=0.9
                  metrics=['accuracy'])

    # Prepare data augmentation configuration
    train_datagen = ImageDataGenerator(
        rescale = 1. / 255,
        shear_range = 0.2,
        zoom_range = 0.2,
        horizontal_flip = True,
        vertical_flip = True,
        rotation_range = 90)

    test_datagen = ImageDataGenerator(rescale=1. / 255)

    # Generate training and validation data
    train_generator = train_datagen.flow_from_directory(
        train_data_dir,
        target_size=(img_height, img_width),
        batch_size=batch_size,
        class_mode='binary')

    validation_generator = test_datagen.flow_from_directory(
        validation_data_dir,
        target_size=(img_height, img_width),
        batch_size=batch_size,
        class_mode='binary')

    # Start timer
    start_time = time.time()

    # Use earlystopping:
    callbacks=[EarlyStopping(
                            monitor='val_acc', 
                            patience=10,
                            mode='max',
                            verbose=1),
                ModelCheckpoint(earlystoping_path[:-3]+'_e'+str(epochs)+'b'+str(batch_size)+'l'+str(learning)+'m'+str(mom)+'f'+str(freezeLayers)+'.h5',
                            monitor='val_acc', 
                            save_best_only=True, 
                            mode='max',
                            verbose=0)]

    # Fine-tune the model
    model.fit_generator(
        train_generator,
        steps_per_epoch=nb_train_samples // batch_size,
        epochs=epochs,
        validation_data=validation_generator,
        validation_steps=nb_validation_samples // batch_size,
        workers=7, # 7 cores of the CPU!
        verbose = 0,
        callbacks=callbacks) # remove this param if you dont need early stopping

    # Print training time
    print("Training time: %0.1f mins ---" % ((time.time() - start_time)/60))

    # Evaluate final test loss and accuracy scores
    scoresVal = model.evaluate_generator(validation_generator, nb_validation_samples//batch_size, workers=7)
    scoresTr  = model.evaluate_generator(train_generator, nb_train_samples//batch_size, workers=7)
    # Print the results
    print(freezeLayers, learning, mom, epochs, batch_size, scoresTr[0], scoresVal[0], scoresTr[1], scoresVal[1])

    # clean some memory
    del base_model
    del top_model
    del model

    del train_datagen
    del train_generator
    del validation_generator
    
    return

### Last Conv block + FC training

Let's try the fine tuning for FC and only the last Conv block using `SGD` and earlystopping:

In [7]:
FineTunningVGG(200, 64, 1e-4,  0.9, 15)

Found 910 images belonging to 2 classes.
Found 302 images belonging to 2 classes.
Epoch 00020: early stopping
Training time: 0.8 mins ---
15 0.0001 0.9 200 64 0.21136859910828726 0.18164563924074173 0.9084821428571429 0.9296875


Even after 200 epochs, the model is underfitter (validation ACC 92.9% vs training ACC 90.8%). We could decrease the drop rate but we are using the same top model for loading the weights.

In the second step, let's try to use different paramters. You should use more values! With this function, you can search for several hyperparamters:

In [47]:
# Start total timer
start_time = time.time()

# Change your hyperparamters to search for
freezeLayersValues = [15] # 15 = freeze last Conv block, 11 = freeze last 2 Conv blocks
learningValues = [1e-6, 1e-5, 1e-4, 5e-4, 1e-3]
monValues = [0.8, 0.9]
epochsValues = [100]
batch_sizeValues = [64]

# Print a header for results
print('Freeze', 'Learning', 'Momentum', 'epochs', 'batch_size', 'Loss_Tr', 'Loos_Val', 'Acc_Tr', 'Acc_Val')
for freezeLayers in freezeLayersValues: # 
    for learning in learningValues:
        for mom in monValues:
            for iepochs in epochsValues:
                for ibatch_size in batch_sizeValues:
                    try:
                        # Try to execute the fine tuning function
                        FineTunningVGG(iepochs, ibatch_size, learning, mom, freezeLayers)
                    except:
                        # If any error
                        print('==> Error:', freezeLayers, learning, mom, iepochs, ibatch_size)

# Print total execution time
print("Total time: %0.1f mins ---" % ((time.time() - start_time)/60))

Freeze Learning Momentum epochs batch_size Loss_Tr Loos_Val Acc_Tr Acc_Val
Found 910 images belonging to 2 classes.
Found 302 images belonging to 2 classes.
Epoch 00014: early stopping
Training time: 0.7 mins ---
15 1e-06 0.8 100 64 0.3496717576469694 0.267158854752779 0.8470982142857143 0.88671875
Found 910 images belonging to 2 classes.
Found 302 images belonging to 2 classes.
Epoch 00014: early stopping
Training time: 0.7 mins ---
15 1e-06 0.9 100 64 0.3785088551895959 0.2557702325284481 0.8348214285714286 0.89453125
Found 910 images belonging to 2 classes.
Found 302 images belonging to 2 classes.
Epoch 00014: early stopping
Training time: 0.7 mins ---
15 1e-05 0.8 100 64 0.3332709191100938 0.2718586437404156 0.8627232142857143 0.890625
Found 910 images belonging to 2 classes.
Found 302 images belonging to 2 classes.
Epoch 00027: early stopping
Training time: 1.2 mins ---
15 1e-05 0.9 100 64 0.2671286410519055 0.24632438644766808 0.8895089285714286 0.90625
Found 910 images belonging

With `learning rate=0.0005` and `momentum=0.9` it is possible to obtain `94.9%` validation accuracy (96.2% training accuracy). Let's try some close values:

In [4]:
# Start total timer
start_time = time.time()

# Change your hyperparamters to search for
freezeLayersValues = [15] # 15 = freeze last Conv block, 11 = freeze last 2 Conv blocks
learningValues = [2e-4, 3e-4, 4e-4, 6e-4, 7e-4]
monValues = [0.9]
epochsValues = [100]
batch_sizeValues = [64]

# Print a header for results
print('Freeze', 'Learning', 'Momentum', 'epochs', 'batch_size', 'Loss_Tr', 'Loos_Val', 'Acc_Tr', 'Acc_Val')
for freezeLayers in freezeLayersValues: # 
    for learning in learningValues:
        for mom in monValues:
            for iepochs in epochsValues:
                for ibatch_size in batch_sizeValues:
                    try:
                        # Try to execute the fine tuning function
                        FineTunningVGG(iepochs, ibatch_size, learning, mom, freezeLayers)
                    except:
                        # If any error
                        print('==> Error:', freezeLayers, learning, mom, iepochs, ibatch_size)

# Print total execution time
print("Total time: %0.1f mins ---" % ((time.time() - start_time)/60))

Freeze Learning Momentum epochs batch_size Loss_Tr Loos_Val Acc_Tr Acc_Val
Found 910 images belonging to 2 classes.
Found 302 images belonging to 2 classes.
Epoch 00022: early stopping
Training time: 0.9 mins ---
15 0.0002 0.9 100 64 0.1575070135295391 0.19659276492893696 0.9397321428571429 0.9375
Found 910 images belonging to 2 classes.
Found 302 images belonging to 2 classes.
Epoch 00034: early stopping
Training time: 1.3 mins ---
15 0.0003 0.9 100 64 0.11700797293867383 0.14799168519675732 0.953125 0.94921875
Found 910 images belonging to 2 classes.
Found 302 images belonging to 2 classes.
Epoch 00025: early stopping
Training time: 1.0 mins ---
15 0.0004 0.9 100 64 0.15608231403997966 0.18702777475118637 0.9397321428571429 0.94140625
Found 910 images belonging to 2 classes.
Found 302 images belonging to 2 classes.
Epoch 00024: early stopping
Training time: 0.9 mins ---
15 0.0006 0.9 100 64 0.27273742854595184 0.33562444150447845 0.8895089285714286 0.8828125
Found 910 images belongin

If you remove the callbacks and use 200 epochs, you will be able to obtain even better accuracies:
* 15 0.0005 0.8 200 64 0.03348573550049748 0.11928138509392738 0.9888392857142857 0.96875
* 15 0.0005 0.9 200 64 0.022504917612033232 0.1295782057568431 0.9921875 0.96875
* 15 0.0002 0.9 200 64 0.037957151753029654 0.11038850899785757 0.9888392857142857 0.98046875

Thus, trainig 8 minutes the last Conv block and FC layer, you can obtain a `validation accuracy of 98%`!

### Last 2 Conv block + FC training

Let's see what ACC we could obtain if we train the last 2 Conv blocks:

In [9]:
FineTunningVGG(200, 64, 1e-4,  0.9, 11)

Found 910 images belonging to 2 classes.
Found 302 images belonging to 2 classes.
Epoch 00027: early stopping
Training time: 1.3 mins ---
11 0.0001 0.9 200 64 0.13032949502979005 0.18357415683567524 0.9553571428571429 0.9375


As we expected, training more layers we are obtaining better results but the complexity of the model and the small dataset are starting to generate overfitting. Let's check different parameters:

In [5]:
# Start total timer
start_time = time.time()

# Change your hyperparamters to search for
freezeLayersValues = [11] # 15 = freeze last Conv block, 11 = freeze last 2 Conv blocks
learningValues = [1e-6, 1e-5, 1e-4, 5e-4]
monValues = [0.8, 0.9]
epochsValues = [100]
batch_sizeValues = [64]

# Print a header for results
print('Freeze', 'Learning', 'Momentum', 'epochs', 'batch_size', 'Loss_Tr', 'Loos_Val', 'Acc_Tr', 'Acc_Val')
for freezeLayers in freezeLayersValues: # 
    for learning in learningValues:
        for mom in monValues:
            for iepochs in epochsValues:
                for ibatch_size in batch_sizeValues:
                    try:
                        # Try to execute the fine tuning function
                        FineTunningVGG(iepochs, ibatch_size, learning, mom, freezeLayers)
                    except:
                        # If any error
                        print('==> Error:', freezeLayers, learning, mom, iepochs, ibatch_size)

# Print total execution time
print("Total time: %0.1f mins ---" % ((time.time() - start_time)/60))

Freeze Learning Momentum epochs batch_size Loss_Tr Loos_Val Acc_Tr Acc_Val
Found 910 images belonging to 2 classes.
Found 302 images belonging to 2 classes.
Epoch 00018: early stopping
Training time: 0.8 mins ---
11 1e-06 0.8 100 64 0.35351393052509855 0.23303452879190445 0.8560267857142857 0.8984375
Found 910 images belonging to 2 classes.
Found 302 images belonging to 2 classes.
Epoch 00018: early stopping
Training time: 0.8 mins ---
11 1e-06 0.9 100 64 0.3450494238308498 0.2228415459394455 0.8526785714285714 0.90625
Found 910 images belonging to 2 classes.
Found 302 images belonging to 2 classes.
Epoch 00018: early stopping
Training time: 0.8 mins ---
11 1e-05 0.8 100 64 0.2748049093144281 0.21897215582430363 0.8861607142857143 0.91796875
Found 910 images belonging to 2 classes.
Found 302 images belonging to 2 classes.
Epoch 00025: early stopping
Training time: 1.1 mins ---
11 1e-05 0.9 100 64 0.2397044769355229 0.20551345869898796 0.9040178571428571 0.93359375
Found 910 images belo

If you remove the callbacks and use 200 epochs, you will be able to obtain even better accuracies over 96%.


## Conclusion

* If you apply the fine tuning for the last conv block of VGG16 + FC (top model) you can obtain an accuracy `over 98%` (learning rate = 0.0002, momentum = 0.9, batch size = 64). This values is better compare with the small CNN results (`over 92%`).
* The search space was limited and possible additional hyperparameter combinations should be tested including drop rate, optimizer or the base model (not only VGG16, it could be Inception, etc.).

If you need a classifier to detect polyps in your colonoscopy images, you could try a small CNN with only few hiden layers. If you need accuracy over 98% you should try fine tuning.

Let's find polyps into a colonoscopy image in the next script ([6-WindowsPolypsDetection.ipynb](6-WindowsPolypsDetection.ipynb)).

Have fun with DL! @muntisa

### Acknowledgements

I gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research ([https://developer.nvidia.com/academic_gpu_seeding](https://developer.nvidia.com/academic_gpu_seeding)).