# LB03a.0 Transfer Learning

To fully train a complex convolutional neural network (CNN) from scratch requires a large dataset of sufficient size, which may be difficult to acquire. Additionally, such a CNN can take several weeks of training on multiple GPUs which poses a high demand of computing resources.

It is a common procedure to use a pre-trained CNN as a feature extractor for the task of interest. Therefore, one can reuse the CNN's architecture with its trained weights and discard only the classification part of the architecture (fully connected layers). The classification part of the architecture has to be defined according to the classification problem to solve.

<img src="resources/LB03a_transfer_learning.png"/>

### Different scenarios when using transfer learning ([medium.com](https://medium.com/@galen.ballew/transferlearning-b65772083b47)):

**1) New dataset is small and similar to the previous**: Since the new data set is small, you run the risk of overfitting if you retrained everything. Instead, slice off the last fully connected layer and replace with with a new fully connected layer with the appropriate output size. This makes sense because the similarity of the observations (i.e. pictures) means both the low-level (e.g. edges) and high-level features (e.g. shapes) will be similiar. Freeze the weights before the last layer and retrain!

**2) New data set is large and similar to the previous**: Since there is more data, there is less risk of overfitting by retraining. Freeze the low-level feature weights and retrain the high-level features to get a better generalization. Don’t forget to replace the last fully connected layer! *Optional: If your data set is large enough to handle it, you can initialize all the layers with their previous weights/biases and retrain the entire network.*

**3) New data set is small and different than the previous**: This is the most difficult situation to deal with. Intuitively, we know that the previous network is finely-tuned at each layer. However, we do not want any of the high-level features and we cannot afford to retrain them because we could overfit. Instead, remove all of the fully connected layers and all of the high-level convolutional layers. All that should remain are the first few low-level convolutional layers. Place a fully connected layer with the correct number of outputs, freeze the rest of the layers, and retrain. 

**4) New data set is large and different than the previous**: Retrain the entire network. It’s usually a good idea to instantiate the previous model’s weights/biases to speed up training (lot’s of the low-level convolutions will have similiar weights/biases). Don’t forget to replace the fully connected output layer.

## Exercise

Develop an image processing system that is capable of distinguishing between two types of leucocytes (white blood cells): lymphocytes and neutrophils.

<img src="resources/LB03a_blood_cells.png"/>

Use the technique of transfer learning with bottlenecking and aim for a stable accuracy of >92%.

In [None]:
import warnings
import sys
import os
from datetime import datetime
import numpy as np
import pandas as pd
from sklearn.metrics import classification_report, accuracy_score, confusion_matrix
import itertools
from matplotlib import pyplot as plt

with warnings.catch_warnings():
    warnings.simplefilter("ignore")

    os.environ['TF_CPP_MIN_LOG_LEVEL'] = '0'

    from keras import __version__
    from keras import optimizers
    from keras import callbacks
    from keras import applications
    from keras.models import model_from_json, Sequential
    from keras.layers import Dropout, Dense, GlobalAveragePooling2D, BatchNormalization, Activation
    from keras.utils import to_categorical
    from keras.preprocessing.image import ImageDataGenerator

In [None]:
# This function is needed later when evaluating the classifier's results.
def plot_confusion_matrix(cm, classes,
                          normalize=False,
                          title='Confusion matrix',
                          cmap=plt.cm.Blues):
    """
    This function prints and plots the confusion matrix.
    Normalization can be applied by setting `normalize=True`.
    """
    fig = plt.figure()

    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
        cm = np.around(cm, decimals=2, out=None)  
    
    
    thresh = cm.max() / 2.
    
    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    
    plt.title(title)
    plt.colorbar()
    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes, rotation=45)
    plt.yticks(tick_marks, classes)

    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j, i, cm[i, j],
                 horizontalalignment="center",
                 color="white" if cm[i, j] > thresh else "black")

    fig.tight_layout()
    plt.ylabel('True label')
    plt.xlabel('Predicted label')

## LB03a.1 Parametrization

To explore the possibilities of transfer learning, we need to define datapaths, image parameters as well as network parameters. This is done in the following section.

In [None]:
# Specifying the path to our data. You'll only need the datasets when extracting features yourself. 
# You may also use the 'TRAIN_SIMPLE' / 'TEST_SIMPLE' dataset if your device can't handle larger datasets.

train_data_dir = './datasets/data_blood/images/TRAIN/'
test_data_dir = './datasets/data_blood/images/TEST'

In [None]:
# TODO: define image dimensions as tuple, the image data will be scaled 
# to the specified size, in this case we will use 160x120
img_size = ...

In [None]:
batch_size = 64
max_epoches = 100
patience = 10
dropout_rate = 0.25

In [None]:
# Defining the log folder for tensorboard (helps by visualizing training curves)
logdir = "transfer_learning_logs/"

if not os.path.exists(logdir):
    os.makedirs(logdir)

In [None]:
# define a callback in order to check for number of epochs defined in 'patience'
early_stop = callbacks.EarlyStopping(monitor='val_accuracy', min_delta=1e-04, patience=patience, mode='auto')

# define a tensorboard callback
tb = callbacks.TensorBoard(log_dir=logdir + "TransferLearning_" + datetime.now().strftime("%Y.%m.%d-%H:%M:%S"))

## LB03a.2 Architecture Definition

* Load the [VGG16](https://arxiv.org/abs/1409.1556) network's weights from Keras' `applications` module. You may also use other models, but you'd have to make sure to copy their architecture correctly on your own. Depending on the model, this may not be a trivial task.
* Set the option `include_top=False` to load convolutional layers only and ignore the classification part of the network

In [None]:
# TODO: define and load VGG16 model with imagenet-weights, exclude the fully connected layers and define the input shape
base_model = ...

## LB03a.3 Feature Extraction

Use the base model to generate features using the images provided in the `datasets/data_blood` folder.
You will use the extracted features as the input for your classification network.
This technique is also called bottlenecking.

In [None]:
# This flag determines whether the feature extraction network is run or not.
# You can either use the prepared features from moodle or run your own feature extractor.
do_extract_features = 0

In [None]:
if do_extract_features == 1:

    # TODO: define image data generator for training data
    train_datagen = ImageDataGenerator(rescale=1./255)

    # TODO: define image data generator for test data
    test_datagen = ImageDataGenerator(rescale=1./255)

    # generate batches of train image data
    train_generator = train_datagen.flow_from_directory(
        train_data_dir,
        target_size=img_size,
        shuffle=False,
        batch_size=batch_size,
        class_mode='categorical')

    # get filenames for training data
    train_filenames = train_generator.filenames
    # get number of training samples
    train_samples = len(train_filenames)
    # get batch size
    predict_steps_train = int(np.ceil(train_samples / batch_size))

    # generate batches of test image data
    test_generator = test_datagen.flow_from_directory(
        test_data_dir,
        shuffle=False,
        target_size=img_size,
        batch_size=batch_size,
        class_mode='categorical')

    test_filenames = test_generator.filenames
    test_samples = len(test_filenames)
    predict_steps_test = int(np.ceil(test_samples / batch_size))

    print("Starting feature extraction for train images")

    # TODO: get number of classes
    num_classes = len(train_generator.class_indices)
    # TODO: get the class labels for the training data
    train_labels = train_generator.classes
    # TODO: convert the training labels to one-hot encoding aka. categorical vectors
    train_labels = to_categorical(train_labels, num_classes=num_classes)
    # extract features using the training image generator
    train_features= base_model.predict_generator(train_generator, predict_steps_train, verbose = 1)

    # TODO: Saving the bottleneck features and their corresponding labels using e.g. numpy's savez()
    np.savez('./datasets/data_blood/trainfeatures_full', features=train_features, labels=train_labels)
    print("Starting feature extraction for test images")
    # TODO: extract and store the features for the test data

    print("Feature extraction done. Bottleneck features saved.")

    # TODO: get the class labels for the training data
    test_labels = test_generator.classes
    # TODO: convert the training labels to one-hot encoding aka. categorical vectors
    test_labels = to_categorical(test_labels, num_classes=num_classes)
    # extract features using the training image generator
    test_features = base_model.predict_generator(test_generator, predict_steps_test, verbose = 1)

    # TODO: Saving the bottleneck features and their corresponding labels using e.g. numpy's savez()
    np.savez('./datasets/data_blood/testfeatures_full', features=test_features, labels=test_labels)

## LB03a.4 Architecture Definition and Classification

Now it is time to create a classification network for the extracted features. Create a MLP with the necessary complexity and use `GlobalAveragePooling2D` as the first layer of your classification network. Think about the number of the nodes in the output layer.

Try to get a stable accuracy of >92%.

In [None]:
# TODO: load the saved bottleneck training features
npzfile = ...
# TODO: get features
train_features = ...
# TODO: get labels
train_labels = ...

In [None]:
# TODO: define classification part of the transfer model
transfered_model = Sequential()

...

# compile the model with the Adam optimizer
transfered_model.compile(loss='categorical_crossentropy',
              optimizer=optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False),
              metrics=['accuracy'])

# print model structure
print(transfered_model.summary())

In [None]:
# TODO: train the model using the bottleneck features generated in the feature extraction part
transfered_model.fit(
    ...,
    epochs=max_epoches, shuffle = True, batch_size= 256, verbose = 1,
    validation_split = 0.2,
    callbacks= [early_stop, tb])

In order to see the training curves you can now activate tensorboard in your docker container using the following command after navigating to the working directory (e.g. `/notebooks/<your-working-directory>/`): 

`tensorboard --logdir transfer_learning_logs --host 0.0.0.0`

Please note that the `--logdir` parameter has to be the same as your `logdir` variable. 

Afterwards navigate to [http://localhost:6006](http://localhost:6006) in your browser.

## LB03a.5 Evaluation
Load the test features saved to the disk in  LB03a.3.
Use the test data in order to evaluate the performance of your classification network.

In [None]:
# TODO: evaluate the model on the test data bottleneck features
#load the bottleneck training features
npzfile_test = ...
# TODO: load the bottleneck test features
test_features = ...
# TODO: get labels
test_labels = ...

predictions = ...
# a hint here: test_labels are one-hot encoded (if not saved in another format)
targets= ...

# output accuracy score, classification report and confusion matrix
print('Accuracy on test data: %.2f %%\n' % (accuracy_score(targets, predictions)*100))

# TODO: calculate the confusion matrix
cm = confusion_matrix(targets, predictions)

# TODO: plot the confusion matrix
plot_confusion_matrix(...)