In this notebook, we will present how to classify flower images by using transfer learning from a pre-trained network. 

A *pre-trained model* is a saved network model that was previously trained on a large dataset.

The idea of **transfer learning** for image classification is that if we use a model which was trained on a really large and representative dataset, this model can serve as a base model to classify images. Indeed, we can use the learned feature maps without having to start from scratch (which require to build and train a custom model on large datasets) which can take quiet a time (training time).

Here, we're gonna test two approaches:
1. **feature extraction**: we use the representations learned by an already trained network to extract meaningful features from new samples. We're simply gonna add a new classifier, which will be trained on top of the pretrained model so that we can use the feature maps learned previously for the dataset. Of course we do not have to retrain this trained network. The base convolutional network already contains features that are generically useful for classifying pictures. Note that the final classification part of the pretrained model is specific to the original classification.

2. **fine tuning** : this method consists by unfreezing some of the top layers of the previously frozen model and jointly train both the new-top layer (to classify our specific datasets) and these last layers of the frozen model. We are doing this since only the last layers of the base model extract top-level feature maps, the first convolution layers only extract basic features (edges, vertical/horizontal lines ...). This fine-tuning of the top feature representation in the base model allows to make them more specific for our classification task.

[ **I ) Introduction**](#content1)
- VGG16
- VGG19 
- MobileNetV2

[ **II ) Data**](#content2)
- 2.1 Load & explore data
- 2.2 Split training and validation set

[ **III ) CNN model**](#content3)
- 3.1 About the optimizer and learning rate
- 3.2 Define the model
- 3.3 Data augmentation
- 3.4 Feature extraction
- 3.5 Fine tuning

[ **IV ) Model evaluation **](#content4)
- Confusion matrix 
- Prediction vizualisations

[ **V ) Conclusion **](#content5)

In [None]:
# import stuff 
import numpy as np
import pandas as pd 
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import seaborn as sns
from PIL import Image
from IPython.display import Image, display
import os

from sklearn.metrics import confusion_matrix
from sklearn.datasets import load_files
from sklearn.model_selection import train_test_split

import tensorflow_hub as hub
from keras.utils import to_categorical
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping, ReduceLROnPlateau
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.preprocessing.image import load_img, img_to_array
from tensorflow.keras.optimizers import Adam, RMSprop, SGD
from tensorflow.keras.applications import ResNet50, VGG16, VGG19, MobileNetV2
from tensorflow.keras.applications.resnet50 import preprocess_input as prepro_res50
from tensorflow.keras.applications.vgg19 import preprocess_input as prepro_vgg19
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, Dense, Flatten, GlobalAveragePooling2D, BatchNormalization, Dropout, MaxPool2D, MaxPooling2D

In [None]:
# load the backend
from keras import backend as K

# prevent Tensorflow memory leakage
K.clear_session()

In [None]:
print(os.listdir('../input/'))

<a id="content1"></a>
## I) Introduction

We show the architecture of one of the most commons CNN: VGG16. There are also VGG19, ResNet50, MobileNetV2, AlexNet etc. All these were pre-trained on the ImageNet dataset: a gold mine dataset for computer vision. It consists of about 14 M hand-labelled annotated images which contains over 22,000 categories. This pre-trained models will be a solid base to help us classify our flower dataset.

VGG16 was published in 2014 and is one of the simplest (compared to the other CNN architectures used in Imagenet competition). This network contains total 16 layers in which weights and bias parameters are learnt.
- a total of 13 convolutional layers are stacked and 3 dense layers for classification
- a number of filters in the convolution layers follow an increasing pattern (similar to decoder architecture of autoencoder)
- the informative features are obtained by Max Pooling layers applied at different steps in the architecture
- the dense layers are made of 4096, 4096, and 1000 nodes

![alt text](https://tech.showmax.com/2017/10/convnet-architectures/image_0-8fa3b810.png 'VGG16 architecture')

So if you want to implement yourself a VGG16-like model it is quite straighforward. The only issue will be the time to train the model. Nonetheless, we demonstrates how to build it below.

In [None]:
# Let's implement it  
input_shape = (224, 224, 3)

my_VGG16 = Sequential([Conv2D(64, (3, 3), input_shape=input_shape, padding='same', activation='relu'), 
                       Conv2D(64, (3, 3), activation='relu', padding='same'), 
                       MaxPooling2D(pool_size=(2, 2), strides=(2, 2)), 
                       Conv2D(128, (3, 3), activation='relu', padding='same'), 
                       Conv2D(128, (3, 3), activation='relu', padding='same'), 
                       MaxPooling2D(pool_size=(2, 2), strides=(2, 2)), 
                       Conv2D(256, (3, 3), activation='relu', padding='same'),  
                       Conv2D(256, (3, 3), activation='relu', padding='same'), 
                       Conv2D(256, (3, 3), activation='relu', padding='same'),  
                       MaxPooling2D(pool_size=(2, 2), strides=(2, 2)), 
                       Conv2D(512, (3, 3), activation='relu', padding='same'), 
                       Conv2D(512, (3, 3), activation='relu', padding='same'),  
                       Conv2D(512, (3, 3), activation='relu', padding='same'),  
                       MaxPooling2D(pool_size=(2, 2), strides=(2, 2)), 
                       Conv2D(512, (3, 3), activation='relu', padding='same'), 
                       Conv2D(512, (3, 3), activation='relu', padding='same'), 
                       Conv2D(512, (3, 3), activation='relu', padding='same'), 
                       MaxPooling2D(pool_size=(2, 2), strides=(2, 2)), 
                       Flatten(),                          # Convert 3D matrices into 1D vector
                       Dense(4096, activation='relu'),     # Add Fully-connected layers
                       Dense(4096, activation='relu'), 
                       Dense(1000, activation='softmax')   # Final Fully-connected layer for predictions
                      ])

my_VGG16.summary()

We also present the VGG19 model. It is similar to the VGG16 architecture with the addition of 3 more convolution layers.

![alt text](https://cdn-images-1.medium.com/max/1600/1*cufAO77aeSWdShs3ba5ndg.jpeg 'VGG19 architecture')

In this notebook, we'll make use of a much complex architecture with the MobileNetV2 model. It has been developed in 2018, MobileNetV2 is a significant improvement over MobileNetV1 and pushes the state of the art for mobile visual recognition including classification, object detection and  segmentation. If you're interested, you can find more details in the [release paper](https://arxiv.org/abs/1801.04381).

![alt text](https://1.bp.blogspot.com/-M8UvZJWNW4E/WsKk-tbzp8I/AAAAAAAAChw/OqxBVPbDygMIQWGug4ZnHNDvuyK5FBMcQCLcBGAs/s1600/image5.png 'MobileNetV2 architecture')

<a id="content2"></a>
## II) Data

### 2.1 - Load & explore data

In [None]:
path_data = '../input/flowers-recognition/flowers/flowers/'
print(os.listdir(path_data))

In [None]:
from os.path import join

img_folders = [join(path_data, folder) for folder in os.listdir(path_data)]
list(img_folders)

In [None]:
data_dir = '../input/flowers-recognition/flowers/flowers/'

data = load_files(data_dir, random_state=28, shuffle=True)
X = np.array(data['filenames'])    # files location of each flower
y = np.array(data['target'])       # target label of each flower
labels = np.array(data['target_names'])

# remove eventual .pyc or .py files
pyc_file = (np.where(file==X) for file in X if file.endswith(('.pyc','.py')))
for pos in pyc_file:
    X = np.delete(X, pos)
    y = np.delete(y, pos)
    
print(f'Data files - {X}')
print(f'Target labels - {y}')   # numbers are corresponding to class label, 
                               # we have to change them to a vector of 5 elements
print(f'Name labels - {labels}')
print(f'Number of training files : {X.shape[0]}')

In [None]:
# Flower species number
df = pd.DataFrame({'species': y})
print(df.shape)
df.head()

In [None]:
# associate names to species number
df['flower'] = df['species'].astype('category')
df['flower'].cat.categories = labels
df.head()

Let's check how many of each species of flowers are present.

In [None]:
fig, ax = plt.subplots()
ax = sns.countplot(x="flower", data=df)
ax.set(ylabel='Count', title='Flower species distribution')
ax.tick_params(axis='x', rotation=15)

Now, we're gonna load the different images and transform them into numpy arrays.

In [None]:
image_size = 224     # standard value for Transfer learning usecase (MobileNet, ResNet50, VGG16, VGG19)

def read_and_prep_images(img_paths, img_height=image_size, img_width=image_size):
    imgs = [load_img(img_path, target_size=(img_height, img_width)) for img_path in img_paths]   # load image
    img_array = np.array([img_to_array(img) for img in imgs])   # image to array 
    return(img_array)

X = np.array(read_and_prep_images(X))
print(X.shape)  # (4323, 224, 224, 3) = (num_images, height_size, width_size, depth=RGB)

In [None]:
# Let's have a look at 6 randomly picked flowers.

In [None]:
N = 18  # flowers to display
fig, axes = plt.subplots(3, 6, figsize=(16,6))
for ax, j in zip(axes.flat, np.random.randint(0, len(X), N)):    
    ax.imshow(X[j].astype(np.uint8))
    ax.set(title=f'Flower: {labels[y[j]]}', xticks=[], yticks=[])
fig.tight_layout()

### 2.2 - Label encoding

In [None]:
num_classes = len(np.unique(y))
print(f'Number of classes: {num_classes} --> {labels}')

Labels are the 5 species number (from 0 to 4). Thus, we need to encode these labels to one-hot vectors. For instance, an image of a sunflower should have a label 3 and a corresponding **y** = [0,0,0,1,0].

In [None]:
y = to_categorical(y, num_classes)
print(y.shape)

### 2.3 - Split training and validation set

Here, we're gonna split our dataset into a training, a validation and a testing one. This ensures that there are no bias: the model is trained on images with known labels, then we test our model accuracy on the validation dataset on images that our model did not see before. Finally, we compute the accuracy on the test dataset.

In [None]:
#train, validation and test from the train dataset
Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, shuffle=True, 
                                                test_size=0.25, random_state=28)

Xval, Xtest, yval, ytest = train_test_split(Xtest, ytest, test_size=0.5,
                                            shuffle=True, random_state=28)
print(f'Train dataset: {Xtrain.shape[0]}')
print(f'Validation dataset: {Xval.shape[0]}')
print(f'Test dataset: {Xtest.shape[0]}')

In [None]:
# release memory
del X

In [None]:
del y

In [None]:
# rescale pixel values
# X /= 255

In [None]:
#num_classes = 5
#resnet_weights_path = '../input/resnet50/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5'

<a id="content3"></a>
## III) CNN model

### 3.1 About the optimizer and learning rate

When our model will be built, we need to specify an accuracy function, a loss function and an optimisation algorithm.

The accuracy function is used to evaluate the performance of the model.

The loss function is used to measure how the model performs on data with known labels. It tells us how poorly the model performs in a supersised system. For multi-label classification, we make use of a specific loss function called as *categorical_crossentropy* (similar to cross-entropy in maths).

Finally, the optimizer function is used in order to minize the loss function by changing model parameters (weighs values, filters kernel values etc.). 

For this classification problem, we choose the `RMSprop` optimizer which is very efficient and commonly used (more details on the [optimizers on Keras here](https://keras.io/optimizers/)).

Since deep networks can take quiet a time for the optimizer to converge, we're gonna use an annealing method of the learning rate (*LR*).

The *LR* is basically the step by which the optimizer is 'walking'. A hight *LR* correspond to big steps and thus the convergence is faster. However, in that case the sampling is not really efficient since the optimizer do not fall especially in the right minima.

At the opposite, have a low *LR* means that the optimizer will probably find the right local minima but it will take a lot of time. 

The idea here is to start from a low value but not so low and then decrease the *LR* along the training to reach efficiently the global minimum of the loss function. Using the `ReduceLROnPlateau` method , we are able to choose to reduce the *LR* by a coefficient (here 75%) if the accuracy has not improved after a number of epochs (here 3).

<br>

In addition, we use the `EarlyStopping` method to control the training time: if the accuracy has not improved after 5 epochs we stop.

Finally we make use of the `ModelCheckpoint` which is useful for monitoring the best found weights during the training.

### 3.2 Define the model

For now, we're doing feature extraction i.e. we freeze the convolutional base (MobileNet). Then, we add a classifier on top of it and train this top-level classifier.

In [None]:
# Load the VGG19 model without the final layers (include_top=False)
img_shape = (image_size, image_size, 3)

print('Loading MobileNetV2 ...')
base_model = MobileNetV2(input_shape=img_shape,
                   include_top=False,
                   weights='imagenet')
print('MobileNetV2 loaded')

base_model.trainable = False
    
#base_model.summary()

In [None]:
base_model.output_shape

Now, we need to generate predictions from the block of features, average over the spatial locations, using a `GlobalAveragePooling2D` layer to convert the features to a single 1280-element vector per image. Finally, we'll some regular `Dense` layer with a final one with 5 units corresponding to each species of flower.

In [None]:
model = Sequential([base_model,
                    GlobalAveragePooling2D(), 
                    Dense(num_classes, activation='softmax')
                   ])
model.summary()

Note that only ~ 6000 parameters will be trained, the other ~2.2M from the MobileNetV2 model were already trained.

In [None]:
# callbacks 
weight_path = '{}_best_weights.hdf5'.format('flower')
checkpointer = ModelCheckpoint(weight_path,
                               monitor='val_accuracy',
                               verbose=1, 
                               save_best_only=True,
                               mode='auto',
                               save_weights_only=True)

# set a learning rate annealer
learning_rate_reduction = ReduceLROnPlateau(monitor='val_accuracy', 
                                            patience=3, 
                                            verbose=1, 
                                            factor=0.7, 
                                            min_lr=0.00001)
    
# early stop if not improvement of accuracy after 5 epochs
early = EarlyStopping(patience=6, 
                      verbose=1) 
    
callbacks = [checkpointer, learning_rate_reduction] #, early]

# Optimizer
opt = RMSprop(lr=0.001, rho=0.9, epsilon=1e-08, decay=0.0)
    
# Compilation
model.compile(optimizer=opt, 
              loss='categorical_crossentropy', 
              metrics=['accuracy'])

### 3.3 - Data augmentation

A useful trick to ovoid any overfitting is to use *data augmentation*. What is that? Well, the idea is to add artificially data into our dataset. But of course not any data, we alter the dataset with tiny transformations to reproduce very similar images. 

For instance, we rotate of a few degree an image, we de-center it or we zoom in or out a little bit. These common augmentation techniques are horizontal/vertical flips, rotations, translations, rescaling, random crops, adjust brightness and more.

Thanks to these transformations, we can get bigger dataset (x2, x3 in size) and then train our model in a much robust way.

In [None]:
image_size = 224
batch_size = 32
path = '../input/flowers-recognition/flowers/flowers/'

#train_gen = train_aug.flow(Xtrain, ytrain, batch_size=batch_size)
# The validation data must not have data augmentation
#valid_gen = valid_no_aug.flow(Xval, yval, batch_size=batch_size)

train_datagen = ImageDataGenerator(
        rescale=1./255,           # rescale pixel values [0,255] to [0,1]
        horizontal_flip=True,     # random horizontal flip
        width_shift_range=0.2,    # random shift images horizontally (fraction of total width)
        height_shift_range=0.2,   # random shift images vertically (fraction of total height)
        zoom_range=0.2)           # random zoom image
        #rotation_range=20,       # random rotation
        #shear_range=0.2)         # shear transfo
        #validation_split=0.2)    # splitting train / test datasets

test_datagen = ImageDataGenerator(
        rescale=1./255)
        #validation_split=0.2)

train_gen = train_datagen.flow(
        Xtrain, ytrain, 
        batch_size=batch_size,
        shuffle=False)              # already applied

valid_gen = test_datagen.flow(
        Xval, yval,
        batch_size=batch_size,
        shuffle=False)   

### 3.4 Feature extraction

In [None]:
batch_size = 32
epochs_0 = 80
steps_per_epoch = len(train_gen.x) // train_gen.batch_size
validation_steps = len(valid_gen.x) // valid_gen.batch_size

history = model.fit(
    train_gen,
    steps_per_epoch=len(Xtrain) // batch_size,   # or batch_size=32
    epochs=epochs_0 ,
    validation_data=valid_gen,
    validation_steps=len(Xval) // batch_size,
    callbacks=callbacks)

In [None]:
def plot_history(history, loss_max=5):
    """
    Check loss and accuracy evolution.
    """
    
    acc = history.history['accuracy']
    val_acc = history.history['val_accuracy']

    loss = history.history['loss']
    val_loss = history.history['val_loss']

    fig, (ax1, ax2) = plt.subplots(1,2,figsize=(14, 4))
    ax1.plot(acc, label='Training')
    ax1.plot(val_acc, label='Validation')
    ax1.legend(loc='lower right')
    ax1.set(ylabel='Accuracy', title='Training - Validation Accuracy', 
            ylim=([min(plt.ylim()),1]))

    ax2.plot(loss, label='Training')
    ax2.plot(val_loss, label='Validation')
    ax2.legend(loc='upper right')
    ax2.set(ylabel='Loss (cross entropy)', xlabel='epochs',
           title='Training - Validation Loss', ylim=([0, loss_max]))
    plt.show()

In [None]:
plot_history(history, loss_max=1)

In [None]:
# Generator for test dataset
datagen = ImageDataGenerator(
        rescale=1./255)

eval_datagen = datagen.flow(
        Xtest, ytest,
        batch_size=batch_size,
        shuffle=False)      # since shuffle was already during splitting into train, valid, test

# Evaluation on the test dataset
loss, acc = model.evaluate_generator(eval_datagen, verbose=0)
print(f'Test loss: {loss:.2f}')
print(f'Test accuracy: {acc*100:.2f}%')

### 3.5 Fine tuning

It is now time for the fine tuning of our model: we're gonna unfreeze some of the top layers of the base model and train all those and the top layer classifier.

In [None]:
base_model.trainable = True

# Let's take a look to see how many layers are in the base model
print(f'Number of layers in the base model: {len(base_model.layers)}')

In [None]:
# Fine-tune from this layer onwards
fine_tuning = 100

# Freeze all the layers before fine_tuned_ind
for layer in base_model.layers[:fine_tuning]:
    layer.trainable =  False

In [None]:
# Load best weights
# model.load_weights(weight_path)

# Finer learning rate now
opt = RMSprop(lr=0.0001, rho=0.9, epsilon=1e-08, decay=0.0)
    
# Compilation
model.compile(optimizer=opt, 
              loss='categorical_crossentropy', 
              metrics=['accuracy'])

In [None]:
model.summary()

Continue training the model

In [None]:
fine_tuned_epochs = 60
total_epochs = epochs_0 + fine_tuned_epochs

history_fined = model.fit_generator(
    train_gen,
    steps_per_epoch=len(Xtrain) // batch_size,   # or batch_size=32
    epochs=total_epochs,
    initial_epoch=history.epoch[-1],
    validation_data=valid_gen,
    validation_steps=len(Xval) // batch_size,
    callbacks=callbacks)

In [None]:
def plot_history_fined(history, history_fined, initial_epochs=epochs_0, loss_max=1):
    """
    Check loss and accuracy evolution after fine tuning
    """
    
    acc = history.history['accuracy'][:epochs_0]
    acc += history_fined.history['accuracy']
    val_acc = history.history['val_accuracy'][:epochs_0]
    val_acc += history_fined.history['val_accuracy']
    
    loss = history.history['loss'][:epochs_0]
    loss += history_fined.history['loss']
    val_loss = history.history['val_loss'][:epochs_0]
    val_loss += history_fined.history['val_loss']
  
    
    fig, (ax1, ax2) = plt.subplots(1,2,figsize=(14, 4))
    ax1.plot(acc, label='Training')
    ax1.plot(val_acc, label='Validation')
    ax1.plot([initial_epochs-1,initial_epochs-1],
              plt.ylim(), label='fine-tuning', ls='--')
    ax1.legend(loc='lower right')
    ax1.set(ylabel='Accuracy', title='Training - Validation Accuracy', 
            ylim=([0.4,1.01]))

    ax2.plot(loss, label='Training')
    ax2.plot(val_loss, label='Validation')
    ax2.plot([initial_epochs-1,initial_epochs-1],
              plt.ylim(), label='fine-tuning', ls='--')
    ax2.legend(loc='upper right')
    ax2.set(ylabel='Loss (cross entropy)', xlabel='epochs',
           title='Training - Validation Loss', ylim=([0, loss_max]))
    plt.show()

In [None]:
plot_history_fined(history, history_fined)

Great ! We can really see that fine-tuning is working and improve the accuracy of our model. We note also that the validation loss tends to increase a bit a the end: to prevent an eventual overfitting situation, we could add the `EarlyStopping` function in the callbacks during the training.

<a id="content4"></a>
## IV) Model evaluation

In [None]:
# Evaluation on the test dataset
loss, acc = model.evaluate_generator(eval_datagen, verbose=0)
print(f'Test loss: {loss:.2f}')
print(f'Test accuracy: {acc*100:.2f}%')

Indeed we have now an 88% accuracy on the test dataset (compared to 77% before fine tuning) ! 

### Confusion matrix

In [None]:
import seaborn as sns
from sklearn import metrics

pred = model.predict(eval_datagen, verbose=1)

# get most likely class
y_pred = pred.argmax(axis=1)
y_true = ytest.argmax(axis=1)

print(metrics.classification_report(y_true, y_pred))

# confusion matrix
mat = metrics.confusion_matrix(y_true, y_pred)
df_mat = pd.DataFrame(mat, index=labels, columns=labels)
plt.figure(figsize=(8,6))
sns.heatmap(df_mat, annot=True, fmt='d', cmap=plt.cm.Reds)
#plt.tight_layout()
plt.ylabel('True label')
plt.xlabel('Predicted label')

### Prediction vizualisations

let's have a look at some predictions !

In [None]:
N = 20  # flowers to display
fig, axes = plt.subplots(4, 5, figsize=(20,12))
for i, ax in enumerate(axes.flat):
    ax.imshow(Xtest[i].astype(np.uint8))
    ax.set(xticks=[], yticks=[])
    true = y_true[i]
    prediction = y_pred[i]   
    ax.set_xlabel(f'Predict: {labels[prediction]}\n True: {labels[true]}', 
                  color='black' if true == prediction else 'red')

#fig.tight_layout()
fig.suptitle('Predicted flowers; Incorrect Labels in Red', size=14)

<a id="content5"></a>
## Conclusion

We can note the improvement of the model predictions by doing some fine-tuning. Of course, we can complexify the model by playing with the hyperparameters and/or adding other layers on top of the top-less MobileNetV2 base model such as several `Dense` layers with some `Dropout` ones. Feel free to test some different architectures. 

It will be interesting also to compare this fine-tuning method with VGG16, VGG19, ResNet50 etc. models.

Let me know what you thought of this notebook and if it pleased you don't hesitate to leave me a comment with a +1 ; ) 