# Introduction

In this guide we will design and train an image recognition model for recognizing 17 different species of flowers. We will use a technique called transfer learning, combining a predefined model called VGG19 trained on ImageNet with our own flower classification subnet. The guide is an extension of the workshop found [here](https://www.tekna.no/kurs/workshop-introduksjon-til-maskinlaring-48133/), and thus requires some shallow understanding of machine learning theory and programming in Python. Running the code has two prerequisites: 

__A python environment containing tensorflow:__

In [None]:
%pip install tensorflow

In [1]:
import tensorflow

ModuleNotFoundError: No module named 'tensorflow'

__Downloading and restructuring the dataset in a folder 'flowers' as defined [here](https://github.com/estenhl/flowers/blob/master/README.md):__

In [None]:
import os

print('Root dir exists: {}'.format(os.path.isdir('flowers')))
print('Train dir exists: {}'.format(
      os.path.isdir(os.path.join('flowers', 'train'))))
print('Validation dir exists: {}'.format(
      os.path.isdir(os.path.join('flowers', 'val'))))

# Step 1: Serving images in-memory

To serve images to our model during training we will use a python Generator. Given that our images are structured as previously stated, a folder with the training set and a folder with the validation set, both with subfolders for each category of flowers, we can use a tensorflow dataset instantiated with [image_dataset_from_directory](https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image_dataset_from_directory). The generator acts as a list and serves what is known as batches of tuples, where the first element of the tuple contains images and the second element contains the corresponding labels. 

In [None]:
from tensorflow.keras.preprocessing import image_dataset_from_directory

batches = image_dataset_from_directory('flowers/train', batch_size=4)

batches

At this point it is usually a good idea to do a [sanity check](https://en.wiktionary.org/wiki/sanity_check). This typically includes verifying that our images are served on the correct format, and that the images and labels are still correctly matched. We can first fetch the reverse encoding of the generator to be able to understand what the numeric labels encode.

In [None]:
labels = batches.class_names
labels

We can then use matplotlib to visualize the first batch:

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

for X, y in batches:
    fig, ax = plt.subplots(1, 4, figsize=(10, 10))

    for i in range(len(X)):
        img = X[i].numpy()
        img = img.astype(np.uint8)
        label = labels[y[i]]

        ax[i].imshow(img)
        ax[i].set_title(label)
        ax[i].set_xticks([])
        ax[i].set_yticks([])

    plt.show()
    break # We only need the first batch

# Step 2: Setting up the base model

Once we know our dataset is being served correctly we can start setting up the base model that will be the core of our flower classification model. As previously mentioned this will be a model called VGG19, a small model which yields relatively good results. Like the generator, this also exists as a prebuilt module in Keras, namely in the [applications-module](https://keras.io/applications/). 

In [7]:
from tensorflow.keras.applications.vgg19 import VGG19

When initializing the model we need to specify that we want to use the weights trained on ImageNet, that we want the entire model including top layers, and we also specify the image size we are going to use for verbosity. 

In [None]:
model = VGG19(weights='imagenet', include_top=True,
              input_shape=(224, 224, 3))
model.summary()

We can sanity check this step by running predicting the label for an image from our generator. Note that the predictions we are doing now will be using the labels from ImageNet, as this is what the model currently recognizes, not the labels from our dataset. We start by reinitializing the generator with the correct image size and a batch size of 1. We also set the seed for the random library to control the order of the images.

In [None]:
import numpy as np

np.random.seed(42)
generator = image_dataset_from_directory('flowers/train', batch_size=1,
                                         image_size=(224, 224))

We can then procure predictions from the first batch containing a single image. To decode the prediction using imagenet labels we can use a predefined function found in the same module as the model.

In [None]:
from tensorflow.keras.applications.vgg19 import decode_predictions

for X, y in generator:
    preds = model.predict(X)
    decoded_preds = decode_predictions(preds, top=1)
    fig = plt.figure()

    img = X[0].numpy()
    img = img.astype(np.uint8)
    label = labels[y[0]]
    predicted = decoded_preds[0]

    plt.imshow(img)
    fig.suptitle('Truth: {}, Predicted: {}'.format(label, predicted))
    plt.show()

    break

# Step 3: Preprocessing

We now know both our generator and model are set up, and we are able to make predictions. The predictions, however, does not necessarily look very good. This is because of a process called preprocessing: A set of transformations applied to the images before training to give the model the best possible foundation to learn what it needs. Typical preprocessing includes rescaling the values of the data, shifting the range of pixel-values, and similar numerical operations. Luckily, in Keras, the module which contains a model also contains the preprocessing function used for training the model. We can fetch this function and feed it to our generator to ensure all images are preprocessed before they are served to the model:

In [None]:
from tensorflow.keras.applications.vgg19 import preprocess_input

np.random.seed(42)
generator = image_dataset_from_directory('flowers/train', batch_size=1,
                                         image_size=(224, 224))
generator = generator.map(lambda images, labels: (preprocess_input(images),
                                                  labels))

Once we have reinitialized the generator correctly we can rerun our predictions to see if they improve:

In [None]:
for X, y in generator:
    preds = model.predict(X)
    decoded_preds = decode_predictions(preds, top=1)
    fig = plt.figure()

    img = X[0].numpy()
    img = img.astype(np.uint8)
    label = labels[y[0]]
    predicted = decoded_preds[0]

    plt.imshow(img)
    fig.suptitle('Truth: {}, Predicted: {}'.format(label, predicted))
    plt.show()

    break

Note that seeing improvements in the predictions is not a given even though the images are now preprocessed correctly. The label we are looking for might not be a part of the original dataset the model was trained on, or it might simply be a case of a bad prediction where the model misses. However, running a sanity check (preferably over more images) is usually a good habit to ensure correctness.

# Step 4: Configuring the flower classification model

Once we are happy with the behaviour of our base model, we can start setting up our own custom model for solving the problem we are interested in, in our case classifying flowers. The first step is to be a bit more restrictive with what we use from the pretrained model, only picking out the parts we need. We do this by dropping the top layers used for predictions, and instead perform a pooling operation on the final convolutional layer. Once initialized, we can fetch the input and output of the pretrained model using properties found in keras' model class.

In [13]:
pretrained = VGG19(include_top=False, input_shape=(224, 224, 3),
                   weights='imagenet', pooling='max')
inputs = pretrained.input
outputs = pretrained.output

As we do not want the weights in this part of the final model to change, we freeze them:

In [14]:
for layer in pretrained.layers:
    layer.trainable = False

We can then create our own custom layers for performing our own task. To start, we will use a hidden fully connected layer with 128 neurons, and a final prediction layer with 17 neurons, one per specie in our dataset. Note that the hidden layer takes the output from the pretrained model as its input.

In [15]:
from tensorflow.keras.layers import Dense

hidden = Dense(128, activation='relu')(outputs)
preds = Dense(17, activation='softmax')(hidden)

Once we have all our layers set up we can wrap them in a Model, and compile the model using a pretty standardized set of hyperparameters.

In [None]:
from tensorflow.keras import Model
from tensorflow.keras.optimizers import Adam

model = Model(inputs, preds)
model.compile(loss='sparse_categorical_crossentropy',
              optimizer=Adam(learning_rate=1e-3),
              metrics=['accuracy'])

model.summary()

We can then set up generators like we did before, one for the training data and one for validation, and train the model using Model.fit(). Note that training here is set to ten epochs for demonstration purposes, and more epochs might be necessary.

In [None]:
np.random.seed(42)

batch_size = 32

train_generator = image_dataset_from_directory('flowers/train',
                                               batch_size=batch_size,
                                               image_size=(224, 224))
train_generator = train_generator.map(
    lambda images, labels: (preprocess_input(images), labels)
  )

val_generator = image_dataset_from_directory('flowers/val',
                                             batch_size=batch_size,
                                             image_size=(224, 224))
val_generator = val_generator.map(
    lambda images, labels: (preprocess_input(images), labels)
  )

model.fit(train_generator,
          epochs=10,
          validation_data=val_generator)

We can then predict using the trained model:

In [None]:
%matplotlib inline

for X, y in val_generator:
    preds = model.predict(X)

    truth = labels[y[0]]
    label = labels[np.argmax(preds[0])]
    probability = preds[0][np.argmax(preds[0])]
    fig = plt.figure()

    img = X[0].numpy()
    img = img.astype(np.uint8)

    plt.imshow(img)
    fig.suptitle('Predicted: {}, probability: {}, truth: {}'.format(
        label, probability, truth)
    )
    plt.show()

    break

# Step 5: Regularization

If you run the training above for a larger number of epochs, you will typically achieve a relatively decent result on the training data and a considerably worse outcome on the validation data (I see 100% training accuracy and about 65% validation accuracy, but this is prone to varying based on various randomness in the process). This is an example of overfitting: The model starts remembering specifics from the training set instead of learning general features that also generalize to new data. We handle this by introducing regularization, a common technique for forcing the model to generalize. In image recognition this is typically done using dropout-layers, that randomly drops a subset of its artificial neurons while training. We can introduce dropout to our model by inserting a Dropout layer which drops 30% of the neurons between the two final layers of our model.

In [None]:
from tensorflow.keras.layers import Dropout

hidden = Dense(128, activation='relu')(outputs)
dropout = Dropout(.3)(hidden)
preds = Dense(17, activation='softmax')(dropout)

model = Model(inputs, preds)
model.compile(loss='sparse_categorical_crossentropy',
              optimizer=Adam(learning_rate=1e-3),
              metrics=['accuracy'])

model.summary()

Now we can recompile the model and restart training to achieve what should be a better result:

In [None]:
np.random.seed(42)

train_generator = image_dataset_from_directory('flowers/train', batch_size=batch_size,
                                               image_size=(224, 224))
train_generator = train_generator.map(
    lambda images, labels: (preprocess_input(images), labels)
  )

val_generator = image_dataset_from_directory('flowers/val', batch_size=batch_size,
                                             image_size=(224, 224))
val_generator = val_generator.map(
    lambda images, labels: (preprocess_input(images), labels)
  )

model.fit(train_generator,
          epochs=10,
          validation_data=val_generator)

# Step 6: Augmentations

A second technique for avoiding overfitting is augmenting the images, which goal it is to take the existing data points in our dataset and create brand new samples. This process works by somehow modifying an image in a way which changes it, while maintaining the thruthfulness of the corresponding label. An example in our case is mirroring the images vertically, which can be implemented with the map-function of our dataset. Using this functionality will randomly decide whether to flip the image or not each time the image is presented, theoretically yielding two samples from the single data point we started with. We can see this by visualizing the same image served from multiple batches

In [None]:
from tensorflow import expand_dims, Tensor
from tensorflow.image import random_flip_left_right
from typing import Tuple


fig, ax = plt.subplots(1, 5, figsize=(15, 10))


def augment(images: Tensor, labels: Tensor) -> Tuple[Tensor]:
  first_image = images[0]
  first_image = random_flip_left_right(first_image)
  image_batch = expand_dims(first_image, 0)

  return image_batch, labels


for i in range(5):
    np.random.seed(42)
    generator = image_dataset_from_directory('flowers/train', batch_size=1,
                                             seed=42,
                                             image_size=(224, 224))
    generator = generator.map(augment)

    for X, y in generator:
        img = X[0].numpy()
        img = img.astype(np.uint8)
        ax[i].imshow(img)
        ax[i].set_title('Run {}'.format(i + 1))
        ax[i].set_xticks([])
        ax[i].set_yticks([])
        break

plt.show()

Retraining the model with a set of augmentations should increase the accuracy even further. There exists a variety of options, and getting the best performance can sometime feel more like an art than a science.

# Summary

The steps in this guide provide a good starting point for classifying species of flowers, or solving any other generic image classification problem. Retracing the steps while leaving more epochs for the model to train should provide a solid baseline with decent results (my best run achieved ~75% validation accuracy). Continued work on this problem would typically include trying different architectures as core models, experimenting with various designs for the custom problem-specific final layers and testing a wide range of combinations of regularization and augmentations to combat overfitting. It should be a feasible goal to reach an accuracy in the high 90s, which seem to be how the state-of-the-art models are performing. Happy hacking!