# 1.0 Network as feature extractors

Over this lesson, we’ll be discussing the concept of **transfer learning**, 

> the ability to use a pre-trained model as a “shortcut” to learn patterns from data it was not originally trained on.

Consider a traditional machine learning scenario where we are given two classification challenges.

**In the first challenge**, our goal is to train a Convolutional Neural Network to recognize dogs
vs. cats in an image.

Then, **in the second project**, we are tasked with recognizing three separate species of bears:
grizzly bears, polar bears, and giant pandas. Using standard practices in **machine learning, neural networks, and deep learning**, we would treat these challenges as two separate problems. 

- First, we would gather a sufficient labeled dataset of dogs and cats, followed by training a model on the dataset. 
- We would then repeat the process a second time, only this time, gathering images of our
bear breeds, and then training a model on top of the labeled bear dataset.


Transfer learning proposes a different training paradigm – 

> what if we could use an existing pretrained classifier and use it as a starting point for a new classification task?

In context of the proposed challenges above, **we would first train a Convolutional Neural Network to recognize dogs versus cats**. 

> Then, we would use the same CNN trained on dog and cat data to be used to
distinguish between bear classes, even though no bear data was mixed with the dog and cat data.


Does this sound too good to be true? It’s actually not. **Deep neural networks trained on
large-scale datasets such as ImageNet have demonstrated to be excellent at the task of transfer
learning**. These networks learn a set of rich, discriminating features to recognize 1,000 separate object classes. It makes sense that these filters can be reused for classification tasks other than what the CNN was originally trained on.

In general, **there are two types of transfer learning** when applied to deep learning for **computer vision**:

1. Treating networks as arbitrary feature extractors.
2. Removing the fully-connected layers of an existing network, placing new FC layer set on
top of the CNN, and fine-tuning these weights (and optionally previous layers) to recognize
object classes.


In this section, we’ll be focusing primarily on the first method of transfer learning, treating networks as feature extractors.

## 1.1 Extracting features with a pre-trained CNN

Up until this point, we have treated Convolutional Neural Networks as end-to-end image classifiers:

1. We input an image to the network.
2. The image forward propagates through the network.
3. We obtain the final classification probabilities from the end of the network.

However, **there is no “rule” that says we must allow the image to forward propagate through
the entire network**. 

> Instead, we can stop the propagation at an arbitrary layer, such as an activation
or pooling layer, extract the values from the network at this time, and then use them as feature
vectors. 

For example, let’s consider the VGG16 network architecture by [Simonyan and Zisserman](https://arxiv.org/abs/1409.1556) (Figure below, left).

<center><img width="500" src="https://drive.google.com/uc?export=view&id=1CNy_EpEVeVAn7LJbPyeZm7xKW1rLnwdz"></center><center><b>Left</b>: The original VGG16 network architecture that outputs probabilities for each of the 1,000 ImageNet class labels. <b>Right</b>: Removing the FC layers from VGG16 and instead returning the output of the final POOL layer. This output will serve as our extracted features.</center>

Along with the layers in the network, we have also included the input and output shapes of the
volumes for each layer. When treating networks as a feature extractor, we essentially “chop off” the network at an arbitrary point (normally prior to the fully-connected layers, but it really depends on your particular dataset).

Now the last layer in our network is a max pooling layer (Figure above, right) which will have the output shape of 7 x 7 x 512 implying there are 512 filters each of size 7 x 7. If we were to forward propagate an image through this network with its FC head removed, we would be left with 512, 7x7 activations that have either activated or not based on the image contents. Therefore, we can actually take these 7x7x512 = 25,088 values and treat them as a feature vector that **quantifies the contents of an image**.

If we repeat this process for an entire dataset of images (including datasets that VGG16 was
not trained on), we’ll be left with a design matrix of N images, each with 25,088 columns used to
quantify their contents (i.e., feature vectors). Given our feature vectors, we can train an off-the-shelf machine learning model such a Linear SVM, Logistic Regression classifier, or Random Forest on top of these features to obtain a classifier that recognizes new classes of images.

Keep in mind that the CNN itself is not capable of recognizing these new classes – instead,
we are using the CNN as an intermediary feature extractor. The downstream machine learning
classifier will take care of learning the underlying patterns of the features extracted from the CNN.

Later in this section, we’ll be demonstrating how you can use pre-trained CNNs (specifically
VGG16) and the Keras library to obtain > 90% classification accuracy on image datasets such as
Animals, CALTECH-101, and Flowers-17. Neither of these datasets contain images that VGG16
was trained on, but by applying transfer learning, we are able to build super accurate image
classifiers with little effort. The trick is extracting these features and storing them in an efficient manner. To accomplish this task, we’ll need HDF5.

## 1.2 Writing features to an HDF5 dataset

Before we can even think about treating CNN Architectures as a feature extractor, we
first need to develop a bit of infrastructure. In particular, we need to define a Python class
named HDF5DatasetWriter, which as the name suggests, is responsible for taking an input set of
NumPy arrays (whether features, raw images, etc.) and writing them to HDF5 format.

In [1]:
# import the necessary packages
import h5py
import os

class HDF5DatasetWriter:
  def __init__(self, dims, outputPath, dataKey="images",bufSize=1000):
    """
    The constructor to HDF5DatasetWriter accepts four parameters, two of which are optional.
    
    Args:
    dims: controls the dimension or shape of the data we will be storing in the dataset.
    if we were storing the (flattened) raw pixel intensities of the 28x28 = 784 MNIST dataset, 
    then dims=(70000, 784).
    outputPath: path to where our output HDF5 file will be stored on disk.
    datakey: The optional dataKey is the name of the dataset that will store
    the data our algorithm will learn from.
    bufSize: controls the size of our in-memory buffer, which we default to 1,000 feature
    vectors/images. Once we reach bufSize, we’ll flush the buffer to the HDF5 dataset.
    """

    # check to see if the output path exists, and if so, raise
    # an exception
    if os.path.exists(outputPath):
      raise ValueError("The supplied `outputPath` already "
        "exists and cannot be overwritten. Manually delete "
        "the file before continuing.", outputPath)

    # open the HDF5 database for writing and create two datasets:
    # one to store the images/features and another to store the
    # class labels
    self.db = h5py.File(outputPath, "w")
    # 
    # for resource limitations due to hard-disk space, a compression algorithm can be used, the price is the demand of computational power
    #
    self.data = self.db.create_dataset(dataKey, dims,dtype="float")#compression='gzip')
    self.labels = self.db.create_dataset("labels", (dims[0],),dtype="int")

    # store the buffer size, then initialize the buffer itself
    # along with the index into the datasets
    self.bufSize = bufSize
    self.buffer = {"data": [], "labels": []}
    self.idx = 0

  def add(self, rows, labels):
    # add the rows and labels to the buffer
    self.buffer["data"].extend(rows)
    self.buffer["labels"].extend(labels)

    # check to see if the buffer needs to be flushed to disk
    if len(self.buffer["data"]) >= self.bufSize:
      self.flush()

  def flush(self):
    # write the buffers to disk then reset the buffer
    i = self.idx + len(self.buffer["data"])
    self.data[self.idx:i] = self.buffer["data"]
    self.labels[self.idx:i] = self.buffer["labels"]
    self.idx = i
    self.buffer = {"data": [], "labels": []}

  def storeClassLabels(self, classLabels):
    # create a dataset to store the actual class label names,
    # then store the class labels
    dt = h5py.special_dtype(vlen=str) # `vlen=unicode` for Py2.7
    labelSet = self.db.create_dataset("label_names",(len(classLabels),), dtype=dt)
    labelSet[:] = classLabels

  def close(self):
    # check to see if there are any other entries in the buffer
    # that need to be flushed to disk
    if len(self.buffer["data"]) > 0:
      self.flush()

    # close the dataset
    self.db.close()

As you can see, the **HDF5DatasetWriter** doesn’t have much to do with machine learning or
deep learning at all – it’s simply a class used to help us store data in HDF5 format. As you continue
in your deep learning career, you’ll notice that much of the initial labor when setting up a new
problem is getting the data into a format you can work with. Once you have your data in a format
that’s straightforward to manipulate, it becomes substantially easier to apply machine learning and
deep learning techniques to your data.

Now that our **HDF5DatasetWriter** is implemented, we can move on to actually extracting
features using pre-trained Convolutional Neural Networks.

## 1.3 The feature extraction process

Let’s define a Python script that can be used to extract features from an arbitrary image dataset (provided the input dataset follows a specific directory structure).

In [4]:
# import the necessary packages
from tensorflow.keras.applications import VGG16
from tensorflow.keras.applications import imagenet_utils
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.preprocessing.image import load_img
from sklearn.preprocessing import LabelEncoder
from imutils import paths
import numpy as np
import progressbar
import h5py
import random
import os

In [5]:
!pip install gdown



In [6]:
# download animals dataset
!gdown https://drive.google.com/uc?id=1ZkrEbDEdiSjpog6IcWK-HB2Y3uk2WjFE

# download caltech-101 dataset
!gdown https://drive.google.com/uc?id=1VpcNjEFHbtfZbQx7Q9FvRBlCYFwxYPIS

# download flowers dataset
!gdown https://drive.google.com/uc?id=1o_BeSmvyuelAyEYpGPNphlkX4bfQy2r5

Downloading...
From: https://drive.google.com/uc?id=1ZkrEbDEdiSjpog6IcWK-HB2Y3uk2WjFE
To: /content/animals.zip
197MB [00:01, 122MB/s] 
Downloading...
From: https://drive.google.com/uc?id=1VpcNjEFHbtfZbQx7Q9FvRBlCYFwxYPIS
To: /content/caltech-101.zip
121MB [00:00, 148MB/s]  
Downloading...
From: https://drive.google.com/uc?id=1o_BeSmvyuelAyEYpGPNphlkX4bfQy2r5
To: /content/flowers17.zip
60.5MB [00:00, 118MB/s] 


In [None]:
!unzip animals.zip
!unzip caltech-101.zip
!unzip flowers17.zip

In [8]:
# 
# Last layer before the head has a 7x7x512 dimension
#
model = VGG16(weights="imagenet", include_top=True)
model.summary()

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg16/vgg16_weights_tf_dim_ordering_tf_kernels.h5
Model: "vgg16"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 224, 224, 3)]     0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     14758

In [10]:
def feature_extraction(dataset,output,buffer_size,bs):
		'''
			dataset: input folder with images dataset
			output: folder to store the feature extraction
			buffer_size: controls the size of our in-memory buffer
			bs: batch size
		'''

		# grab the list of images that we'll be describing then randomly
		# shuffle them to allow for easy training and testing splits via
		# array slicing during training time
		print("[INFO] loading images...")
		imagePaths = list(paths.list_images(dataset))
		random.shuffle(imagePaths)

		# extract the class labels from the image paths then encode the
		# labels
		labels = [p.split(os.path.sep)[-2] for p in imagePaths]
		le = LabelEncoder()
		labels = le.fit_transform(labels)

		# load the VGG16 network
		print("[INFO] loading network...")
		model = VGG16(weights="imagenet", include_top=False)

		# initialize the HDF5 dataset writer, then store the class label
		# names in the dataset
		dataset = HDF5DatasetWriter((len(imagePaths), 512 * 7 * 7),
																output, 
																dataKey="features", 
																bufSize=buffer_size)
		dataset.storeClassLabels(le.classes_)

		# initialize the progress bar
		widgets = ["Extracting Features: ", progressbar.Percentage(), " ", progressbar.Bar(), " ", progressbar.ETA()]
		pbar = progressbar.ProgressBar(maxval=len(imagePaths),widgets=widgets).start()

		# loop over the images in batches
		for i in np.arange(0, len(imagePaths), bs):
			# extract the batch of images and labels, then initialize the
			# list of actual images that will be passed through the network
			# for feature extraction
			batchPaths = imagePaths[i:i + bs]
			batchLabels = labels[i:i + bs]
			batchImages = []

			# loop over the images and labels in the current batch
			for (j, imagePath) in enumerate(batchPaths):
				# load the input image using the Keras helper utility
				# while ensuring the image is resized to 224x224 pixels
				image = load_img(imagePath, target_size=(224, 224))
				image = img_to_array(image)

				# preprocess the image by (1) expanding the dimensions and
				# (2) subtracting the mean RGB pixel intensity from the
				# ImageNet dataset
				image = np.expand_dims(image, axis=0)
				image = imagenet_utils.preprocess_input(image)

				# add the image to the batch
				batchImages.append(image)

			# pass the images through the network and use the outputs as
			# our actual features
			batchImages = np.vstack(batchImages)
			features = model.predict(batchImages, batch_size=bs)

			# reshape the features so that each image is represented by
			# a flattened feature vector of the `MaxPooling2D` outputs
			features = features.reshape((features.shape[0], 512 * 7 * 7))

			# add the features and labels to our HDF5 dataset
			dataset.add(features, batchLabels)
			pbar.update(i)

		# close the dataset
		dataset.close()
		pbar.finish()

In [67]:
# import the necessary packages
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import classification_report
import pickle
import h5py

def train_and_evaluate(features_set):
    db = h5py.File(features_set,mode='r')
    print("Database keys {0:}".format(list(db.keys())))

    # open the HDF5 database for reading then determine the index of
    # the training and testing split, provided that this data was
    # already shuffled *prior* to writing it to disk
    i = int(db["labels"].shape[0] * 0.75)

    # define the set of parameters that we want to tune then start a
    # grid search where we evaluate our model for each value of C
    print("[INFO] tuning hyperparameters...")
    params = {"C": [0.1, 1.0, 10.0]}
    model = GridSearchCV(LogisticRegression(solver="lbfgs",
                                            multi_class="auto"),
                        params, 
                        cv=3, 
                        n_jobs=-1)

    model.fit(db["features"][:i], db["labels"][:i])
    print("[INFO] best hyperparameters: {}".format(model.best_params_))

    # evaluate the model
    print("[INFO] evaluating...")
    preds = model.predict(db["features"][i:])

    print(classification_report(db["labels"][i:], 
                                preds,
                                target_names=[str(i,'utf-8') for i in db["label_names"]])
    )
    # serialize the model to disk
    print("[INFO] saving model...")
    f = open(features_set.split("/")[0] + ".cpickle", "wb")
    f.write(pickle.dumps(model.best_estimator_))
    f.close()

    # close the database
    db.close()

### 1.3.1 Animals dataset


The first dataset we are going to extract features from using VGG16 is our “Animals” dataset. This dataset consists of 3,000 images, of three classes: dogs, cats, and pandas. Notice how the .shape is (3000, 25088) – this result implies that each of the 3,000 images in our Animals dataset is quantified via feature vector with length 25,088 (i.e., the values inside **VGG16** after the final POOL operation).

In [60]:
# INPUTS
# path to input dataset
dataset = "animals"

# path to output HDF5 file
output  = "animals/hdf5/features.hdf5"

# size of feature extraction buffer
buffer_size = 1000

# store the batch size in a convenience variable
bs = 32

In [None]:
feature_extraction(dataset,output,buffer_size,bs)

In [61]:
db = h5py.File(output,mode='r')
list(db.keys())

['features', 'label_names', 'labels']

In [62]:
db["features"].shape

(3000, 25088)

In [63]:
db["labels"].shape

(3000,)

In [64]:
db["label_names"].shape

(3,)

In [65]:
[str(i,'utf-8') for i in db["label_names"]]

['cats', 'dogs', 'panda']

In [68]:
train_and_evaluate(output)

Database keys ['features', 'label_names', 'labels']
[INFO] tuning hyperparameters...
[INFO] best hyperparameters: {'C': 0.1}
[INFO] evaluating...
              precision    recall  f1-score   support

        cats       0.97      1.00      0.98       264
        dogs       0.99      0.96      0.98       250
       panda       1.00      1.00      1.00       236

    accuracy                           0.99       750
   macro avg       0.99      0.99      0.99       750
weighted avg       0.99      0.99      0.99       750

[INFO] saving model...


### 1.3.2 Caltech-101 dataset

Just as we extracted features from the Animals dataset, we can do the same with CALTECH-101.

In [69]:
# INPUTS
# path to input dataset
dataset = "caltech-101"

# path to output HDF5 file
output  = "caltech-101/hdf5/features.hdf5"

# size of feature extraction buffer
buffer_size = 1000

# store the batch size in a convenience variable
bs = 32

In [21]:
feature_extraction(dataset,output,buffer_size,bs)

[INFO] loading images...
[INFO] loading network...


Extracting Features: 100% |####################################| Time:  0:00:38


In [70]:
db = h5py.File(output,mode='r')
list(db.keys())

['features', 'label_names', 'labels']

In [71]:
db["features"].shape

(8677, 25088)

In [72]:
db["labels"].shape

(8677,)

In [73]:
db["label_names"].shape

(101,)

In [74]:
# 15min - 30min or more
train_and_evaluate(output)

Database keys ['features', 'label_names', 'labels']
[INFO] tuning hyperparameters...


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression


[INFO] best hyperparameters: {'C': 1.0}
[INFO] evaluating...
                 precision    recall  f1-score   support

          Faces       0.89      0.93      0.91        95
     Faces_easy       0.94      0.96      0.95       108
       Leopards       0.94      1.00      0.97        44
     Motorbikes       1.00      1.00      1.00       210
      accordion       1.00      1.00      1.00        16
      airplanes       1.00      1.00      1.00       200
         anchor       0.75      0.80      0.77        15
            ant       0.86      0.80      0.83        15
         barrel       1.00      0.91      0.95        11
           bass       0.92      0.86      0.89        14
         beaver       1.00      0.87      0.93        15
      binocular       1.00      1.00      1.00         7
         bonsai       1.00      0.97      0.98        33
          brain       0.88      0.94      0.91        16
   brontosaurus       0.88      0.82      0.85        17
         buddha       0.90

### 1.3.3 Flowers-17 dataset

Just as we extracted features from the Animals dataset, we can do the same with CALTECH-101.

In [76]:
# INPUTS
# path to input dataset
dataset = "flowers17"

# path to output HDF5 file
output  = "flowers17/hdf5/features.hdf5"

# size of feature extraction buffer
buffer_size = 1000

# store the batch size in a convenience variable
bs = 32

In [77]:
feature_extraction(dataset,output,buffer_size,bs)

[INFO] loading images...
[INFO] loading network...


Extracting Features: 100% |####################################| Time:  0:00:11


In [78]:
db = h5py.File(output,mode='r')
list(db.keys())

['features', 'label_names', 'labels']

In [79]:
db["features"].shape

(1360, 25088)

In [80]:
db["labels"].shape

(1360,)

In [81]:
db["label_names"].shape

(17,)

In [82]:
# 2min
train_and_evaluate(output)

Database keys ['features', 'label_names', 'labels']
[INFO] tuning hyperparameters...
[INFO] best hyperparameters: {'C': 0.1}
[INFO] evaluating...
              precision    recall  f1-score   support

    bluebell       0.95      1.00      0.97        19
   buttercup       1.00      0.87      0.93        23
   coltsfoot       1.00      0.96      0.98        23
     cowslip       0.67      0.80      0.73        25
      crocus       1.00      0.90      0.95        20
    daffodil       0.78      0.95      0.86        19
       daisy       0.94      1.00      0.97        15
   dandelion       1.00      0.89      0.94        19
  fritillary       0.91      1.00      0.95        20
        iris       1.00      0.86      0.92        21
  lilyvalley       0.86      0.95      0.90        20
       pansy       0.88      1.00      0.93        14
    snowdrop       0.90      0.95      0.93        20
   sunflower       1.00      1.00      1.00        18
   tigerlily       1.00      0.94      0.97

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression


# 2.0 Fine-tuning networks

In the previous section we learned how to treat a pre-trained **Convolutional Neural Network** as **feature extractor**. 

> Using this feature extractor, we forward propagated our dataset of images through the network, extracted the activations at a given layer, and saved the values to disk. A standard machine
learning classifier (in this case, Logistic Regression) was then trained on top of the CNN features.

This CNN feature extractor approach, called **transfer learning**, obtained remarkable accuracy, far higher than any of our previous experiments on the Animals, CALTECH-101, or Flowers-17 dataset.

But there is another type of transfer learning, one that can actually outperform the feature extraction method if you have sufficient data. This method is called **fine-tuning** and **requires us to perform “network surgery”**. 

1. First, we take a **scalpel and cut off the final set of fully-connected layers** (i.e., the “head” of the network) from a pre-trained /Convolutional Neural Network, such as
VGG, ResNet, or Inception. 
2. We then **replace the head** with a new set of fully-connected layers with random initializations. From there all layers below the head are frozen so their weights cannot be
updated (i.e., the backward pass in backpropagation does not reach them)
3.  We then train the network **using a very small learning rate** so the new set of FC layers can start to learn patterns from the previously learned CONV layers earlier in the network. 
4. Optionally, we may unfreeze the rest of the network and continue training. Applying fine-tuning allows us to apply pre-trained networks to recognize classes that they were not originally trained on; furthermore, **this method can lead to higher accuracy than feature extraction**.

## 2.1 Transfer Learning and Fine-tuning

**Fine-tuning is a type of transfer learning**. We apply fine-tuning to deep learning models that have already been trained on a given dataset. Typically, these networks are state-of-the-art architectures
such as VGG, ResNet, and Inception that have been trained on the ImageNet dataset.

As we found out in previous section on feature extraction, these networks contain rich, discriminative filters that can be used on datasets and class labels outside the ones they have already been trained on. However, instead of simply applying feature extraction, we are going to perform network surgery and modify the actual architecture so we can re-train parts of the network.


If this sounds like something out of a bad horror movie; don’t worry, there won’t be any blood and gore – but we will have some fun and learn a lot with our experiments. To understand how finetuning
works, consider Figure below (left) where we have the layers of the VGG16 network. As we know, the final set of layers (i.e., the “head”) are our fully-connected layers along with our softmax classifier. When performing fine-tuning, we actually remove the head from the network, just as in feature extraction (middle). However, unlike feature extraction, when we perform fine-tuning we actually **build a new fully-connected head and place it on top of the original architecture
(right)**.


<center><img width="600" src="https://drive.google.com/uc?export=view&id=1qTj4KeosAyDUcffqTQ_BepiINEUXs-cE"></center><center><b>Left</b>:  The original VGG16 network architecture. <b>Middle</b>: Removing the FC layers from VGG16 and treating the final POOL layer as a feature extractor. <b>Right</b>: Removing the original FC layers and replacing them with a brand new FC head. These new FC layers can then be fine-tuned to the specific dataset (the old FC layers are no longer used).</center>


In most cases your new FC head will have fewer parameters than the original one; however, that really depends on your particular dataset. The new FC head is randomly initialized (just like any other layer in a new network) and connected to the body of the original network, and we are ready to train.

However, there is a problem – our CONV layers have already learned rich, discriminating filters while our FC layers are brand new and totally random. If we allow the gradient to backpropagate from these random values all the way through the body of our network, we risk destroying these powerful features. To circumvent this, we instead let our FC head “warm up” by (ironically) “freezing” all layers in the body of the network (I told you the cadaver analogy works well here) as
in Figure below (left).



<center><img width="600" src="https://drive.google.com/uc?export=view&id=11Zh6mGG3qMISsnCg6JLgL-sH7TnxpUSC"></center><center><b>Left</b>: When we start the fine-tuning process we freeze all CONV layers in the network and only allow the gradient to backpropagate through the FC layers. Doing this allows our network to “warm up”. <b>Right</b>: After the FC layers have had a chance to warm up we may choose to unfreeze all layers in the network and allow each of them to be fine-tuned as well.</center>


Training data is forward propagated through the network as we normally would; however, the backpropagation is stopped after the FC layers, which allows these layers to start to learn patterns from the highly discriminative CONV layers. In some cases, we may never unfreeze the body of the network as our new FC head may obtain sufficient accuracy. 

However, for some datasets it is often advantageous to allow the original CONV layers to be modified during the fine-tuning process as
well (Figure above, right).

After the FC head has started to learn patterns in our dataset, pause training, unfreeze the body, and then continue the training, but with a very **small learning rate** – we do not want to deviate our
CONV filters dramatically. 

Training is then allowed to continue until sufficient accuracy is obtained. Fine-tuning is a super powerful method to obtain image classifiers from pre-trained CNNs on custom datasets, even more powerful than feature extraction in most cases. **The downside is that
fine-tuning can require a bit more work and your choice in FC head parameters does play a big part
in network accuracy** – you can’t rely strictly on regularization techniques here as your network has already been pre-trained and you can’t deviate from the regularization already being performed by
the network.

Secondly, for small datasets, it can be challenging to get your network to start “learning” from a “cold” FC start, which is why we freeze the body of the network first. Even still, getting past the warm-up stage can be a bit of a challenge and might require you to use optimizers other than SGD. **While fine-tuning does require a bit more effort, if it is done correctly, you’ll nearly always enjoy higher accuracy**.

## 2.2 Indexes and Layers

Prior to performing **network surgery**, we need to know the **layer name and index** of every layer in a given deep learning model. We need this information as we’ll be required to **“freeze”** and **“unfreeze”** certain layers in a pre-trained CNN.

Without knowing the layer names and indexes ahead of time, we would be “cutting blindly”, an out-of-control surgeon with no game plan. **If we instead take a few minutes to examine the network architecture and implementation, we can better prepare for our surgery.**

In [1]:
# import the necessary packages
from tensorflow.keras.applications import VGG16

# whether or not to include top of CNN
include_top = 0

# load the VGG16 network
print("[INFO] loading network...")
model = VGG16(weights="imagenet", include_top= include_top > 0)
print("[INFO] showing layers...")

# loop over the layers in the network and display them to the
# console
for (i, layer) in enumerate(model.layers):
	print("[INFO] {}\t{}".format(i, layer.__class__.__name__))

[INFO] loading network...
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg16/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5
[INFO] showing layers...
[INFO] 0	InputLayer
[INFO] 1	Conv2D
[INFO] 2	Conv2D
[INFO] 3	MaxPooling2D
[INFO] 4	Conv2D
[INFO] 5	Conv2D
[INFO] 6	MaxPooling2D
[INFO] 7	Conv2D
[INFO] 8	Conv2D
[INFO] 9	Conv2D
[INFO] 10	MaxPooling2D
[INFO] 11	Conv2D
[INFO] 12	Conv2D
[INFO] 13	Conv2D
[INFO] 14	MaxPooling2D
[INFO] 15	Conv2D
[INFO] 16	Conv2D
[INFO] 17	Conv2D
[INFO] 18	MaxPooling2D


Before we can replace the head of a pre-trained CNN, we need something to replace it with – therefore, we need to define our own fully-connected head of the network.

In [10]:
# import the necessary packages
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dense

# a fully connect network
class FCHeadNet:
	@staticmethod
	def build(baseModel, classes, D):
		# initialize the head model that will be placed on top of
		# the base, then add a FC layer
		headModel = baseModel.output
		headModel = Flatten(name="flatten")(headModel)
		headModel = Dense(D, activation="relu")(headModel)
		headModel = Dropout(0.5)(headModel)

		# add a softmax layer
		headModel = Dense(classes, activation="softmax")(headModel)

		# return the model
		return headModel

Again, this fully-connected head is very simplistic compared to the original head from VGG16 which consists of two sets of 4,096 FC layers. However, for most fine-tuning problems you are not seeking to replicate the original head of the network, but rather simplify it so it is easier to fine-tune– the fewer parameters in the head, the more likely we’ll be to correctly tune the network to a new
classification task.

In [11]:
# import the necessary packages
from tensorflow.keras.preprocessing.image import img_to_array

class ImageToArrayPreprocessor:
	def __init__(self, dataFormat=None):
		# store the image data format
		self.dataFormat = dataFormat

	def preprocess(self, image):
		# apply the Keras utility function that correctly rearranges
		# the dimensions of the image
		return img_to_array(image, data_format=self.dataFormat)

In [12]:
# import the necessary packages
import imutils
import cv2

# useful class to help the resize of images
class AspectAwarePreprocessor:
	def __init__(self, width, height, inter=cv2.INTER_AREA):
		# store the target image width, height, and interpolation
		# method used when resizing
		self.width = width
		self.height = height
		self.inter = inter

	def preprocess(self, image):
		# grab the dimensions of the image and then initialize
		# the deltas to use when cropping
		(h, w) = image.shape[:2]
		dW = 0
		dH = 0

		# if the width is smaller than the height, then resize
		# along the width (i.e., the smaller dimension) and then
		# update the deltas to crop the height to the desired
		# dimension
		if w < h:
			image = imutils.resize(image, width=self.width,
				inter=self.inter)
			dH = int((image.shape[0] - self.height) / 2.0)

		# otherwise, the height is smaller than the width so
		# resize along the height and then update the deltas
		# crop along the width
		else:
			image = imutils.resize(image, height=self.height,
				inter=self.inter)
			dW = int((image.shape[1] - self.width) / 2.0)

		# now that our images have been resized, we need to
		# re-grab the width and height, followed by performing
		# the crop
		(h, w) = image.shape[:2]
		image = image[dH:h - dH, dW:w - dW]

		# finally, resize the image to the provided spatial
		# dimensions to ensure our output image is always a fixed
		# size
		return cv2.resize(image, (self.width, self.height),
			interpolation=self.inter)

In [13]:
# import the necessary packages
import numpy as np
import cv2
import os

# helper to load images
class SimpleDatasetLoader:
	def __init__(self, preprocessors=None):
		# store the image preprocessor
		self.preprocessors = preprocessors

		# if the preprocessors are None, initialize them as an
		# empty list
		if self.preprocessors is None:
			self.preprocessors = []

	def load(self, imagePaths, verbose=-1):
		# initialize the list of features and labels
		data = []
		labels = []

		# loop over the input images
		for (i, imagePath) in enumerate(imagePaths):
			# load the image and extract the class label assuming
			# that our path has the following format:
			# /path/to/dataset/{class}/{image}.jpg
			image = cv2.imread(imagePath)
			label = imagePath.split(os.path.sep)[-2]

			# check to see if our preprocessors are not None
			if self.preprocessors is not None:
				# loop over the preprocessors and apply each to
				# the image
				for p in self.preprocessors:
					image = p.preprocess(image)

			# treat our processed image as a "feature vector"
			# by updating the data list followed by the labels
			data.append(image)
			labels.append(label)

			# show an update every `verbose` images
			if verbose > 0 and i > 0 and (i + 1) % verbose == 0:
				print("[INFO] processed {}/{}".format(i + 1,
					len(imagePaths)))

		# return a tuple of the data and labels
		return (np.array(data), np.array(labels))

In some cases you’ll want to allow the entire body to be trainable; however, for deeper architectures with many parameters such as VGG, I suggest only unfreezing the top CONV layers and then continuing training. If classification accuracy continues to improve (without overfitting), you may want to consider unfreezing more layers in the body.

In [18]:
# import the necessary packages
from sklearn.preprocessing import LabelBinarizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.optimizers import RMSprop
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.applications import VGG16
from tensorflow.keras.layers import Input
from tensorflow.keras.models import Model
from imutils import paths
import numpy as np
import os

# "path to input dataset"
dataset = "flowers17"

# output model
model_out = "flowers17.model"


# construct the image generator for data augmentation
aug = ImageDataGenerator(rotation_range=30, width_shift_range=0.1,
                         height_shift_range=0.1, shear_range=0.2, zoom_range=0.2,
                         horizontal_flip=True, fill_mode="nearest")

# grab the list of images that we'll be describing, then extract
# the class label names from the image paths
print("[INFO] loading images...")
imagePaths = list(paths.list_images(dataset))
classNames = [pt.split(os.path.sep)[-2] for pt in imagePaths]
classNames = [str(x) for x in np.unique(classNames)]

# initialize the image preprocessors
aap = AspectAwarePreprocessor(224, 224)
iap = ImageToArrayPreprocessor()

# load the dataset from disk then scale the raw pixel intensities to
# the range [0, 1]
sdl = SimpleDatasetLoader(preprocessors=[aap, iap])
(data, labels) = sdl.load(imagePaths, verbose=500)
data = data.astype("float") / 255.0

# partition the data into training and testing splits using 75% of
# the data for training and the remaining 25% for testing
(train_x, test_x, train_y, test_y) = train_test_split(data, labels,
                                                    test_size=0.25, 
                                                    random_state=42)

# convert the labels from integers to vectors
train_y = LabelBinarizer().fit_transform(train_y)
test_y = LabelBinarizer().fit_transform(test_y)

# load the VGG16 network, ensuring the head FC layer sets are left
# off
baseModel = VGG16(weights="imagenet", include_top=False,
                  input_tensor=Input(shape=(224, 224, 3)))

# initialize the new head of the network, a set of FC layers
# followed by a softmax classifier
headModel = FCHeadNet.build(baseModel, len(classNames), 256)

# place the head FC model on top of the base model -- this will
# become the actual model we will train
model = Model(inputs=baseModel.input, outputs=headModel)

# loop over all layers in the base model and freeze them so they
# will *not* be updated during the training process
for layer in baseModel.layers:
	layer.trainable = False

# compile our model (this needs to be done after our setting our
# layers to being non-trainable
print("[INFO] compiling model...")


# RMSprop is frequently used in situations where we need to quickly obtain
# reasonable performance (as is the case when we are trying to “warm up” a set of FC layers).
opt = RMSprop(learning_rate=0.001)
model.compile(loss="categorical_crossentropy", optimizer=opt,
              metrics=["accuracy"])

# train the head of the network for a few epochs (all other
# layers are frozen) -- this will allow the new FC layers to
# start to become initialized with actual "learned" values
# versus pure random
print("[INFO] training head...")
model.fit(aug.flow(train_x, train_y, batch_size=32),
                    validation_data=(test_x, test_y), epochs=25,
                    steps_per_epoch=len(train_x) // 32, verbose=1)

# evaluate the network after initialization
print("[INFO] evaluating after initialization...")
predictions = model.predict(test_x, batch_size=32)
print(classification_report(test_y.argmax(axis=1),
                            predictions.argmax(axis=1), target_names=classNames))

# now that the head FC layers have been trained/initialized, lets
# unfreeze the final set of CONV layers and make them trainable
for layer in baseModel.layers[15:]:
	layer.trainable = True

# for the changes to the model to take affect we need to recompile
# the model, this time using SGD with a *very* small learning rate
print("[INFO] re-compiling model...")
opt = SGD(learning_rate=0.001)
model.compile(loss="categorical_crossentropy", optimizer=opt,
	metrics=["accuracy"])

# train the model again, this time fine-tuning *both* the final set
# of CONV layers along with our set of FC layers
print("[INFO] fine-tuning model...")
model.fit(aug.flow(train_x, train_y, batch_size=32),
          validation_data=(test_x, test_y), epochs=100,
          steps_per_epoch=len(train_x) // 32, verbose=1)

# evaluate the network on the fine-tuned model
print("[INFO] evaluating after fine-tuning...")
predictions = model.predict(test_x, batch_size=32)
print(classification_report(test_y.argmax(axis=1),
                            predictions.argmax(axis=1), target_names=classNames))

# save the model to disk
print("[INFO] serializing model...")
model.save(model_out)

[INFO] loading images...
[INFO] processed 500/1360
[INFO] processed 1000/1360
[INFO] compiling model...
[INFO] training head...
Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25
[INFO] evaluating after initialization...
              precision    recall  f1-score   support

    bluebell       0.94      0.79      0.86        19
   buttercup       0.90      0.95      0.92        19
   coltsfoot       0.79      0.94      0.86        16
     cowslip       0.77      0.85      0.81        20
      crocus       0.73      0.89      0.80        18
    daffodil       0.78      0.78      0.78        23
       daisy       1.00      0.95      0.97        20
   dandelion       0.94      0.80      0.86        20
  fritillary       1.00      0.86      0.92        2

Additional accuracy can be obtained by performing more aggressive data augmentation and continually unfreezing more and more CONV blocks in VGG16. While fine-tuning is certainly more work than feature extraction, it also enables us to tune and modify the weights in our CNN to a particular dataset – something that feature extraction does not allow. Thus, when given enough training data, consider applying fine-tuning as you’ll likely obtain higher classification accuracy
than simple feature extraction alone.