# 3. Bottleneck Feature Extraction

One of the biggest advantages of transfer learning is that it let us harness the vast amounts of knowledge stored in the layers of the pre-trained models at our disposal. In a world where data is costly in time and money to acquire, this characteristic is quite appealing.

If we think about it, these pre-trained models can act as excelent feature extractors, if we decide to drop the top of them (which tends to be a fully-connected network used for classification on ImageNet). 

With these features in hand, we can append _any_ machine learning model, including (but not limited to) neural networks, as long as we perform the proper the proper transformations required by the receiving algorithm.

The features that result from this approach are known as __bottleneck features__.

## Feature Extractors

Although any pre-trained model can act as a feature extractor, in [this]() project we determined that the pre-trained model that works best for this problem is VGG16, so it is only logical to reuse it here.

In [1]:
import numpy as np
from glob import glob
import random

In [2]:
from tensorflow.keras.applications.vgg16 import VGG16

base = VGG16(include_top=False)

  from ._conv import register_converters as _register_converters


Instructions for updating:
Colocations handled automatically by placer.


We'll define three feature extractors:

  - **Global Average Pooling Extractor**: For this extractor we are applying a GlobalAveragePooling2D operation to the outputs of the pre-trained model (without the top). A global average pooling calculates, as it name indicates, the average of all the values in an activation map (also known as feature map, kernel or filter map). So, if we have a feature volume of 32 filters of 64x64, we'll get a vector of 32 elements.
  - **Global Max Pooling Extractor**: For this extractor we are applying a GlobalMaxPooling2D operation to the outputs of the pre-trained model (without the top). A global max pooling calculates, as it name indicates, the maximum of all the values in an activation map (also known as feature map, kernel or filter map). So, if we have a feature volume of 32 filters of 64x64, we'll get a vector of 32 elements.
  - **Flatten Extractor**: This is the simplest extractor, as it only reshapes the outputs of the model (without the top) into a very long vector. So, for instance, if we have a feature volume of 32 filters of 64x64, it will produce a vector of 64 * 32 * 32 = 65536 elements.

In [3]:
from tensorflow.keras.layers import GlobalAveragePooling2D, GlobalMaxPool2D, Flatten
from tensorflow.keras.models import Model

x = base.output
out = GlobalAveragePooling2D()(x)
        
global_average_feature_extractor = Model(inputs=base.input, outputs=out)

In [4]:
x = base.output
out = GlobalMaxPool2D()(x)
        
global_max_feature_extractor = Model(inputs=base.input, outputs=out)

In [5]:
x = base.output
out = Flatten()(x)
        
flatten_feature_extractor = Model(inputs=base.input, outputs=out)

## Dataset Transforms

With these extractors in place, we can use them to generate a transform of the original dataset. Our goal with this project is to use deep learning to generate good features for more classical machine learning algorithms, such as decision trees, or SVMs.

The original images will be passed through each feature extractor, and the features and labels will be saved in a `.npy` file, which is a format that NumPy likes.

### Helpers

These two functions will let us load an image, convert it into a tensor and pre-process it the way VGG16 expects them to be pre-processed.

In [6]:
from tensorflow.keras.applications.vgg16 import preprocess_input
from tensorflow.keras.preprocessing import image

def path_to_tensor(image_path):
    img = image.load_img(image_path, target_size=(224, 224))
    x = image.img_to_array(img)
    
    return np.expand_dims(x, axis=0)

def paths_to_tensor(image_paths):
    tensors = [path_to_tensor(image_path) for image_path in image_paths]
    return np.vstack(tensors)

Before transforming all the data, let's make sure everything is working with just a couple of images.

In [7]:
SAMPLE_SIZE = 3

image_paths = glob('dataset/train/*/*')
image_paths = random.sample(image_paths, SAMPLE_SIZE)

# Calculate the image input
image_input = preprocess_input(paths_to_tensor(image_paths))

print(image_input.shape)

(3, 224, 224, 3)


That shape above correspond to the batch we'll pass to each feature extractor.

In [8]:
global_max_bottlenecked_batch = global_max_feature_extractor.predict(image_input)
print(f'Bottlenecked batch shape (Global Max): {global_max_bottlenecked_batch.shape}')

global_average_bottlenecked_batch = global_average_feature_extractor.predict(image_input)
print(f'Bottlenecked batch shape (Global Average): {global_average_bottlenecked_batch.shape}')

flatten_bottlenecked_batch = flatten_feature_extractor.predict(image_input)
print(f'Bottlenecked batch shape (Flatten): {flatten_bottlenecked_batch.shape}')

Bottlenecked batch shape (Global Max): (3, 512)
Bottlenecked batch shape (Global Average): (3, 512)
Bottlenecked batch shape (Flatten): (3, 25088)


We can notice that the Flatten Extractor, as expected, generates the longest representation of the images, as it does not summarize anything, unlike the other two extractors.

### Generating the Three Versions of the Data

We are now good to go! Let's generate each transform.

In [9]:
from pathlib import Path

if not Path('features.npy').is_file():
    vehicles_images_path = glob('data/vehicles/*/*.png')
    non_vehicles_images_path = glob('data/non-vehicles/*/*.png')

    vehicles_images = preprocess_input(paths_to_tensor(vehicles_images_path))
    non_vehicles_images = preprocess_input(paths_to_tensor(non_vehicles_images_path))

    features = np.vstack([vehicles_images, non_vehicles_images])

    np.save('features.npy', features)
else:
    features = np.load('features.npy')
    
print(f'Features shape: {features.shape}')

Features shape: (7325, 224, 224, 3)


In [10]:
if not Path('labels.npy').is_file():
    vehicles_labels = np.array([1] * len(vehicles_images_path))
    non_vehicles_labels = np.array([0] * len(non_vehicles_images_path))
    labels = np.hstack([vehicles_labels, non_vehicles_labels])
    
    np.save('labels.npy', labels)
else:
    labels = np.load('labels.npy')

    print(f'Labels shape: {labels.shape}')

Labels shape: (7325,)


In [11]:
if not Path('global_average_features.npy').is_file():
    global_average_bottlenecked_features = global_average_feature_extractor.predict(features)
    
    np.save('global_average_features.npy', global_average_bottlenecked_features)
else:
    global_average_bottlenecked_features = np.load('global_average_features.npy')

print(f'global_average_features shape: {global_average_bottlenecked_features.shape}')

global_average_features shape: (7325, 512)


In [12]:
if not Path('global_max_features.npy').is_file():
    global_max_bottlenecked_features = global_max_feature_extractor.predict(features)
    np.save('global_max_features.npy', global_max_bottlenecked_features)
else:
    global_max_bottlenecked_features = np.load('global_max_features.npy')

print(f'global_max_bottlenecked_features shape: {global_max_bottlenecked_features.shape}')

global_max_bottlenecked_features shape: (7325, 512)


In [13]:
if not Path('flattened_features.npy').is_file():
    flattened_bottlenecked_features = flatten_feature_extractor.predict(features)
    np.save('flattened_features.npy', flattened_bottlenecked_features)
else:
    flattened_bottlenecked_features = np.load('flattened_features.npy')

print(f'flattened_bottlenecked_features shape: {flattened_bottlenecked_features.shape}')

flattened_bottlenecked_features shape: (7325, 25088)


With these transforms we can now proceed to evaluate different models using each of them.