# Bottleneck Features Extraction

---

Some required preprocessing steps are needed before supplying an image to a pre-trained network in Keras. When TensorFlow is utilized as backend, Keras neural networks takes a 4D-array as input. This array has shape (`nb_images`, `rows`, `columns`, `channels`), where `nb_images` is the total number of images, and `rows`, `columns`, and `channels` correspond to the number of rows, columns, and channels for each image, respectively.

In [1]:
from os import listdir

directory = 'Flickr8k/Flicker8k_Dataset'
files = [directory + '/' + name for name in listdir(directory)]

print("Number of images: %d" % len(files))

Number of images: 8091


The VGG16 pre-trained deep learning model is employed here to extract for each photo a set of features. The `paths_to_tensor` funcion used in the cell below takes a numpy array of string-valued image paths as input and returns a 4D-array with shape (`nb_images`, 224, 224, 3).

In [2]:
import numpy
from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input
from keras.applications.vgg16 import VGG16
from keras.models import Model

model = VGG16()
model.layers.pop()
model = Model(inputs=model.inputs, outputs=model.layers[-1].output)
model.summary()

def path_to_tensor(img_path):
    img = image.load_img(img_path, target_size=(224, 224))
    x = image.img_to_array(img)
    return numpy.expand_dims(x, axis=0)

Using TensorFlow backend.


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
__________

The image features will be a 1-dimensional 4,096 element vector. In the cell below, we create a dictionary of image identifier to image features that in turn is saved on disk. 

In [3]:
from tqdm import tqdm_notebook
from pickle import dump

features = dict()

for file in tqdm_notebook(files):
    tensor = preprocess_input(path_to_tensor(file))
    feature = model.predict(tensor, verbose=0)
    img_id = file.split('.')[0]
    features[img_id] = feature
    
print("Number of features extracted: %d" % len(features))
dump(features, open('features.pkl', 'wb'))


Number of features extracted: 8091
