# Feature extraction

>*In this first part of the project, start by extracting a set of high-level features for each image in the data set. To achieve this, you can use ex. the Inception v3 or MobileNet v2 ConvNets which respectively extract 2048 and 1280 high-level features.*

>*Suggestion: consider storing the extracted high-level features, e.g. in npz files, for quickly reloading them into each of the following notebooks.*

In [1]:
# Import the packages needed

import tensorflow as tf
import tensorflow_hub as hub
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import os
import numpy as np

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


In order to load the images I am going to use the ImageDataGenerator. As the TensorFlow Hub image modules work with float32 images normalized between zero and one I am going to rescale the data and set the data type to float32.

In [2]:
# Create image generator

generator = ImageDataGenerator(rescale=1/255, dtype=np.float32)

For loading the images I am going to use the flow_from_directory from the ImageDataGenerator. I am going to set the image size to 224x224 as this is the expected input size when using MobileNetV2. I am going to suffle the train set but not the validation and test sets.

In [3]:
# Train, validation and test sets

trainset = generator.flow_from_directory(
    os.path.join('swissroads', 'train'), batch_size=32, target_size=(224, 224), shuffle=True)
validset = generator.flow_from_directory(
    os.path.join('swissroads', 'valid'), batch_size=32, target_size=(224, 224), shuffle=False)
testset = generator.flow_from_directory(
    os.path.join('swissroads', 'test'), batch_size=32, target_size=(224, 224), shuffle=False)

Found 281 images belonging to 6 classes.
Found 139 images belonging to 6 classes.
Found 50 images belonging to 6 classes.


Now I am going to download the module MobileNetV2 and create a graph with it.

In [4]:
# Create graph
img_graph = tf.Graph()

with img_graph.as_default():
    # Download module
    module_url = 'https://tfhub.dev/google/imagenet/mobilenet_v2_100_224/feature_vector/2'

    feature_extractor = hub.Module(module_url)

    # Create input placeholder
    input_imgs = tf.placeholder(dtype=tf.float32, shape=[None, 224, 224, 3])

    # A node with the features
    imgs_features = feature_extractor(input_imgs)

    # Collect initializers
    init_op = tf.group([
        tf.global_variables_initializer(), tf.tables_initializer()
    ])

img_graph.finalize() 

INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


Let's extract the features with the graph created before and save them in three different npz files (one for the training, one for validation and one for testing). I am also going to save the images in these files as they will be needed for some tasks. 

In [5]:
sets = [trainset, validset, testset]
file = ['train', 'valid', 'test']

with tf.Session(graph=img_graph) as sess:
    
    # Initialize variables
    sess.run(init_op)
    
    # Going through all the sets (train, validation and test)
    for i in np.arange(3):    
        batches=0   
    # Going through all the batches in each set 
        while batches <= sets[i].batch_index:
            imgs, labels = sets[i].next()
            # Extract features
            features = sess.run(imgs_features, feed_dict={input_imgs: imgs})
            # Save the features
            if batches==0:
                save_images=imgs
                save_features=features
                save_labels=labels
            else:
                save_images=np.append(save_images,imgs, axis=0)
                save_features=np.append(save_features,features, axis=0)
                save_labels=np.append(save_labels,labels, axis=0)
            batches=batches+1
        # Save a npz file for each set with images, features, labels and names
        np.savez(file[i]+'.npz', images=save_images ,features=save_features, labels=np.argmax(save_labels, axis=1), names=list(sets[i].class_indices.keys()))

Let's check if the images, features, labels and names of the classes were correctly saved in the npz files.

In [6]:
for i in np.arange(3):
    # Load the npz file
    with np.load(file[i]+'.npz', allow_pickle=False) as npz_file:
        # Print the shape of the arrays
        print(file[i]+' images:', npz_file['images'].shape)
        print(file[i]+' features:', npz_file['features'].shape) 
        print(file[i]+' labels:', npz_file['labels'].shape)
        print(file[i]+' names:', npz_file['names'].shape)

train images: (281, 224, 224, 3)
train features: (281, 1280)
train labels: (281,)
train names: (6,)
valid images: (139, 224, 224, 3)
valid features: (139, 1280)
valid labels: (139,)
valid names: (6,)
test images: (50, 224, 224, 3)
test features: (50, 1280)
test labels: (50,)
test names: (6,)
