# 01. Feature extraction

In [3]:
import tensorflow as tf
import tensorflow_hub as hub
import PIL.Image as Image
import numpy as np
import os

I will be using the Inception v3 module from Tensorflow hub

Let's create the graph that implements the module. We create the placeholder with the expected shape from the documentation and make a node for the features. Then we get the initializers.

In [4]:
img_graph = tf.Graph()

with img_graph.as_default():
    # Download module
    module_url = 'https://tfhub.dev/google/imagenet/inception_v3/feature_vector/1'
    feature_extractor = hub.Module(module_url)

    # Create input placeholder
    input_imgs = tf.placeholder(dtype=tf.float32, shape=[None, 299, 299, 3])

    # A node with the features
    imgs_features = feature_extractor(input_imgs)

    # Collect initializers
    init_op = tf.group([
        tf.global_variables_initializer(), tf.tables_initializer()
    ])

img_graph.finalize();

INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


Now let's create the session and run our initializers. Then we can extract the features from the batch image. We should get a feature vector size of 2048.

Normaly there should be 280 images in the train set but somehow I got one more in the bike folder due to checkpoints from github, I tried removing them using a command in a gitignore document I created but it didn't fix it, so I figured it wasn't that important anyway.

So I create an Image generator that will rescale my images and I flow it through my directory. I use the wanted target size and set batch size to the size of the files.

In [5]:
datagen = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255)

In [6]:
train_generator = datagen.flow_from_directory("swissroads/train",target_size=(299,299),batch_size=281,class_mode="sparse")

Found 281 images belonging to 6 classes.


In [7]:
valid_generator = datagen.flow_from_directory("swissroads/valid",target_size=(299,299),batch_size=139,class_mode="sparse")

Found 139 images belonging to 6 classes.


In [8]:
test_generator = datagen.flow_from_directory("swissroads/test",target_size=(299,299),batch_size=50,class_mode="sparse")

Found 50 images belonging to 6 classes.


Now I can extract my features that I can feed to the graph to get the high-level ones.

In [9]:
X_train_raw = train_generator[0][0]
y_train = train_generator[0][1]
X_valid_raw = valid_generator[0][0]
y_valid = valid_generator[0][1]
X_test_raw = test_generator[0][0]
y_test = test_generator[0][1]

In [10]:
# Create a session
sess = tf.Session(graph=img_graph)

# Initialize it
sess.run(init_op)

# Extract features
X_train = sess.run(imgs_features, feed_dict={input_imgs: X_train_raw})
X_valid = sess.run(imgs_features, feed_dict={input_imgs: X_valid_raw})
X_test = sess.run(imgs_features, feed_dict={input_imgs: X_test_raw})

In [11]:
print("X train shape : {} and y train shape : {}".format(X_train.shape,y_train.shape))
print("X valid shape : {} and y valid shape : {}".format(X_valid.shape,y_valid.shape))
print("X test shape : {} and y test shape : {}".format(X_test.shape,y_test.shape))
print("Raw pixels from train shape: {} and y train shape : {}".format(X_train_raw.shape,y_train.shape))
print("Raw pixels from valid shape: {} and y test shape : {}".format(X_valid_raw.shape,y_valid.shape))
print("Raw pixels from test shape: {} and y test shape : {}".format(X_test_raw.shape,y_test.shape))


X train shape : (281, 2048) and y train shape : (281,)
X valid shape : (139, 2048) and y valid shape : (139,)
X test shape : (50, 2048) and y test shape : (50,)
Raw pixels from train shape: (281, 299, 299, 3) and y train shape : (281,)
Raw pixels from valid shape: (139, 299, 299, 3) and y test shape : (139,)
Raw pixels from test shape: (50, 299, 299, 3) and y test shape : (50,)


We see that we got the shapes we need with the right number of image and the expected highlevel output. We can now save these values in an npz file for later use. We keep the raw pixels as well to plot images.

In [12]:
np.savez("train_data.npz",data = X_train, labels = y_train)
np.savez("valid_data.npz",data = X_valid, labels = y_valid)
np.savez("test_data.npz",data = X_test, labels = y_test)
np.savez("raw_train.npz",data = X_train_raw, labels = y_train)
np.savez("raw_valid.npz",data = X_valid_raw, labels = y_valid)
np.savez("raw_test.npz",data = X_test_raw, labels = y_test)