### __1) Feature extraction__ 
In this first part of the project, start by extracting a set of high-level features for each
image in the data set. To achieve this, you can use ex. the Inception v3 or MobileNet v2
ConvNets which respectively extract 2048 and 1280 high-level features.
This high-level features should then be used for all of the tasks in this project, except for
when it is stated otherwise. In other words, the PCA exploration and all models (except
for the Convolutional Neural Network) should use these high-level features. And in the
case where we ask you to visualize the images, we of course mean to visualize the raw
images with their pixel values.
Suggestion: consider storing the extracted high-level features, e.g. in npz files, for
quickly reloading them into each of the following notebooks.

__Note:__ All your models should be trained on the training set, and the 􀁿ne tuning of your
hyperparameters should be validated on the validation set. The 􀁿nal test set should only
be used for the final comparison to test the accuracies of your models on a new dataset.

However, in the case where you use a cross-validation approach, you can of course
merge the train and validation set into one bigger dataset and use this for model fitting.
    


#### Get libraries

In [1]:
import itertools
import os

import matplotlib.pylab as plt
import numpy            as np
import tensorflow       as tf
import tensorflow_hub   as hub

try:
  # %tensorflow_version only exists in Colab.
  %tensorflow_version 2.x
except Exception:
    pass

# Import Keras API from tensorflow
from tensorflow.keras        import layers, models
from tensorflow.keras        import Sequential
from tensorflow.keras.layers import Dense

# Test the version of tf
print("TF version:", tf.__version__)
print("Hub version:", hub.__version__)
print("GPU is", "available" if tf.test.is_gpu_available() else "NOT AVAILABLE")


TF version: 2.0.0
Hub version: 0.9.0
GPU is NOT AVAILABLE


#### Configuring inception_v3 module to use
Hint from https://tfhub.dev/google/imagenet/inception_v3/feature_vector/4: 
Inception V3 is a neural network architecture for image classification.
This TF-Hub module uses the TF-Slim implementation of inception_v3. 
The module contains a trained instance of the network, packaged to get feature vectors from images. 
If you want the full model including the classification it was originally trained for, use module google/imagenet/inception_v3/classification/1 instead.



In [2]:
handle_base, pixels = ("inception_v3" # module selection
                      , 299) # number of pixels )
MODULE_HANDLE       = "https://tfhub.dev/google/imagenet/{}/feature_vector/4".format(handle_base)
IMAGE_SIZE          = (299, 299)

print("Using {} with input size {}".format(MODULE_HANDLE, (299,299)))

data_dir = 'C:/Users/tgdreju4/OneDrive - Swisscom/EPFL/Notebooks/04ML/swissroads/'

Using https://tfhub.dev/google/imagenet/inception_v3/feature_vector/4 with input size (299, 299)


Training Parameters

In [3]:
do_fine_tuning       = False #@param {type:"boolean"}
do_data_augmentation = True #@param {type:"boolean"}
BATCH_SIZE           = 10 #@param {type:"integer"}
epochs               = 5
#modelname            = handle_base+'_FT_'+str(do_fine_tuning)+'_DA_'+str(do_data_augmentation)+'_BS_'+str(BATCH_SIZE)+'_Ep_'+str(epochs)

#### Model definition and extract high level features

This model will extract 2048 features.

In [4]:
import      os
proxy       = 'https://clientproxy.corproot.net:8079'

Define and build the keras model

In [5]:
model = tf.keras.Sequential([hub.KerasLayer(MODULE_HANDLE, trainable=do_fine_tuning),])

model.build((None,)+IMAGE_SIZE+(3,))
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
keras_layer (KerasLayer)     multiple                  21802784  
Total params: 21,802,784
Trainable params: 0
Non-trainable params: 21,802,784
_________________________________________________________________


#### Train ImageDataGenerator

Get input from here: https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html plus discussion with my Swisscom colleague Mario Useche

 

Data pre-processing and data augmentation
In order to make the most of our few training examples, we will "augment" them via a number of random transformations, so that our model would never see twice the exact same picture. This helps prevent overfitting and helps the model generalize better.
In Keras this can be done via the keras.preprocessing.image.ImageDataGenerator class. This class allows you to:
configure random transformations and normalization operations to be done on your image data during training
instantiate generators of augmented image batches (and their labels) via .flow(data, labels) or .flow_from_directory(directory). These generators can then be used with the Keras model methods that accept data generators as inputs, fit_generator, evaluate_generator and predict_generator.

In [6]:
train_dataflow_kwargs = dict(
                        directory=data_dir+'train',
                        shuffle=False, 
                        target_size=IMAGE_SIZE,
                        batch_size=BATCH_SIZE,
                        interpolation='bilinear')
test_dataflow_kwargs = dict(
                        directory=data_dir+'test',
                        shuffle=False, 
                        target_size=IMAGE_SIZE,
                        batch_size=BATCH_SIZE,
                        interpolation='bilinear')
valid_dataflow_kwargs = dict(
                        directory=data_dir+'valid',
                        shuffle=False, 
                        target_size=IMAGE_SIZE,
                        batch_size=BATCH_SIZE,
                        interpolation='bilinear')



# Data Augmentation will be used in future tasks
if do_data_augmentation:
    train_datagen_kwargs  = dict(
                       rescale=1./255,
                       rotation_range=40,
                       horizontal_flip=True,
                       width_shift_range=0.2, 
                       height_shift_range=0.2,
                       shear_range=0.2,
                       zoom_range=0.2)
else :                 train_datagen_kwargs = dict(rescale=1./255)  
    
valid_datagen_kwargs =  dict(rescale=1./255)
test_datagen_kwargs  =  dict(rescale=1./255)

  



Train ImageDataGenerator

In [7]:
train_datagen   = tf.keras.preprocessing.image.ImageDataGenerator(**train_datagen_kwargs)
train_generator = train_datagen.flow_from_directory(**train_dataflow_kwargs)

valid_datagen   = tf.keras.preprocessing.image.ImageDataGenerator(**valid_datagen_kwargs)
valid_generator = valid_datagen.flow_from_directory(**valid_dataflow_kwargs)

test_datagen   = tf.keras.preprocessing.image.ImageDataGenerator(**test_datagen_kwargs)
test_generator = test_datagen.flow_from_directory(**test_dataflow_kwargs)

Found 280 images belonging to 6 classes.
Found 139 images belonging to 6 classes.
Found 50 images belonging to 6 classes.


#### Generating Features for the Train, Test and Validation Datasets
  
For each Image generator, extract the features and numpy arrays and save them in a npz file with it's corresponding category(label) and filename

In [8]:
file=data_dir+'train.npz'
np.savez(file,features=model.predict(train_generator), labels=train_generator.labels ,files=train_generator.filenames)
with np.load(file, allow_pickle=False) as npz_file:
    for arr in list(npz_file.keys()):
        print('train.npz  '+arr +' with shape '+str(npz_file[arr].shape))
        
file=data_dir+'test.npz'
np.savez(file,features=model.predict(test_generator), labels=test_generator.labels ,files=test_generator.filenames)
with np.load(file, allow_pickle=False) as npz_file:
    for arr in list(npz_file.keys()):
        print('test.npz  '+arr +' with shape '+str(npz_file[arr].shape))
              
file=data_dir+'valid.npz'
np.savez(file,features=model.predict(valid_generator), labels=valid_generator.labels ,files=valid_generator.filenames)
with np.load(file, allow_pickle=False) as npz_file:
    for arr in list(npz_file.keys()):
        print('valid.npz  '+arr +' with shape '+str(npz_file[arr].shape)) 

train.npz  features with shape (280, 2048)
train.npz  labels with shape (280,)
train.npz  files with shape (280,)
test.npz  features with shape (50, 2048)
test.npz  labels with shape (50,)
test.npz  files with shape (50,)
valid.npz  features with shape (139, 2048)
valid.npz  labels with shape (139,)
valid.npz  files with shape (139,)
