**Featurize Data**

*Summary of this notebook:*  
Obtain a low-dimensional feature vector for each image in an input dataset using a ImageNet based pretrained model (MobileNet, here). Load the dataset in a generator object, preprocess based on the model, run predict on every image to obtain a feature vector. Save the feature vector and the filenames in a separate pickle file.

*Definition of Done:*

In [None]:
import os
import tensorflow
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import math
import pickle

In [None]:
from google.colab import drive 
drive.mount('/content/gdrive')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


In [None]:
os.chdir("/content/gdrive/Shared drives/2020_FDLUSA_Earth Science_Knowledge Discovery Framework/Code")

In [None]:
tensorflow.test.gpu_device_name()

'/device:GPU:0'

In [None]:
dataset = "UCMerced_LandUse"
dataPath = ("Datasets/"+ dataset+ "/Images")
modelName = "MobileNet"


Import Model


In [None]:
# Import pretrained model

from tensorflow.keras.applications import MobileNet
from tensorflow.keras.applications.mobilenet import preprocess_input





In [None]:
model = MobileNet(
    input_shape = (224, 224, 3),
    include_top = False,
    weights = 'imagenet',
    pooling = "max"
)

Get Data & Preprocess

In [None]:
dataGenerator = ImageDataGenerator(
    preprocessing_function = preprocess_input
)

In [None]:
batch_size = 32
trainGenerator = dataGenerator.flow_from_directory(
        dataPath,
        target_size=(224, 224),
        batch_size= batch_size,
        class_mode= None, 
        shuffle = False)

Found 2100 images belonging to 21 classes.


Generate Feature Vector from User-defined dataset

In [None]:
nImages = len(trainGenerator.filenames)
nLoops = int(math.ceil(nImages / batch_size))

In [None]:
bottleneckFeaturesTrain = model.predict(trainGenerator, nLoops, verbose = 1)



In [None]:
print(bottleneckFeaturesTrain.shape)

(2100, 1024)


Dump features and filenames into GDrive folder


In [None]:
pickle.dump(bottleneckFeaturesTrain, file = open(("Features/" + modelName + "_" + dataset + "_features.pkl"), mode = 'wb'))
pickle.dump(trainGenerator.filenames, file = open(("Features/" + modelName + "_" + dataset + "_filenames.pkl"), mode = 'wb'))