# LazyLearner

LazyLearner implements three common transfer learning techniques: Integrated Feature Extraction, Standalone Feature Extraction and Fine-tuning. All three approaches are following the same structure:

### 1. Create an instance of the searcher
Give as arguments
* a list of strings with the pretrained models you want to test,
* the numer of classes of the dataset, 
* the shape of the input the model will receive  
* optionally, a custom top model (if None, a simple 1 Dense, 1 Dropout network is used) and
* in ConvBaseSearchWI only, the number of layers that should be fine tuned.


### 2. Compile the models
Give as arguments
* optimizer,
* loss function and 
* list of metrics.


### 3. Fit the models
Give as arguments
* train set generator object 
* steps per epoch
* number of epochs
* optionally, validation data and validation steps per epoch
* in ConvBaseSearchSFE only, the batch size (in the other classes the batch size of the generator is used).


### 4. Evaluate models
Give as argument
* test set generator


In [1]:
# Prepare Dataset

import os
import pandas as pd
from sklearn.model_selection import train_test_split
from keras.preprocessing.image import ImageDataGenerator
from sklearn.utils import shuffle

#https://stackoverflow.com/questions/42654961/creating-pandas-dataframe-from-os
res = []
path = 'C:\\Users\\Michael\\Desktop\\Master\\Deep Learning\\Project\\011_Fotos\\'
#path = 'E:\\Dados\\FLH HOLIDAY RENTALS\\011_Fotos\\'
for root, dirs, files in os.walk(path, topdown=True):
    if len(files) > 0:
        res.extend(list(zip([root]*len(files), files)))

df = pd.DataFrame(res, columns=['Path', 'File_Name'])


df = df[df['File_Name'] != 'Thumbs.db']
#df['ClientId'] = df.Path.apply(lambda x: int(x.split("\\")[-1]))
#df = df[df['ClientId'] < 10000]

df['Full_Path'] = df["Path"] + '\\' + df["File_Name"]
df['Cat'] = df.File_Name.apply(lambda x: x.split(".")[0].split("_")[-1])

classes = ['1','3','4']
df = df[df.Cat.isin(classes)]
df_total = df
numOfSamplesCat = 400

df = pd.DataFrame(columns=df_total.columns)
# Get only n pics of each class
for cl in classes:
    df_class = shuffle(df_total[df_total['Cat'] == cl]).iloc[:numOfSamplesCat, :]
    df = df.append(df_class)

Using TensorFlow backend.


# Integrated Feature Extraction
The convolutional base of the pretrained model is plugged to a custom top model. All layers in the convolutional base are frozen, only the custom top model will be trained.

In [2]:
df_train, df_test = train_test_split(df, test_size=0.2)
batch_size = 64

train_datagen = ImageDataGenerator(rescale = 1./255,
                                   shear_range = 0.2,
                                   zoom_range = 0.2,
                                   horizontal_flip = True)

test_datagen = ImageDataGenerator(rescale = 1./255)

training_set = train_datagen.flow_from_dataframe(dataframe=df_train, directory = None, x_col='Full_Path', y_col='Cat',
                                                 target_size = (64, 64),
                                                 batch_size = batch_size,
                                                 class_mode = 'categorical')

test_set = train_datagen.flow_from_dataframe(dataframe=df_test, directory = None, x_col='Full_Path', y_col='Cat',
                                                 target_size = (64, 64),
                                                 batch_size=batch_size,
                                                 class_mode = 'categorical')

Found 960 validated image filenames belonging to 3 classes.
Found 240 validated image filenames belonging to 3 classes.


In [3]:
from lazylearner import ConvBaseSearchIFE
classifier = ConvBaseSearchIFE(['vgg16','vgg19'], len(classes), input_shape=(64,64,3), top_model=None)

In [4]:
classifier.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])

In [5]:
classifier.fit_generator(training_set, steps_per_epoch = df_train.shape[0] // batch_size + 1,
                         epochs=1, validation_data = test_set,  validation_steps = df_test.shape[0] // batch_size + 1)

Fitting  vgg16
Epoch 1/1
Score on val set:  0.5375000238418579 

Fitting  vgg19
Epoch 1/1
Score on val set:  0.6208333373069763 



In [6]:
classifier.evaluate_generator(test_set)

{'vgg16': [0.8301326632499695, 0.5833333134651184],
 'vgg19': [0.8300464153289795, 0.6458333134651184]}

# Standalone Feature Extraction
The feature map produced by the convolutional base of the pretrained model is extracted first and then they are used as the input for tha custom top model. 

_Faster than integrated feature extraction, but data augmentation techniques cannot be applied to the input data._

In [7]:
df_train, df_test = train_test_split(df, test_size=0.2)
batch_size = 50

datagen = ImageDataGenerator(rescale = 1./255)

training_set = datagen.flow_from_dataframe(dataframe=df_train, directory = None, x_col='Full_Path', y_col='Cat',
                                                 target_size = (64, 64),
                                                 batch_size = batch_size,
                                                 class_mode = 'categorical')

test_set = datagen.flow_from_dataframe(dataframe=df_test, directory = None, x_col='Full_Path', y_col='Cat',
                                                 target_size = (64, 64),
                                                 batch_size=batch_size,
                                                 class_mode = 'categorical')

Found 960 validated image filenames belonging to 3 classes.
Found 240 validated image filenames belonging to 3 classes.


In [8]:
from lazylearner import ConvBaseSearchSFE
classifier = ConvBaseSearchSFE(['vgg16', 'vgg19'], len(classes), input_shape=(64,64,3))

In [9]:
classifier.compile(optimizer = 'rmsprop', loss = 'categorical_crossentropy', metrics = ['accuracy'])

In [10]:
classifier.fit_generator(training_set, steps_per_epoch = df_train.shape[0] // batch_size + 1,
                         epochs = 1, validation_data = test_set,  validation_steps = df_test.shape[0] // batch_size + 1)

Extracting Features...
Successfully extraced features from  vgg16
Successfully extraced features from  vgg19

Fit top model with feature maps
Fitting  vgg16
Train on 960 samples, validate on 240 samples
Epoch 1/1
Score on val set:  0.5583333373069763
Fitting  vgg19
Train on 960 samples, validate on 240 samples
Epoch 1/1
Score on val set:  0.675000011920929


In [11]:
classifier.evaluate_generator(test_set)



{'vgg16': [0.7838989655176799, 0.6875],
 'vgg19': [0.7915910482406616, 0.675000011920929]}

# Fine-Tuning
The convolutional base of the pretrained model is plugged to a custom top model. The last n_trainable layers of the convolutional base are trained jointly with the custom top model.



In [12]:
df_train, df_test = train_test_split(df, test_size=0.2)
batch_size = 50

train_datagen = ImageDataGenerator(rescale = 1./255,
                                   shear_range = 0.2,
                                   zoom_range = 0.2,
                                   horizontal_flip = True)

test_datagen = ImageDataGenerator(rescale = 1./255)

training_set = train_datagen.flow_from_dataframe(dataframe=df_train, directory = None, x_col='Full_Path', y_col='Cat',
                                                 target_size = (64, 64),
                                                 batch_size = batch_size,
                                                 class_mode = 'categorical')

test_set = train_datagen.flow_from_dataframe(dataframe=df_test, directory = None, x_col='Full_Path', y_col='Cat',
                                                 target_size = (64, 64),
                                                 batch_size=batch_size,
                                                 class_mode = 'categorical')

Found 960 validated image filenames belonging to 3 classes.
Found 240 validated image filenames belonging to 3 classes.


In [13]:
from lazylearner import ConvBaseSearchFT
classifier = ConvBaseSearchFT(['vgg16', 'vgg19'], len(classes), input_shape=(64,64,3), n_trainable=5)

In [14]:
classifier.compile(optimizer = 'rmsprop', loss = 'categorical_crossentropy', metrics = ['accuracy'])

In [15]:
classifier.fit_generator(training_set, steps_per_epoch = df_train.shape[0] // batch_size + 1,
                         epochs = 1, validation_data = test_set,  validation_steps = df_test.shape[0] // batch_size + 1)

Initial training
Fitting  vgg16
Epoch 1/1
Score on val set:  0.637499988079071 

Fitting  vgg19
Epoch 1/1
Score on val set:  0.5666666626930237 

Fine tuning of last 5 layers
Fitting  vgg16
Epoch 1/1
Score on val set:  0.7208333611488342 

Fitting  vgg19
Epoch 1/1
Score on val set:  0.7166666388511658 



In [16]:
classifier.evaluate_generator(test_set)

{'vgg16': [0.7389445304870605, 0.6833333373069763],
 'vgg19': [0.6807215809822083, 0.6875]}