Created on Fri Jan 10 14:42:01 2020
<br>
Group 7
<br>
@author: P.S.B.
<h1>Group 7 - Images sociales<span class="tocSkip"></span>

<br>    
<center>Exteriors - aircraft types model<center>

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Introduction" data-toc-modified-id="Introduction-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Introduction</a></span></li><li><span><a href="#Environment" data-toc-modified-id="Environment-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Environment</a></span><ul class="toc-item"><li><span><a href="#Libraries" data-toc-modified-id="Libraries-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Libraries</a></span></li><li><span><a href="#Parameters" data-toc-modified-id="Parameters-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Parameters</a></span></li><li><span><a href="#Functions" data-toc-modified-id="Functions-2.3"><span class="toc-item-num">2.3&nbsp;&nbsp;</span>Functions</a></span></li></ul></li><li><span><a href="#Train-test-split-and-read-images" data-toc-modified-id="Train-test-split-and-read-images-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Train-test split and read images</a></span></li><li><span><a href="#Build,-save,-and-train-model" data-toc-modified-id="Build,-save,-and-train-model-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Build, save, and train model</a></span></li></ul></div>

# Introduction
This notebook trains a model to predict aircraft types on images representing an aircraft exterior. The images used for training come from Airliners scraping ("clean" images of different types of aircraft exteriors).

**Pre-processing**<span class="tocSkip"></span><br>
Images are split into train and test sets. If the data augmentation option is set to `True`, the train set will be enriched with new images (obtained by cropping / (de)zooming / rotating / flipping existing images).

**Model**<span class="tocSkip"></span><br>
We get weights from VGG16 pre-trained model, and add some layers (Conv2D, ReLU, MaxPooling2D, Flatten, and Dense) to predict the target classes (Airbus and Boeing types).

**Out**<span class="tocSkip"></span><br>
After training, a folder is created in `Models` repository, containing the model in `h5` format, along with the corresponding labels stored in a `pickle` dictionary.

# Environment
To ensure a proper functioning of this code file, `python 3.6` or later version is required.
## Libraries

In [8]:
import warnings
warnings.filterwarnings('ignore')
from keras.applications.vgg16 import VGG16
from keras.callbacks import ReduceLROnPlateau
from keras.applications.resnet50 import preprocess_input
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from keras.utils import np_utils
from keras.optimizers import SGD, Adam
from keras.layers.core import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D, InputLayer, ReLU, AveragePooling2D
from keras.models import Sequential, Model
from keras import backend as K
import keras
import pandas as pd
import random
import os
from PIL import Image
from shutil import copyfile

In [10]:
%load_ext watermark
%watermark -p keras,tensorflow,PIL

keras 2.3.1
tensorflow 1.13.1
PIL 6.2.0


## Parameters

In [7]:
project_path = './../'
Airliners_path = project_path + 'Scraping/Airliners/data'
new_paths = [project_path + 'Split_Data/Airliners/Train',
             project_path + 'Split_Data/Airliners/Test']

model_name = 'Ext_F_2'

# Classes
airbus_planes = ['A320', 'A321', 'A330', 'A350']
boeing_planes = ['737', '747', '757', '777']
nb_types = len(airbus_planes) + len(boeing_planes)

# Images parameters
size = (224, 224)
greys = False
apply_data_augmentation = False

## Functions

In [21]:
%run g7_functions_for_models.py
# %run g7_data_augmentation.ipynb

# Train-test split and read images

In [22]:
sep_train_test_airliners(Airliners_path, new_paths,
                         airbus_types=airbus_planes, boeing_types=boeing_planes)

In [None]:
# Option: use data augmentation to enrich your train set
if apply_data_augmentation:
    %run g7_data_augmentation.py
    data_augmentation(train_path=new_paths[0], shape=size, save_format='jpeg', nb_win=2, coef_gen=2,
                      greys=False, rotation_range=20, shear_range=.2, zoom_range=.15, horizontal_flip=True)

In [25]:
# Image data generator
datagen = ImageDataGenerator(preprocessing_function=preprocess_input)
train_generator = datagen.flow_from_directory(new_paths[0],
                                              target_size=size,
                                              color_mode='rgb',
                                              batch_size=32,
                                              class_mode='categorical',
                                              shuffle=True)

test_generator = datagen.flow_from_directory(new_paths[1],
                                             target_size=size,
                                             color_mode='rgb',
                                             batch_size=32,
                                             class_mode='categorical',
                                             shuffle=True)

Found 5600 images belonging to 8 classes.
Found 2400 images belonging to 8 classes.


# Build, save, and train model

In [28]:
# Get the base pre-trained model
base_model = VGG16(weights='imagenet', include_top=False,
                   input_shape=(size[0], size[1], 1 if greys else 3))
x = base_model.output

# Add layers
x = Flatten()(x)  # vector

x = Dense(1024)(x)
x = keras.layers.BatchNormalization()(x)
x = ReLU()(x)

x = Dense(512)(x)
x = keras.layers.BatchNormalization()(x)
x = ReLU()(x)

# Last layer used to predict our classes
# Dense has the same number of neurons as the number of classes to predict
x = Dense(nb_types)(x)
x = keras.layers.BatchNormalization()(x)
predictions = Softmax()(x)

# Model to be trained
model = Model(inputs=base_model.input, outputs=predictions)

# Don't retrain pre-trained layers
for layer in base_model.layers:
    layer.trainable = False

# Compile the model (should be done after setting layers to non-trainable)
model.compile(optimizer=keras.optimizers.Adam(learning_rate=0.001),
              loss='categorical_crossentropy', metrics=['accuracy'])

model.summary()

Model: "model_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0   

In [29]:
# Reduce learning rate when a metric has stopped improving
reduce_lr = ReduceLROnPlateau(monitor='val_accuracy',  # chosen metric
                              patience=2,  # number of epochs
                              verbose=1,
                              factor=0.5,
                              min_lr=0.00001)

In [30]:
# Train model
batch_size = 128
step_size_train = train_generator.n // train_generator.batch_size
model.fit_generator(generator=train_generator,
                    steps_per_epoch=step_size_train,
                    epochs=10,
                    validation_data=test_generator,
                    callbacks=[reduce_lr])

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10

Epoch 00005: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10

Epoch 00009: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.
Epoch 10/10

Epoch 00010: ReduceLROnPlateau reducing learning rate to 0.0001250000059371814.


<keras.callbacks.callbacks.History at 0x7f55da9ee050>

In [31]:
# Save model and labels
os.makedirs(project_path + 'Models/' + model_name + '/', exist_ok=True)
save_model_classes(project_path + 'Models/',
                   model_name, train_generator, model)