Created on Mon Jan 13 16:00:47 2020
<br>
Group 7
<br>
@authors : G.H.
<h1>Group 7 - Images sociales<span class="tocSkip"></span>

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Introduction" data-toc-modified-id="Introduction-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Introduction</a></span></li><li><span><a href="#Environment" data-toc-modified-id="Environment-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Environment</a></span><ul class="toc-item"><li><span><a href="#Libraries" data-toc-modified-id="Libraries-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Libraries</a></span></li><li><span><a href="#Parameters" data-toc-modified-id="Parameters-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Parameters</a></span></li><li><span><a href="#Functions" data-toc-modified-id="Functions-2.3"><span class="toc-item-num">2.3&nbsp;&nbsp;</span>Functions</a></span></li></ul></li><li><span><a href="#Train-test-split-and-read-images" data-toc-modified-id="Train-test-split-and-read-images-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Train-test split and read images</a></span></li><li><span><a href="#Build,-save,-and-train-model" data-toc-modified-id="Build,-save,-and-train-model-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Build, save, and train model</a></span></li></ul></div>

# Introduction
This notebook trains a model to predict Airbus aircraft types on images representing an aircraft interior. The images used for training come from Seatguru social media.

**Pre-processing**<span class="tocSkip"></span><br>
By reading and filtering the CSV file that contains the labels (Int, Ext, Ext-Int, Meal), we get the list of Interior labelled images. Then, the images are copied to directories (one per desired aircraft type), and split into train and test sets. If the data augmentation option is set to `True`, the train set will be enriched with new images (obtained by cropping / (de)zooming / rotating / flipping existing images).

**Model**<span class="tocSkip"></span><br>
We get weights from VGG16 pre-trained model, and add some layers (Conv2D, ReLU, MaxPooling2D, Flatten, and Dense) to predict the target classes (e.g.: 3 classes could be A320, A350, and A380).

**Out**<span class="tocSkip"></span><br>
After training, a folder is created in `Models` repository, containing the model in `h5` format, along with the corresponding labels stored in a `pickle` dictionary.

# Environment
To ensure a proper functioning of this code file, `python 3.6` or later version is required.
## Libraries

In [10]:
import warnings
warnings.filterwarnings('ignore')
from keras.applications.vgg16 import VGG16
from keras.callbacks import ReduceLROnPlateau
from keras.applications.resnet50 import preprocess_input
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from keras.utils import np_utils
from keras.optimizers import SGD, Adam
from keras.layers.core import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D, InputLayer, ReLU, AveragePooling2D
from keras.models import Sequential, Model
from keras import backend as K
import keras
import pandas as pd
import random
import os
from PIL import Image
from shutil import copyfile

%load_ext watermark
%watermark -p keras,tensorflow,PIL

## Parameters

In [11]:
project_path = './../'
seatguru_path = project_path + 'Interpromo2020/All Data/ANALYSE IMAGE/IMG SEATGURU/'
stats_path = project_path + 'ImagesStats/'
new_path_train = project_path + 'G7_SEATGURU/Int/data_train'
new_path_val = project_path + 'G7_SEATGURU/Int/data_test'
new_paths = [new_path_train, new_path_val]
model_name = 'Int_man_F_2'

# Images parameters
size = (224, 224)
greys = False


# Number of classes to predict
nb_types = 2  # Airbus, Boeing

## Functions

In [12]:
%run g7_functions_for_models_V2.ipynb

# Train-test split and read images

In [13]:
# Read SEATGURU annotated CSV
df_seat_annot = pd.read_csv(stats_path + 'g7_SEATGURU_annotate.csv', sep=';')

# Get Airbus and Boeing images names for Interior view
df_airbus = df_seat_annot[df_seat_annot['aircraft_manufacturer'] == 'Airbus']
list_airbus = df_airbus[df_airbus['view'] == 'Int']['name'].tolist()

df_boeing = df_seat_annot[df_seat_annot['aircraft_manufacturer'] == 'Boeing']
list_boeing = df_boeing[df_boeing['view'] == 'Int']['name'].tolist()

In [14]:
# Create train and test sets
split_train_test_seatguru_man(new_paths, seatguru_path, list_airbus, list_boeing)

./../G7_SEATGURU/Int/data_train/Airbus
./../G7_SEATGURU/Int/data_train/Boeing


In [15]:
# Image data generator
train_datagen = ImageDataGenerator(preprocessing_function=preprocess_input)
train_generator = train_datagen.flow_from_directory(new_paths[0],
                                                    target_size=size,
                                                    color_mode='rgb',
                                                    batch_size=32,
                                                    class_mode='categorical',
                                                    shuffle=True)

test_generator = train_datagen.flow_from_directory(new_paths[1],
                                                   target_size=size,
                                                   color_mode='rgb',
                                                   batch_size=32,
                                                   class_mode='categorical',
                                                   shuffle=True)

Found 1123 images belonging to 2 classes.
Found 482 images belonging to 2 classes.


# Build, save, and train model

In [16]:
# Get the base pre-trained model
base_model = VGG16(weights='imagenet', include_top=False, input_shape=(size[0], size[1], 1 if greys else 3))
x = base_model.output

# Add layers
'''x = Conv2D(256, kernel_size=(3, 3))(x)
x = keras.layers.BatchNormalization()(x)
x = ReLU()(x)

x = Conv2D(256, kernel_size=(3, 3))(x)
x = keras.layers.BatchNormalization()(x)
x = ReLU()(x)'''

x = Flatten()(x)  # vector

# Fully-connected layer
x = Dense(1024)(x)
x = keras.layers.BatchNormalization()(x)
x = ReLU()(x)

x = Dense(512)(x)
x = keras.layers.BatchNormalization()(x)
x = ReLU()(x)

# Output layer to predict the class
predictions = Dense(nb_types, activation = 'softmax')(x)

# Model to be trained
model = Model(inputs=base_model.input, outputs=predictions)

# Don't retrain pre-trained layers
for layer in base_model.layers:
    layer.trainable = False

# compile the model (should be done *after* setting layers to non-trainable)
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

model.summary()

Model: "model_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_2 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0   

In [17]:
# Reduce learning rate when a metric has stopped improving
reduce_lr = ReduceLROnPlateau(monitor='val_accuracy',  # chosen metric
                              patience=2,  # number of epochs
                              verbose=1,
                              factor=0.5,
                              min_lr=0.00001)

In [18]:
# Train model
step_size_train = train_generator.n // train_generator.batch_size
model.fit_generator(generator=train_generator,
                    steps_per_epoch=step_size_train,
                    epochs=10,
                    validation_data=test_generator,
                    callbacks=[reduce_lr])

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10

Epoch 00004: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10

Epoch 00008: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.
Epoch 9/10
Epoch 10/10

Epoch 00010: ReduceLROnPlateau reducing learning rate to 0.0001250000059371814.


<keras.callbacks.callbacks.History at 0x7f66053e2b90>

In [19]:
# Save model and labels
os.makedirs(project_path + 'Models/' + model_name + '/', exist_ok=True)
save_model_classes(project_path + 'Models/',
                   model_name, train_generator, model)