 # TP3. Tuning models 
 
 #### Université Jean-Monnet, 2019-2020

## Part 2. Keras with data from directories, small dataset

Before our data are provided as numpy arrays and already normalized (same size). Now we move to a dataset with high resolution images (around 300 * 300) organized in folders.
- download bird dataset at: http://perso.ens-lyon.fr/tien-nam.le/data/ML/birds.zip

This is an excerpt of CUB-200 dataset (http://www.vision.caltech.edu/visipedia/CUB-200.html), which contain 200 types of birds. Our sub-dataset contains 10 types of birds, each type contains around 50 images for training and 10 images for testing.

<img src = "http://www.vision.caltech.edu/visipedia/collage.jpg">

We face 3 problems here:
1. How to label the data?
2. How to feed images and their labels to the neural net?
3. How to normalize the size of the images (to feed to the input of the neural net)?

All these problems can be solved by ImageDataGenerator. Keras will run through whole directory 'birds/train' and get images and label each image from 0 to 9 by the subfolders containing it. Thus, the subfolders of train folder and test folder must be similar

**Problem 1. Use `flow_from_directory` method to train a NN with the dataset**

In [43]:
import os
import numpy as np
from keras.preprocessing.image import ImageDataGenerator
from keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Activation, BatchNormalization
from keras.layers import Conv1D, GlobalMaxPooling1D, Conv2D, MaxPooling2D, MaxPooling1D, MaxPool2D
from keras import optimizers
import matplotlib.pyplot as plt

train_datagen = ImageDataGenerator(rescale= 1./255, shear_range= 0.2, zoom_range = 0.2, horizontal_flip= True,  validation_split=0.2)
test_datagen = ImageDataGenerator(rescale= 1./255)

train_path = (r"F:\MLDM\3rd Semester\Deep Learning and Applications\Session 3\TP\birds\train")
test_path = (r"F:\MLDM\3rd Semester\Deep Learning and Applications\Session 3\TP\birds\test")
train_generator = train_datagen.flow_from_directory(train_path, target_size=(32,32), batch_size=32, class_mode='categorical', subset='training')
    
validation_generator = train_datagen.flow_from_directory(
    train_path, # same directory as training data
    target_size=(32,32),
    batch_size=32,
    class_mode='categorical',
    subset='validation') # set as validation data

test_generator = test_datagen.flow_from_directory(test_path, target_size=(32,32), batch_size=32, class_mode='categorical')



model = Sequential()
model.add(Conv2D(64, (3,3), input_shape=(32,32,3), activation='relu'))
model.add(Conv2D(128, (3,3), activation='relu'))
model.add(MaxPool2D(pool_size=(2,2)))
model.add(Dropout(0.3))
model.add(Conv2D(64, (3,3), activation='relu'))
model.add(Conv2D(128, (3,3), activation='relu'))
model.add(MaxPool2D(2, 2))
model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))

model.compile(loss="categorical_crossentropy",
              optimizer="Adam",
              metrics=['accuracy'])
print(model.summary())

STEP_SIZE_TRAIN=train_generator.samples //train_generator.batch_size
STEP_SIZE_VALIDATION=validation_generator.samples //validation_generator.batch_size

model.fit_generator(generator=train_generator,
          steps_per_epoch=STEP_SIZE_TRAIN,
          epochs=50,
          verbose=1,
          validation_data = validation_generator,validation_steps = STEP_SIZE_VALIDATION)




Found 387 images belonging to 10 classes.
Found 92 images belonging to 10 classes.
Found 100 images belonging to 10 classes.
Model: "sequential_30"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_71 (Conv2D)           (None, 30, 30, 64)        1792      
_________________________________________________________________
conv2d_72 (Conv2D)           (None, 28, 28, 128)       73856     
_________________________________________________________________
max_pooling2d_34 (MaxPooling (None, 14, 14, 128)       0         
_________________________________________________________________
dropout_52 (Dropout)         (None, 14, 14, 128)       0         
_________________________________________________________________
conv2d_73 (Conv2D)           (None, 12, 12, 64)        73792     
_________________________________________________________________
conv2d_74 (Conv2D)           (None, 10, 10, 128)       73856

Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.callbacks.callbacks.History at 0x1ad83d88780>

In [44]:
STEP_SIZE_TEST=test_generator.samples //test_generator.batch_size

test_loss, test_acc = model.evaluate_generator(test_generator, steps=STEP_SIZE_TEST, verbose=1)
print('Test loss:', test_loss)
print('Test accuracy:', test_acc)

Test loss: 2.682387351989746
Test accuracy: 0.46875


**Problem 2. Use data augmentation to improve the results**

In [55]:
model = Sequential()
model.add(Conv2D(64, (3,3), input_shape=(32,32,3), activation='relu'))
model.add(BatchNormalization())
model.add(Conv2D(32, (3,3), padding='same'))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.2))
 
model.add(Conv2D(64, (3,3), padding='same'))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(Conv2D(64, (3,3), padding='same'))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.3))
 
model.add(Conv2D(128, (3,3), padding='same'))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(Conv2D(128, (3,3), padding='same'))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.4))
 
model.add(Flatten())
model.add(Dense(10, activation='softmax'))


model.compile(loss="categorical_crossentropy",
              optimizer="Adam",
              metrics=['accuracy'])
print(model.summary())

STEP_SIZE_TRAIN=train_generator.samples //train_generator.batch_size
STEP_SIZE_VALIDATION=validation_generator.samples //validation_generator.batch_size

model.fit_generator(generator=train_generator,
          steps_per_epoch=STEP_SIZE_TRAIN,
          epochs=50,
          verbose=1,
          validation_data = validation_generator,validation_steps = STEP_SIZE_VALIDATION)





Model: "sequential_40"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_116 (Conv2D)          (None, 30, 30, 64)        1792      
_________________________________________________________________
batch_normalization_22 (Batc (None, 30, 30, 64)        256       
_________________________________________________________________
conv2d_117 (Conv2D)          (None, 30, 30, 32)        18464     
_________________________________________________________________
activation_22 (Activation)   (None, 30, 30, 32)        0         
_________________________________________________________________
batch_normalization_23 (Batc (None, 30, 30, 32)        128       
_________________________________________________________________
max_pooling2d_58 (MaxPooling (None, 15, 15, 32)        0         
_________________________________________________________________
dropout_76 (Dropout)         (None, 15, 15, 32)      

Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.callbacks.callbacks.History at 0x1ad9d37db38>

**Problem 3. Use other techniques to avoid overfitting**

**Problem 4. Use pretrained models, objective 84%**