## first model

This notebook is regarding the homonymous fifth chapter of the accompanying report, located at `reports/srp/document.pdf`.

The first model can be considered as proof of concept.
It is a simple model composed of only dense layers, manually tuned to represent a somewhat working model.

The first cell loads the data from `data/raw` to `data/processed`.
The `intermediate` directory is not regarded here, since no manipulations are done on the images themself.

For this some helping function to create the training environment are defined, which can be found in `src/training_env.py`.

`reset_and_populate` resets the `processed` folder and then distributes data from `raw`. In this case 500 images are taken from `raw`, whereby 400 images from each class (`n`, `o` and `x`) are put into training, no images are put into validation (because no validation takes place in this notebook) and 100 images of each class are taken for testing.

Please note that `raw` contains the classes as directories as it is suggested by the `ImageDataGenerator` for the processed data; e.g. all images for the bases are in `data/raw/x/`.

In [1]:
from os.path import join

raw = join('data', 'raw')
processed = join('data', 'processed')

from src.training_env import reset_and_populate

reset_and_populate(raw, processed, [400,0,100])

['data\\processed\\train\\n',
 'data\\processed\\validation\\n',
 'data\\processed\\test\\n',
 'data\\processed\\train\\o',
 'data\\processed\\validation\\o',
 'data\\processed\\test\\o',
 'data\\processed\\train\\x',
 'data\\processed\\validation\\x',
 'data\\processed\\test\\x']

After loading a `generator` is defined, which can be passed to the models `fit` function later in the code.

At this point two generators are created. One for the training data, which is used for training and possibly submitted multiple times and one for testing which is never looked at besides for model evaluation.

The `create_generator` function takes the image size and the batch size.
The image size is reduced at this point to shrink the information given into the model to a small, but still reasonable size. The size of the batch is determining how many images are given to the model to calculate the loss before the weights and biases are updated.

It can be seen that the classes are detected as described earlier, as it found 1200 images (400 x 3) classes in `train` and 300 images (100 x 3) in `test`.

In [2]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

def create_generator(data_dir, batch_size):
    datagen = ImageDataGenerator(rescale=1./255)
    full_path = join(processed, data_dir)
    return datagen.flow_from_directory(
        full_path,
        target_size=(32, 32),
        batch_size=batch_size,
        class_mode='binary')

train_generator = create_generator('train', 20)
test_generator = create_generator('test', 10)

Found 1200 images belonging to 3 classes.
Found 300 images belonging to 3 classes.


After the data and the respective generators are prepared, the model is created.

For the beginning the model is a simple feed forward neural network with only dense layers.
The input shape is predetermined by the data given to the model.
If the data does not fit to the input layer, the training (and the evaluation) will not work.

The model has 3 dense layers, whereas the activation is given by `relu`.
These values are more or less lucky guesses at this point to create a somewhat feasible model to show that this process actually works.

The `softmax` activation in the last layer takes the values of the last layer and normalizes them.
This is necessary to compare the values with the labels (which are given as 1 for the correct label and 0 for the others).

In [3]:
from tensorflow.keras import layers
from tensorflow.keras import models

model = models.Sequential()
model.add(layers.Flatten(input_shape=(32, 32, 3)))
model.add(layers.Dense(32,'relu'))
model.add(layers.Dense(32,'relu'))
model.add(layers.Dense(3, 'softmax'))

model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
flatten (Flatten)            (None, 3072)              0         
_________________________________________________________________
dense (Dense)                (None, 32)                98336     
_________________________________________________________________
dense_1 (Dense)              (None, 32)                1056      
_________________________________________________________________
dense_2 (Dense)              (None, 3)                 99        
Total params: 99,491
Trainable params: 99,491
Non-trainable params: 0
_________________________________________________________________


The optimizer used in this example is gradient descent, which was already introduced in chapter 2. It is called "stochastic" despite the batch size being greater than one.

The learning rate and momentum are tweaked manually, which is not optimal but suffices for a first test.
In first manual tests it seems that to set nesterov to `True` results in better results.

In later chapters and notebooks a more sophistiated look is taken on these hyper-parameters.

The loss function is `sparse_categorical_crossentropy`, which calculates sum of the differences between the prediction and the label for each class.

In [4]:
from tensorflow.keras.optimizers import SGD, RMSprop

optimizer = SGD(lr=0.005, momentum=0.9, nesterov=True)

model.compile(loss='sparse_categorical_crossentropy', optimizer=optimizer, metrics=['acc'])

In this cell the log directory is defined.
The log directory is a concatenation of the project abbreviation (srp) and the chapter this notebook is written for.

Furthermore a directory is created for the current time to be able to distinguish different runs.

A callback for Tensorboard is implemented to further analyze the training process.

`model.fit` initializes the training.

In [5]:
from tensorflow.keras.callbacks import TensorBoard
import numpy as np
from datetime import datetime
from os import mkdir

log_dir = join('logs', 'srp5', datetime.now().strftime("%Y-%m-%dT%H-%M-%S"))
mkdir(log_dir)

from src.training_env import reset
reset(log_dir)

callbacks = [ TensorBoard(
    log_dir=log_dir,
    histogram_freq=1,
    embeddings_freq=1) ]

history = model.fit(
    train_generator,
    steps_per_epoch=20,
    epochs=20,
    callbacks=callbacks)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


A model evaluation is undertaken to see how well the model performs on data it has never seen before. The first value represents the loss and the second value represents the accuracy.

In [6]:
model.evaluate_generator(test_generator)

[0.6071663084129493, 0.7866667]

At last the model is saved for later investigation if necessary.

In [7]:
model_path = join('models', 'srp5.h5')
model.save(model_path)