## Load images from record

Herein the data is to be load from TFRecords instead of the system directory.

`inline_augment_images` takes the images in the provided directory and augments them in the directory, removing the old files of the `interim` directory.

`augment_images` also returns a `list` of `dictionaries` which contain information about the images.
This allows for an arbitrary amount of labels saved in the image to be saved without parsing the name of the file or similar.

`encode_record` takes the respective `data_list` and creates TFRecords which are then load later.

This methodology promises to be faster because necessary preprocessing like augmentation and decoding of the image is already done.

Each loaded example of the record is a tensor ready to be put into the training algorithm, with parallel calls and prefetching of data for future steps embedded.

In [1]:
from os.path import join

raw = join('data', 'raw')
interim = join('data', 'interim')
processed = join('data', 'processed')

from src.utils import reset_and_distribute_data

reset_and_distribute_data(raw, interim, [400,100,0])

from src.image_handling import encode_record, inline_augment_images

train_images = inline_augment_images(join(interim, 'train'), target_size=(32, 32))
validation_images = inline_augment_images(join(interim, 'validation'), target_size=(32, 32))

encode_record(train_images, ['n', 'o', 'x'], processed, 'train')
encode_record(validation_images, ['n', 'o', 'x'], processed, 'validation')

In [2]:
import tensorflow as tf

example_featue_description = { 
        'image': tf.io.FixedLenFeature([], tf.string),
        'label': tf.io.FixedLenFeature([], tf.int64)
}

def decode_record(record_path, batch_size, shuffle_buffer_size=1000):
        def decode_example(example):
                features = tf.io.parse_single_example(example, example_featue_description)
                image = tf.io.parse_tensor(features['image'], tf.float32)
                image.set_shape([32, 32, 1])
                label = features['label']

                return image, label

        autotune = tf.data.experimental.AUTOTUNE

        data = (tf.data.TFRecordDataset(record_path)
                .map(decode_example, num_parallel_calls=autotune)
                .cache()
                .shuffle(shuffle_buffer_size)
                .repeat()
                .batch(batch_size)
                .prefetch(buffer_size=autotune))
        return data

train_dataset = decode_record(join(processed, 'train.tfrecord'), 32)
validation_dataset = decode_record(join(processed, 'validation.tfrecord'), 10)

In [3]:
from tensorflow.keras import layers
from tensorflow.keras import models
from tensorflow.keras.optimizers import SGD

model = models.Sequential()
model.add(layers.Flatten(input_shape=(32, 32, 1)))
model.add(layers.Dense(32,'relu'))
model.add(layers.Dense(32,'relu'))
model.add(layers.Dense(3, 'softmax'))

optimizer = SGD(lr=0.005, momentum=0.9, nesterov=True)

model.compile(loss='sparse_categorical_crossentropy', optimizer=optimizer, metrics=['acc'])

In [4]:
from tensorflow.keras.callbacks import TensorBoard
import numpy as np
from datetime import datetime
from os import mkdir

log_dir = join('logs', 'srp16', datetime.now().strftime("%Y-%m-%dT%H-%M-%S"))
mkdir(log_dir)

callbacks = [ TensorBoard(
    log_dir=log_dir,
    histogram_freq=1,
    embeddings_freq=1) ]

history = model.fit(
    train_dataset,
    steps_per_epoch=20,
    epochs=20,
    callbacks=callbacks)

Train for 20 steps
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
