<h1 style='font-size:40px'> tf.data Pipeline</h1>
<div> 
    <ul style='font-size:20px'> 
        <li> 
            In this notebook I'll practice my skills with the tf.data module solving Exercise 9 from Hands-On Machine Learning with Scikit-Learn and TensorFlow's Chapter 13.
        </li>
        <li> 
            The Exercise commands us the following:
            <p style='font-style:italic;margin-top:10px'> 
                Load the Fashion MNIST dataset (introduced in Chapter 10); split
it into a training set, a validation set, and a test set; shuffle the
training set; and save each dataset to multiple TFRecord files.
Each record should be a serialized Example protobuf with two
features: the serialized image (use tf.io.serialize_tensor()
to serialize each image), and the label. 11 Then use tf.data to create
an efficient dataset for each set. Finally, use a Keras model to
train these datasets, including a preprocessing layer to standardize
each input feature. Try to make the input pipeline as efficient as
possible, using TensorBoard to visualize profiling data.
            </p>
        </li>
    </ul>
</div>

<h2 style='font-size:30px'> Data Importing & Splitting</h2>

In [1]:
# Loading the fashion_mnist dataset.
from tensorflow.keras.datasets import fashion_mnist
from sklearn.model_selection import train_test_split
(X_train, y_train), (X_test, y_test) = fashion_mnist.load_data()

# Now, generating the validation set with `train_test_split`.
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=.1, random_state=42)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz


In [2]:
# Storing each one of the sets in a tf.data.Dataset object.
# The classes store 1000 elements batches. The groups' data will be put into a .tfrecord file.
from tensorflow.data import Dataset
batch_size = 1000
train = Dataset.from_tensor_slices((X_train, y_train)).shuffle(54000).batch(batch_size)
val = Dataset.from_tensor_slices((X_val, y_val)).batch(batch_size)
test = Dataset.from_tensor_slices((X_test, y_test)).batch(batch_size)

In [3]:
# We'll produce 1000 instances .tfrecord files. One corresponding to a bacth created.
train_files = len(X_train) // batch_size
val_files = len(X_val) // batch_size
test_files = len(X_test) // batch_size

<h2 style='font-size:30px'> .tfrecord's Production</h2>
<div> 
    <ul style='font-size:20px'> 
        <li> 
            TensorFlow records demands data to be stored in protobuf format. We can do so by placing the serialized information in an `Example` class. 
        </li>
    </ul>
</div>

In [4]:
from tensorflow import Tensor
from tensorflow.train import BytesList, Example, Features, Feature, Int64List
from tensorflow.io import serialize_tensor, TFRecordWriter

def create_example(images:Tensor, targets:Tensor)->str:
    '''
        Generates a serialized `Example` object holding the pixel intensities and target values from a collection
        of MNIST images.
        
        Parameters
        ----------
        `images`: A 3-D `tf.Tensor` with the digits pixels. \n
        `targets`: An 1-D `tf.Tensor` with the digits labels.
        
        Returns
        -------
        An `tf.train.Example` object storing both pixels and target values.
    '''
    # Serializing the input vectors.
    serialized_images, serialized_targets = serialize_tensor(images), serialize_tensor(targets)
    example = Example(
        features=Features(
            feature={
            'pixels':Feature(bytes_list=BytesList(value=[serialized_images.numpy()])),
            'target':Feature(bytes_list=BytesList(value=[serialized_targets.numpy()]))
        }
        ))
    # Now, converting the `Example` object into a binary string.
    return example.SerializeToString()

In [5]:
# It is convenient to place all data files in a separate directory.
! mkdir mnist

In [6]:
def create_files(dataset:Dataset, filename:str, directory:str='.')->None:
    '''
        Creates the .tfrecord's files based on the batches from a provided `dataset`.
        
        Parameters
        ----------
        `dataset`: A `tf.data.Dataset` object. \n
        `filename`: A custom name for file identification. \n
        `directory`: A string that indicates the directory where the files are put.
    '''
    for index, (images, labels) in dataset.enumerate():
        file = TFRecordWriter(f'{directory}/{filename}_{index}.tfrecord')
        serialized_data = create_example(images, labels)
        file.write(serialized_data)

# Generating the files.
create_files(train, 'train', 'mnist')
create_files(val, 'val', 'mnist')
create_files(test, 'test', 'mnist')

<h2 style='font-size:30px'> Data Treatment</h2>
<div> 
    <ul style='font-size:20px'> 
        <li> 
            With the files generated, we can proceed and handle the data importing and its proper treatment.
        </li>
    </ul>
</div>

In [7]:
# Reading the data files separately.
train_files = Dataset.list_files('mnist/train*')
val_files = Dataset.list_files('mnist/val*')
test_files = Dataset.list_files('mnist/test*')

In [8]:
from tensorflow import string, uint8
from tensorflow.data import AUTOTUNE, TFRecordDataset
from tensorflow.io import FixedLenFeature, parse_example, parse_tensor
from typing import Optional, Iterable, Tuple

def preprocess(tfrecord:Tensor)->Tuple[Tensor, Tensor]:
    '''
    Reads an encoded protobuf and returns its Tensors in numerical format.
    
    Parameter
    ---------
    `tfrecord`: A `tf.Tensor` that stores encoded protobufs.
    
    Returns
    -------
    Two tensors in a tuple. One with the pixel intensities and another containing the target values.
    '''
    features = {
    'pixels':FixedLenFeature([], string, default_value=''), 
    'target':FixedLenFeature([], string, default_value='-1')
                }
    example = parse_example(tfrecord, features) # Returns a dictionary with the serialized images and target values.
    pixels, target = parse_tensor(example['pixels'], uint8), parse_tensor(example['target'], uint8)
    return pixels, target

def read_files(filenames:Iterable[str], shuffle_size:Optional[int]=None, num_threads_reading:int=AUTOTUNE, 
               num_threads_preprocess:int=AUTOTUNE)->Dataset:
    '''
        Reads the .tfrecord files specified and retrieves a `tf.data.Dataset` object with the processed data.
        
        Parameters
        ----------
        `filenames`: The names of the files.
        `shuffle_size`: If specified, it shuffles the Dataset using a deck with the specified length.
        `num_threads_reading`: The number of threads to use when reading the files.
        `num_threads_preprocess`: The number of threads to use when preprocessing the dataset.
        
        Returns
        -------
        The treated dataset.
    '''
    dataset = TFRecordDataset(filenames, num_parallel_reads=num_threads_reading)
    if shuffle_size:
        dataset.shuffle(shuffle_size)
    return dataset.map(preprocess, num_parallel_calls=num_threads_preprocess).prefetch(1)

<h2 style='font-size:30px'> Standardization Layer</h2>
<div> 
    <ul style='font-size:20px'> 
        <li> 
            Here, we'll simply code a `tf.layers.Layer` object which fairly does a similar job of the Batch Normalization Layer. The main difference is that $\mu$ and $\sigma$ are computed in advance using the `adapt` function.
        </li>
    </ul>
</div>

In [9]:
# The layer will inherit the properties of the experimental `PreprocessingLayer`.
from tensorflow.keras.layers.experimental.preprocessing import PreprocessingLayer
from tensorflow.keras.layers import Layer
from tensorflow.math import reduce_mean, reduce_std
from tensorflow.keras.backend import epsilon
from typing import Union

class Standardize(PreprocessingLayer):
    '''
    A `PreprocessingLayer` object that carries out the standardization of a given array accordingly to an 
    informed axis.
    
    The necessary stats are computed before training with the `adapt` method. This feature the  class' main 
    difference from the `BatchNormalization` layer, which computes means and standard deviations on the fly.
    '''    
    
    def adapt(self, input_data:Union[Dataset, Tensor], axis=0)->None:
        '''
            Computes means and std's from a provided `tf.Tensor` or `tf.data.Dataset`.
            
            Paramater
            ---------
            `input_data`: The array from which the stats are computed.
            `axis`: The axis of choice to compute the stats.
        '''
        self.means = reduce_mean(input_data, axis=axis, keepdims=True)
        self.stds = reduce_std(input_data, axis=axis, keepdims=True)
        return self
        
    def call(self, input_data:Union[Dataset, Tensor])->Union[Dataset, Tensor]:
        '''
        The method that standardizes the array.
        
        Parameter
        --------
        `input_data`: The array in which we perform the standardization.
        
        Returns
        -------
        The standardized array.
        '''
        return (input_data - self.means) / (self.stds + epsilon())
    
    def get_config(self):
        '''
            Returns the Layer's configurations. It is needed for saving a class' instance for future uses.
        '''
        base_config = super().get_config() # Rendering the `Preprocessing` layer `get_config` output.
        return {**base_config, 'means':self.means.numpy(), 'stds':self.stds.numpy()}
    

<h2 style='font-size:30px'> Neural Net Modeling</h2>
<div> 
    <ul style='font-size:20px'> 
        <li> 
            With all the preprocessing stages programmed, we are able to model our Neural Network.
        </li>
        <li> 
            Firstly, we are going to design a simple Fully-Connected Neural Network and see how it performs.
        </li>
    </ul>
</div>

In [10]:
# Reading all the .tfrecords created.
train_set = read_files(train_files, shuffle_size=100)
val_set = read_files(val_files)
test_set = read_files(test_files)

In [11]:
# But before actually fitting the NN, we need to adapt our `Standardize` layer with the training data.
from tensorflow import concat, cast, float32
train_pixels = list(train_set.map(lambda pixels, target: pixels).take(-1)) # Getting all the training images.
train_pixels = cast(concat(train_pixels, axis=0), dtype=float32) # Concatenating the batches so that we end up
                                                                # with a single 3-D matrix.

# Now, instantiating the `Standardize` class and adapting it to the `train_pixels` data.
standardize = Standardize(input_shape=train_pixels.shape[1:]).adapt(train_pixels, axis=0)

In [12]:
# Finally, making our FCNN.
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Flatten, Input
from tensorflow.keras.activations import elu, softmax
from tensorflow.keras.initializers import GlorotNormal, HeNormal
from tensorflow.keras.optimizers import SGD, Optimizer
from tensorflow.keras.losses import SparseCategoricalCrossentropy, Loss
from tensorflow.keras.metrics import Accuracy, Metric
from typing import Callable

# Here we define some of the model's settings.
optimizer = SGD(momentum=.9, nesterov=True)
loss = SparseCategoricalCrossentropy()
metrics = [Accuracy()]

def _check_length(neurons:Iterable, activations:Iterable)->bool:
    '''
        Checks whether both provided arrays have equal lengths.
        
        Parameters
        ----------
        `neurons`: First Array
        `activations`: Second array
        
        Returns
        -------
        A boolean indicating the existance of such condition.
    '''
    if len(neurons) != len(activations):
        raise  AttributeError('The `neurons` array must have the same length as `activations`')
    return True

def my_fcnn(neurons:Iterable[int], activations:Union[Iterable[Callable[[float], float]], Callable[[float], float]],dropout_ratio:float=None,  
            input_shape:list=[28,28], **kwargs)->Sequential:
    '''
        Generates the Fully-Connected Neural Network for the project.
        
        Parameters
        ----------
        `neurons`: An iterable object that indicates the amount of neurons each hidden layer needs to own. Its length will suggest the amount of such layers the model will have.
        
        `activations`: The activation functions for each hidden layer. If the user wishes to use a single function to be applied in all layers, they
                       can only inform just one callable object. Otherwise, it will be demanded an iterable object containing all the functions. 
                       Note that in such case, the structure needs to have the same length as `neurons`.
        
        `dropout_ratio`: The amount of dropout to be applied in every hidden layer. If 'None' - default value -, no dropout is used. 
        
        `input_shape`: The original images' shape. Do not provide the batch size in the iterable.
        
        **kwargs: Any further keyword argument will be used in the `compile` method.
        
        Returns
        -------
        A Sequential object built accordingly to the user's preferences. If no **kwargs is provided, the model will have to be manually compiled.
        
    '''
    # If the user informed a function for `activations`, a list of such callable of the same length as `hidden_layers`
    # is created.
    if isinstance(activations, Callable):
        activations = [activations] * len(neurons)    
    
    # `check_length` is of use when 'activations' is given as an array by the user.
    _check_length(neurons, activations)
    
    # Creating the model's basic structure.
    model = Sequential([
        Input(shape=input_shape),
        standardize, 
        Flatten(), # The inputs are flattened after the standardization.
    ])
    
    for units, activation in zip(neurons, activations):
        model.add(Dense(units, activation=activation, kernel_initializer=HeNormal()))
        # Adding Dropout if it is the user's wish.
        if dropout_ratio:
            model.add(Dropout(dropout_ratio))
            
    # In the end, applying the output layer.
    model.add(Dense(units=10, activation=softmax , kernel_initializer=GlorotNormal()))
    
    # Compiling the model.
    if kwargs:
        model.compile(**kwargs)
    return model

In [13]:
# Compiling and fitting the model.
from numpy.random import randint, seed
from tensorflow.keras.optimizers import Adam 
from tensorflow.keras.losses import SparseCategoricalCrossentropy
from tensorflow.keras.metrics import SparseCategoricalAccuracy
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau
from tensorflow.config import list_logical_devices
import tensorflow.distribute as tf_dist

seed(42) 

# Running the fitting in our available GPU's.
gpus = list_logical_devices('GPU')
strategy = tf_dist.MirroredStrategy(gpus)

# Informing TensorFlow to run the training in the GPU's.
with strategy.scope():
    # Setting the compilation configs.
    optimizer = Adam(learning_rate=0.0025)
    loss = SparseCategoricalCrossentropy()
    metrics = [SparseCategoricalAccuracy()]
    callbacks = [EarlyStopping(monitor='val_sparse_categorical_accuracy', min_delta=.01, patience=25, restore_best_weights=True),
                    ReduceLROnPlateau(monitor='val_sparse_categorical_accuracy', factor=.8, min_delta=.01, patience=10)]
    neurons = randint(200, 400, 5)
    
    # Creating the model and fitting it.
    fcnn = my_fcnn(neurons, elu, .5, optimizer=optimizer, loss=loss, metrics=metrics)
    fcnn.fit(x=train_set, epochs=100, callbacks=callbacks, validation_data=val_set)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100


<h3 style='font-size:30px;font-style:italic'> Extra Challenge: LeNet-5 </h3>
<div> 
    <ul style='font-size:20px'> 
        <li> 
            Fully-Connected Neural Networks are utterly unsuitable to tasks such as Computer Vision. In modern DL systems, Convolutional Neural Networks show up as a resource both trustable and computationally lighter for this duty.
        </li>
        <li> 
            Hence, we'll design here a LeNet-5 model because of its success in digit recognition. 
        </li>
    </ul>
</div>

In [14]:
# We'll be designing here a LetNet-5 NN. Thus, we'll need some extra components.

# The images that feed the Input Layer of such model have 32x32 shape. Since the MNIST digits are 28x28 it is
# necessary to add zero-padding to each matrix.
from tensorflow.keras.layers import Reshape, ZeroPadding2D

# `ZeroPadding2d` demands the inputs to own a dimension for the amount of channels. That's why we are invoking the
# `Reshape` layer as well.
target_shape = train_pixels.shape[1:]+[1]
reshape = Reshape(target_shape=target_shape)

# Instantiating the `ZeroPadding2D` layer.
zero_padding_2d = ZeroPadding2D(padding=2)

In [15]:
from tensorflow.keras.layers import Conv2D, AveragePooling2D
# Informing TensorFlow to run the training in the GPU's.
with strategy.scope():
    
    # Building our LeNet-5
    lenet5 = Sequential([
        # Initial layers.
        Input(shape=[28,28]),
        standardize, # Note that the inputs need to be standardized as well in this architecture.
        reshape,
        zero_padding_2d,

        # Performing the Convolutions and Pooling in accordance to LeNet-5's architecture.
        Conv2D(filters=6, kernel_size=5, strides=1, activation='tanh'),
        AveragePooling2D(),
        Conv2D(filters=16, kernel_size=5, strides=1, activation='tanh'),
        AveragePooling2D(),
        Conv2D(filters=120, kernel_size=5, strides=1, activation='tanh'),

        # A Dense layer followed by the output layer.
        Flatten(),
        Dense(units=84, activation='tanh', kernel_initializer=GlorotNormal()),
        Dense(units=10, activation='softmax')

    ])

    # Setting the compilation configs.
    optimizer = Adam(learning_rate=0.00125)
    loss = SparseCategoricalCrossentropy()
    metrics = [SparseCategoricalAccuracy()]
    callbacks = [EarlyStopping(monitor='val_sparse_categorical_accuracy', min_delta=.01, patience=25, restore_best_weights=True),
                    ReduceLROnPlateau(monitor='val_sparse_categorical_accuracy', factor=.5, min_delta=.005, patience=10)]
    
    # Creating the model and fitting it.
    lenet5.compile(optimizer=optimizer, loss=loss, metrics=metrics)
    lenet5.fit(x=train_set, epochs=100, callbacks=callbacks, validation_data=val_set)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100


<h2 style='font-size:30px'> Models Testing and Comparison</h2>
<div> 
    <ul style='font-size:20px'> 
        <li> 
            By seeing the fitting logs, we cannot observe a big difference between the NN's accuracies. 
        </li>
        <li> 
            Let's find out whether the test performances presents any statistical difference.
        </li>
    </ul>
</div>

In [16]:
from numpy import argmax
#argmax(lenet5.predict(X_test), axis=1)
lenet5.evaluate(X_test, y_test)



[0.3022363781929016, 0.89410001039505]

In [17]:
lenet5.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 standardize (Standardize)   (None, 28, 28)            0         
                                                                 
 reshape (Reshape)           (None, 28, 28, 1)         0         
                                                                 
 zero_padding2d (ZeroPadding  (None, 32, 32, 1)        0         
 2D)                                                             
                                                                 
 conv2d (Conv2D)             (None, 28, 28, 6)         156       
                                                                 
 average_pooling2d (AverageP  (None, 14, 14, 6)        0         
 ooling2D)                                                       
                                                                 
 conv2d_1 (Conv2D)           (None, 10, 10, 16)       

In [18]:
fcnn.summary()#(X_test, y_test)

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 standardize (Standardize)   (None, 28, 28)            0         
                                                                 
 flatten (Flatten)           (None, 784)               0         
                                                                 
 dense (Dense)               (None, 302)               237070    
                                                                 
 dropout (Dropout)           (None, 302)               0         
                                                                 
 dense_1 (Dense)             (None, 379)               114837    
                                                                 
 dropout_1 (Dropout)         (None, 379)               0         
                                                                 
 dense_2 (Dense)             (None, 292)               1

<p style='color:red'>  Teste de Hipóteses

<h2 style='font-size:30px'> </h2>