# Exercise 1.3.3 - Image Classification with FNNs
#### By Jonathan L. Moran (jonathan.moran107@gmail.com)
From the Self-Driving Car Engineer Nanodegree programme offered at Udacity.

## Objectives

* Create a small feedforward neural network ([FNN](https://en.wikipedia.org/wiki/Feedforward_neural_network)) leveraging the TensorFlow [Keras API](https://www.tensorflow.org/api_docs/python/tf/keras);
* Train the FNN on the German Traffic Sign Recognition Benchmark ([GTSRB](https://benchmark.ini.rub.de/gtsrb_dataset.html)) dataset;
* Visualise the training and validation metrics. 

## 1. Introduction

Here is a bit of [terminology](https://developers.google.com/machine-learning/glossary/) from the Google Machine Learning Glossary before we get started:
* **Neural network**: a model composed of layers (at least one of which is [hidden](https://developers.google.com/machine-learning/glossary/#hidden_layer));

* **Neuron**: a node in a neural network, typically taking in multiple input values and generating a single output value. The neuron calculates the output value by applying an [activation function] to a weighted sum of input values;

* **Perceptron**: _nodes_ in a deep neural network. Each perceptron takes in input values, runs an activation function over the weighted sum of values, and computes a single output value. A [backpropagation](https://developers.google.com/machine-learning/glossary/#backpropagation) algorithm is used to introduce feedback into the network;

* **Feedforward Neural Network (FNN)**: a neural network without cyclic or recursive connections (e.g., a _deep neural network_, as opposed to a [recurrent neural network](https://developers.google.com/machine-learning/glossary/#recurrent_neural_network)).

In [None]:
### Importing the required modules

In [None]:
import argparse
import logging
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras.preprocessing import image_dataset_from_directory

In [None]:
tf.__version__

In [None]:
tf.test.gpu_device_name()

In [None]:
### Setting the environment variables

In [None]:
ENV_COLAB = True                # True if running in Google Colab instance

In [None]:
# Root directory
DIR_BASE = '' if not ENV_COLAB else '/content/'

In [None]:
# Subdirectory to save output files
DIR_OUT = os.path.join(DIR_BASE, 'out/')
# Subdirectory pointing to input data
DIR_SRC = os.path.join(DIR_BASE, 'data/')

In [None]:
### Unzipping the GTSRB dataset
!unzip -q /content/GTSRB.zip -d /content/data/ if ENV_COLAB else pass

In [None]:
### Creating subdirectories (if not exists)
os.makedirs(DIR_OUT, exist_ok=True)

### 1.1. Feedforward Neural Networks (FNNs)

#### Types of FNNs

##### The single-layer perceptron

In its simplest form, a Feedforward Neural Network is a [single-layer perceptron](https://en.wikipedia.org/wiki/Feedforward_neural_network#Single-layer_perceptron). The single-layer perceptron is a network with only one input and one output layer. In [Exercise 1.3.2](https://github.com/jonathanloganmoran/ND0013-Self-Driving-Car-Engineer/blob/main/1-Object-Detection-in-Urban-Environments/Exercises/1-3-2-Stochastic-Gradient-Descent/2022-08-29-Stochastic-Gradient-Descent.ipynb), we built a single-layer perceptron that consisted of a single input layer shaped by our input data (the image pixel attributes) and a single output layer (the predicted class distribution of probabilities). The output layer was computed directly from the sum of the product of the input layer units and the perceptron weights (plus a bias term). The output of the single-layer perceptron was then passed into an activation function (the softmax function), and a classification label was selected by choosing the prediction with the highest probability value.

The single-layer perceptron was a powerful yet simple type of Feedforward Neural Network architecture, but as a [linear classifier](https://en.wikipedia.org/wiki/Linear_classifier) it is only capable of learning linearly-separable patterns. As most functions are not linearly-seperable, we will consider a bit more involved approach.

##### The multi-layer perceptron

What's better than _one_ layer of perceptrons? A ton of them!

[Multi-layer perceptrons](https://en.wikipedia.org/wiki/Feedforward_neural_network#Multi-layer_perceptron) (MLPs) are a type of Feedforward Neural Network composed of many single-layer perceptrons. These perceptrons are organised into groups called _layers_. In a MLP, we introduce at least one _hidden_ layer composed of multiple perceptrons. These perceptrons in a hidden layer are not interconnected (i.e., no connections between each other in the same layer) but are said to be "fully connected" to the perceptrons in a preceding layer. In a hidden layer, these perceptrons also have activation functions, similar to the output layer in a single-layer perceptron.

Like single-layer perceptrons, multi-layer perceptrons are trained by iteratively updating the weights and bias values in order to minimise an error (cost) function. Using _stochastic gradient descent_ to adjust these model parameters, we saw in Exercise 1.3.2 that, after a given number of epochs (passes) over our entire dataset, we were able to settle on semi-optimal weight $\mathrm{w}$ and bias $\mathrm{b}$ values.

We can break the training phase into two primary steps with the multi-layer perceptron: a _forward pass_ and a _backwards_ pass. With a _forward_ pass over the data, each layer of perceptrons calculates a sum of products between the perceptron weights and the input values. The resulting values are then passed to an activation function (such as the _Rectified Linear Unit_ — [ReLU](https://en.wikipedia.org/wiki/Rectifier_(neural_networks))) where a single output value is generated. In a _backwards_ pass over over the network, we update the model parameters (the weights and bias values) using the partial derivative of an error function (like [cross-entropy loss](https://en.wikipedia.org/wiki/Cross_entropy#Cross-entropy_loss_function_and_logistic_regression)). With the backpropagation algorithm, values calculated in the forward pass are used to compute the gradients for each of the model parameters. The "backwards" pass here indicates that the gradients are propagated back through the network starting from the output nodes to the nodes in the first hidden layer. The partial computations at each layer are used to update the weight values at each step.

### 1.2. TensorFlow Keras API

[TensorFlow](https://www.tensorflow.org) is a very powerful, highly-optimised platform for machine learning. [Keras](https://github.com/keras-team/keras) is a high-level framework for the TensorFlow platform primarily authored by [François Chollet](https://github.com/fchollet). While Keras used to support a several other backend compute engines, TensorFlow is now the only supported platform in Keras version 2.4+. With the release of TensorFlow 2.0 [came the announcement](https://blog.tensorflow.org/2018/12/standardizing-on-keras-guidance.html) that Keras would be the official high-level API for Google's TensorFlow. Together, the two make a great couple. While Keras focuses on simplistic, highly-readable and abstracted model creation, TensorFlow's full-fledged compute engine gives developers and practitioners the freedom to fine-tune much of their model's low-level implementation.

#### Modelling with the Keras API

#### The `Sequential ` API

In this exercise we will interface with the Keras through TensorFlow's [`Sequential`](https://www.tensorflow.org/guide/keras/sequential_model) API. In [Exercise 1.3.2](https://github.com/jonathanloganmoran/ND0013-Self-Driving-Car-Engineer/blob/main/1-Object-Detection-in-Urban-Environments/Exercises/1-3-2-Stochastic-Gradient-Descent/2022-08-29-Stochastic-Gradient-Descent.ipynb) we demonstrated how to implement a model using TensorFlow Keras but didn't touch on many of Keras' core design principles. Given the simplicity of our network architecture, we will be using the Sequential API to model the multi-layer perceptron as a simple stack of layers. The `Sequential` model [approach](https://www.tensorflow.org/guide/keras/sequential_model) is good enough for us in this use case, but is **not appropriate** for any model architecture with:
* multiple inputs or multiple outputs;
* any amount of layers accepting multiple inputs or multiple outputs;
* layer sharing;
* a non-linear topology.

Let's dive a bit more into the Keras abstraction and the classes we will use in our model...

##### The `Layer` base class

Now that we've learned a bit about multi-layer perceptrons, let's see how those layers can actually be implemented in TensorFlow. 

Every `Layer` instance in TensorFlow is born from the the Keras [`layers.Layer`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Layer) base class. As said by Google themselves, 


_A layer is a callable object that takes as input one or more tensors and that outputs one or more tensors. It involves **computation**, defined in the `call()` method, and a **state** (weight variables) implemented in the `__init__()` method._

A layer in our use case is a set of fully-connected (but not interconnected) units, each with a respective weight value. This _layer_ in a multi-layer perceptron also implements an activation function, and those weight values we discussed are also said to be _trainable_ (that is, that they can be modified during training). We will see that translating this into code using the Keras `Layer` API is a straightforward and easy process. 

With the `Layer` abstraction we also now have access to a whole set of methods and class attributes provided by TensorFlow to help us maintain, monitor, or modify layer state and layer variables during training and more.

##### The `Model` base class

A [`Model`](https://www.tensorflow.org/api_docs/python/tf/keras/Model) groups `Layer`s into a single object while also providing many powerful training and inference features. A `Model` can be initialised in a single line of code and begin providing insane functionality right off-the-shelf. Remember that exhausting [training and validation loop](https://github.com/jonathanloganmoran/ND0013-Self-Driving-Car-Engineer/blob/main/1-Object-Detection-in-Urban-Environments/Exercises/1-3-2-Stochastic-Gradient-Descent/2022-08-29-Stochastic-Gradient-Descent.ipynb) we wrote in the last exercise? With a `Model` instance, we can shorten that entire functionality into just _one line of code_ using the `Model.fit()` method.

One other way to use ` Model` is to [_subclass it_](https://www.tensorflow.org/guide/keras/custom_layers_and_models). In other words, we can create a class that inherits `Model` and from there customise much of the core Keras functionality to suit our own needs. This can be beneficial for those wanting full control over the [lower-level](https://www.tensorflow.org/guide/keras/customizing_what_happens_in_fit#going_lower-level) details of the training cycle, or for those who want to encapsulate complex model architectures, custom callbacks, metric functions, etc. into neat, [serialisable](https://www.tensorflow.org/guide/keras/save_and_serialize/) Model objects.

#### Training and validation with the Keras API

##### The `compile()` method

This `Model` class method is essential for putting together all the pieces in building a model. The [`compile()`](https://www.tensorflow.org/api_docs/python/tf/keras/Model#compile) API allows us to quickly initialise our model instance with all sorts of useful [optimizers](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers), [metric](https://www.tensorflow.org/api_docs/python/tf/keras/metrics) and [loss](https://www.tensorflow.org/api_docs/python/tf/keras/losses) functions and be up and running in no time.

##### The `fit()` method

The `Model` [`fit()`](https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit) method is at the heart of our model training process. This class method provides us with all the parameters we need to go from a hollow, untrained model to a fully-trained object detector, image classifier, etc. Our model hyperparameters — the _batch size_, _epochs_, etc. are supplied as input parameters into this method along with our dataset and any considerations we have for it (e.g., whether to `shuffle` the input data, perform a `validation_split` over it, etc.). Lastly, here in the model `fit()` method is where we supply any `callbacks` we want to include alongside the training process.

Once your input parameters are good to go, training a model is really just as simple as executing this line of code. The `fit()` method will even handle printing out updates to the console for you — a nice progress bar and some helpful metrics at every iteration of training. No wonder Keras has been [such a hit](https://blog.tensorflow.org/2018/12/standardizing-on-keras-guidance.html) at Google!

##### The `callbacks` functionality

In addition to the benefits aforementioned when using the Keras `fit()` API for training, we also get the ability to specify [`callbacks`](https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/Callback). Callbacks are functions that get passed into the `fit()`, `evaluate()` and `predict()` methods of a model in order to customise their functionality or run your own code on regular intervals (e.g., at the end of each epoch). While some callback functionality is baked in by default (e.g., [`callbacks.History`](https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/History)), several other popular callbacks exist. One of those is the [`callbacks.LearningRateScheduler`](https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/LearningRateScheduler) which serves as a base class for all custom learning rate schedulers (e.g., [`optimizers.schedules.ExponentialDecay`](https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/LearningRateScheduler)). Through [_subclassing_](https://www.tensorflow.org/guide/keras/custom_layers_and_models) we can override some of the callback methods and provide our own to do things like update our model's learning rate using a custom function on every epoch, for example. 

One last callback worth mentioning here for our own use is [`callbacks.TensorBoard`](https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/TensorBoard). This callback allows our model to log certain events throughout the training process, such as metrics summaries or weight updates, in order to visualise them in detail during the training cycle. The [TensorBoard dashboard](https://www.tensorflow.org/tensorboard/get_started) provides a nice graphical interface to monitor real-time, interactive plots, graphs and charts of important training metrics and other stats. We can even use these real-time plots in TensorBoard to [visualise layer weights](http://cs231n.github.io/understanding-cnn/) or [monitor confusion matrices](https://www.tensorflow.org/tensorboard/image_summaries#building_an_image_classifier) on each epoch, all with the help of callbacks.

## 2. Programming Task

From the Introduction you by now should have a decent overview of TensorFlow and some of the essential functionality Keras brings to it. In this section, we will implement a Feedforward Neural Network (FNN) for image classification on the German Traffic Sign Recognition (GTSRB), starting with a simple `Sequential` model architecture and adding complexity from there.

### 2.1. Feedforward Neural Networks (FNNs)

The neural network you create should have less than 4 layers, including the output layer. This last layer should not be activated. Take the time to experiment with different architecture (number of layers, number of neurons) and see how it impacts the results.

In [None]:
### From Udacity's `training.py`

In [None]:
def create_network():
    """ output a keras model """
    # IMPLEMENT THIS FUNCTION
    return 

In [None]:
### Defining our model parameters

In [None]:
model_params = {}

In [None]:
model_params.update({'model_name': 'FeedforwardNeuralNetwork'})

In [None]:
### Initialising the FNN
model = create_network()

### 2.2. Modelling with TensorFlow Keras API

In [None]:
### Defining our optimizer hyperparameters

In [None]:
model_params.update({'learning_rate': 1e-3})

In [None]:
decay = False                                     # Whether or not to use a learning rate schedule

In [None]:
initial_lr = model_params['learning_rate']
decay_rate = 0.96                                 # Amount to decay learning rate by (decrease by 96%)
decay_steps = 3082 * 10                           # When to modify learning rate (every interval of `decay_steps`)

In [None]:
### Initialising a learning rate schedule
# See: https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/schedules/ExponentialDecay

In [None]:
lr_schedule = keras.optimizers.schedules.ExponentialDecay(
                    intial_learning_rate=initial_lr,
                    decay_steps=decay_steps,
                    decay_rate=decay_rate,
                    staircase=True)

In [None]:
model_params.update({'lr_schedule': lr_schedule})

In [None]:
### Selecting the optimiser and activation functions 
# See: https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/Adam

In [None]:
optimizer = tf.keras.optimizers.Adam(learning_rate=model_params['learning_rate'] if not decay else (
                                                   model_params['lr_schedule']),
                                     beta_1=0.9,
                                     beta_2=0.999,
                                     epsilon=1e-07,
                                     amsgrad=False,
                                     name='Adam')

In [None]:
### Choosing the loss and performance metrics
# See: https://www.tensorflow.org/api_docs/python/tf/keras/losses/SparseCategoricalCrossentropy
# See: https://www.tensorflow.org/api_docs/python/tf/keras/metrics/Accuracy

In [None]:
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

In [None]:
accuracy_fn = tf.keras.metrics.Accuracy

In [1]:
### Compiling the model
# See: https://www.tensorflow.org/api_docs/python/tf/keras/Model#compile

In [None]:
model.compile(optimizer='adam',
              loss=loss_fn,
              metrics=[accuracy_fn])

### 2.3. Training and validation

In [None]:
### Setting the training hyperparameters

In [None]:
model_params.update({'epochs': 10})
model_params.update({'batch_size': 128})
model_params.update({'shuffle': True})

In [None]:
### Usage of the `fit()` method
# See: https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit

In [None]:
### Usage of the `callbacks()` method
# See: https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/History
# See: https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/ModelCheckpoint

In [None]:
### From Udacity's `utils.py`

In [None]:
def get_module_logger(mod_name):
    ### Setting up the console logger and formatter
    logger = logging.getLogger(mod_name)
    handler = logging.StreamHandler()
    formatter = logging.Formatter('%(asctime)s %(levelname)-8s %(message)s')
    handler.setFormatter(formatter)
    logger.addHandler(handler)
    logger.setLevel(logging.DEBUG)
    ### Prevent messages going to root handler
    logger.propagate = False
    return logger

In [None]:
logger = get_module_logger(__name__)

### 2.4. Evaluation on the GTSRB dataset

#### Considerations for our input data

In [None]:
### Defining our input image specs

In [None]:
image_size = (32, 32)          # Each RGB image has 32x32 px resolution
n_features = (32 * 32) * 3     # Each pixel value is considered an attribute (feature)
batch_size = 128               # Using batch size of 128 (mini-batching as in D. Kingma, 2015)

#### Putting it all together

##### Fetching the GTSRB data

You will need to specify the `--imdir`, e.g. `--imdir GTSRB/Final_Training/Images/`, using the provided GTSRB dataset.

In [None]:
imdir = os.path.join(DIR_SRC, 'GTSRB/Final_Training/Images')

The following `get_datasets()` method returns a tuple of [`tf.data.Dataset`](https://www.tensorflow.org/api_docs/python/tf/data/Dataset) instances containing the training and validation datasets, respectively.

In [None]:
### From Udacity's `utils.py`

In [None]:
def get_datasets(imdir: str) -> tuple:
    """Return the training and validation datasets.
    
    :param imdir: absolute path to the directory where the data is stored in.
    :returns: (train_dataset, validation_dataset), tuple of tf.data.Dataset instances.
    """
    
    train_dataset = tf.keras.preprocessing.image_dataset_from_directory(
                        imdir,
                        labels='inferred',
                        label_mode='int',
                        color_mode='rgb',
                        batch_size=batch_size,
                        image_size=image_size,
                        shuffle=True,
                        seed=123,
                        validation_split=0.1,
                        subset='training',
    )
    validation_dataset = tf.keras.preprocessing.image_dataset_from_directory(
                        imdir,
                        labels='inferred',
                        label_mode='int',
                        color_mode='rgb',
                        batch_size=batch_size,
                        image_size=image_size,
                        shuffle=True,
                        seed=123,
                        validation_split=0.1,
                        subset='validation',
    )
    return train_dataset, validation_dataset

In [None]:
### Fetching the training and validation datasets
train_dataset, val_dataset = get_datasets(args.imdir)

##### Processing the image data

In [None]:
### Number of features (pixel values) in a single image
train_iter = iter(train_dataset)
len(train_iter.get_next()[0].numpy().flatten())

In [None]:
def process(image,label):
    """ small function to normalize input images """
    image = tf.cast(image/255. ,tf.float32)
    return image,label

In [None]:
### Scaling the image data

In [None]:
train_dataset_scaled = tf.data.Dataset.from_generator((lambda x, label: process(x, label))(train_dataset))

##### Performing the training and validation loops

In [None]:
### From Udacity's `training.py`

In [None]:
logger.info(f'Training for {args.epochs} epochs using {args.imdir} data')
# Using the model `fit()` API call for training
history = model.fit(x=train_dataset, 
                    epochs=args.epochs, 
                    validation_data=val_dataset)

##### Visualising the results

Lastly, at the end of training, you will need to be in the `Desktop` view to see the metrics visualization.

In [None]:
### From Udacity's `utils.py`

In [None]:
def display_metrics(history):
    """ plot loss and accuracy from keras history object """
    f, ax = plt.subplots(1, 2, figsize=(15, 5))
    ax[0].plot(history.history['loss'], linewidth=3)
    ax[0].plot(history.history['val_loss'], linewidth=3)
    ax[0].set_title('Loss', fontsize=16)
    ax[0].set_ylabel('Loss', fontsize=16)
    ax[0].set_xlabel('Epoch', fontsize=16)
    ax[0].legend(['train loss', 'val loss'], loc='upper right')
    ax[1].plot(history.history['accuracy'], linewidth=3)
    ax[1].plot(history.history['val_accuracy'], linewidth=3)
    ax[1].set_title('Accuracy', fontsize=16)
    ax[1].set_ylabel('Accuracy', fontsize=16)
    ax[1].set_xlabel('Epoch', fontsize=16)
    ax[1].legend(['train acc', 'val acc'], loc='upper left')
    plt.show()

In [None]:
### From Udacity's `training.py`

In [None]:
display_metrics(history)

## Tips

You can leverage `tf.keras.Sequential` to stack layers in your network and `tf.keras.layers` to create the different layers.

## Credits

This assignment was prepared by Thomas Hossler and Michael Virgo et al., Winter 2021 (link [here](https://www.udacity.com/course/self-driving-car-engineer-nanodegree--nd0013)).



References
* [1] Kingma, D. and Ba, J. Adam: A Method for Stochastic Optimization. arXiv (2014). [doi:10.48550/arXiv.1412.6980](https://arxiv.org/abs/1412.6980v9).


Helpful resources:
* [Feedforward Neural Networks | Brilliant.org](https://brilliant.org/wiki/feedforward-neural-networks/)
* [Standardizing on Keras: Guidance on High-level APIs in TensorFlow 2.0 | Google TensorFlow Blog](https://blog.tensorflow.org/2018/12/standardizing-on-keras-guidance.html)