# Implementing The AlexNet Convolutional Neural Network

This notebook details the steps taken to implement AlexNet, a deep convolutional neural network introduced in 2012 by Alex Krizhevsky et al, in the paper: "[ImageNet Classification With Deep Convolutional Neural Network"](https://proceedings.neurips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf). To build our neural network, the libraries TensorFlow and Keras are leveraged to provide the features and methods required to build the components of a convolutional neural network.

**[TensorFlow](https://www.tensorflow.org/)**: An open-source platform for implementing, training, and deploying machine learning models.

**[Keras](https://keras.io/)**: An open-source library used to implement neural network architectures that run on both CPUs and GPUs.

## Installation

Installing Tensorflow and Keras is simple when using a package managers such as [pip](https://www.tensorflow.org/install) or [conda](https://docs.anaconda.com/anaconda/user-guide/tasks/tensorflow/).

Here are examples of installing TensorFlow.
And it's worth mentioning that you don’t need to explicitly install keras as it’s a built in API within TensorFlow 2+

If you have a CPU you can install TensorFlow with the following command
`conda install tensorflow`

If you have a GPU installed, you can install TensorFlow with the following command
`conda install tensorflow-gpu`

Alternatively if you are using pip
`pip install tensorflow`


In [1]:
import tensorflow as tf
from tensorflow import keras

## Convolutional Neural Network Implementation(CNN) Components

1. [ Keras Layers API](https://keras.io/api/layers) (keras.layers)
    
    Layers within Keras allow for the composition of neural networks as they are the fundamental components of neural networks in Keras.
    Layers within Keras also house the following: weights of the neural network and functions that act upon inputs and provide outputs, typically to the next layer.
    
2. [Conv2D Layer](https://keras.io/api/layers/convolution_layers/convolution2d/) (keras.layers.Conv2D)

    We are using the tf.keras.layers.Conv2D class to construct the convolutional layers within the convolutional neural network.
    In the convolutional layer, a convolution operation takes place between the input image and the kernel/filter of the conv layer. The output of the convolutional layer is the result of the convolution operation between the input data and the values of the kernel/filter.

    The Conv2D class constructor `keras.layers.Conv2D` takes several arguments. Below are the ones used within this notebook
    - `filters`: The kernel/filter is the name given to the window containing the weight values utilized during the convolution of the weight values with the input values. 
    - `kernel_size`: Indicative of each unit or neuron's local receptive field size within a convolutional layer. 
    - `strides`: Defines the amount of shift the filter/sliding window takes over the input image.
    - `activation`: A mathematical operation that transforms the result or signals of neurons into a normalized output.
    - `input_shape`: The expected shape of the input data to be fed forward through the neural network.
    - `padding`: An assigned value of 'same' results in the output shape of the layer been the same as the input, as the output data is padded with zeros around the edges. An assigned value of 'valid' will indicate no padding should be assigned to the output, resulting in the filter/kernel of the convolutional layer only operating parts of the input image that covers the filter/kernel size.


3. [BatchNormalization Layer](https://keras.io/api/layers/normalization_layers/batch_normalization/) (keras.layers.BatchNormalization)
    
    Batch Normalization(BN) is a technique that mitigates the effect of unstable gradients within deep neural networks. BN introduces an additional layer to the neural network that performs operations on the inputs from the previous layer.
    The operation standardizes and normalizes the input values. The input values are then transformed through scaling and shifting operations.
    The technique batch normalization was presented in 2015 by Christian Szegedy and Sergey Ioffe in this [paper](https://arxiv.org/pdf/1502.03167.pdf).

4. [Max Pooling Layer](https://keras.io/api/layers/pooling_layers/max_pooling2d/) (keras.layers.MaxPool2D)

    Max pooling is a type of sub-sampling where the maximum pixel value of a set of pixels that fall within the receptive field of a unit within a sub-sampling layer is taken as the output.
    
    | 30 | 28  | 28 | 184 |
    |----|-----|----|-----|
    | 0  | 100 | 12 | 98  |       
    | 12 | 11  | 9  | 4   |
    | 12 | 1   | 45 | 6   |

    The above pixel values transform to the set of pixel values below with the max-pooling operation.
    
    | 100 | 184 |
    |-----|-----|
    | 12  | 45  |

    The max-pooling layer constructor `keras.layers.MaxPool2D` expects the following arguments
    - `pool_size`: Dimensions of the sliding window that performs a mathematical operation within pixel values that falls within it.
    - `strides`: Indicates the amount the pooling window moves across the input data after the evaluation of each pooling operation.
    
5. [Flatten Layer](https://keras.io/api/layers/reshaping_layers/flatten/) (keras.layers.Flatten)

    The  Flatten layer is known as one of the reshaping layers Keras provides to modify the dimensionalities of inputs.
    The Flatten class acts upon the inputs by reducing the dimensionality of the input data to one.
    Image datasets are multidimensional, and for input data to be fed forward through the neural network, the dimensions of the input data need to be reduced to one. We essentially require our input data to be 1-dimensional.
    For example, an input to the Flatten layer with the shape (None, 10, 2) will provide the output (None, 20).

    The input shape of the first layer of a neural network should match the shape of the input data; hence the 'input_shape' attribute of the Flatten layer is (28,28) when using the FashionMNIST dataset (shown in the notebook 02_image_classification_with_DNN).

6. [Dense Layer](https://keras.io/api/layers/core_layers/dense/) (keras.layers.Dense)

    The dense layer houses neurons within the neural network. The 'unit' attribute specifies the number of neurons within a dense layer. All neurons/units within the dense layer receive input from the previous layer.
    The dense layer operation on its input is a matrix-vector multiplication between the input data, learnable weights of the layer, and biases.

7. [Activation Functions](https://keras.io/api/layers/activations/) (keras.activations.relu / keras.activations.softmax)

    Activation Function: A mathematical operation that transforms the result or signals of neurons into a normalized output. An activation function is a component of a neural network that introduces non-linearity within the network. The inclusion of the activation function enables the neural network to have greater representational power and solve complex functions.

    **Examples of Activation functions**

    ReLU activation: Stands for ‘rectified linear unit’ ( y=max(0, x)). It's a type of activation function that transforms the value results of a neuron. The transformation imposed by ReLU on values from a neuron is represented by the formula y=max(0,x). The ReLU activation function clamps down any negative values from the neuron to 0, and positive values remain unchanged. This mathematical transformation is utilized as the output of the current layer and as input to the next.

    Softmax: An activation function that derives the probability distribution of a set of numbers within an input vector. The output of a softmax activation function is a vector whose set of values represents the probability of an occurrence of a class/event. The values within the vector all add up to 1.

8. [Dropout Layer](https://keras.io/api/layers/regularization_layers/dropout/) (keras.layers.Dropout)

    Dropout is a technique that is utilized to reduce a model's potential to overfit.
    The dropout technique works by adding a probability factor to the activation of neurons within the layers of a CNN. This probability factor indicates the neurons' chances of being activated during a current feedforward step and during the process of backpropagation.
    Dropout is useful as it enables the neurons to reduce dependability on neighbouring neurons; each neuron learns more useful features due to this.

**What we are Implementing**

![AlexNet](./images/notebook3.PNG)


**Below is a deep convolutional neural network implementation using all the layers and components described above**

In [2]:
model = keras.models.Sequential([
    keras.layers.Conv2D(filters=96, kernel_size=(11,11), strides=(4,4), activation='relu', input_shape=(227,227,3)),
    keras.layers.BatchNormalization(),
    keras.layers.MaxPool2D(pool_size=(3,3), strides=(2,2)),
    keras.layers.Conv2D(filters=256, kernel_size=(5,5), strides=(1,1), activation='relu', padding="same"),
    keras.layers.BatchNormalization(),
    keras.layers.MaxPool2D(pool_size=(3,3), strides=(2,2)),
    keras.layers.Conv2D(filters=384, kernel_size=(3,3), strides=(1,1), activation='relu', padding="same"),
    keras.layers.BatchNormalization(),
    keras.layers.Conv2D(filters=384, kernel_size=(1,1), strides=(1,1), activation='relu', padding="same"),
    keras.layers.BatchNormalization(),
    keras.layers.Conv2D(filters=256, kernel_size=(1,1), strides=(1,1), activation='relu', padding="same"),
    keras.layers.BatchNormalization(),
    keras.layers.MaxPool2D(pool_size=(3,3), strides=(2,2)),
    keras.layers.Flatten(),
    keras.layers.Dense(4096, activation='relu'),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(4096, activation='relu'),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(10, activation='softmax')
])

## Convolutional Neural Network Structural Information

A structural summary of the neural network implemented above is obtainable by calling the ‘summary’ method available on our model. By calling the summary method, we gain information on the model properties such as layers, layer type, shapes, number of weights in the model, and layers.

[Keras documentation reference](https://keras.io/api/models/model/#summary-method)

In [3]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 55, 55, 96)        34944     
_________________________________________________________________
batch_normalization (BatchNo (None, 55, 55, 96)        384       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 27, 27, 96)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 27, 27, 256)       614656    
_________________________________________________________________
batch_normalization_1 (Batch (None, 27, 27, 256)       1024      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 13, 13, 256)       0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 13, 13, 384)       8