# The Complete Convolutional Neural Network with Python 2022

Artificial Neural Network (ANN)

The structure of the neural network contains the input layer, hidden layer and output layer. So, normally the information will be received by the input signal and then it will be trasferred to the hidden layers where all the information will be processed. Finally, after processing all information, output will be be released by the output layer.

Convolutional Neural Network (CNN)

CNNs are ANNs. CNNs are made of neurons that are connected to one another by weighted brances (weight); the training parameters of the networks are once again the weight and bias. In CNNs, the connection pattern between neurons is inspired by the structure of the visual cortex in the animal world. The individual neurons that are present in this part of the brain (visual cortex) respond to certain stimuli in a narrow region of the observation, called the receptive field.

The receptive fields of different neurons are partially overlapped to cover the entire field of vision. The response of a single neuron to stimuli taking place in it's receptive field can be mathematically approximated by a convolutional operation.

CNNs contain input layers, convolution layers, pooling layers, fully connected layers and output layers.

Convolution Layer

This is the main type of layer; the use of one or more of these layers in CNNs are essential. Convolution layers have neurons which are organized in 3 dimensons:

    Width
    Height
    Depth

During forward propagation, each filter which is spatially small (along the width and height dimensions) and extends over the entire depth of the input volume to which it is applied, is translated or convoluted with the width and height of the input volume producing a 2D activation map (or a feature map) for that filter. As the filter moved along the input area, a scalar product operation is performed between the values of the filter and those of the input portion to which it is applied.

The goal is to learn activated filers in the presense of some specific type of functionality in a given spatial region of the input. The queuing of all these feature maps (for all filters) and the depth dimension form the output volume of a convolution layer.

Each element of this volume can be interpreted as the output of a neuron that observes only a small region of the input which shares it's parameters with other neurons that are in the same feature map. Because these values all come from the application of the same filter.

Pooling Layers

These layers are inserted into the network to reduce the spatial size (width and height) of current representations, as well as volumes in a specific network stage. This serves to reduce the number of parameters and the computational time of the network. It also monitors overfitting. A pooling layer operates on each depth slice of the input volume independently to resize it spatially.

For each feature obtained in the convolutional step, a matrix will be built and we will find the maximum in each chosen matrix to shrink the entire input.

Steps:

    Pick a window size. ( 2 or 3 )
    Pick a stride moving range of pixels. ( usually 2 )
    Slide the window across the filtered images.

For each window, the maximum value will be chosen.

A pooling layer will divide input into regions and select a single representative value. (max pooling and average pooling)

The max pool layer will select the maximum number of features that have been detected by the convolution layers that precede it. The output will check whether a hypothetical feature is present in a region of the previous layers or not but not exactly where.

Hence, the idea is to allow the succesive layers to worn on larger selection of data. Max pooling allows for faster convergence rates and therefore, allow us to select higher invarient features to improve the generalization performance.

Advantages of using pooling layer:

    Reduce the calculation of subsequent layers.
    Increase the robustness of the features with respect to spatial position.

In [12]:
# Implement Basic CNN
# Importing and loading all the necessary libraries
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf

# import keras
from keras.preprocessing import image
# and use 
# image.ImageDataGenerator()
# image.load_img()

We will use the MNIST digits dataset and we can access to this dataset via Keras library

In [13]:
# Load the data and reshape the images in a 4-dimensional matrix 
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# Reshape
x_train = x_train.reshape(-1, 28, 28, 1)
x_test = x_test.reshape(-1, 28, 28, 1)

# Padding the images by 2 pixels
x_train = np.pad(x_train, ((0, 0), (2, 2), (2, 2), (0, 0)), 'constant')
x_test = np.pad(x_test, ((0, 0), (2, 2), (2, 2), (0, 0)), 'constant')


The MNIST dataset includes training and test datasets. These datasets are composed of the grayscale images (integer arrays with shape (num_sample, 28, 28)) and the labels (integers in the range 0-9). Images are padded by 2 pixels because the input images were 32x32.

The next step is model parameters need to be set and the depth of the image (number of channels) will be 1. The reason for that is these images are grayscale.

In [14]:
image.width = x_train[0].shape[0]
image.height = x_train[0].shape[1]

# Grayscale = 1 channel
num_channels = 1
seed = 98
np.random.seed(seed)
tf.random.set_seed(seed)


The next is training and test data variables will be declared and there will be various batch sizes for training and evaluation. these values can be changed depending on the physical memory that is available for training and evaluating.

In [15]:
batch_size = 100
evaluation_size = 500
epochs = 300
eval_every = 5

Our images need to be normalized to change the values of all pixels to a common scale

In [16]:
x_train = x_train/255
x_test = x_test/255

We will declare our model. We will have the feature extractor module composed of 2 convolutional/ReLu/max pooling layers followed by the classifier with fully connected layers. Also to get the classifer to work, we flatten the out of the feature extractor module. So we can use it in the classifier. We will use a softmax activation function at the last layer of the classifier. Softmax will turn numeric output (logits) into probabilities that sum to 1.

In [17]:
image_width = image.width
image_height = image.height

In [18]:
input_data = tf.keras.Input(dtype=tf.float32, 
                            shape=(image_width, 
                                   image_height, 
                                   num_channels), name="INPUT")

# First conv-ReLu-MaxPool layer
conv1 = tf.keras.layers.Conv2D(filters=6, 
                               kernel_size=5, 
                               padding="VALID", 
                               activation="relu", 
                               name="C1")(input_data)

max_pool1 = tf.keras.layers.MaxPool2D(pool_size=2, 
                                      strides=2, 
                                      padding="SAME", 
                                      name="S1")(conv1)

# Second conv-ReLu-MaxPool layer
conv2 = tf.keras.layers.Conv2D(filters=16, 
                               kernel_size=5, 
                               padding="VALID", 
                               activation="relu", 
                               name="C3")(max_pool1)

max_pool2 = tf.keras.layers.MaxPool2D(pool_size=2, 
                                      strides=2, 
                                      padding="SAME", 
                                      name="S4")(conv2)

# Flatten layer
flatten = tf.keras.layers.Flatten(name="FLATTEN")(max_pool2)

# First fully connected layer
fully_connected1 = tf.keras.layers.Dense(units=120, 
                                         activation="relu", 
                                         name="F5")(flatten)

# Second fully connected layer
fully_connected2 = tf.keras.layers.Dense(units=84, 
                                         activation="relu", 
                                         name="F6")(fully_connected1)

# Final fully connected layer
final_model_output = tf.keras.layers.Dense(units=10, 
                                           activation="softmax", 
                                           name="OUTPUT")(fully_connected2)

model = tf.keras.Model(inputs=input_data, outputs=final_model_output)

Next we will compile the model using Adam (Adaptive Moment Estimation) optimizer. Adam uses adaptive learning rates and momentum that allows us to get to local minima faster and so, converge faster.

Loss function is a method of evaluating how well your algorithm models your dataset. If your predictions are totally off, your loss function will output a higher number. If they're pretty good, it'll output a lower number.

As our targets are integers and not in a one-hot-encoded format, we will use the sparse categorical cross-entropy loss function and then we will add an accuracy metric to determine how accurate the model is in each batch.

In [19]:
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.summary()

Model: "functional_5"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
INPUT (InputLayer)           [(None, 32, 32, 1)]       0         
_________________________________________________________________
C1 (Conv2D)                  (None, 28, 28, 6)         156       
_________________________________________________________________
S1 (MaxPooling2D)            (None, 14, 14, 6)         0         
_________________________________________________________________
C3 (Conv2D)                  (None, 10, 10, 16)        2416      
_________________________________________________________________
S4 (MaxPooling2D)            (None, 5, 5, 16)          0         
_________________________________________________________________
FLATTEN (Flatten)            (None, 400)               0         
_________________________________________________________________
F5 (Dense)                   (None, 120)              