# Convolutional Neural Networks

## Data Preprocessing - Step 1

    In Convolutional Neural Networks, we do not deal with rows and columns. We deal with pixels and hence we cannot folow the folder and file structure that we use for other Deep Learning algorithms.
    
    We will use Keras to import the images. We do this by preparing a special structure for the dataset. We split the dataset into two seperate folders - Training & Test set folders. We also need a simple way to diffrentiate the class labels for the images.
    
    The simple trick to use here is to make diffrent folders for the diffrent class labels of the images. This is to help Keras diffrentiate between the class labels.
    
    The folder structure is the data preprocessing step in CNN.
    
    Feature Scaling is a compulsory step in Deep Learning. We will apply Feature Scaling for CNN before we fit the CNN to the images.
    
    The take-away here is that the data prepeocessing is not done using Python but is done manually.

## Build the CNN - Step 2

In [1]:
# Import the libraries and packages from the Keras module
# Sequential package will be used to initialise the Neural Network
# Convolution2D package will be used to make the first step in building the CNN - adding the Convolution layers
# Conv2D is used in case of classifying images as they are in 2D
# MaxPooling2D package will be used to make the second step in building the CNN - adding our Pooling layers
# Flatten package will be used to make the third step in building te CNN -
# converting the pooled feature maps created through Convolution & Pooling into a large feature vector which then 
# becomes the input for the Fully Connected Layers of the CNN
# Dense package will be used to make the final step in building the CNN - add the Fully Connected Layers in a classic ANN

from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten
from keras.layers import Dense

Using TensorFlow backend.


## Initialise the CNN - Step 3

In [2]:
# Initialising the CNN
# Create a classifier object of the Sequential class

classifier = Sequential()

### Convolution - Adding the first Convolution Layer

In [3]:
# Use the "add" method 
# Use the "Conv2D" function 
# "32" denotes the number of filters/feature detectors applied on the input image - An optimum number to use
# to get the same number of feature maps - The number of filters = The number of Feature Maps created
# "(3, 3)" denotes the number of rows & columns in the Convolution Kernel/Feature Detector
# "32, (3, 3)" denotes that 32 Feature Maps with 3 rows and 3 columns(3*3) each will be created
# "input_shape" denotes the shape of the input image - Convert all the images into a fixed/single format
# This conversion will be done during the image preprocessing step after we build the CNN 
# and before we fit the CNN to the images
# The expected format needs to be specified into which the images will be converted
# The number of channels will be "1" for black & white images
# The number of channels will be "3" for colour images
# (64, 64, 3) 3 denotes the number of channels, 
# 64, 64 denotes the dimensions of the array - 64*64 pixels of colour images
# The order of the parameters in the "input_shape" is important.
# We are using Tensorflow backend - Hence the dimensions array will be the first parameter passed
# The second parameter passed will be the number of channels
# Classification of images is a non-linear problem and we need to have non-lineaity 
# "Activation" denotes the activation function used in the CNN 
# Rectifier(ReLU) is the best option 
# We can get some negative pixels in the feature maps 
# The negative pixels need to be removed in order to have non-linearity in the CNN

classifier.add(Conv2D(32, (3, 3), input_shape = (64, 64, 3), activation = 'relu'))

### Pooling - Max pooling step

In [4]:
# We apply Max Pooling to reduce the size of feature maps which 
# in turn will reduce the number of nodes obtained in the next step
# This will ensure that the CNN model will not be too complex and compute intensive but not affect the performance 
# Max Pooling ensures that by taking the maximum of the 2*2 sub-tables of the feature map, 
# information is retained and we are keeping track of the part of the image that contain the high numbers corresponding 
# to where the feature detectors detected some features in the input image
# "pool_size" denotes the size of the sub-table used to slide 
# across the feature map to take the maximum in each sub table
# (2, 2) is the general parameter passed for Max Pooling - Helps to keep the information 

classifier.add(MaxPooling2D(pool_size = (2, 2)))

### Flatten - Flatten the feature map into a single vector

    Flattening consists of putting all the numbers in the cells derived in the Feature Maps into one single vector

    We do not lose the facial structure of the images by creating a single vector because we extracted the spatial structure information by creating the Feature Maps in the Convolution step. We do this by applying the feature detectors on the input image which give the high numbers that represent the spatial structure of the images - the high numbers are associated to specific features in the input image.
    
    We do not directly take all the pixels of the input image and flatten them into one single vector without applying the previous steps - Convolution & max Pooling. If we directly flatten the pixels into the single vector, then each node of the huge vector will represent one pixel of the image independently of the pixels that are around it. We only get information of the pixel itself and not of how this pixel is spatially connected to the other pixels around it.
    
    If we apply the Convolution and Max Pooling step to create all the reduced size Feature Maps and flatten all these Feature Maps into a single vector, then each node of the high vector will represent the information of a specific feature of the input image.

In [5]:
# Use the Flatten() function

classifier.add(Flatten())

### Full Connection - Adding fully Connected Layers to the CNN

* This step is required because we have managed to convert the 2D images into a 1D vector
* The 1D vector contains some information of te spatial structure & pixel patterns of the image
* We will use this input vector as the input layer to build the ANN consisting of fully connected layers
* We do this because ANN is a great classifier for non-linear problems like image classification
* We will then create a hidden layer 

In [6]:
# Create a hidden layer
# "units" denote the number of nodes in the hidden layer. 
# The number of input nodes in this case is very large and not possible to count
# We will choose "128" as an optimum number and this optimum number is derived by several experimentations
# It is a common practice to go with numbers ending with power of 2
# The nodes in the hidden layer are like neurons that need to be activated 
# based on how much they can pass on the signal
# Use ReLU as the activation function


classifier.add(Dense(units = 128, activation = 'relu'))

### Output Layer - Adding the output layer

In [7]:
# Create the output layer
# "units" will be 1 as we need only one node
# Use "Sigmoid" as the activation function for the output as the outcome is a binary one

classifier.add(Dense(units = 1, activation = 'sigmoid'))

## Construct the CNN with optimiser - Step 4

In [8]:
# Compiling the ANN
# Optimiser refers to the algorithm to obtain optimised values for weights 
# We will be using one of the most efficient stochastic gradient descent algorithms named "adam"
# The "adam" optimiser is always a safe choice because it is very powerful 
# and always perfoms relevant updates of the weights
# "loss" function denotes the cost function which gives the error in prediction accuracy 
# For Linear type of problems, it is the squared sum of differences (OLS)
# For classification type of problems, it is the logarithmic function 
# Since in this case, the output is of binary type, we choose the binary crossentropy for "loss"
# "metrics" denotes the evaluation criteria to be chosen to evaluate the model performance. 
# In case of linear problems, the metric used is "Mean Squared Error(MSE)"
# In case of classification problems, the metric used is "accuracy"

classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

## Image Preprocessing 
## Train the CNN
## Test the CNN - Step 5

    The image preprocessing is a necessary step required to be sone before training the CNN. This process is called image augmentation. It consists of preprocessing the images to prevent overfitting. If we do not do it, then we will most probably get a very high accuarcy in the training set but a low accuracy in the test set.
    
    We will do the image preprocessing in one step using the Keras documentation.
    
    https://keras.io/api/preprocessing/image/
    
    The first function that will be used is the "ImageDataGenerator" which will be used to generate the image augmentation. One of the situations leading to overfitting is when we have few data to train the model. In such situations, the model finds some correlations in few observations of the training set but fails to generalize these correlations on new observations. 
    
    In images classification, a huge data of images is required to find and generalize the correlations because in Computer Vision, the ML model does not seem to need to find correlations between the independent variables and some dependent variables. It actullay needs to find some patterns in the pixels and to do this, a lot of images are required. 
    
    So, in cases where the dataset of images available are not very huge, image augmentation may be the answer to achieve a high level of accuracy in the prediction. It will create many batches of the images. It will then apply random transformations on each batch on a random selection of images by rotating, flipping or even shearing them. This will result in the generation of many more diverse images during training in the batches and this means more images to train on. Image augmentation, put simply augments the training images. Hence the name. Since the transformations are random, the model will never find the same picture across the batches. 

    Image Augmentation is a technique that allows to enrich the dataset, specifically, the training set without adding more images and that allows to get good performance results with little or no overfitting, even with a small dataset of images.

    We will be using the code from ".flow_from_directory(directory)" method for the model that we will be training because we require the code to be structured in this specific way so that the classes can be well identified in the separate folders. 
    
    The section of code available here - the "fit_generator" method  will not only fit the training dataset to the CNN but will also test the performance of the model on some new observations which will be the test set.
    
    Feature Scaling is always compulsory and is taken care using the "rescale" method which corresponds to the feature scaling part of the data preprocessing step discussed above. There are other transformations like shear range coresponding to shearing. 
    
    Shearing is a geometrical transformation or transvection where the pixels are moved to a fixed direction over a proportional distance from a line that is parallel to the direction that they are moving to. 
    
    There are other transformations like zoom range - which is a random zoom that can be applied on the images. The horizontal flip flips the image horizontally. The vertical flip is not used here. 
    
    We will be applying the image augmentation on the training and test dataset and then we use the "ImageDataGenerator" function to rescale only the images from the test set. 
    
    The "train_generator"(training_set) and the "validation_generator"(test_set) actually create the training and test sets. The training set created here will contain all the augmented images extracted from the "ImageDataGenerator" and it will also create the test set that will be used to evalaute the model performance and prediction accuracy. 

In [9]:
# Import the ImageDataGenerator class from the image preprocessing module of the Keras library
# Create a "train_datagen" object of the ImageDataGenerator class
# Create a "test_datagen" object of the ImageDataGenerator class
# Apply a "rescale" value betwween 0 and 1 for the pixel values
# Pixels take value between 0 and 255 and by rescaling them using 1./255, all pixels will be between 0 and 1

from keras.preprocessing.image import ImageDataGenerator

train_datagen = ImageDataGenerator(rescale = 1./255,
                                   shear_range = 0.2,
                                   zoom_range = 0.2,
                                   horizontal_flip = True)

test_datagen = ImageDataGenerator(rescale = 1./255)

In [10]:
# Specify the folder from which the data needs to be extracted for both the training and test sets
# "target_size" denotes the size of the images expected in the CNN model
# As we already specified the dimension (64, 64) we will keep the same here
# "batch_size" denotes the size of the batches in which some random samples of the images will be included
# It contains the number of images that will flow through the CNN after which the weights will be updated
# "class-mode" denotes if the class of the dependent variable has two categories or more than two categories

training_set = train_datagen.flow_from_directory('dataset/training_set',
                                                 target_size = (64, 64),
                                                 batch_size = 32,
                                                 class_mode = 'binary')

Found 8000 images belonging to 2 classes.


In [11]:
# Specify the folder from which the data needs to be extracted for both the training and test sets
# "target_size" denotes the size of the images expected in the CNN model
# As we already specified the dimension (64, 64) we will keep the same here
# "batch_size" denotes the size of the batches in which some random samples of the images will be included
# It contains the number of images that will flow through the CNN after which the weights will be updated
# "class-mode" denotes if the class of the dependent variable has two categories or more than two categories

test_set = test_datagen.flow_from_directory('dataset/test_set',
                                            target_size = (64, 64),
                                            batch_size = 32,
                                            class_mode = 'binary')

Found 2000 images belonging to 2 classes.


In [13]:
# This is the section where we fit the training set into the CNN model
# The model performance and the prediction accuarcy is also tested here
# "samples_per_epoch" will be the number of images in the training set 
# All the observations of the trainig set pass through the CNN during each epoch
# "epochs" is the number of epochs to run to train the CNN
# "validation_data" is the data where the model is evaluated which is the test set
# "validation_steps" is the number of images in the test set

classifier.fit_generator(training_set,
                         samples_per_epoch = 8000,
                         epochs = 5,
                         validation_data = test_set,
                         validation_steps = 2000)

  del sys.path[0]


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.callbacks.History at 0x7ffd58358d30>

## Conclusion

* We have obtained an accuracy of 86.05% for the training set and 76% for the test set. It is not very bad but it is not very good either.
    
* We first obtained around 86.05% accuracy on the training set which is not bad but as data scientists, we are more interested in the accuracy of the test set which is around 76% and the difference between the accuracy of the training set and accuracy of the test set to assess wherther there is overfitting or not. So, 76% accuracy on the test set is not bad. 
    
* The accuracy for the test set which is 76% denotes that we get three correct predictions out of four which is not bad. But we do get a large difference between the accuracy on the training set and the test set. It does not necessarily signify any important overfitting but it does laeve a lot of room for improvement.
    
* There can be a siginificant increase in the accuracy on the test set. The difference between the accuracy of the test set and the training set can also be made smaller.
    
* The goal wpould be to increase the accuracy on the test set over 80% and decrease the difference between the accuracy on the training and test sets. 