# Convolutional Neural Networks
Our brain classifies visuals by looking at the features of the object, which is why optical illusions occur. Convolutional Neural Networks are a type of neural network that filter the features of a sample, typically an image, and classify the sample.

# Convolution
In a convolutional operation, there are 3 components:
- Input
- Feature Detector (also called Filter or Kernel)
- Feature Map (also called a Convolved Feature or Activation Map)

The feature detector is a matrix that the convolutional operation uses to detect a feature on the input. Some examples of a feature could be a long nose, green eyes, long eyelashes, etc. that you could create a matrix from and use it to detect on the input.

### Example of Convolutional Operation
Let's say we gray-scale an image of a smiley face, we can represent it as a 2D matrix below.

<img src="images/cnn/smiley_example.png" height="65%" width="65%"></img>

Now we can use a 3x3 (standard size) feature detector to match 3x3 sections of the image on a feature map. Let's fill the feature map below!

<hr>

<img src="images/cnn/convolutional_operation_1.png" height="65%" width="65%"></img>

No cells in the section and the feature detector matched, so place a 0. Let's continue with the next stride.

<hr>

<img src="images/cnn/convolutional_operation_2.png" height="65%" width="65%"></img>

Only the middle-left cell of the section matched the middle-left cell of the feature detector, so place a 1. Keep continuing the strides until the feature map is filled.

<hr>

<img src="images/cnn/convolutional_operation_3.png" height="65%" width="65%"></img>

This is the completed feature map after performing all the strides. The feature map is smaller than the image, which will be more efficient for the neural network because we only care about this certain feature and not the entire input.

### Convolutional Layer
<img src="images/cnn/convolutional_layer.png" height="65%" width="65%"></img>

We create many feature maps to create the convolutional hidden layer, which detects all of the features on the image.

# ReLU Layer
The ReLU hidden layer is the Rectifier Activation Function. It decreases the linearity of the neural network, which helps with seeing abrupt changes in the image.

<img src="images/cnn/relu_layer.png" height="65%" width="65%"></img>

### Example of ReLU Layer
<img src="images/cnn/fergus.png" height="40%" width="40%"></img>

This is the original image.

<hr>

<img src="images/cnn/edge_fergus.png" height="40%" width="40%"></img>

This is the original image applied with an edge detector, a type of feature detector that detects edges on an image.

<hr>

<img src="images/cnn/relu_fergus.png" height="40%" width="40%"></img>

This is the edge-detected image with the applied ReLU layer. The abrupt changes in the image are seen with the ReLU layer because the black lines are zeroed out and only the white lines appear.

# Max Pooling
There are lots of differences among images due to space, even if they contain the same object.

<img src="images/cnn/spatial_variance.png" height="40%" width="40%"></img>

For example, the three images of the cheetah above are still images of a cheetah. However, the neural network may not notice that because of the spatial variance.

In order to resolve this issue, we need to implement "spatial invariance". Spatial invariance disregards if the features on an object are a bit distorted relative to each other.

Pooling is a hidden layer to achieve "spatial invariance". We will implement "max" pooling, which only receives the maximum value in a section of cells in the feature map.

### Example of Max Pooling
<img src="images/cnn/max_pooling_1.png" height="60%" width="60%"></img>

This is the first stride of the pooled feature map. The top-left cell is 1 because the maximum value in the section is 1.

<hr>

<img src="images/cnn/max_pooling_2.png" height="60%" width="60%"></img>

This is the completed pooled feature map after performing all the strides. The pooled feature map is smaller than the feature map, which will be even more efficient for the neural network to learn.

### Effectiveness of Max Pooling
Let's give an see how max pooling is effective to solving distortions.

For instance, say the cell with value 4 on the feature map above was located on a slightly different cell location due to image distortion. The pooled feature map would still put a 4 on the middle-left because it's the maximum of the section.

### Pooling Layer
<img src="images/cnn/pooling_layer.png" height="65%" width="65%"></img>

We create many pooled feature maps to create the pooling layer, which detects all of the pooled features on the image.

# Flattening
Flattening is a hidden layer that transforms each pooled feature map into a large single column. This column becomes the input layer of the neural network.

### Example of Flattening
<img src="images/cnn/flattening.png" height="65%" width="65%"></img>

### Input Layer
<img src="images/cnn/input_layer.png" height="65%" width="65%"></img>

# Full Connection (Densely Connected)
The last hidden layers of the CNN are the fully connected layers, also called "Dense" layers.

<img src="images/cnn/full_connection.png" height="65%" width="65%"></img>

The layers are called "fully connected" or "dense" because all the neurons in the layer are connected to the previous and next layers. The layers are fully connected because we want the neural network to evaluate the entire image, and not just certain features of the image. However, this can come at a risk of overfitting the neural network to the training set.

### Example of Full Connection
<img src="images/cnn/dog_classify.png" height="65%" width="65%"></img>

Even though the neural network is fully connected, there are still weights on the synapses (signals). Therefore, as seen on the diagram above, only certain neurons from the last fully connected layer have a significant influence on the Dog and Cat output neurons.

# Softmax
The Sigmoid Activation Function is typically used on a neural network with a single output neuron, but the Softmax Activation Function is best when working with multiple neurons in the output layer.

### Example of Softmax
<img src="images/cnn/softmax.png" height="75%" width="75%"></img>

If we didn't use the Softmax function, then the neural network may state the raw outputs like the image being 85% Dog and 40% Cat, which does not sum to 100%.

The neurons in the output layer are not connected between each other, so how does the neural network know that the image is 95% a Dog and 5% a Cat, which sums to 100%?

The Softmax function uses the normalized value of the output value and divides it by the normalized sum of the all the other output values. This is how the percentage of the outputs sum to 100%.

# Cross-Entropy
The Cross-Entropy cost function is best when working with Softmax Activation Function. This is because it minimizes the probability of the output to the actual probability.

Cross-Entropy is called a "Loss function" not a "Cost function". This is because a Cost function is just the average of the Loss functions, but Cross-Entropy doesn't calculate any averages.

Cross-Entropy is best for classification because it understands that even small adjustments can decrease the error because the function is logarithmic. However, in Mean Squared Error (MSE), tiny adjustments might not change the error because MSE only compares averages.

### Example of Cross-Entropy
<img src="images/cnn/cross_entropy.png" height="75%" width="75%"></img>
- 1 is the actual value of a dog
- 0 is the actual value of a cat
- p(x) is the probability of x
- q(x) is the normal distribution function

The predicted values (0.9 and 0.1) and the actual values (1 and 0) are summed in the function.

### Example of Comparing Errors
<img src="images/cnn/comparing_errors.png" height="75%" width="75%"></img>

The diagram above shows the errors of 3 popular error estimators.

# Image Augmentation
Image augmentation allows us to create many batches of the images, which create many more diverse set of the images. Some augmentations could be rotating, stretching, zooming, etc.

This helps prevent overfitting because augmentation better diversifies the data set.

In [1]:
# import the image data augmentor
from keras.preprocessing.image import ImageDataGenerator

Using TensorFlow backend.


In [2]:
# create a data generator to randomly rescale, shear (rotate), zoom, and flip images
train_datagen = ImageDataGenerator(
    rescale=1.0/255,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True
)

In [3]:
# create a data generator to randomly rescale the images
test_datagen = ImageDataGenerator(rescale=1.0/255)

In [4]:
"""
get the training set set from the images directory 
- target_size = (64, 64) means to set all image sizes to 64 width and 64 height
- batch_size = 32 means to separate all of the images to 32 images per batch,
    and after each batch the neural network updates its weights
- class_mode = categorical because we're using a softmax function for the output layer
"""
training_set = train_datagen.flow_from_directory(
    "datasets/cnn/training_set",
    target_size=(64, 64),
    batch_size=32,
    class_mode="categorical"
)

Found 114 images belonging to 2 classes.


In [5]:
"""
get the testing set set from the images directory 
- target_size = (64, 64) means to set all image sizes to 64 width and 64 height
- batch_size = 32 means to separate all of the images to 32 images per batch
- class_mode = categorical because we're using a softmax function for the output layer
"""
testing_set = train_datagen.flow_from_directory(
    "datasets/cnn/testing_set",
     target_size=(64, 64),
     batch_size=32,
     class_mode="categorical"
)

Found 59 images belonging to 2 classes.


# Convolutional Neural Network Model

In [6]:
# import keras packages to make the CNN to classify 2D images (width and height)
from keras.models import Sequential
from keras.layers import Convolution2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten
from keras.layers import Dense

In [7]:
# initialize the sequential neural network
classifier = Sequential()

In [8]:
"""
add the convolutional layer
- filters = 32 means use 32 feature detectors
- kernel_size = (3, 3) means use a 3x3 feature detector matrix
- input_shape = (64, 64, 3) means the input images are 64x64 size as RGB (colored) 3D Arrays.
    if using black and white colors the parameter should equal to (64, 64, 1)
- activation = relu means to use the Rectifier activation function
"""
classifier.add(
    Convolution2D(
        filters=32,
        kernel_size=(3,3),
        input_shape=(64, 64, 3),
        activation="relu"
    )
)

Instructions for updating:
Colocations handled automatically by placer.


In [9]:
# add the max pooling layer as a 2x2 matrix
classifier.add(MaxPooling2D(pool_size=(2, 2)))

In [10]:
# add a second convolutional layer with the same parameters to use on the first max pooled layer
classifier.add(
    Convolution2D(
        filters=32,
        kernel_size=(3,3),
        activation="relu"
    )
)

# add the second max pooling layer to use on the second convolutional layer
classifier.add(MaxPooling2D(pool_size=(2, 2)))

In [11]:
# add the flattening layer
classifier.add(Flatten())

In [12]:
"""
add the fully connected layer
- units = 128 means use 128 neurons, this number was calculated through parameter tuning
- activation = relu means to use the Rectifier activation function
"""
classifier.add(Dense(units=128, activation="relu"))

In [13]:
"""
add the output layer
- units = 2 for the Dog and Cat output neurons
- activation = softmax because there are multiple outputs
"""
classifier.add(Dense(units=2, activation="softmax"))

In [14]:
"""
compile the ANN
- optimizer = adam is a specific type of Stochastic Gradient Descent algorithm
- loss = categorical_crossentropy because we're using a softmax function for the output layer
- metric = accuracy means to use the accuracy metric to determine how accurate the model is
"""
classifier.compile(optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"])

In [15]:
"""
fit the data generator to the training set
- steps_per_epoch = 114 means there are 114 training images (steps), which equals 1 epoch
- epochs = 1 means to only use 1 epoch (we're using a small epoch to save time)
- validation_data = testing_set means to use the testing set to determine the accuracy
- validation_steps = 59 means there are 59 testing images

For the first epoch, we can see the acc = 0.7093 (testing set accuracy), and the
val_acc = 0.5582(training set accuracy). This seems like overfitting to the training set.

A reason for overfitting could be because we used 2 convolutional layers.
This is because having too many layers can lead to an overfitting of the
training set because more layers creates more neurons that look for more
specific features on the training images.

Another reason for overfitting could be because its a fully-connected (Dense) neural network.
"""
classifier.fit_generator(
    training_set,
    steps_per_epoch=114,
    epochs=1,
    validation_data=testing_set,
    validation_steps=59
)

Instructions for updating:
Use tf.cast instead.
Epoch 1/1


<keras.callbacks.History at 0x7f5b648903c8>