# Convolutional Neural Networks #

As we start to attack larger, more complicated data sets, the networks themselves also become much larger. If we stick with what we learned about simple neural networks and expand upon it, we get what's called Multi-Layer Perceptrons, or MLP's. As an example, we'll examine hand writing analysis for clean, grayscale images with distinct black backgrounds. All samples are the same size and of the numbers 0-9. Visually, the network would look something like the following:

<img src="FullMLP.PNG" style="height: 500px;"/>

Note that we have to flatten the image array to a linear input vector. As before all of the input nodes are fully connected to the nodes in the hidden layer. Here we see the direct limitations of MLP's. A consequence of all of the layers being fully connected causes both a large increase in computational requirements and a loss of locational context between data and node. 

In [3]:
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten

# define the model
model = Sequential()
model.add(Flatten(input_shape=(1,784)))
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(10, activation='softmax'))

# summarize the model
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
flatten_1 (Flatten)          (None, 784)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 512)               401920    
_________________________________________________________________
dropout_1 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 512)               262656    
_________________________________________________________________
dropout_2 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_3 (Dense)              (None, 10)                5130      
Total params: 669,706.0
Trainable params: 669,706.0
Non-trainable params: 0.0
________________________________________________________________

One of the prime innovations moving us from MLP's to CNN's is the concept of Regional Networks. Such arrangements only connect certain nodes to certain weights. Looking at a 2D array of data for something like an image, we end up with:

<img src="RegionalNetwork.PNG" style="height: 500px;"/>

From here, the configuration can be further simplified by using weight matrices instead of vectors:

<img src="RegionalNetworkMatrix.PNG"/>

# CNN Structure #

Another major change in CNN's is how they take in and operate on input data. The input here is a matrix that is sized proportional to a region of data that you think needs to analyzed to detect a feature of relevant size, relative to the overall data. Note that you don't have to know *what* the feature will be, only an appropriate filter size. Comparing this to image processing, think of typical image kernals like edge detectors. Additionally, you can specify **how many** filters are used, and how those filters march along the data. The results of these filters are then "convolved" together in a new layer, known as a *convolutional layer*.

<img src="CNN_Input_handling.PNG" style="height: 500px;"/>

For a CNN with multiple filters, they could end up with iterating to something like this:

<img src="MultipleFilters.PNG" style="height: 500px;"/>

For a color vs. grayscale image, the only difference would be a multi-dimensional input, with the filter set operating on each "layer" of the input array. Note that during the fitting process, the network will create filters that work at finding features across all of the different layers. The convolutional layer holds the results of the different filters for analysis.

<img src="CNN_MultiInputLayerMultiFilter.PNG" style="height: 500px;"/>

In [1]:
from keras.models import Sequential
from keras.layers import Conv2D

model = Sequential()
model.add(Conv2D(filters=32, kernel_size=3, strides=2, padding='valid', 
    activation='relu', input_shape=(128, 128, 3)))
model.summary()

Using TensorFlow backend.


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 63, 63, 32)        896       
Total params: 896.0
Trainable params: 896
Non-trainable params: 0.0
_________________________________________________________________


# Pooling Layers #

The last unique layer to a CNN is called the *pooling layer*. It's responsibility is to reduce the results from a convolutional layer down to a more manageable amount of dimensions. The goal for structuring these layers is to decimate the data enough so that the next layer is more efficient, but not so much that it looses meaningful data. Some typical pooling layer reduction methods include max pooling (max value contained within filter window), global average pooling (average of all values in filter layer).

In [4]:
from keras.models import Sequential
from keras.layers import MaxPooling2D

model = Sequential()
model.add(MaxPooling2D(pool_size=2, strides=2, input_shape=(100, 100, 15)))
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
max_pooling2d_2 (MaxPooling2 (None, 50, 50, 15)        0         
Total params: 0.0
Trainable params: 0.0
Non-trainable params: 0.0
_________________________________________________________________


# Putting it together #

The overall goal of alternating convolutional and pooling layers is to go from spacially related data to an overall signature vector for what the image contains. The signature vector is then used as an input vector to an MLP, which outputs a set of probabilities as to what category item may be contained within the dataset itself.

In [5]:
from keras.models import Sequential
model = Sequential()
model.add(Conv2D(filters=16, kernel_size=2, padding='same', activation='relu', input_shape=(32, 32, 3)))
model.add(MaxPooling2D(pool_size=2))
model.add(Conv2D(filters=32, kernel_size=2, padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=2))
model.add(Conv2D(filters=64, kernel_size=2, padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=2))
model.add(Flatten())
model.add(Dense(500, activation='relu'))
model.add(Dense(10, activation='softmax'))
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_2 (Conv2D)            (None, 32, 32, 16)        208       
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 16, 16, 16)        0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 16, 16, 32)        2080      
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 8, 8, 32)          0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 8, 8, 64)          8256      
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 4, 4, 64)          0         
_________________________________________________________________
flatten_2 (Flatten)          (None, 1024)              0         
__________

Note how, despite there still being a large number of parameters here overall, the total number of trainable parameters is still smaller than the comparatively shallow MLP shown above in the beginning of the notes (670k MLP vs. 530k CNN). On top of that, you have a network that can more accurately describe the spatial data contained in the input set, since the network is connected in a way that preserves that information. This makes the CNN both more easy to train and more robust to less "clean" data.