# Convolutional Neural Networks #

Convolutional Neural Networks (CNNs) are deep learning frameworks that are common for image classification, though used in other cases as well. CNNs can be useful for this task over a general deep feed-forward network because it allows for preserving relative positioning, say between two pixels, when learning. This is because pixels "near" each other will be processed together through the convolutional layer described below. There are also other layers invovled in the basic CNN structure, often incuding pooling layers and the usual fully connected layers.

### Convolutional Layer ###

The basic component of a CNN is the filter, called a kernal. There are many kernels in a CNN and through training they eventually learn features useful for classification. For example, we may be trying to create a classifier that determines if an image contains a person, dog, or cat. The network would have many kernels that may all learn to distingush specific parts of an image useful for distinguishing between these three classes. For example, a kernel may learn the feature pointed ears on top of a head. In which case, if this feature is in an image, the classifier may want to consider it more likely this image is of a cat, possible it is a dog, but definitely not a human.

A kernel itself is a matrix of size $k \times k$ that "sweeps" over the layer input, in this case say an image that is $n \times m \times 3$. That is, the image has a height n, a width m, and is an RGB image. In this case, the kernel will move over the first channel of size $n \times m$ by moving across groups of $k \times k$ pixels. For each group, it will calculate the dot product of the input and the kernel weights and that value will be added to an output matrix. An example of a $5 \times 5$ image with a $3 \times 3$ kernel is shown below (borrowed from: https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53). The current part of the image being processed is highlighted in yellow and the kernel weights are shown in the corner of the yellow squres in red. 

After calculating the dot product, the yellow matrix would slide over some number of squares to the right (determine by the kernel setting called the step size) and calculate the new dot product, which would be added to the output matrix being constructed. This would continue over this green image matrix until the entire image had been proccessed (and applied to additional channels if there were some). Then, the output matrix would be passed onto the next layer.

In one convolutional layer there may be many filters, leading to a larger output.

### Pooling Layers ###

A poolying layer reduces the dimensions (downsamples), in order to reduce the number of parameters required in the model. This is also achieved with a kernel filter, which in this case just sweeps over the layer to determine which parts of a given matrix are "pooled" together. There are two ways of pooling. In max pooling, the maximum value "covered" by the kernel is passed on to the next layer. In average pooling, the average of the values "covered" by the kernel is passed on. In this way, for example a $5 \times 5$ area of an image (25 pixels), is reduced to 1 pixel. Pooling usually happens after some convolutional layers are applied, which generally increase the dimensions from the original input.

### Fully-Connected Layer ###

These are the normal layers of a general deep neural network. They are what is usually applied at the end to return the final classification or a given input.

## Demonstration and Data Set ##

The implementation of a basic CNN will be shown using the built-in scikit-learn hand written digits data set.

### Import Data Set ###

Each image is $8 \times 8$ with the image in black and white (only 1 color channel). There are 10 possible classes and 1,797 images in the data set.

In [1]:
from sklearn import datasets, model_selection

data = datasets.load_digits()
X = data['data']
Y = data['target']

# create a training and testing split
X_train,X_test,Y_train,Y_test = model_selection.train_test_split(X, Y, train_size=0.7, stratify=Y)

# use k-fold cross validation for hyperparameter tuning
