# TOC

__Chapter 4 - Convolutional neural networks__

1. [Import](#Import)
1. [Introduction to CNNs](#Introduction-to-CNNs)
1. [MNIST - Take 2](#MNIST)
    1. [Convolution](#Convolution)
    1. [Pooling](#Pooling)
    1. [Dropout](#Dropout)
1. [](#)
1. [](#)
1. [](#)
1. [](#)
1. [](#)
1. [](#)
1. [](#)
1. [](#)
1. [](#)


<a id = 'Import'></a>

# Import

In [None]:
# Standard libary and settings
import os
import sys
import importlib
import itertools
import warnings; warnings.simplefilter('ignore')
modulePath = os.path.abspath(os.path.join('../../CustomModules'))
sys.path.append(modulePath) if modulePath not in sys.path else None
from IPython.core.display import display, HTML; display(HTML("<style>.container { width:95% !important; }</style>"))


# Data extensions and settings
import numpy as np
np.set_printoptions(threshold = np.inf, suppress = True)
import pandas as pd
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.options.display.float_format = '{:,.6f}'.format

import tensorflow as tf

# Visualization extensions and settings
import seaborn as sns
import matplotlib.pyplot as plt


# Custom extensions and settings
import mlmachine as mlm


# Magic functions
%matplotlib inline


<a id = 'Introduction-to-CNNs'></a>

# Introduction to CNNs

Contrasting with fully connected neural networks, units in CNNs are connected to a (typically  small) number of nearby units in the previous layer. Further, all units are connected to the previous layer in the same way, with the exact same weights and structure. This facilitates an operation know as convolution, which can be thought as the application of a 'window' of weights. This windows slides along the surface of the image. This helps to address the fact that an object can appear in many different locations in a picture, and the perspective of an object will certainly differ from image to image. The is known as 'invariance'. The convolutional approach to learning weights addressse this by performing the same exact computation on different parts of the image.



<a id = 'MNIST'></a>

# MNIST - Take 2

Modeling using the MNIST dataset, this time with a small CNN.

<a id = 'Convolution'></a>

## Convolution

The convolutation operation is the fundamental means by which layers are connected in CNNs. TensorFlow has a build in operation conv2d()

```python
tf.nn.conv2d(x, W, strides = [1,1,1,1], padding = 'SAME'
```

Here, 'x' is the data - which is either the input image or a downstream feature map obtained further along in the network following previous convolutional layers. A feature map is the output of each layer. The output of each layer can also be thought of as a 'processed' image, the result of applying a filter and perhaps some other operations. 

The filter is parameterized by W, which is comprised of the learned weights of our network. This convolutional filter is the small 'sliding window' that slides across the face of the image.

The output of this operation will depend on the shape of X and W. In this case, the output is four-dimensional. The image data X is of shape: [None, 28, 28, 1], meaning we have an unknown number of images, each has 28 x 28 pixels, with one color channel (grayscale). The weights W is of shape: [5, 5, 1, 32], where the initial 5 x 5 x 1 represents the size of the 'window' in the image to be convolved, which in this is a 5 by 5 region. The 32 represents the number of feature maps. In other words, we have multiple sets of weights for the convolutional layer. The idea of a convolutional layer is to compute the same feature along the image - we would like to compute many such features and thus use multiple sets of convolutional filters.

The 'strides' argument controls the spatial movement of the filter window W across the image (or feature map) x. The value [1,1,1,1] means that the filter is applied to the input in 1-ixel intervals, which can be thought of as a full convolution. Increasing the stride will result in a smaller feature map.

Lastly, the padding argument is set to 'SAME', which means that the border of x are padded such that the size of the result of the operation is the same as the size of x. This allows the window to give similar attention to the pixels on the border of the image and the pixels in the middle of the image.

<a id = 'Pooling'></a>

## Pooling

Pooling means reducing the size of the data with some local aggregation function, typically within each feature map. The technical aspect of this operation is that pooling reduces the size of the data processed downstream. This drastically reduces the number of parameters in the model, particularly if we use fully connected layers after the convolutional layer. The theoretical aspect of pooling is that we would like our features to not care too much about small changes in position in an image. This allows the process to over spatial variability between images.

```python
tf.nn.max_pool(x, ksize = [1,2,2,1], stides = [1,2,2,1], padding = 'SAME')
```

The ksize argument controls the size of the pooling and strides controls how much the pooling grid slides across x, just as it does in the convolution layer. Setting strides to a 2x2 grid means the output of the pooling will be exactly one-half of the height and width of the original - one-quarter of the original size overall.

<a id = 'Dropout'></a>

## Dropout

Dropout is a regularization trick used to force the network to distribute the learned representation across all nuerons. Dropout 'turns off a random preset fraction of units in a layer by setting their values to zero during training. These dropped neurons are random, and different for each computation, which forces the network to learn a representation that will work despite the dropout. This process can be thought of training an 'ensemble of multiple network that have a different understanding of the training data, which tends to increase generalization. Dropout is not used in the test phase.

```python
tf.nn.dropout(layer, keep_prob = 0.1)
```



<a id = ''></a>

## a

<a id = ''></a>

## a

<a id = ''></a>

## a