# TOC

__Chapter 4 - Convolutional neural networks__

1. [Import](#Import)
1. [Introduction to CNNs](#Introduction-to-CNNs)
1. [MNIST - Take 2](#MNIST)
    1. [Convolution](#Convolution)
    1. [Pooling](#Pooling)
    1. [Dropout](#Dropout)
    1. [Model](#Model)
1. [CIFAR10](#CIFAR10)
    1. [](#)
    1. [](#)
    1. [](#)
    1. [](#)
1. [](#)
1. [](#)
1. [](#)


<a id = 'Import'></a>

# Import

In [6]:
# Standard libary and settings
import os
import sys
import importlib
import itertools
import warnings; warnings.simplefilter('ignore')
modulePath = os.path.abspath(os.path.join('../../CustomModules'))
sys.path.append(modulePath) if modulePath not in sys.path else None
from IPython.core.display import display, HTML; display(HTML("<style>.container { width:95% !important; }</style>"))


# Data extensions and settings
import numpy as np
np.set_printoptions(threshold = np.inf, suppress = True)
import pandas as pd
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.options.display.float_format = '{:,.6f}'.format

import tensorflow as tf

# Visualization extensions and settings
import seaborn as sns
import matplotlib.pyplot as plt


# Custom extensions and settings
# import mlmachine as mlm


# Magic functions
%matplotlib inline


<a id = 'Introduction-to-CNNs'></a>

# Introduction to CNNs

Contrasting with fully connected neural networks, units in CNNs are connected to a (typically  small) number of nearby units in the previous layer. Further, all units are connected to the previous layer in the same way, with the exact same weights and structure. This facilitates an operation know as convolution, which can be thought as the application of a 'window' of weights. This windows slides along the surface of the image. This helps to address the fact that an object can appear in many different locations in a picture, and the perspective of an object will certainly differ from image to image. The is known as 'invariance'. The convolutional approach to learning weights addressse this by performing the same exact computation on different parts of the image.



<a id = 'MNIST'></a>

# MNIST - Take 2

Modeling using the MNIST dataset, this time with a small CNN.

<a id = 'Convolution'></a>

## Convolution

The convolutation operation is the fundamental means by which layers are connected in CNNs. TensorFlow has a build in operation conv2d()

```python
tf.nn.conv2d(x, W, strides = [1,1,1,1], padding = 'SAME'
```

Here, 'x' is the data - which is either the input image or a downstream feature map obtained further along in the network following previous convolutional layers. A feature map is the output of each layer. The output of each layer can also be thought of as a 'processed' image, the result of applying a filter and perhaps some other operations. 

The filter is parameterized by W, which is comprised of the learned weights of our network. This convolutional filter is the small 'sliding window' that slides across the face of the image.

The output of this operation will depend on the shape of X and W. In this case, the output is four-dimensional. The image data X is of shape: [None, 28, 28, 1], meaning we have an unknown number of images, each has 28 x 28 pixels, with one color channel (grayscale). The weights W is of shape: [5, 5, 1, 32], where the initial 5 x 5 x 1 represents the size of the 'window' in the image to be convolved, which in this is a 5 by 5 region. The 32 represents the number of feature maps. In other words, we have multiple sets of weights for the convolutional layer. The idea of a convolutional layer is to compute the same feature along the image - we would like to compute many such features and thus use multiple sets of convolutional filters.

The 'strides' argument controls the spatial movement of the filter window W across the image (or feature map) x. The value [1,1,1,1] means that the filter is applied to the input in 1-ixel intervals, which can be thought of as a full convolution. Increasing the stride will result in a smaller feature map.

Lastly, the padding argument is set to 'SAME', which means that the border of x are padded such that the size of the result of the operation is the same as the size of x. This allows the window to give similar attention to the pixels on the border of the image and the pixels in the middle of the image.

<a id = 'Pooling'></a>

## Pooling

Pooling means reducing the size of the data with some local aggregation function, typically within each feature map. The technical aspect of this operation is that pooling reduces the size of the data processed downstream. This drastically reduces the number of parameters in the model, particularly if we use fully connected layers after the convolutional layer. The theoretical aspect of pooling is that we would like our features to not care too much about small changes in position in an image. This allows the process to over spatial variability between images.

```python
tf.nn.max_pool(x, ksize = [1,2,2,1], stides = [1,2,2,1], padding = 'SAME')
```

The ksize argument controls the size of the pooling and strides controls how much the pooling grid slides across x, just as it does in the convolution layer. Setting strides to a 2x2 grid means the output of the pooling will be exactly one-half of the height and width of the original - one-quarter of the original size overall.

<a id = 'Dropout'></a>

## Dropout

Dropout is a regularization trick used to force the network to distribute the learned representation across all nuerons. Dropout 'turns off a random preset fraction of units in a layer by setting their values to zero during training. These dropped neurons are random, and different for each computation, which forces the network to learn a representation that will work despite the dropout. This process can be thought of training an 'ensemble of multiple network that have a different understanding of the training data, which tends to increase generalization. Dropout is not used in the test phase.

```python
tf.nn.dropout(layer, keep_prob = 0.1)
```


<a id = 'Model'></a>

## Model

In [9]:
# helper functions

def weight_variable(shape):
    """
    Info:
        Description:
            Specifies weights for either a fully connected layer or convolutional layer. 
            Randomized initially using a truncated normal distribution with a SD of 0.1. 
            This is a pretty typical randomization method.
    """
    initial = tf.truncated_normal(shape, stddev = 0.1)
    return tf.Variable(initial)

def bias_variable(shape):
    """
    Info:
        Description:
            Defines bias elements in either a fully connected layer or convolutional layer.
            Initialized with the constant value of 0.1
    """
    initial = tf.constant(0.1, shape = shape)
    return tf.Variable(initial)

def conv2d(x, W):
    """
    Info:
        Description:
            Specifies the convolution that will typically be used.
            This represents a full convolution (no skipping) with padding 
            that creates an output that is the same size as the input.
    """
    return tf.nn.conv2d(x, W, strides = [1,1,1,1], padding = 'SAME')

def max_pool_2x2(x):
    """
    Info:
        Description:
            Sets the max pool to half the size across both the height and width.
            In total, the output is one quarter of the size of the input feature map.
    """
    return tf.nn.max_pool(x, ksize = [1,2,2,1]
                          ,strides = [1,2,2,1], padding = 'SAME')

def conv_layer(input, shape):
    """
    Info:
        Description:
            The convolutional layer, linear convolution as defined in conv2d, with a
            bias followed by the ReLU activation function.
    """
    W = weight_variable(shape)
    b = bias_variable([shape[3]])
    return tf.nn.relu(conv2d(input, W) + b)

def full_layer(input, size):
    """
    Info:
        Description:
            A standard full layer with a bias. To be used for the final output.
    """
    in_size = int(input.get_shape()[1])
    W = weight_variable([in_size, size])
    b = bias_variable([size])
    return tf.matmul(input, W) + b


> Remarks - Random initialization, as opposed to constant initialization, helps break the symmetry between learned features which allows the modle to learn a diverse and rich representation. Using a bound (truncated) distribution helps control the magnitude of the gradients, allowing the network to ocnverge more efficiently.

In [10]:
# Setup model

# 28 x 28 pixel input
x = tf.placeholder(tf.float32, shape = [None, 784])
y_ = tf.placeholder(tf.float32, shape = [None, 10])

x_image = tf.reshape(x, [-1,28,28,1])

# 5 x 5 x 32 feature map. Creates 28 x 28 x 32 feature map
# Followed by 2x2 max pooling
conv1 = conv_layer(x_image, shape = [5,5,1,32])
conv1_pool = max_pool_2x2(conv1)

# 5 x 5 x 32 x 64 (5 by 5 tiles, 32 deep, 64 sets)
# Creates 14 x 14 x 64 feature map
# Followed by 2x2 max pooling
conv2 = conv_layer(conv1_pool, shape = [5,5,32,64])
conv2_pool = max_pool_2x2(conv2)

# 7 x 7 x 64 fully connected layer
conv2_flat = tf.reshape(conv2_pool, [-1,7*7*64])
full_1 = tf.nn.relu(full_layer(conv2_flat, 1024))

# dropouts
keep_prob = tf.placeholder(tf.float32)
full1_drop = tf.nn.dropout(full_1, keep_prob = keep_prob)

y_conv = full_layer(full1_drop, 10)


> Remarks - First placeholders are defined for the input images and correct labels. Next the input image is reshaped into the @D image format of size 28 x 28 x 1. In the basic logistic regression implemented earlier, since all pixels were treated independently. With a CNN, however, its power comes from the utilization of spatial meaning pixels and nearby pixels.

> Next, two consecutive convolutional layers and pools, each with 5 x 5 convolutions and 32 feature maps. These are followed by a single fully connected layer with 1,024 units. Prior to the image arriving at this fully connected layer, we flatten the image back to a single vector form since the fully connected layer derives no benefit from the spatial relationships of between pixels.

> After the second convolution/pooling layer, the size of the image is 7 x 7 x 64. the original 28 x 28 pixel image is reduced to 14 x 14 by the first pooling operation, and then to 7 x 7 by the second pooling operation. The '64' in 7 x 7 x 64 is the number of feature maps creates in the second convolutional layer.

> One interesting thing to note is that the number of parameters between the 7 x 7 x 64 layer and the fully connected 1 x 1 x 1,024 layer is 3.2 million. Without max pooling, which would give us a 28 x 28 x 64 feature map, would yield 51 million parameters.

> Lastly, the output is a fully connected layer with 10 units, one unit for each handwritten digit.

In [14]:
# Execute model

mnist = input_data.read_data_sets(DATA_DIR, one_hot = True)

cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
                               logits = y_conv, labels = y))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)

correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    
    for i in rnage(STEPS):
        batch = mnist.train.next_batch(50)
        
        if i % 50 == 0:
            train_accuracy = sess.run(accuracy, feed_dict = {x : batch[0]
                                                             ,y_ : batch[1]
                                                             ,keep_prob : 1.0})
            print('step {}, training accuracy {}'.format(i, train_accuracy))
            
        sess.run(train_step, feed_dict = {x : batch[0]
                                         ,y_ : batch[1]
                                         ,keep_prob : 0.5})
    
    X = mnist.test.images.reshape(10, 1000, 784)
    Y = mnist.test.labels.reshape(10, 1000, 10)
    test_accuracy = np.mean([sess.run(accuracy
                            ,feed_dict = {x : X[i]
                                         ,y_ : Y[i]
                                         ,keep_prob : 1.0})
                                for i in rnage(10)])

print('test accuracy: {}'.format(test_accuracy))

NameError: name 'input_data' is not defined

<a id = 'CIFAR10'></a>

# CIFAR10

<a id = ''></a>

## a

<a id = ''></a>

## a

<a id = ''></a>

## a

<a id = ''></a>

## a

<a id = ''></a>

## a

<a id = ''></a>

## a

<a id = ''></a>

## a