## Applications of CNNs

RNNs used more frequently in NLP, though CNNs can be used in this area too

* [Text classification](http://www.wildml.com/2015/12/implementing-a-cnn-for-text-classification-in-tensorflow/)
* [Language translation](https://code.facebook.com/posts/1978007565818999/a-novel-approach-to-neural-machine-translation/)
* Play [Atari games](https://code.facebook.com/posts/1978007565818999/a-novel-approach-to-neural-machine-translation/) with a CNN and reinforcement learning
* Power drones
* Self-driving cars
* [Localize breast cancer](https://research.googleblog.com/2017/03/assisting-pathologists-in-detecting.html)

## How Computers Interpret Images

* MNIST database, 70,000 images of hand-written digits (available in `keras.datasets`)

## Hyperparameter tuning

http://machinelearningmastery.com/grid-search-hyperparameters-deep-learning-models-python-keras/

## Issues with MLPs
* Only use fully connected/densely connected layers (each node in one layer connects with a certain weight to every node in the following layer)
* Only accept vectors as input

## How CNNs address these issues
* Use sparsely connected/locally connected layers
    * Hidden nodes connected to pixels only in a certain region (i.e. an image matrix can be broken into four regions, with each of four hidden nodes connected to pixels from one region)
* Accepts matrics as input
* Weight-sharing (if a cat is seen in one portion of an image, those weights are shared for other potions of the image)

## Convolutional Layers

* Slide a convolution window over every equally-sized region in the matrix; at each position, the window specifies a small piece of the matrix, and connects each collection of pixels to a single hidden layer, called a convolutional layer

## Stride and Padding

Stride: amount by which the filter strides across the image
Stride of 1 makes the convolutional layer roughly the height and width of the input parameter; if stride is 2, the height and width are about half of the input layer
Padding: pad layer with zeros, to make sure we don't lose some nodes from the convolutional layer ('same'); otherwise, you lose some data on the edges ('valid')

## Convolutional Layers in Keras

In [1]:
from keras.layers import Conv2D

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.
  (fname, cnt))
  (fname, cnt))


### Arguments

You must pass the following arguments:

* `filters` - The number of filters.
* `kernel_size` - Number specifying both the height and width of the (square) convolution window.

There are some additional, optional arguments that you might like to tune:

* `strides` - The stride of the convolution. If you don't specify anything, strides is set to 1.
* `padding` - One of 'valid' or 'same'. If you don't specify anything, padding is set to 'valid'.
* `activation` - Typically 'relu'. If you don't specify anything, no activation is applied. You are strongly encouraged to add a ReLU activation function to every convolutional layer in your networks.

NOTE: It is possible to represent both kernel_size and strides as either a number or a tuple.

When using your convolutional layer as the first layer (appearing after the input layer) in a model, you must provide an additional input_shape argument:

* `input_shape` - Tuple specifying the height, width, and depth (in that order) of the input.

NOTE: Do not include the input_shape argument if the convolutional layer is not the first layer in your network.

There are many other tunable arguments that you can set to change the behavior of your convolutional layers. To read more about these, we recommend perusing the official documentation.

#### Example no. 1

Say I'm constructing a CNN, and my input layer accepts grayscale images that are 200 by 200 pixels (corresponding to a 3D array with height 200, width 200, and depth 1). Then, say I'd like the next layer to be a convolutional layer with 16 filters, each with a width and height of 2. When performing the convolution, I'd like the filter to jump two pixels at a time. I also don't want the filter to extend outside of the image boundaries; in other words, I don't want to pad the image with zeros. Then, to construct this convolutional layer, I would use the following line of code:

In [2]:
Conv2D(filters=16, kernel_size=2, strides=2, activation='relu', input_shape=(200, 200, 1))

<keras.layers.convolutional.Conv2D at 0x7f97925b6470>

#### Example no. 2

Say I'd like the next layer in my CNN to be a convolutional layer that takes the layer constructed in Example 1 as input. Say I'd like my new layer to have 32 filters, each with a height and width of 3. When performing the convolution, I'd like the filter to jump 1 pixel at a time. I want the convolutional layer to see all regions of the previous layer, and so I don't mind if the filter hangs over the edge of the previous layer when it's performing the convolution. Then, to construct this convolutional layer, I would use the following line of code:

In [3]:
Conv2D(filters=32, kernel_size=3, padding='same', activation='relu')

<keras.layers.convolutional.Conv2D at 0x7f97925b6438>

#### Example no. 3

If you look up code online, it is also common to see convolutional layers in Keras in this format:

In [4]:
Conv2D(64, (2,2), activation='relu')

<keras.layers.convolutional.Conv2D at 0x7f97925b6400>

In this case, there are 64 filters, each with a size of 2x2, and the layer has a ReLU activation function. The other arguments in the layer use the default values, so the convolution uses a stride of 1, and the padding has been set to 'valid'.

## Dimensionality

In [5]:
from keras.models import Sequential
from keras.layers import Conv2D

model = Sequential()
model.add(Conv2D(filters=16, kernel_size=2, strides=2, padding='valid', 
    activation='relu', input_shape=(200, 200, 1)))
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_4 (Conv2D)            (None, 100, 100, 16)      80        
Total params: 80
Trainable params: 80
Non-trainable params: 0
_________________________________________________________________


### Formula: Number of Parameters in a Convolutional Layer

The number of parameters in a convolutional layer depends on the supplied values of filters, kernel_size, and input_shape. Let's define a few variables:

* K - the number of filters in the convolutional layer
* F - the height and width of the convolutional filters
* D_in - the depth of the previous layer

Notice that K = filters, and F = kernel_size. Likewise, D_in is the last value in the input_shape tuple.

Since there are F*F*D_in weights per filter, and the convolutional layer is composed of K filters, the total number of weights in the convolutional layer is K*F*F*D_in. Since there is one bias term per filter, the convolutional layer has K biases. Thus, the number of parameters in the convolutional layer is given by K*F*F*D_in + K.


### Formula: Shape of a Convolutional Layer

The shape of a convolutional layer depends on the supplied values of kernel_size, input_shape, padding, and stride. Let's define a few variables:

* K - the number of filters in the convolutional layer
* F - the height and width of the convolutional filters
* S - the stride of the convolution
* H_in - the height of the previous layer
* W_in - the width of the previous layer

Notice that K = filters, F = kernel_size, and S = stride. Likewise, H_in and W_in are the first and second value of the input_shape tuple, respectively.

The depth of the convolutional layer will always equal the number of filters K.

If padding = 'same', then the spatial dimensions of the convolutional layer are the following:

* height = ceil(float(H_in) / float(S))
* width = ceil(float(W_in) / float(S))

If padding = 'valid', then the spatial dimensions of the convolutional layer are the following:

* height = ceil(float(H_in - F + 1) / float(S))
* width = ceil(float(W_in - F + 1) / float(S))

## Pooling Layers

See [Keras documentation](https://keras.io/layers/pooling/)

Method for reducing dimensionality within CNN

1. Max pooling layer
    * Takes stack of feature maps as input
    * Specify window size and stride
    * Take the maximum of the pixels in each window to construct the max pooling layer
2. Global average pooling
    * More extreme form of dimensionality reduction
    * Takes stack of feature maps, calculates average value for nodes in each
    * Final output is stack of feature maps where each feature map has been reduced to a single value
    * Turns 3D array into a vector

### Max Pooling Layers in Keras

To create a max pooling layer in Keras, you must first import the necessary module:

```python
from keras.layers import MaxPooling2D
```

Then, you can create a convolutional layer by using the following format:

```python
MaxPooling2D(pool_size, strides, padding)
```

#### Arguments

You must include the following argument:

* pool_size - Number specifying the height and width of the pooling window.

There are some additional, optional arguments that you might like to tune:

* strides - The vertical and horizontal stride. If you don't specify anything, strides will default to pool_size.
* padding - One of 'valid' or 'same'. If you don't specify anything, padding is set to 'valid'.

NOTE: It is possible to represent both pool_size and strides as either a number or a tuple.

You are also encouraged to read the official documentation.

#### Example

Say I'm constructing a CNN, and I'd like to reduce the dimensionality of a convolutional layer by following it with a max pooling layer. Say the convolutional layer has size (100, 100, 15), and I'd like the max pooling layer to have size (50, 50, 15). I can do this by using a 2x2 window in my max pooling layer, with a stride of 2, which could be constructed in the following line of code:

In [6]:
from keras.layers import MaxPooling2D
MaxPooling2D(pool_size=2, strides=2)

<keras.layers.pooling.MaxPooling2D at 0x7f97925b6f28>

#### Checking the Dimensionality of Max Pooling Layers

In [7]:
from keras.models import Sequential
from keras.layers import MaxPooling2D

model = Sequential()
model.add(MaxPooling2D(pool_size=2, strides=2, input_shape=(100, 100, 15)))
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
max_pooling2d_2 (MaxPooling2 (None, 50, 50, 15)        0         
Total params: 0
Trainable params: 0
Non-trainable params: 0
_________________________________________________________________


## CNNs for Image Classification

* CNNs require fixed-size images
    * Generally, resize them all to a square
    * Every image interpreted as 3D array
        * Grayscale images have depth of 1, still technically 3D
* CNNs make layers deeper, pooling layers decrease dimensionality
* Set kernel size to 2, stride to 1, padding to 'same'
    * Gives CNN layer same width and height as previous layer
* Number of filters often doubles with each layer (16, 32, 64, etc.)
* Use 'relu' activation function in all CNN layers
* Max pooling layers generally follow every one or two convolutional layers in the sequence
    * Generally half the dimensions (pool_size, stride equal 2)
* Resultant layer contains no spatial image (spatial information no longer important)
* Flatten array to vector, feed to one or more fully connected layers to determine what object was found in the image

In [8]:
from keras.models import Sequential

from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense

model = Sequential()
model.add(Conv2D(filters=16, kernel_size=2, padding='same', activation='relu', input_shape=(32, 32, 3)))
model.add(MaxPooling2D(pool_size=2))
model.add(Conv2D(filters=32, kernel_size=2, padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=2))
model.add(Conv2D(filters=64, kernel_size=2, padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=2))
model.add(Flatten())
model.add(Dense(500, activation='relu'))
model.add(Dense(10, activation='softmax'))

The network begins with a sequence of three convolutional layers, followed by max pooling layers. These first six layers are designed to take the input array of image pixels and convert it to an array where all of the spatial information has been squeezed out, and only information encoding the content of the image remains. The array is then flattened to a vector in the seventh layer of the CNN. It is followed by two dense layers designed to further elucidate the content of the image. The final layer has one entry for each object class in the dataset, and has a softmax activation function, so that it returns probabilities.

#### Things to Remember

* Always add a ReLU activation function to the Conv2D layers in your CNN. With the exception of the final layer in the network, Dense layers should also have a ReLU activation function.
* When constructing a network for classification, the final layer in the network should be a Dense layer with a softmax activation function. The number of nodes in the final layer should equal the total number of classes in the dataset.
* Have fun! If you start to feel discouraged, we recommend that you check out [Andrej Karpathy's tumblr](https://lossfunctions.tumblr.com/) with user-submitted loss functions, corresponding to models that gave their owners some trouble. Recall that the loss is supposed to decrease during training. These plots show very different behavior :).


## Image Augmentation in Keras

We're only interested in whether an object is present in the image, therefore:
* Scale invariance: we don't want the size to matter
* Rotation invariance: we don't want the rotation to matter
* Translation invariance: we don't want the location to matter


* To make your CNN rotation and translation invariant, augment the training set with random rotated and translated images
* Helps with over-fitting

## Groundbreaking CNN Architectures

* AlexNet Architecture
    * Trained network in about a week
    * Pioneered used of ReLU activation and dropout technique for overfitting
* VGG Architecture
    * VGG-16 and VGG-19 (number of layers)
    * Long sequence of 3x3 convoultions broken up by 2x2 pooling layers, finished with 3 fully connected layers
* ResNet Architecture

All of these are available in Keras, along with some others

## Visualizing CNNs

See https://classroom.udacity.com/nanodegrees/nd009/parts/99115afc-e849-48cf-a580-cb22eea2ba1b/modules/777db663-2b0d-4040-9ae4-bf8c6ab8f157/lessons/52fc79a7-13ff-4065-b3c6-8203ec9ef60c/concepts/cbf65dc4-c0b4-44c5-81c6-5997e409cb75

## Transfer Learning

Involves taking a pre-trained neural network and adapting the neural network to a new, different data set

Remove layers that are specific to the dataset, keep the other layers, and add one or more layers, and only train those

See [here](https://classroom.udacity.com/nanodegrees/nd009/parts/99115afc-e849-48cf-a580-cb22eea2ba1b/modules/777db663-2b0d-4040-9ae4-bf8c6ab8f157/lessons/52fc79a7-13ff-4065-b3c6-8203ec9ef60c/concepts/8c202ff3-aab5-46c3-8ed1-0154fa7b566b) for more info
