## Convolutional Neural Networks

CNNs are Neural Networks that are used to classify images. They do so by using `filters` and `convolutions`. `filters` are the weights in CNNs. They're a column vector in NNs(e.g. hidden layer with 3 nodes), and a matrix in CNNs (3x3 filter). One more way how they differ from ordinary NNs is they use the idea of shared weights. 

### Shared Weights
CNNs achieve translational invarince using shared weights. The basic idea is: if a filter detects a horizontal line, then it is intuitive for it to detect the line anywhere in the image irrespective of the location. Hence, there is no need to learn how to detect a line at a different location again and again. Also: __it enormously decreases the number of parameters to learn__. In a normal NN, it would have one weight for every pixel in the image which are too many hyper parameters.  

Steps:
1. Convolution
2. Max Pooling
3. ReLU
4. Flattening
5. Full Connection


### Convolution layer
For an image, we create a filter/kernel (from the same image) that performs element wise multiplication (convolution). Hence using this we can detect edges, and various other information about the image. The convolution matrix is initially initialised with random zero-centered numbers. Later it will automatically learn to figure out various aspects of the image. 

Important thing to remember: depth of filter in current layer equals the number of channels in previous layer. 

### Interpreting Convolutions
Initial convolution layers learn basic things like horizontal lines, vertical lines, small shapes. And as the layers go on increasing they learn more and more high level features. Like a facial recognition model will learn basic lines in the initial layers, then learn nose, eyes, etc in the next layers, then faces in the final layers. 

### Filters
Filters are the weights in CNNs. These filters have depth, another hyperparameter. One filter is a 2D array which can be interpreted as something that learn a shape. We can create an array of such filters thus adding a depth to it. Each filter learns a different element: one might learn horizontal lines, one might learn a basic circle shape, etc. 


### Padding
In padding we add an extra layer of 0s accross the dimension so that adding multiple convolutions won't shrink the dimensions quickly. 


### Output Dimensions
The output dimensions after padding are:

$$W_0 = \frac{W_i - F + 2P}{2}$$


$$H_0 = \frac{H_i - F + 2P}{2}$$

Where, 
W_i is the input width, H_i is input height, F is filter size (it's symmetric), and P is the padding

### Structuring (selecting filter size and depth)
It is suggested that the filter size should be bigger in the initial layers, and depth should be smaller. For subsequent layers, you should be reducing the filter size and increasing the depth. That's because the final layers are high level representations so increase the depth in the final layers means more high level features will be learned. The size should generally be a multiple of 2. 

### Pooling
Pooling is used for following reasons:
1. Translational Invariance: we don't care where the face is for a facial classifier. 
2. Reduction in number of parameters, while respecting the spatial ascept. 
3. Reduce overfitting. 

There are couple of ways of pooling: mean, max. Max pooling is most commonly used with 2x2 ksize and 2 stride. There are also overlapping ksizes in pooling. 

Recently, pooling is not much used because:
1. Datasets are so diverse, we're more concerned about underfitting, than overfitting. 
2. Dropout is a much better regularizer. 
3. Downsampling image results in information loss. 

### ReLU (Activation Function)
This operation add non-linearity in the network. It is a computationally efficient activation function and has many advantages over other. 

### Flattening
This operation simply flattens the matrix, that is, converts the matrix into single column by appending all the rows below one another. 

### Fully connected layer
Once we have the flattened input, imagine this step as creating a neural network with hidden layers. 


## Convolution Layer in TensorFlow

In [4]:
import tensorflow as tf

# output depth
k_output = 64

# Image properties
image_width = 10
image_height = 10
color_channels = 3

# convolutional filter
filter_width = 5
filter_height = 5

# input image
# first element in shape is batch_size which needn't be provided
input_image = tf.placeholder(tf.float32, shape=[None, image_height, image_width, color_channels])

# weights and bias
weights = tf.Variable(tf.truncated_normal([filter_height, filter_width, color_channels, k_output]))
bias = tf.Variable(tf.zeros(k_output))

# apply convolution
conv = tf.nn.conv2d(input_image, weights, strides=[1, 2, 2, 1], padding="SAME")
conv = tf.nn.bias_add(conv, bias)

# max pooling
conv = tf.nn.max_pool(conv, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="SAME")

# activation
conv = tf.nn.relu(conv)

## Image Classifier in Keras


In [9]:
from keras.models import Sequential
from keras.layers import Convolution2D, MaxPooling2D, Flatten, Dense

classifier = Sequential()

# 1. Convolution
# filters = number of filters
# kernel size is the size of MxM matrix of filter
# stride is the stride matrix size
# input shape (row, cols, dims) note this is reverse for Theano backend
classifier.add(Convolution2D(filters=32, kernel_size=3, strides=3, input_shape=(64, 64, 3), activation='relu'))

# 2. Pooling
# pool size is the matrix dimension of the pooling matrix 
classifier.add(MaxPooling2D(pool_size=(2, 2)))

# 3. Flattening
classifier.add(Flatten())

# 4. Fully connected layer
# Rule of thumb for picking the output dimension is to pick the
# mean of number of output and number of input.
# Good practice: to pick a power of 2
classifier.add(Dense(units=128, activation='relu'))

# 5. Output layer
# for binary classification: sigmoid activation
# more than 3 classes: softmax
classifier.add(Dense(units=1, activation='sigmoid'))

classifier.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

In [None]:
from keras.preprocessing.image import ImageDataGenerator
# import images and augment
# Data Augmentation will generate more images with following changes:
# 1. Rescale
# 2. Shearing
# 3. Random Zooming
# 4. Horizontal flipping
# This will basically give us additional data for training 
# Furthermore, it will reduce overfitting


train_datagen = ImageDataGenerator(
    rescale = 1. / 255,
    shear_range = 0.2,
    zoom_range = 0.2,
    horizontal_flip = True)

test_datagen = ImageDataGenerator(rescale = 1. / 255)

# target size is the size of the image expected by your model
training_set = train_datagen.flow_from_directory(
    'dataset/training_set',
    target_size = (64, 64),
    batch_size = 32,
    class_mode = 'binary')

test_set = test_datagen.flow_from_directory(
    'dataset/test_set',
    target_size = (64, 64),
    batch_size = 32,
    class_mode = 'binary')

classifier.fit_generator(
    training_set,
    steps_per_epoch = 8000,
    epochs = 25,
    validation_data = test_set,
    validation_steps = 2000)