The first thing we need to understand is what *convolution* means in the context of deep learning. 

Generally, a convolution is an integral that expresses the overlap of one function with another. So, it can be thought of *blending* two functions together. Or, another way of putting it is a convolution acts like a filter.

In [1]:
from tensorflow.keras import layers, models

input_layer = layers.Input(shape=(64,64,1))
conv_layer_1 = layers.Conv2D( filters=2,
                              kernel_size = (3,3),
                              strides = 1,
                              padding= "same")(input_layer)
# *strides* is the step size used by the layer to move the filters across the input. 
# increasing the stride therefore reduces the size of the output tensor. For example,
# if strides = 2, then the output tensor will be half the input tensor.

# *padding* pads the input data with zeros so that the output size from the layer
# so that the output size from the layer is the same size as it would be
# if the *strides* parameter were set to 1.
conv_layer_2 = layers.Conv2D(
    filters = 20
    , kernel_size = (3,3)
    , strides = 2
    , padding = 'same'
    )(conv_layer_1)
flatten_layer = layers.Flatten()(conv_layer_2)
output_layer = layers.Dense(units=10, activation = 'softmax')(flatten_layer)
model = models.Model(input_layer, output_layer)

### Model of the above network
![Model of the Network](convolution_network.png)

In [2]:
model.summary()

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 64, 64, 1)]       0         
                                                                 
 conv2d (Conv2D)             (None, 64, 64, 2)         20        
                                                                 
 conv2d_1 (Conv2D)           (None, 32, 32, 20)        380       
                                                                 
 flatten (Flatten)           (None, 20480)             0         
                                                                 
 dense (Dense)               (None, 10)                204810    
                                                                 
Total params: 205210 (801.60 KB)
Trainable params: 205210 (801.60 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


## Remember the point!
Remember, that the whole point of this is that one of the reasons the network (from **mlp.ipynb**) isn’t yet performing as well as it might is because there isn’t anything in the network that takes into account the spatial structure of the input images. In fact, the first step is to flatten the image into a single vector, so that we can pass it to the first Dense layer!

So the convolution filters function to blend adjacent pixles so that we can vectorize the data.

###Batch Normalization

One of the more common problems when training a deep neural network is ensuring that the weights of the network reamin within a reasonable range of values -- if they become too large, this is a sign that the network is suffering as the *exploding gradient* problem, meaning the the weight values can fluctuate wildly.

##### Warning 
If the loss function returns a NaN at any point, chances are that the weights have grown too large, and caused an overflow.

So what is the root casuse of an exploding gradient?

The technical term here is covariate shift, but perhaps a more plain term might cumulative shift. In that if we scale the input, say going from pixel values of 0-255 -> -1 to 1, that seems fine for the input layer, but over time subsequent layers might also shift. We get this cumulative shift in weights that can result in an exploding gradient (essentially the neural network is overcorrecting because it's become too sensitive per layer, and not flexible enough as a whole).

**Batch normalization** solves this by inserting a layer -- a batch normalization layer -- that normalizes all of its inputs. From the text:

During training, a batch normalization layer calculates the mean and standard deviation of each of its input channels across the batch and normalizes by subtracting the mean and dividing by the standard deviation. There are then two learned parameters for each channel, the scale (gamma) and shift (beta). 

Essentially, it's smoothing out the preceding input layers.



### Dropouts

Dropouts are simply a way of forcing some weights to be reset to fight against over-fitting.

In [3]:
from tensorflow.keras import layers, models

input_layer = layers.Input((32,32,3))

x = layers.Conv2D(filters = 32, kernel_size = 3
	, strides = 1, padding = 'same')(input_layer)
x = layers.BatchNormalization()(x)
x = layers.LeakyReLU()(x)

x = layers.Conv2D(filters = 32, kernel_size = 3, strides = 2, padding = 'same')(x)
x = layers.BatchNormalization()(x)
x = layers.LeakyReLU()(x)

x = layers.Conv2D(filters = 64, kernel_size = 3, strides = 1, padding = 'same')(x)
x = layers.BatchNormalization()(x)
x = layers.LeakyReLU()(x)

x = layers.Conv2D(filters = 64, kernel_size = 3, strides = 2, padding = 'same')(x)
x = layers.BatchNormalization()(x)
x = layers.LeakyReLU()(x)

x = layers.Flatten()(x)

x = layers.Dense(128)(x)
x = layers.BatchNormalization()(x)
x = layers.LeakyReLU()(x)
x = layers.Dropout(rate = 0.5)(x)

output_layer = layers.Dense(10, activation = 'softmax')(x)

model = models.Model(input_layer, output_layer)

Let's get the model summary.

In [4]:
model.summary()

Model: "model_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_2 (InputLayer)        [(None, 32, 32, 3)]       0         
                                                                 
 conv2d_2 (Conv2D)           (None, 32, 32, 32)        896       
                                                                 
 batch_normalization (Batch  (None, 32, 32, 32)        128       
 Normalization)                                                  
                                                                 
 leaky_re_lu (LeakyReLU)     (None, 32, 32, 32)        0         
                                                                 
 conv2d_3 (Conv2D)           (None, 16, 16, 32)        9248      
                                                                 
 batch_normalization_1 (Bat  (None, 16, 16, 32)        128       
 chNormalization)                                          

Let's bring back our previous example and see how this convolusion Neural Network now performs.

In [5]:
import numpy as np
from tensorflow.keras import datasets, utils
from tensorflow.keras import layers, models
from tensorflow.keras import optimizers

(x_train, y_train), (x_test, y_test) = datasets.cifar10.load_data()

NUM_CLASSES = 10

x_train = x_train.astype('float32') / 255.0
x_test  = x_test.astype('float32') / 255.0

y_train = utils.to_categorical(y_train, NUM_CLASSES)
y_test  = utils.to_categorical(y_test, NUM_CLASSES)

Evaluating the CNN

In [11]:
from tensorflow.keras import optimizers
opt = optimizers.Adam(learning_rate=0.0005)
model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])

### Training the Model
model.fit(x_train,
          y_train,
          batch_size= 32,
          epochs = 10,
          shuffle = True)
model.evaluate(x_test, y_test, batch_size=1000)



Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


[0.8270503878593445, 0.7192999720573425]

A vast improvement!

### Let's investigate
Let's try again, without batch normalization (and then without drop outs).. Just to see the potential issues occur.

In [12]:
input_layer = layers.Input((32,32,3))

x = layers.Conv2D(filters = 32, kernel_size = 3
	, strides = 1, padding = 'same')(input_layer)
x = layers.BatchNormalization()(x)
x = layers.LeakyReLU()(x)

x = layers.Conv2D(filters = 32, kernel_size = 3, strides = 2, padding = 'same')(x)
# x = layers.BatchNormalization()(x)
x = layers.LeakyReLU()(x)

x = layers.Conv2D(filters = 64, kernel_size = 3, strides = 1, padding = 'same')(x)
# x = layers.BatchNormalization()(x)
x = layers.LeakyReLU()(x)

x = layers.Conv2D(filters = 64, kernel_size = 3, strides = 2, padding = 'same')(x)
# x = layers.BatchNormalization()(x)
x = layers.LeakyReLU()(x)

x = layers.Flatten()(x)

x = layers.Dense(128)(x)
# x = layers.BatchNormalization()(x)
x = layers.LeakyReLU()(x)
x = layers.Dropout(rate = 0.5)(x)

output_layer = layers.Dense(10, activation = 'softmax')(x)

model = models.Model(input_layer, output_layer)
model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])

### Training the Model
model.fit(x_train,
          y_train,
          batch_size= 32,
          epochs = 10,
          shuffle = True)
model.evaluate(x_test, y_test, batch_size=1000)



Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


[0.8935840129852295, 0.7056999802589417]