# Part 2 - Convolutional and ResNets

## Two parts
Convolutional Neural Networks (CNN) are a type of neural network that can be viewed as consisting
of two parts, a frontend and a backend. The backend is a deep neural network (DNN), which we
have already covered. The name convolutional neural network comes from the frontend, referred to
as a convolutional layer(s). The frontend acts as a preprocessor. 

## Downsampling (resize)

If we reduce the image resolution too far, at some point we may lose the
ability to distinguish clearly what's in the image - it becomes fuzzy and/or has artifacts. So, the first
step is to reduce the resolution down to the level that we still have enough details. The common
convention for everyday computer vision is around 224 x 224

## Convolutions and Strides
Typical filter sizes are 3x3 and 5x5, with 3x3 the most
common. The number of filters varies more, but they are typically multiples of 16, such as 16, 32
or 64 are the most common. Additionally, one specifies a stride. The stride is the rate that the
filter is slid across the image. In a stride of 3, there would be no overlap. Most common
practice is to use strides of 1 and 2.

the common practice is to keep the same or
increase the number of filters on deeper layers, and to use stride of 1 on the first layer and 2 on
deeper layers. The increase in filters provides the means to go from coarse detection of features
to more detailed detection within coarse features, while the increase in stride offsets the
increase in size of retained data.
More Filters => More Data
Bigger Strides => Less Data

## Pooling

The next step is to reduce the total amount of data, while retaining the features detected and
corresponding spatial relationship between the detected features.
This step is referred to as pooling. Pooling is the same as downsampling (or sub-sampling); whereby
the feature maps are resized to a smaller dimension using either max (downsampling) or mean
(sub-sampling) pixel average within the feature map. In pooling, we set the size of the area to pool
as a NxM matrix as well as a stride. The common practice is a 2x2 pool size with a stride of 2. This
will result in a 75% reduction in pixel data, while still preserving enough resolution that the detected
features are not lost through pooling.

## Flattening
For example, if we have 16 pooled maps of size 20x20 and three channels per pooled map
(e.g., RGB channels in color image), our 1D vector size will be 16 x 20 x 20 x 3 = 19,200
elements.

# Basic CNN

In [26]:
# Keras's Neural Network components
from keras.models import Sequential
from keras.layers import Dense, ReLU, Activation
# Kera's Convolutional Neural Network components
from keras.layers import Conv2D, MaxPooling2D, Flatten, add, GlobalAveragePooling2D, BatchNormalization, ZeroPadding2D, MaxPool2D

In [3]:
model = Sequential()
# Create a convolutional layer with 16 3x3 filters and stride of two as the input
# layer

# Frontend
model.add(Conv2D(16, kernel_size=(3, 3), strides=(2, 2), padding="same",
input_shape=(128,128,1)))
# Pass the output (feature maps) from the input layer (convolution) through a
# rectified linear unit activation function.
model.add(ReLU())
# Add a pooling layer to max pool (downsample) the feature maps into smaller pooled
# feature maps
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
# Add a flattening layer to flatten the pooled feature maps to a 1D input vector
# for the DNN classifier

# Backend
model.add(Flatten())
# Add the input layer for the DNN, which is connected to the flattening layer of
# the convolutional frontend
model.add(Dense(512))
model.add(ReLU())
# Add the output layer for classifying the 26 hand signed letters
model.add(Dense(26))
model.add(Activation('softmax'))
# Use the Categorical Cross Entropy loss function for a Multi-Class Classifier.
model.compile(loss='categorical_crossentropy', optimizer='adam',
metrics=['accuracy'])

In [4]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 64, 64, 16)        160       
_________________________________________________________________
re_lu_1 (ReLU)               (None, 64, 64, 16)        0         
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 32, 32, 16)        0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 16384)             0         
_________________________________________________________________
dense_1 (Dense)              (None, 512)               8389120   
_________________________________________________________________
re_lu_2 (ReLU)               (None, 512)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 26)                13338     
__________

## With activation in dense classes

In [5]:
# Keras's Neural Network components
from keras.models import Sequential
from keras.layers import Dense
# Kera's Convolutional Neural Network components
from keras.layers import Conv2D, MaxPooling2D, Flatten
model = Sequential()
# Create a convolutional layer with 16 3x3 filters and stride of two as the input
# layer
model.add(Conv2D(16, kernel_size=(3, 3), strides=(2, 2), padding="same",
activation='relu', input_shape=(128,128, 1)))
# Add a pooling layer to max pool (downsample) the feature maps into smaller pooled
# feature maps
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
# Add a flattening layer to flatten the pooled feature maps to a 1D input vector
# for the DNN
model.add(Flatten())
# Create the input layer for the DNN, which is connected to the flattening layer of
# the convolutional front-end
model.add(Dense(512, activation='relu'))
model.add(Dense(26, activation='softmax'))
# Use the Categorical Cross Entropy loss function for a Multi-Class Classifier.
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])

## Functional method

In [1]:
from keras import Input, Model
from keras.layers import Dense
from keras.layers import Conv2D, MaxPooling2D, Flatten
# Create the input vector (128 x 128).
inputs = Input(shape=(128, 128, 1))
layer = Conv2D(16, kernel_size=(3, 3), strides=(2, 2), padding="same",
activation='relu')(inputs)
layer = MaxPooling2D(pool_size=(2, 2), strides=(2, 2))(layer)
layer = Flatten()(layer)
layer = Dense(512, activation='relu')(layer)
output = Dense(26, activation='softmax')(layer)
# Now let's create the neural network, specifying the input layer and output layer.
model = Model(inputs, output)

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


In [2]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 128, 128, 1)       0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 64, 64, 16)        160       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 32, 32, 16)        0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 16384)             0         
_________________________________________________________________
dense_1 (Dense)              (None, 512)               8389120   
_________________________________________________________________
dense_2 (Dense)              (None, 26)                13338     
Total params: 8,402,618
Trainable params: 8,402,618
Non-trainable params: 0
_________________________________________________________________


# ResNet

Residual blocks are the building blocks of ResNets. It adds to the layer g(z[l+2] + a[l]). Worse case scenario is that it returns the identity a(l). 

Residual blocks allow neural networks to be built with deeper layers without a degradation in
performance.

## ResNet 34

The variable x represents the output of a layer, which is the
input to the next layer. At the beginning of the block, we retain a copy of the previous block/layer
output as the variable shortcut . We then pass the previous block/layer output (x) through two
convolutional layers, each time taking the output from the previous layer as input into the next
layer. Finally, the last output from the block (retained in the variable x) is added (matrix addition)
with the original value of x (shortcut). This is the identity link.

In [None]:
shortcut = x
x = Conv2D(64, (3, 3), padding="same")(x)
x = ReLU()(x)
x = Conv2D(64, (3, 3), padding="same")(x)
x = ReLU()(x)
x = add([shortcut, x])

The identity link would attempt to add the input matrix (X) and the output matrix (2X).
Yeaks, we get an error, indicating we can’t broadcast (for add operation) matrices of different
sizes.

For ResNet, this is solved by **adding a convolutional block between each “doubling” group of
residual blocks.** The convolutional block doubles the filters to reshape the size and doubles the
stride to reduce the size by 75%.

## Full code using helper function

In [10]:
def residual_block(n_filters, x):
    
    shortcut = x
    x = Conv2D(n_filters, (3, 3),  padding="same", activation="relu")(x)
    x = Conv2D(n_filters, (3, 3), padding="same", activation="relu")(x)
    x = add([shortcut, x])
    
    return x

In [12]:
def conv_block(n_filters, x):
    
    x = Conv2D(n_filters, (3, 3), strides = (2, 2), padding = "same", activation="relu")(x)
    x = Conv2D(n_filters, (3, 3), strides = (2, 2), padding = "same", activation="relu")(x)
    
    return x

In [15]:
# input tensor
inputs = Input(shape=(224, 224, 3))

x = Conv2D(64, kernel_size=(7, 7), strides=(2, 2), padding="same", activation="relu")(inputs)
x = MaxPooling2D(pool_size=(3, 3), strides=(2, 2), padding="same")(x)

# First Residual Block Group of 64 filters
for _ in range ( 2 ):
    x = residual_block ( 64 , x )
# Double the size of filters and reduce feature maps by 75% (strides=2, 2) to fit the next Residual Group
x = conv_block ( 128 , x )
# Second Residual Block Group of 128 filters
for _ in range ( 3 ):
    x = residual_block ( 128 , x )
# Double the size of filters and reduce feature maps by 75% (strides=2, 2) to fit the next Residual Group
x = conv_block ( 256 , x )
# Third Residual Block Group of 256 filters
for _ in range ( 5 ):
    x = residual_block ( 256 , x )
# Double the size of filters and reduce feature maps by 75% (strides=2, 2) to fitthe next Residual Group
x = conv_block ( 512 , x )


# Fourth Residual Block Group of 512 filters
for _ in range ( 2 ):
    x = residual_block ( 512 , x )
# Now Pool at the end of all the convolutional residual blocks
x = GlobalAveragePooling2D()(x)
# Final Dense Outputting Layer for 1000 outputs
outputs = Dense ( 1000 , activation = 'softmax' )( x )
model = Model ( inputs , outputs )

In [16]:
model.summary()

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_4 (InputLayer)            (None, 224, 224, 3)  0                                            
__________________________________________________________________________________________________
conv2d_38 (Conv2D)              (None, 112, 112, 64) 9472        input_4[0][0]                    
__________________________________________________________________________________________________
max_pooling2d_4 (MaxPooling2D)  (None, 56, 56, 64)   0           conv2d_38[0][0]                  
__________________________________________________________________________________________________
conv2d_39 (Conv2D)              (None, 56, 56, 64)   36928       max_pooling2d_4[0][0]            
__________________________________________________________________________________________________
conv2d_40 

So the ResNet architecture is 6 times computationally faster. This reduction is mostly achieved by the
construction of the residual blocks. Notice how the DNN backend is just a single output Dense
layer. In effect, there is no backend. The top residual block groups act as the CNN frontend
doing the feature detection, while the bottom residual blocks perform the classification. In doing
so, unlike VGG, there was no need for several fully connected dense layers, which would have
substantially increased the number of parameters.
Another advantage is the identity link, which provided the ability to add deeper layers, without
degradation, for higher accuracy.

# ResNet 50

variation of the residual block referred to as the bottleneck residual
block. In this version, the group of two 3x3 convolution layers are replaced by a group of 1x1,
then 3x3, and then 1x1 convolution layer. The 1x1 convolutions perform a dimension reduction
reducing the computational complexity, and the last convolutional restores the dimensionality
increasing the number of filters by a factor of 4. The bottleneck residual group allows for deeper
neural networks, without degradation, and further reduction in computational complexity.

In [17]:
def bottleneck_block(n_filters, x):
    
    shortcut = x
    x = Conv2D(n_filters, (1, 1), strides=(1, 1), padding="same", activation="relu")(x)
    x = Conv2D(n_filters, (3, 3), strides=(1, 1), padding="same", activation="relu")(x)
    x = Conv2D(n_filters * 4, (1, 1), strides=(1, 1), padding="same", activation="relu")(x)
    x = add([shortcut, x])
    
    return x

## Batch Normalization

In [20]:
model = Sequential()

model.add(Conv2D(64, (3, 3), strides=(1, 1), padding="same", input_shape=(128, 128, 3)))

# Batch normalization to output before ativation function

model.add(BatchNormalization())
model.add(ReLU())

model.add(Flatten())

model.add(Dense(4096))
model.add(ReLU())
model.add(BatchNormalization())

# ResNet 50 with Batch Normalization

In [21]:
def bottleneck_block(n_filters, x):
    # using the bottleneck structure of filter size 1x1 , 3x3, and 1x1

    shortcut = x
    x = Conv2D(n_filters, (1,1), strides=(1, 1))(x)
    x = BatchNormalization()(x)
    x = ReLU()(x)
    
    x = Conv2D(n_filters, (3, 3), strides=(1, 1), padding="same")(x)
    x = BatchNormalization()(x)
    x = ReLU()(x)
    
    x = Conv2D(n_filters * 4, (1, 1), strides=(1, 1))(x)
    x = BatchNormalization()(x)

    x = add([shortcut, x])
    x = ReLU()(x)
    
    return x

In [31]:
def conv_block ( n_filters , x , strides =( 2 , 2 )):
    """ Create Block of Convolutions with feature pooling
    Increase the number of filters by 4X
    n_filters: number of filters
    x : input into the block
    """
    # construct the identity link
    # increase filters by 4X to match shape when added to output of block
    shortcut = Conv2D ( 4 * n_filters , ( 1 , 1 ), strides = strides )( x )
    shortcut = BatchNormalization ()( shortcut )
    # construct the 1x1, 3x3, 1x1 convolution block
    # feature pooling when strides=(2, 2)
    x = Conv2D ( n_filters , ( 1 , 1 ), strides = strides )( x )
    x = BatchNormalization ()( x )
    x = ReLU ()( x )
    x = Conv2D ( n_filters , ( 3 , 3 ), strides =( 1 , 1 ), padding = 'same' )( x )
    x = BatchNormalization ()( x )
    x = ReLU ()( x )
    # increase the number of filters by 4X
    x = Conv2D ( 4 * n_filters , ( 1 , 1 ), strides =( 1 , 1 ))( x )
    x = BatchNormalization ()( x )
    # add the identity link to the output of the convolution block
    x = add ([ x , shortcut ])
    x = ReLU ()( x )
    
    return x

In [32]:
inputs = Input( shape =( 224 , 224 , 3 ))
# First Convolutional layer, where pooled feature maps will be reduced by 75%
x = ZeroPadding2D ( padding =( 3 , 3 ))( inputs )
x = Conv2D ( 64 , kernel_size =( 7 , 7 ), strides =( 2 , 2 ), padding = 'valid' )( x )
x = BatchNormalization ()( x )
x = ReLU ()( x )
x = ZeroPadding2D ( padding =( 1 , 1 ))( x )
x = MaxPool2D ( pool_size =( 3 , 3 ), strides =( 2 , 2 ))( x )
x = conv_block ( 64 , x , strides =( 1 , 1 ))


# First Residual Block Group of 64 filters
for _ in range ( 2 ):
    x = bottleneck_block ( 64 , x )
# Double the size of filters and reduce feature maps by 75% (strides=2, 2) to fit the next Residual Group
x = conv_block ( 128 , x )
# Second Residual Block Group of 128 filters
for _ in range ( 3 ):
    x = bottleneck_block ( 128 , x )
# Double the size of filters and reduce feature maps by 75% (strides=2, 2) to fit the next Residual Group
x = conv_block ( 256 , x )
# Third Residual Block Group of 256 filters
for _ in range ( 5 ):
    x = bottleneck_block ( 256 , x )
# Double the size of filters and reduce feature maps by 75% (strides=2, 2) to fit the next Residual Group
x = conv_block ( 512 , x )
# Fourth Residual Block Group of 512 filters
for _ in range ( 2 ):
    x = bottleneck_block ( 512 , x )
# Now Pool at the end of all the convolutional residual blocks
x = GlobalAveragePooling2D ()( x )
# Final Dense Outputting Layer for 1000 outputs
outputs = Dense ( 1000 , activation = 'softmax' )( x )
model = Model ( inputs , outputs )

In [33]:
model.summary()

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_9 (InputLayer)            (None, 224, 224, 3)  0                                            
__________________________________________________________________________________________________
zero_padding2d_9 (ZeroPadding2D (None, 230, 230, 3)  0           input_9[0][0]                    
__________________________________________________________________________________________________
conv2d_74 (Conv2D)              (None, 112, 112, 64) 9472        zero_padding2d_9[0][0]           
__________________________________________________________________________________________________
batch_normalization_7 (BatchNor (None, 112, 112, 64) 256         conv2d_74[0][0]                  
__________________________________________________________________________________________________
re_lu_7 (R