# building convolution neural networks

Convolution neural networks are a bit 'trickier' than the simple densely-connected networks we've been looking at so far. This is because convolutions:

* work with data having a more complex shape, such as 2D image data with multiple channels
* typically reduce the size of the input
* generate multiple outputs

In this jupyter notebook, we'll build some convolution neural networks, in order to get a practical understanding of how to specify various features of the network, and how design decisions impact the network's shape and parameter count.

First, let's assume we want to build a convolution neural network that can process color images of size 64x64 pixels.

For a color image, we typically assume the data are encoded in RGB format, with 3 image channels. So, the shape of our input image data will be:

    (64,64,3)

for a 64x64 pixel image in RGB format.

We therefore need to specify the input shape for the first layer in our network as:

    input_shape=[64,64,3]

to match the shape of the input image data (remember the batch dimension does not need to be specified for tensorflow).

For image data, we will use a "2-dimensional" convolution, even though there are multiple 'channels' in the image data. Two-dimensional convolutions are implemented in tensorflow as a tf.keras.layers.Conv2D object.

To create a Conv2D object, we need to specify the number of "filters" we want to use, and the "kernel_size" of the filters.

The number of filters is a user-selected option, and it can really be anything you like. A typical convolution layer might have anywhere from 32 to 512 filters, or more. For now, we'll just specify a single filter, so we'll need to set the option:

    filters=1

when we create the Conv2D object.

The kernel size is also up to the user. However, nearly all Conv2D kernels are square, having equal height and width, and the height and width of the kernel is almost always an odd number. Most convolution networks use kernel size of (3,3), (5,5) or (7,7). Early in the development of convolution networks, there was more of a diversity of kernel sizes, and a trend toward using *larger* kernels. More recently, more emphasis has been placed on building *deep* networks consisting of hundreds or thousands of convolution layers, and kernel sizes tend to be *smaller* to reduce parameter counts. In contemporary convolution networks, (3,3) convolutions are the most commonly used 'default'.

To specify the kernel size, we just set the option:

    kernel_size=(3,3)

when we create the Conv1D object.

The following code cell creates a simple 1-filter convolution neural network layer with a (3,3) kernel size.

In [None]:
import tensorflow as tf

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Conv2D(filters=1, kernel_size=(3,3), input_shape=[64,64,3]))

model.summary()

We can see that the output shape of the convolution layer (ignoring the batch dimension) is (62,62,1). The last "1" is the number of fiters, which we specified when creating the layer.

The height and the width of the layer's output are both 62, which are calculated from the shape of the input image (64x64) and the size of the kernel (3,3).

Specifically, we can calculate the output shape of a 2D convolution layer as follows:

    outwidth = (inwidth - kernelwidth) + 1  = (64 - 3) + 1  = 61 + 1 = 62

There are 28 trainable paramters in our 2D convolution layer. Given a kernel size of (3,3), this means that each filter will take inputs from a 3x3 'grid' of image pixels. So, there are 3x3=9 inputs to each filter. Each input requires an input weight, so there are 9 trainable input weights for each filter in our convolution layer. However there are *3* input channels in our image data, and each input channel requires an *independent* convolution filter. So, given the 3 input channels, our model should have 9x3=27 trainable parameters. But let's not forget the bias term! In tensorflow, the bias term is shared among all channels, so it only adds a single trainable paramter: 27+1=28.

If we increased the number of filters to 2, we'd have 28x2=56 trainable parameters, and the output shape of the convolution layer would be (62,62,2).



Larger kernel sizes combine more information from a larger 'block' of the input image, but they also require more trainable parameters, and they 'shrink' the layer's output to a greater degree.

For example, the following code cell uses a (7,7) kernel to analyze the same 64x64 image data.

In [None]:
import tensorflow as tf

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Conv2D(filters=1, kernel_size=(7,7), input_shape=[64,64,3]))

model.summary()

There are now 148 trainable parameters in the model. A 7x7 convolution kernel has 7x7=49 trainable input weights. There are 3 input channels, so incorporating information from all 3 channels takes 49x3=147 parameters. Plus the single shared bias term, so 148 total.

Also notice that the *output* shape of the convolution layer has gotten smaller:

    (64 - 7) + 1 = 58

If we start combining *many* convolution layers in a feed-forward network, the resulting output can shrink pretty quickly:

In [None]:
import tensorflow as tf

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Conv2D(filters=1, kernel_size=(7,7), input_shape=[64,64,3]))
model.add(tf.keras.layers.Conv2D(filters=1, kernel_size=(7,7)))
model.add(tf.keras.layers.Conv2D(filters=1, kernel_size=(7,7)))
model.add(tf.keras.layers.Conv2D(filters=1, kernel_size=(7,7)))

model.summary()

To combat this 'shrinking data' problem, we can ask tensorflow to 'draw' an appropriate 'border' around our input data, so that the shape of the convolution layer's output remains the *same* as the shape of the input (except the channel dimension, which is always specified by the number of convolution filters in the layer).

This 'border' is called "padding". Padding essentially 'extends' the height and width of the 'image' with zeros on all sides. The number of extra rows and columns of zeros is determined automatically by the kernel size.

To initiate padding, all we need to do is set the:

    padding='same'

option when we create a convolution layer.

This is demonstrated in the following code cell.

In [None]:
import tensorflow as tf

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Conv2D(filters=1, kernel_size=(7,7), padding='same', input_shape=[64,64,3]))
model.add(tf.keras.layers.Conv2D(filters=1, kernel_size=(7,7), padding='same'))
model.add(tf.keras.layers.Conv2D(filters=1, kernel_size=(7,7), padding='same'))
model.add(tf.keras.layers.Conv2D(filters=1, kernel_size=(7,7), padding='same'))

model.summary()

With "same" padding on *all* the Conv2D layers, the 'height' and 'width' of the image data remains constant at 64x64, no matter how many convolution layers we process the image data through. This can be very useful, particularly if we want to build a very 'deep' convolution network consisting of many layers.

## convolution image classification

Now that we have some basic understanding of how to build convolution layers in tensorflow, let's put this into 'practice' and build an image classifier.

For this problem, we are going to classify images from a 'standard' neural-network training and evaluation data set called "CIFAR10". The CIFAR10 data set comes from the Canadian Institute for Advanced Research; it has been widely used to develop, train and evaluate image-classification AI systems.

The CIFAR10 data set consists of 60,000 32x32 color images in RGB format. Each image comes from 1 of 10 possible classes: airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. There are 6000 images from each of the 10 possible classes.

The CIFAR10 data set is so widely used that tensorflow has a function that will *automatically* download the data set for us, including splitting the data set into appropriate training and validation sub-sets.

The following code cell will download the training and validation data from the CIFAR10 data set, normalize the images so the pixel values are between 0 and 1, and display a few example images.

In [None]:
import tensorflow as tf
import matplotlib.pyplot as plt

# download CIFAR10 data
(train_images, train_labels), (valid_images, valid_labels) = tf.keras.datasets.cifar10.load_data()

# Normalize pixel values to be between 0 and 1
train_images = train_images / 255.0 
valid_images = valid_images / 255.0

# plot example images
class_names = ['airplane', 'automobile', 'bird',  'cat',  'deer',
               'dog',      'frog',       'horse', 'ship', 'truck']

plt.figure(figsize=(10,10))
for i in range(25):
    plt.subplot(5,5,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(train_images[i])
    plt.xlabel(class_names[train_labels[i][0]])
plt.show()

It probably takes a few seconds to download the entire 60,000 images.

As you can see from the examples, there is *not* much 'high-def' information in a 32x32 color image! You can probably make out the 'true' class assigned to each image, but it can be a bit tough, as the images are quite blurry and 'pixelated' at such low resolution.

But let's see how a convolution network does.

XX