# building convolution neural networks

Convolution neural networks are a bit 'trickier' than the simple densely-connected networks we've been looking at so far. This is because convolutions:

* work with data having a more complex shape, such as 2D image data with multiple channels
* typically reduce the size of the input
* generate multiple outputs

In this jupyter notebook, we'll build some convolution neural networks, in order to get a practical understanding of how to specify various features of the network, and how design decisions impact the network's shape and parameter count.

First, let's assume we want to build a convolution neural network that can process color images of size 64x64 pixels.

For a color image, we typically assume the data are encoded in RGB format, with 3 image channels. So, the shape of our input image data will be:

    (64,64,3)

for a 64x64 pixel image in RGB format.

We therefore need to specify the input shape for the first layer in our network as:

    input_shape=[64,64,3]

to match the shape of the input image data (remember the batch dimension does not need to be specified for tensorflow).

For image data, we will use a "2-dimensional" convolution, even though there are multiple 'channels' in the image data. Two-dimensional convolutions are implemented in tensorflow as a tf.keras.layers.Conv2D object.

To create a Conv2D object, we need to specify the number of "filters" we want to use, and the "kernel_size" of the filters.

The number of filters is a user-selected option, and it can really be anything you like. A typical convolution layer might have anywhere from 32 to 512 filters, or more. For now, we'll just specify a single filter, so we'll need to set the option:

    filters=1

when we create the Conv2D object.

The kernel size is also up to the user. However, nearly all Conv2D kernels are square, having equal height and width, and the height and width of the kernel is almost always an odd number. Most convolution networks use kernel size of (3,3), (5,5) or (7,7). Early in the development of convolution networks, there was more of a diversity of kernel sizes, and a trend toward using *larger* kernels. More recently, more emphasis has been placed on building *deep* networks consisting of hundreds or thousands of convolution layers, and kernel sizes tend to be *smaller* to reduce parameter counts. In contemporary convolution networks, (3,3) convolutions are the most commonly used 'default'.

To specify the kernel size, we just set the option:

    kernel_size=(3,3)

when we create the Conv1D object.

The following code cell creates a simple 1-filter convolution neural network layer with a (3,3) kernel size.

In [None]:
import tensorflow as tf

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Conv2D(filters=1, kernel_size=(3,3), input_shape=[64,64,3]))

model.summary()

We can see that the output shape of the convolution layer (ignoring the batch dimension) is (62,62,1). The last "1" is the number of fiters, which we specified when creating the layer.

The height and the width of the layer's output are both 62, which are calculated from the shape of the input image (64x64) and the size of the kernel (3,3).

Specifically, we can calculate the output shape of a 2D convolution layer as follows:

    outwidth = (inwidth - kernelwidth) + 1  = (64 - 3) + 1  = 61 + 1 = 62

There are 28 trainable paramters in our 2D convolution layer. Given a kernel size of (3,3), this means that each filter will take inputs from a 3x3 'grid' of image pixels. So, there are 3x3=9 inputs to each filter. Each input requires an input weight, so there are 9 trainable input weights for each filter in our convolution layer. However there are *3* input channels in our image data, and each input channel requires an *independent* convolution filter. So, given the 3 input channels, our model should have 9x3=27 trainable parameters. But let's not forget the bias term! In tensorflow, the bias term is shared among all channels, so it only adds a single trainable paramter: 27+1=28.

If we increased the number of filters to 2, we'd have 28x2=56 trainable parameters, and the output shape of the convolution layer would be (62,62,2).



XX