<a href="https://colab.research.google.com/github/rahiakela/deep-learning--from-basics-to-practice/blob/24-keras-part-2/convolution_networks.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Convolution Networks

Let’s build some convolutional neural networks, also called convnets,
or more commonly, CNNs.

Each convolution layer holds a collection of
filters, or kernels, which are rectangles of numbers (often a small
square that is 3, 5, or 7 elements on a side). When we use 2D convolution
layers with images, each filter in the first layer is applied in turn
to every pixel in the input. The output of the filter becomes the value
of that element in a new tensor produced at the layer’s output. If there
are multiple filters, then the output tensor contains multiple channels,
just like the red, green, and blue channels of a color image.

We can characterize a convolution layer by
the number of dimensions in which the filters are moved. If the filter is
moved in just one dimension (for example, down), then we call it a 1D
convolution layer. Typically, when we work with images, we slide our
filters over the 2D width and height of the tensor, so we usually use 2D
convolution layers for image processing. Keras also offers 3D convolution
layers for working with volumetric data.

In practice we don’t often build and train
a new CNN from the ground up. Instead, we usually try to start with
an existing network whenever possible, and specialize it for our task
by perhaps modifying it, and then training it some more with our own
data. Such transfer learning is appealing because we get to start
with an existing architecture that is known to work well, and we save
the time (sometimes days or weeks) that was invested in training the
model we’re building upon. We also get the benefit of the data that
network was trained on, which might not be available to us.

But it’s important to know how to build our own from scratch. This
lets us start fresh when we need to, and gives us the tools to modify an
existing network when we want to. Whether we’re working with our
own model or one we’ve adopted, knowing what’s going on inside will
help us diagnose problems and get the best performance out of our
model.

## Setup

In [0]:
from __future__ import absolute_import, division, print_function, unicode_literals

try:
  # %tensorflow_version only exists in Colab.
  %tensorflow_version 2.x
except Exception:
  pass
import tensorflow as tf
from tensorflow import keras
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt

from tensorflow.keras.datasets import mnist
from tensorflow.keras import backend as keras_backend
from tensorflow.keras.utils import to_categorical

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation, Flatten, Conv2D, MaxPooling2D, BatchNormalization
from tensorflow.keras.constraints import max_norm
from tensorflow.keras.optimizers import Adam, SGD, RMSprop
from tensorflow.keras.preprocessing.image import ImageDataGenerator

keras_backend.set_image_data_format('channels_last')

## Preparing the Data for A CNN

We prepare our MNIST data for a convnet with almost the same process
as we’ve been doing so far when the first layer was a Dense, or
fully-connected, layer.

The difference is in the shape of the feature data. So far, we’ve been
shaping our feature data in a 2D grid, with one row per image. Each
row held all the pixels for that image.

Something important happened when we flattened out our image
to make that grid: we lost the spatial information that tells us which
pixels are near one another vertically (technically, it’s still there, but
definitely not in a structure that’s easily useful). A great thing about
CNNs is that they work with inputs as multidimensional tensors, not
long 1D lists. For instance, the receptive field for a filter covers a group
of spatially-related elements.

When working with CNNs there’s no need to flatten out input 2D grids
of pixels. We’ll maintain them instead as three-dimensional volumes,
where each input image has a height, width, and depth.

The MNIST data is black and white, so we have just a single channel of
pixel data. But we still have to explicitly tell Keras that we have just that
one channel, by making it one of the dimensions of our input tensor.

Each image in the input will be reshaped as a 3D block
with dimensions 28 by 28 by 1, since we’re using the channels_last
convention. Then we stack all 60,000 of these 3D blocks together to
make a 4D tensor of shape 60,000 by 1 by 28 by 28, which will serve as
input to our CNN.

<img src='https://github.com/rahiakela/img-repo/blob/master/image-inputs.PNG?raw=1' width='800'/>

It can be easier to think of this not as a 4D
structure but instead as a sequence of nested lists: the outermost list
contains 60,000 images, each image contains one channel, each channel
contains 28 rows, and each row contains 28 elements.

Convnets work best with input data scaled from −1 to 1.This means we can’t just divide every pixel by 255. Instead, we’ll use
the NumPy function interp() to convert each input value in the range
[0,255] to the range [−1,1].

```python
X_train = np.interp(X_train, [0, 255], [-1, 1])
X_test = np.interp(X_test, [0, 255], [-1, 1])
```

Now we’ll re-shape the data into the shape we just discussed. We just
tell NumPy how to take our original version of X_train, which was
60,000 by 28 by 28, and reshape it into a 4D tensor that’s 60,000 by 1 by 28 by 28. We’re not changing the total number of elements, so
NumPy can do this for us.
```python
X_train = X_train.reshape(X_train.shape[0], image_height, image_width, 1)
X_test = X_test.reshape(X_test.shape[0], image_height, image_width, 1)
```

We’ll place these re-shaping lines right after the scaling step. For completeness,all the pre-processing in one place.

In [0]:
random_seed = 42
np.random.seed(random_seed)

In [0]:
# load MNIST data and save sizes
(X_train, y_train), (X_test, y_test) = mnist.load_data()
image_height = X_train.shape[1]
image_width = X_train.shape[2]
number_of_pixels = image_height * image_width

# convert to floating-point
X_train = keras_backend.cast_to_floatx(X_train)
X_test = keras_backend.cast_to_floatx(X_test)

# scale data to range [-1, 1]
X_train = np.interp(X_train, [0, 255], [-1,1])
X_test = np.interp(X_test, [0, 255], [-1,1])

# save original y_train and y_test
original_y_train = y_train
original_y_test = y_test

# replace label data with one-hot encoded versions
number_of_classes = 1 + max(np.append(y_train, y_test))
y_train = to_categorical(y_train, num_classes=number_of_classes)
y_test = to_categorical(y_test, num_classes=number_of_classes)

# reshape sample data to 4D tensor using channels_last convention 
X_train = X_train.reshape(X_train.shape[0], image_height, image_width, 1)
X_test = X_test.reshape(X_test.shape[0], image_height, image_width, 1)

Shaping the feature data into these 4D tensors is a necessary pre-processing
step. It puts the data into the structure that is expected by the
convolution layer that will sit at the start of our convnet.

In [0]:
# A little utility to draw accuracy and loss plots
def plot_accuracy_and_loss(history, plot_title, filename):
    xs = range(len(history.history['acc']))

    plt.figure(figsize=(10,3))
    plt.subplot(1, 2, 1)
    plt.plot(xs, history.history['acc'], label='train')
    plt.plot(xs, history.history['val_acc'], label='validation')
    plt.legend(loc='lower left')
    plt.xlabel('epochs')
    plt.ylabel('accuracy')
    plt.title(plot_title+', Accuracy')

    plt.subplot(1, 2, 2)
    plt.plot(xs, history.history['loss'], label='train')
    plt.plot(xs, history.history['val_loss'], label='validation')
    plt.legend(loc='upper left')
    plt.xlabel('epochs')
    plt.ylabel('loss')
    plt.title(plot_title+', Loss')

    #plt.tight_layout()
    plt.show()

## Convolution Layers

The Conv2D layer takes two unnamed, mandatory arguments at the
start of its argument list, followed by a variety of optional arguments.

The first mandatory argument is an integer specifying the number of
filters the layer should manage.Each filter
is applied to the input independently, and produces its own output. So
if our input has one channel (as our input does), and we use 5 filters
in a convolution layer, the output will have 5 channels.

<img src='https://github.com/rahiakela/img-repo/blob/master/image-channel-inputs.PNG?raw=1' width='800'/>

The second argument to Conv2D is a list that gives the dimensions of
the filters on this layer. if we have
5 filters and each is 3 by 3, then these arguments would be 5,[3,3].
This tells the layer to automatically allocate and initialize 5 volumes,
each of shape 3 by 3 by 1 (the trailing 1 is the number of channels).

In practice, we almost always use square kernels, often of 3 or 5 elements
on a side. Experience has shown that these sizes, coupled
with reduction in the size of the input (either by pooling or convolution
striding), represents a good tradeoff of computation and results.

Keep in mind that these filter kernels are 3D volumes, since there’s
one channel in the kernel for each channel in the input.

<img src='https://github.com/rahiakela/img-repo/blob/master/image-channel-inputs-1.PNG?raw=1' width='800'/>

Each filter automatically holds as many channels as there
are in the input. Here a 5 by 5 filter is being applied to a 3-channel input,
so the system automatically gives the filter 3 channels as well. Each of
the 75 values in the input (bottom) is multiplied by its corresponding
value in the filter (middle), and all of those products are added together
to produce a single number (top), the output of that filter for that location
of the input.

If we apply several filters to a multi-channel input, then
each filter will also have multiple channels. The number of channels in
the output is given by the number of filters that were used.

<img src='https://github.com/rahiakela/img-repo/blob/master/image-channel-inputs-2.PNG?raw=1' width='800'/>

let’s suppose that we’ve made a convolution layer with 5
filters, each 5 by 5. Then every output it produces will have 5 channels.
If the next layer is also a convolution layer, and we say that we want 2
filters that are 3 by 3, Keras will automatically know to make each filter
5 channels deep, since that’s what’s coming out of the previous layer.
In short, the number of channels in each filter is equal to the number
of channels in the input, which in turn is the number of filters used in
the previous convolution layer.

<img src='https://github.com/rahiakela/img-repo/blob/master/image-channel-inputs-3.PNG?raw=1' width='800'/>

Let’s make a convolution layer with 15 filters, each 3 by 3.

```python
convolution_layer = Conv2D(15, (3, 3))
model.add(convolution_layer)
```

In practice, we usually do this in one step.

```python
model.add(Conv2D(15, (3, 3)))
```

Now that we’ve covered all the background, let’s construct a 2D convolution
layer.

```python
model.add(Conv2D(16, (5, 5), activation='relu', strides=(2, 2), padding='same'),
            input_shape=(image_height, image_width, 1)))
```

Explanation:

* **padding**:To apply zero padding, we set the optional argument padding to the
string ′same′, meaning “make the output the same size as the input.”
The default value of padding is the string ′valid′, which means “only
place the filter where there is valid data available.” This is a longwinded
way of saying, “no padding.”

* **strides**:we set the stride values to (2,2), then our output
will be half the width and height as the input.

* **input_shape**: a CNN layer wants the input_shape to describe not
the shape of the whole data set, but just one sample.Therefore, the value of input_shape is the list (28,28,1), describing
one image.


## Using Convolution for MNIST