## Convolution layers learn local patterns

- This gives conv layers two main properties:
    * Patterns they learn are translation invariant.
    * They can learn spatial hierarchies. A second layer can learn larger patterns from features of the first layer

2D convolutions operate over 3D tensors, called feature maps. They are composed by 2D spatial axis (heigth, width) and a third axis for channels. For example, RGB images have 3 channels and grayscale images only 1.

The convolution operation is applied to substructures of its input feature map. This operation generates an output feature map which is also a 3D tensor, it has the 2 spatial axis, but its third dimension no longer stands for channels. They are filters, where features learned from the input are encoded.

- Convolutional layers are defined by two parameters:
     * The size of each substructure the convolution operation is going to be applied on.
     * The depth of its output feature map, which is the number of computed filters by the convolution.

- Convolution pseudocode:
    - For every possible location in the input:
        * Slide over a substructure of the input feature map and extract a patch of features with shape (substructure_heigth, substructure_width, substructure_depth)
        * Dot product this patch with the learned weigth matrix(convolutional kernel) into a 1D vector of shape (output_depth)
    - Reassemble each vector into a 3D output feature map

After a convolution operation the output feature map spatial dimensions(height and width) may differ from the input's spatial dimension. This may happen by the usage of strides, or because of border effects.

- Border effects can be countered by the usage of padding: which consists of adding an appropriate number of rows and columns on each side of the input feature map so as to make it possible to fit center convolution windows around every input tile
- When extracting batches it is by default used a stride of 1, where each center tile are considered contiguos. When using >1 strides, the window slides "skipping" one center tile, this way dividing the output by the stride used.

## Maxpooling operation

Used to downsample the feature map. It's similar to the convolution operation, but instead of applying a learned linear transformation(convolutional kernel dot product), a hardcoded max tensor operation is done extracting the max value of each input feature patch.

The main objective of max pooling is by downsampling features, make CNNs have better translation invariance with the benefit of being less complex. 

In [9]:
import plaidml.keras
plaidml.keras.install_backend()

In [11]:
from keras import layers
from keras import models

model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))

In [12]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_7 (Conv2D)            (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_8 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_6 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
conv2d_9 (Conv2D)            (None, 3, 3, 64)          36928     
Total params: 55,744
Trainable params: 55,744
Non-trainable params: 0
_________________________________________________________________


In [13]:
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))

In [14]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_7 (Conv2D)            (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_8 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_6 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
conv2d_9 (Conv2D)            (None, 3, 3, 64)          36928     
_________________________________________________________________
flatten_2 (Flatten)          (None, 576)               0         
_________________________________________________________________
dense_3 (Dense)              (None, 64)                36928     
__________

In [15]:
from keras.datasets import mnist
from keras.utils import to_categorical

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = train_images.reshape((60000, 28, 28, 1))
train_images = train_images.astype('float32') / 255

test_images = test_images.reshape((10000, 28, 28, 1))
test_images = test_images.astype('float32') / 255

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=5, batch_size=64)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x12cf83208>

In [16]:
test_loss, test_acc = model.evaluate(test_images, test_labels)
test_acc



0.9898