# Properties of Covnets

The major difference between a densely connected layer and a convolution layers is that dense layers can learn global patters in their input feature space (patterns involving all the pixels), whereas a convolutional layer can learn local patterns (patterns that don't involve all the pixels).

This characteristic gives covnets two interesting properties
* *The patters they learn are translation invarient* For example, after learning a certain pattern is in the lower-right corner of an iamge, if in another image that pattern is in the upper-left corder a covnet does not have to learn the new patter. Covnets are data efficient when processig image.

* *Covnets can learn spatial heiarchies of patterns* For example the frist layer can learn local patterns like edges, and the second layer can learn larger patterns made of the features from the first layers, and so on. Again, this this makes covnets eficient.

Convolutions operate over 3D tensors called *feature maps*, with two spatial exes (*height* and *width*) as well as depth called the *channel axis* The convolution operation extracts patches from its input feature map and applies the same transformation to all of these patches, producing an *output feature map*. This output feature map is a 3D tensor: it has a width and height. It's depth can be arbitrary, because the output depth is a parameter of the layer, and the different channels in that depth no longer represent convential channels like RBG input; rather, they stand for filters. Filters encode specific aspects of the input data.

A convolution works by sliding a window over the 3D input feature map, stopping at every possible location, and extracting the 3D patch of surface surrounfing features (shape (*window_height*, *window_width*, *input_depth*)). Each 3D patch is then transformed into a tensor of 1D (vector) of shape (*output_depth*,) All these 1D vectors are reassembled into a 3D output map of shape (*height*, *width*, *output_depth*). Every location on the output feature map corresponds to the same location in the input feature map. 

# Instantiate the Network
A convolutional neural network or convnet takes as input tensors of shape (image_height, image_width, image_channels). This does not include the batch dimention or the number of images we have.

In this case, since we are working with the MNIST dataset, the images are of size (28, 28, 1). Therefore we will be passing in the argument `input_shape=(28, 28, 1)` to the first layer

The output of very `Conv2D` and `MaxPooling2D` layer is a 3D tensor of shape (height, width, channels). The width and height dimensions is controlling by the first argument passed to the `Conv2D` layers is 32 or 64.

In [15]:
from keras import layers, models
from keras.datasets import mnist
from keras.utils import to_categorical

First Conv layer outputs 32 filter of size 3x3 

Second Conv layer outputs 64 filter of size 3x3 

Third Conv layer outputs 63 filter of size 3x3 

In [9]:
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation = 'relu', input_shape = (28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation = 'relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(63, (3, 3), activation = 'relu'))

In [10]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_4 (Conv2D)            (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 3, 3, 63)          36351     
Total params: 55,167
Trainable params: 55,167
Non-trainable params: 0
_________________________________________________________________


The next step is to feed the last output tensor from the network above that has shape (3, 3, 64) into a densely connected classifier network into a stack of `Densse` layers. These classifiers processes 1D vectors whereas the current output is a 3D tensor.

In [11]:
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))

Because we are working with the MNIST dataset, this will be a 10-way classification. From the modelsummary we can see that the (3, 3, 64) outputs are flattened into vectors of shape 3*3*64 = (576,)

In [12]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_4 (Conv2D)            (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 3, 3, 63)          36351     
_________________________________________________________________
flatten_4 (Flatten)          (None, 567)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 64)                36352     
__________

# Data Prep

In [16]:
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

In [17]:
train_images = train_images.reshape((60000, 28, 28, 1))
train_images = train_images.astype('float32') / 255
test_images = test_images.reshape((10000, 28, 28, 1))
test_images = test_images.astype('float32') / 255

# One hot encode the data
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

# Train the Covnet

In [19]:
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=5, batch_size=64)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x109dbd390>

In [20]:
test_loss, test_acc = model.evaluate(test_images, test_labels)



We can see that the model reaches a test accuracy of 98.85%.

In [21]:
test_acc

0.9885