In [1]:
from tensorflow.keras import layers
from tensorflow.keras import models

In [11]:
from keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

In [12]:
model = models.Sequential()
model.add(layers.Conv2D(32,(3,3),activation='relu',input_shape=(28,28,1)))
model.add(layers.MaxPooling2D(2,2))
model.add(layers.Conv2D(64,(3,3),activation='relu'))
model.add(layers.MaxPooling2D(2,2))
model.add(layers.Conv2D(64,(3,3),activation='relu'))

input is a tensor of shape (image_height,image_width,image_channels). MNIST is (28,28,1)

the number of filters are 32 and 64

First need to reshape the 3 D tensors to 1 D tensors then input them into the dense layers

In [13]:
model.add(layers.Flatten())
model.add(layers.Dense(64,activation='relu'))
model.add(layers.Dense(10,activation='softmax'))

3*3*64 = 576 is the input to the dense layer. Output is 10 neurons as the classes are ten and activation is softmax for probability.

The first convolutional layer takes a feature map of size (28,28,1) and outputs a feature map of size (26,26,32). It computes 32 filters over its input. Each of these 32 output channels contains a 26*26 grid of values which is a response map of the filter over the input indicating the response of that filter patterns at different locations of the input. This is feature map. 

The size of the patches extracted from the inputs are (3 * 3) or (5 * 5). Here it is (3*3)

The depth of the output feature map. the number of filters computed by the convolution. from 32 to 64

Conv2D(output_depth, (window_height, window_width))
A convolution works by sliding these windows of size 3 × 3 or 5 × 5 over the 3D input
feature map, stopping at every possible location, and extracting the 3D patch of surrounding features (shape (window_height, window_width, input_depth)).
Each
such 3D patch is then transformed (via a tensor product with the same learned weight
matrix, called the convolution kernel) into a 1D vector of shape (output_depth,). All of
these vectors are then spatially reassembled into a 3D output map of shape (height,
width, output_depth). Every spatial location in the output feature map corresponds
to the same location in the input feature map (for example, the lower-right corner of
the output contains information about the lower-right corner of the input). For
instance, with 3 × 3 windows, the vector output[i, j, :] comes from the 3D patch
input[i-1:i+1, j-1:j+1, :].

Before the first MaxPooling2D layers, the feature map is 26 × 26, but the max-pooling operation halves it to 13 × 13.
The max pooling downsample feature maps.

Max pooling consists of extracting windows from the input feature maps and outputting the max value of each channel.
max pooling is usually done with 2 × 2 windows and stride 2, in order to downsample the feature maps by a factor of 2. convolution is typically done with 3 × 3 windows and no stride (stride 1).

In [14]:
model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_3 (Conv2D)           (None, 26, 26, 32)        320       
                                                                 
 max_pooling2d_2 (MaxPooling  (None, 13, 13, 32)       0         
 2D)                                                             
                                                                 
 conv2d_4 (Conv2D)           (None, 11, 11, 64)        18496     
                                                                 
 max_pooling2d_3 (MaxPooling  (None, 5, 5, 64)         0         
 2D)                                                             
                                                                 
 conv2d_5 (Conv2D)           (None, 3, 3, 64)          36928     
                                                                 
 flatten_1 (Flatten)         (None, 576)              

Reshaping and Scaling

In [15]:
train_images = train_images.reshape(60000,28,28,1)
train_images = train_images.astype('float32')/255

In [16]:
test_images = test_images.reshape(10000,28,28,1)
test_images = test_images.astype('float32')/255

In [17]:
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

In [18]:
model.compile(optimizer='rmsprop',loss='categorical_crossentropy', metrics=['accuracy'])

In [19]:
model.fit(train_images,train_labels,epochs=5,batch_size=64)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7fc61c5eb090>

Evaluate the model on the test data

In [20]:
test_loss, test_accuracy = model.evaluate(test_images, test_labels)
test_accuracy



0.9890999794006348