In [8]:
from keras import layers
from keras import models
from keras.datasets import mnist
from keras.utils import to_categorical

## MNIST example but using convnet instead of densely connected network

In [5]:
model = models.Sequential()
## convnet take input as tensor of shape = (image_height, image_width, image_channels), not include batch dim
model.add(layers.Conv2D(32, (3,3), activation='relu', input_shape=(28,28,1)))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(64, (3,3), activation='relu'))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(64, (3,3), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))

In [6]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_7 (Conv2D)            (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_8 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_6 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
conv2d_9 (Conv2D)            (None, 3, 3, 64)          36928     
_________________________________________________________________
flatten_1 (Flatten)          (None, 576)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 64)                36928     
__________

In [9]:
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

# feature engineering
train_images = train_images.reshape((60000, 28, 28, 1))
train_images = train_images.astype('float32') / 255
test_images = test_images.reshape((10000, 28, 28, 1))
test_images = test_images.astype('float32') / 255
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

In [10]:
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])

In [11]:
model.fit(train_images, train_labels, epochs=5, batch_size=64)

Instructions for updating:
Use tf.cast instead.
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f052a795be0>

## 5.1.1 The convolution operation
Dense layer learn global pattern in input feature space (for MNIST, it's all pixels)<br>
convolution layer learn local pattern (it's small 2D windows of image, for example)

Properties of convnets:
 - pattern learned are <i>translation invariants</i>: learn pattern from bottom-right of image can recognize it on other corner of image => efficient when processing images.
 - learn spatial hierarchies of pattern: a first conv will learn local pattern, a second learn larger pattern made of feature of first layer, and so on. 

Operation:
 - know input with shape = (input_width, input_height, input_channels)
 - define number of filter, first parameter of layers.Conv2D(...), in MNIST number of filter is 32
 - define shape of window kernel shape, in MNIST window is 3x3<br>
 
So Conv will slide window over input, get patches from input, dot product patch with kernel => join it to have output.<br>
Output will be (output_width, output_height, number_of_filter)

## Understanding border effect and padding
If input is 5x5 => output of convnet is 3x3 because only 9 tiles in 5x5 can form a 3x3 grid.<br>
=> If need output is 5x5, use padding => make input is 7x7 (adding border) then output will be 5x5.<br>
There is <i>padding</i> argument in Conv2D layers, which can take one of two values:
 - "valid" : default, no padding
 - "same" : padding to have output same shape as input

## Understanding convolution strides
parameter of Conv2D named "stride": distance between center tiles of convolution windows, default is 1.<br>
=> downsample feature maps.<br>
Example: 
 - 5x5 input, stride = 1 (default) => 9 output 3x3
 - 5x5 input, stride = 2 => 4 output 3x3

## 5.1.2 The max-pooling operation
input go through layer MaxPooling2D => output with size = 1/2 size of input.<br>
Example: 26x26 + MaxPooling2D => 13x13.<br>
How:
 - slide 2x2 window, stride = 2
 - each window apply max(window), instead of convolution