# Basics of convolutional neural network  

<strong>Abstract</strong>  
In this notebook, a convolutional neural network will be build using [Keras](https://keras.io). A basic usage will be described.
- Number of parameters.
- Sizes of inputs and outputs.
- [Convolution](https://keras.io/layers/convolutional/)
- [Max pooling](https://keras.io/layers/pooling/)
- Padding

<strong>Reference</strong>  
See pages 120-129 of "<strong>Deep Learning with Python</strong>" by Francois Chollet (2018). 

<strong>Summary</strong>  
- Convolution layers learn local patters in contrast to dense layers that learn global patters (page 122, 123, see also Fig. 5.2).  
- The patterns that the convolutional neural networks learn are translation invariant (page 123). 
- The convolution is typycally done with the convolution window is 3 x 3 or 5 x 5 (page 124) and no stride (page 128).
- The max-pooling operation is used (page 127) to downsample the feature map, which results in reducing the number of feature-map coefficients to process (page 128). This operation is done with 2 x 2 windows with stride 2 in order to downsample the feature maps by a factor of 2 (page 127-128). 
- Even though there are other ways to achieve downsampling, max pooling tends to work better (page 129).

### Number of parameters

<strong>model.add(layers.Conv2D(filters=32,kernel_size=(3,3), strides= 1, activation='relu', input_shape=(28,28,1)))</strong>   
- filters=32: The number of output filters in the convolution. In other words, it is the depth of the output feature map. This layer computes 32 filters over its input. 
- kernel_size=(3,3): Size of a filter (i.e., the height and width of the 2D convolution window). This is typically 3 x 3 or 5 x 5. 
- strides= 1: Strides of the convolution.
input_shape=(28,28,1): Height, width, and depth.
- More information: [keras documentation](https://keras.io/layers/convolutional/)    
         
<strong>model.add(layers.MaxPooling2D(pool_size=(2,2),strides=2))</strong>  
- pool_size=(2,2): Size of the max pooling windows
- strides=2: Factor by which to downscale. 
- More information: [keras documentation](https://keras.io/layers/pooling/)

In [54]:
from keras import layers
from keras import models

model = models.Sequential()
model.add(layers.Conv2D(filters=32,kernel_size=(3,3), strides= 1, activation='relu', input_shape=(28,28,1)))
model.add(layers.MaxPooling2D(pool_size=(2,2),strides=2))
model.add(layers.Conv2D(64,(3,3), activation='relu'))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(64,(3,3), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation = 'relu' ))
model.add(layers.Dense(10,activation='softmax'))

model.summary()

Model: "sequential_29"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_83 (Conv2D)           (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d_56 (MaxPooling (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_84 (Conv2D)           (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_57 (MaxPooling (None, 5, 5, 64)          0         
_________________________________________________________________
conv2d_85 (Conv2D)           (None, 3, 3, 64)          36928     
_________________________________________________________________
flatten_27 (Flatten)         (None, 576)               0         
_________________________________________________________________
dense_53 (Dense)             (None, 64)              

<strong>model.summary()</strong> shows the structure of a neural network. The number of parameters (Param #) is shown in the table above and calculated in the next cell. Note that there are not parameters in the max pooling layers.

In [31]:
print("Number of parameters")
print("conv2d (1):", 3*3*1*32+32) # (filter width) x (filter height) x (input image channel) + (bias)
print("conv2d (2):", 3*3*32*64+64) # (filter width) x (filter height) x (input channel size) x (output channel size) + (bias) 
print("conv2d (3):", 3*3*64*64+64) # (filter width) x (filter height) x (input channel size) x (output channel size) + (bias) 
print("dense (1):", 576*64+64) # (input channel size) x (output channel size) + (bias = output channel size)
print("dense (2):", 64 * 10 +10) # # (input channel size) x (output channel size) + (bias = output channel size)

Number of parameters
conv2d (1): 320
conv2d (2): 18496
conv2d (3): 36928
dense (1): 36928
dense (2): 650


Let us calculate the number of parameters for another neural network in order to get familiarized with parameters.

In [32]:
model = models.Sequential()
model.add(layers.Conv2D(128,(5,5), activation='relu', input_shape=(28,28,3)))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(128,(3,3), activation='relu'))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(64,(3,3), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation = 'relu' ))
model.add(layers.Dense(10,activation='softmax'))

model.summary()

Model: "sequential_9"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_25 (Conv2D)           (None, 24, 24, 128)       9728      
_________________________________________________________________
max_pooling2d_17 (MaxPooling (None, 12, 12, 128)       0         
_________________________________________________________________
conv2d_26 (Conv2D)           (None, 10, 10, 128)       147584    
_________________________________________________________________
max_pooling2d_18 (MaxPooling (None, 5, 5, 128)         0         
_________________________________________________________________
conv2d_27 (Conv2D)           (None, 3, 3, 64)          73792     
_________________________________________________________________
flatten_9 (Flatten)          (None, 576)               0         
_________________________________________________________________
dense_17 (Dense)             (None, 64)               

In [33]:
print("Number of parameters")
print("conv2d (1):", 5*5*3*128+128) # (filter width) x (filter height) x (input image channel) + (bias)
print("conv2d (2):", 3*3*128*128+128) # (filter width) x (filter height) x (input channel size) x (output channel size) + (bias) 
print("conv2d (3):", 3*3*128*64+64) # (filter width) x (filter height) x (input channel size) x (output channel size) + (bias) 
print("dense (1):", 576*64+64) # (input channel size) x (output channel size) + (bias = output channel size)
print("dense (2):", 64 * 10 +10) # # (input channel size) x (output channel size) + (bias = output channel size)

Number of parameters
conv2d (1): 9728
conv2d (2): 147584
conv2d (3): 73792
dense (1): 36928
dense (2): 650


### Dependence of the output shape on the stride and padding

Let us check how stride and padding affect the output shape. If you do not understand explanations below, check figures 5.5 and 5.6 on page 126,

1. Since the size of the convolution window is (3,3), the output shape (not including a depth axis) is reduced by 2, i.e., (28-2, 28-2, 32) = (26,26,32).

In [66]:
model = models.Sequential()
model.add(layers.Conv2D(32,(3,3), strides=1,padding='valid' ,activation='relu', input_shape=(28,28,1)))
model.summary()

Model: "sequential_41"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_97 (Conv2D)           (None, 26, 26, 32)        320       
Total params: 320
Trainable params: 320
Non-trainable params: 0
_________________________________________________________________


2. Because strides = 2, the output shape is almost the half of the input shape.

In [64]:
model = models.Sequential()
model.add(layers.Conv2D(32,(3,3), strides=2, padding='valid' ,activation='relu', input_shape=(28,28,1)))
model.summary()

Model: "sequential_39"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_95 (Conv2D)           (None, 13, 13, 32)        320       
Total params: 320
Trainable params: 320
Non-trainable params: 0
_________________________________________________________________


3. Because the padding is activated, the output shape is the same as the input shape.

In [67]:
model = models.Sequential()
model.add(layers.Conv2D(32,(3,3), strides=1,padding='same' ,activation='relu', input_shape=(28,28,1)))
model.summary()

Model: "sequential_42"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_98 (Conv2D)           (None, 28, 28, 32)        320       
Total params: 320
Trainable params: 320
Non-trainable params: 0
_________________________________________________________________


4. Since the size of the convolution window is (5,5), the output shape (not including a depth axis) is reduced by 4, i.e., (28-4, 28-4, 32) = (24,24,32)

In [69]:
model = models.Sequential()
model.add(layers.Conv2D(32,(5,5), strides=1,padding='valid' ,activation='relu', input_shape=(28,28,1)))
model.summary()

Model: "sequential_44"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_100 (Conv2D)          (None, 24, 24, 32)        832       
Total params: 832
Trainable params: 832
Non-trainable params: 0
_________________________________________________________________


## Application to MNIST data

Let us get and process MNIST data.

In [6]:
from keras.datasets import mnist
from keras.utils import to_categorical

# get data
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

# normalization and reshaping
train_images=train_images.reshape((60000, 28, 28, 1)) # specify the RGB channel. 
train_images=train_images.astype('float32')/255 # normalization

test_images=test_images.reshape((10000, 28,28, 1)) # change the shape
test_images=test_images.astype('float32')/255 # normalization

# show sizes of data
print("train_images shape: {}".format(train_images.shape))
print("train_labels shape: {}".format(train_labels.shape))
print("test_images shape: {}".format(test_images.shape))
print("test_labels shape: {}".format(test_labels.shape))

# one-hot encoding
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)
print("\nAfter implementing one-hot encoding, the sizes of labels are:")
print("train_labels shape: ", train_labels.shape)
print("test_labels shape: ", test_labels.shape)


Downloading data from https://s3.amazonaws.com/img-datasets/mnist.npz
train_images shape: (60000, 28, 28, 1)
train_labels shape: (60000,)
test_images shape: (10000, 28, 28, 1)
test_labels shape: (10000,)

After implementing one-hot encoding, the sizes of labels are:
train_labels shape:  (60000, 10)
test_labels shape:  (10000, 10)


A model will be defined and trained in the next cell.

In [9]:
model = models.Sequential()
model.add(layers.Conv2D(32,(3,3), activation='relu', input_shape=(28,28,1)))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(64,(3,3), activation='relu'))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(64,(3,3), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation = 'relu' ))
model.add(layers.Dense(10,activation='softmax'))

model.compile(optimizer = 'rmsprop',
                      loss = 'categorical_crossentropy',
                       metrics=['accuracy'])
history = model.fit(train_images, train_labels, epochs = 5, batch_size=64)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


The loss and accuracy for test dataset will be shown.

In [10]:
results = model.evaluate(test_images, test_labels)
results



[0.032889669519441486, 0.99]