# A Quick Guide to Deep Learning with Python

Kai Zhang, Duke Kunshan University, 2022

# Lecture 6 Convolutional Neural Network (CNN or ConvNet)

**References**:

[Wikipedia] https://en.wikipedia.org/wiki/Convolution

[Amidi] https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-convolutional-neural-networks

[Fchollet] https://keras.io/examples/vision/mnist_convnet/

[Jordan] https://www.jeremyjordan.me/convnet-architectures/

[Dumoulin] https://arxiv.org/abs/1603.07285

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import math

In [None]:
#X_train = np.rint(train_images.reshape(60000, 28*28).astype(np.float32) / 255)
#X_test = np.rint(test_images.reshape(10000, 28*28).astype(np.float32) / 255)

# Convolutional layer

**receptive field**

**weight sharing**

Advantages of CNN

* translation-invariant
* spatial hierarchies of patterns

<figure>
 <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/2/21/Comparison_convolution_correlation.svg/1280px-Comparison_convolution_correlation.svg.png"  width = "600" >
 <figcaption align="left">Figure 1. Visual comparison of convolution, cross-correlation, and autocorrelation (Wikipedia). 
 </figcaption>
</figure>

**convolution**
\begin{equation}
(f*g)(t) = \int_{-\infty}^{\infty} f(\tau) g(t - \tau) d\tau =\int_{-\infty}^{\infty}  f(t-\tau) g(\tau) d\tau = (g*f)(t)
\end{equation}

**cross-correlation**
\begin{equation}
(f \star g)(t) = \int_{-\infty}^{\infty} f(\tau) g(t + \tau) d\tau =(g \star f)(-t) \ne \int_{-\infty}^{\infty} f(t+\tau) g(\tau) d\tau =(g \star f)(t)
\end{equation}



rank-3 **tensor**

**input layer**: size = [input_height, input_width, input_depth (color channel)]

**(convolution) kernel** (**filter**) (patch):  size = [kernel_size, kernel_size, input_depth, output_depth (filter number)]

**output layer**: size = [output_height, output_width, output_depth]

**feature map** size = [height, width, 1]

**stride**

**padding**
* same: pad with zeros to make the same size as input feature map
* valid (default): no padding such that kernel windows do not go outside the boundary of input feature map.

\begin{equation}
output\_width = \left\lfloor \frac{input\_width + 2\times padding\_size - kernel\_size}{stride} \right\rfloor + 1
\end{equation}

Number of parameters in each convolutional layer
\begin{equation}
parameter\_number = (kernel\_size\times kernel\_size\times input\_depth + 1) \times output\_depth
\end{equation}
where "$+1$" is for the bias.


# Pooling layer

To downsample feature maps (usually size is halved). No parameters are needed.

* max pooling
* average pooling


<figure>
 <img src="https://www.jeremyjordan.me/content/images/2018/04/AlexNet-CNN-architecture-layers.png"  width = "800" >
 <figcaption align="left">Figure 2. Architecture for ImageNet Classification with Deep Convolutional Neural Networks. 
 </figcaption>
</figure>

<figure>
 <img src="https://www.jeremyjordan.me/content/images/2018/04/vgg16.png"  width = "800" >
 <figcaption align="left">Figure 3. Architecture for Very Deep Convolutional Networks for Large-Scale Image Recognition (VGG16). 
 </figcaption>
</figure>

# Keras implementation for MNIST problem

In [2]:
from tensorflow import keras
from tensorflow.keras import layers

In [3]:
from tensorflow.keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


In [4]:
# Scale images to the [0, 1] range
x_train = x_train.astype("float32") / 255
x_test = x_test.astype("float32") / 255
print("x_train shape:", x_train.shape)

x_train shape: (60000, 28, 28)


In [5]:
# Make sure images have shape (28, 28, 1)
x_train = np.expand_dims(x_train, -1)
x_test = np.expand_dims(x_test, -1)
print("x_train shape:", x_train.shape)
print(x_train.shape[0], "train samples")
print(x_test.shape[0], "test samples")

x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples


In [6]:
# Model / data parameters
num_classes = 10
input_shape = (28, 28, 1)

# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
print(y_train[0].shape, y_train[0])

(10,) [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]


In [7]:
model = keras.Sequential(
    [
        keras.Input(shape=input_shape),
        layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Flatten(),
        layers.Dropout(0.5),
        layers.Dense(num_classes, activation="softmax"),
    ]
)

model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 26, 26, 32)        320       
                                                                 
 max_pooling2d (MaxPooling2D  (None, 13, 13, 32)       0         
 )                                                               
                                                                 
 conv2d_1 (Conv2D)           (None, 11, 11, 64)        18496     
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 5, 5, 64)         0         
 2D)                                                             
                                                                 
 flatten (Flatten)           (None, 1600)              0         
                                                                 
 dropout (Dropout)           (None, 1600)              0

Note that this CNN has way more less parameters than fully connected NN.

In [8]:
batch_size = 128
epochs = 15

model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=0.1)

Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


<keras.callbacks.History at 0x7f9659d2b390>

In [9]:
score = model.evaluate(x_test, y_test, verbose=0)
print("Test loss:", score[0])
print("Test accuracy:", score[1])

Test loss: 0.02420741319656372
Test accuracy: 0.9919000267982483


# Exercise: analyze the architecture and parameters of above CNN for MNIST