# CNN with MNIST dataset
> In this post, we will implement various type of CNN for MNIST dataset. In Tensorflow, there are various ways to define CNN model like sequential model, functional model, and sub-class model. We'll simply implement each type and test it.

- toc: true 
- badges: true
- comments: true
- author: Chanseok Kang
- categories: [Python, Deep_Learning, Tensorflow-Keras]
- image: images/CNN_MNIST.png

In [1]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

plt.rcParams['figure.figsize'] = (16, 10)
plt.rc('font', size=15)

## CNN model with sequential API
Previously, we learned basic operation of convolution and max-pooling. Actually, we already implemented simple type of CNN model for MNIST classification, which is manually combined with 2D convolution layer and max-pooling layer. But there are other ways to define CNN model. In this section, we will implement CNN model with Sequential API.

Briefly speaking, we will build the model as follows,

![CNN](image/CNN_MNIST.png)

 3x3 2D convolution layer is defined as an input layer, and post-process with 2x2 max-pooling. And these process will be redundant 3 times, then set fully-connected layer as an output layer for classification. In convolution layer, stride will be 1, and padding will be `same` (that is, we will use half padding). And in max-pooling layer, stride will be 2, and padding will also be `same`.

### Hyperparameter setting
Firstly, we need to define hyperparameter that affect model training. For the review, **hyperparameter** is a parameter whose value is used to control the learning process, such as learning rate, epochs, and batch_size.

In [2]:
learning_rate = 0.001
training_epochs = 15
batch_size = 100

And for the tracking model training, it is helpful to build checkpoint while training the model, so when we the model training is failed due to unexpected reason, we can re-train it with checkpoint.

In [3]:
import os

cur_dir = os.getcwd()
checkpoint_dir = os.path.join(cur_dir, 'checkpoints', 'mnist_cnn_seq')
os.makedirs(checkpoint_dir, exist_ok=True)
checkpoint_prefix = os.path.join(checkpoint_dir, 'mnist_cnn_seq')

### Data Pipelining
Before model implementation, it requires data pipelining, also known as data-preprocess. As you can see from previous example, the original raw data is hardly used directly. So we need to normalize it, convert it, that we can express whole process as an "data-preprocessing".

Note that, the label of each data is class label. So to use it in Neural network model, it needs to encode it as an binary code. Maybe someone already knew it, it is **one-hot** encoding. Luckily, tf.keras also implements `to_categorical` for one-hot encoding.

In [4]:
# One-hot encoding
from tensorflow.keras.utils import to_categorical

# MNIST dataset
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()

# Normalization
X_train = X_train.astype(np.float32) / 255.
X_test = X_test.astype(np.float32) / 255.

# Convert it to 4D array (or we can use np.expand_dims for dimension expansion)
X_train = X_train[..., tf.newaxis]
X_test = X_test[..., tf.newaxis]

# one-hot encoding
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

# Build dataset pipeline
train_ds = tf.data.Dataset.from_tensor_slices((X_train, y_train)).shuffle(buffer_size=100000).batch(batch_size)
test_ds = tf.data.Dataset.from_tensor_slices((X_test, y_test)).batch(batch_size)

### Build model with Sequential API
Building model with Sequential API is similar with previous example. The difference is that Sequential API pre-build the model skeleton, then add each specific layers. In this code, we will build one API to build whole models.

In [5]:
def create_model():
    model = tf.keras.Sequential(name='CNN_Sequential')
    model.add(tf.keras.layers.Conv2D(filters=32, kernel_size=(3, 3), activation=tf.keras.activations.relu,
                                     padding='SAME', input_shape=(28, 28, 1)))
    model.add(tf.keras.layers.MaxPool2D(padding='SAME'))
    model.add(tf.keras.layers.Conv2D(filters=64, kernel_size=(3, 3), activation=tf.keras.activations.relu,
                                     padding='SAME'))
    model.add(tf.keras.layers.MaxPool2D(padding='SAME'))
    model.add(tf.keras.layers.Conv2D(filters=128, kernel_size=(3, 3), activation=tf.keras.activations.relu,
                                     padding='SAME'))
    model.add(tf.keras.layers.MaxPool2D(padding='SAME'))
    model.add(tf.keras.layers.Flatten())
    model.add(tf.keras.layers.Dense(256, activation=tf.keras.activations.relu))
    model.add(tf.keras.layers.Dropout(0.4))
    model.add(tf.keras.layers.Dense(10))
    return model

# Create model
model = create_model()
model.summary()

Model: "CNN_Sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 28, 28, 32)        320       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 14, 14, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 14, 14, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 7, 7, 64)          0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 7, 7, 128)         73856     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 4, 4, 128)         0         
_________________________________________________________________
flatten (Flatten)            (None, 2048)           

Note that, when we directly add the layer, we need to enter the input data for generating output. But in Sequential model, each previous layers node is connected with next layers node automatically, All we need to do is to input the data in the model, then output will be generated from the whole model.

### Loss Function and Gradient 
Same as MLP, we need to define loss function and use gradient descent for finding minimum loss.

In [6]:
# Loss function
def loss_fn(model, images, labels):
    logits = model(images, training=True)
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))
    return loss

# Gradient Function
def grad(model, images, labels):
    with tf.GradientTape() as tape:
        loss = loss_fn(model, images, labels)
    return tape.gradient(loss, model.variables)

### Optimizer and Evaluation
For finding optimum value, we will use "Adam" Optimizer with predifined learning_rate. Also, we need to define evaluation function so that we can check the performance (or accuracy of model).

One more thing, We already mention that checkpoint is required for tracking history. So we will define it here.


In [7]:
# Optimizer
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)

# Evaluation function
def evaluate(model, images, labels):
    logits = model(images, training=False)
    correct_predict = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_predict, tf.float32))
    return accuracy

# Checkpoint
checkpoint = tf.train.Checkpoint(cnn=model)

### Training and Validation
Finally, we can train model with our training dataset. And also we need to check the performance while training the model, so after train the model in each epoch, we will also evaluate the model. 

In [8]:
for e in range(training_epochs):
    avg_loss = 0.
    avg_train_acc = 0.
    avg_test_acc = 0.
    train_step = 0
    test_step = 0
    
    for images, labels in train_ds:
        grads = grad(model, images, labels)
        optimizer.apply_gradients(zip(grads, model.variables))
        loss = loss_fn(model, images, labels)
        acc = evaluate(model, images, labels)
        avg_loss = avg_loss + loss
        avg_train_acc = avg_train_acc + acc
        train_step += 1
    avg_loss = avg_loss / train_step
    avg_train_acc = avg_train_acc / train_step
    
    for images, labels in test_ds:
        acc = evaluate(model, images, labels)
        avg_test_acc = avg_test_acc + acc
        test_step += 1
    avg_test_acc = avg_test_acc / test_step
    
    print("Epoch: {}".format(e + 1),
          "loss: {:.8f}".format(avg_loss),
          "train acc: {:.4f}".format(avg_train_acc),
          "test acc: {:.4f}".format(avg_test_acc))
    
    checkpoint.save(file_prefix=checkpoint_prefix)

Epoch: 1 loss: 0.18465614 train acc: 0.9539 test acc: 0.9866
Epoch: 2 loss: 0.04830704 train acc: 0.9901 test acc: 0.9898
Epoch: 3 loss: 0.03166568 train acc: 0.9929 test acc: 0.9918
Epoch: 4 loss: 0.02311547 train acc: 0.9953 test acc: 0.9916
Epoch: 5 loss: 0.01861383 train acc: 0.9964 test acc: 0.9931
Epoch: 6 loss: 0.01619163 train acc: 0.9972 test acc: 0.9935
Epoch: 7 loss: 0.01229716 train acc: 0.9979 test acc: 0.9908
Epoch: 8 loss: 0.01014238 train acc: 0.9984 test acc: 0.9930
Epoch: 9 loss: 0.00907810 train acc: 0.9986 test acc: 0.9932
Epoch: 10 loss: 0.00838505 train acc: 0.9989 test acc: 0.9935
Epoch: 11 loss: 0.00753370 train acc: 0.9991 test acc: 0.9941
Epoch: 12 loss: 0.00646839 train acc: 0.9992 test acc: 0.9937
Epoch: 13 loss: 0.00608160 train acc: 0.9995 test acc: 0.9916
Epoch: 14 loss: 0.00588523 train acc: 0.9994 test acc: 0.9924
Epoch: 15 loss: 0.00493233 train acc: 0.9995 test acc: 0.9924


### Build model with Functional API
We can find out that it works in Sequential API. Now let's implement it with another approach, the Functional APIs. Whole process will be same, except building model section.

There is some limitation while building model with Sequential API. As you can see from `create_model`, whole layers are connected in one pipeline. But what if we want to use multi-input, or multi-output? And in Sequaltial API, we cannot mannually build the layer block. For instance, [ResNet](https://arxiv.org/abs/1512.03385) uses specific block named **residual block** that contained **skip connection**. But we cannot implement manual block in sequential API. Or we cannot build shared layers, so same layer is called several times.

Actually, building process is almost similar with that of Sequential API. All we need to do is to define input, output, and connect each layers like this,

In [9]:
def create_model_functional():
    inputs = tf.keras.Input(shape=(28, 28, 1))
    conv1 = tf.keras.layers.Conv2D(filters=32, kernel_size=(3, 3), padding='SAME', 
                                   activation=tf.keras.activations.relu)(inputs)
    pool1 = tf.keras.layers.MaxPool2D(padding='SAME')(conv1)
    conv2 = tf.keras.layers.Conv2D(filters=64, kernel_size=(3, 3), padding='SAME',
                                   activation=tf.keras.activations.relu)(pool1)
    pool2 = tf.keras.layers.MaxPool2D(padding='SAME')(conv2)
    conv3 = tf.keras.layers.Conv2D(filters=128, kernel_size=(3, 3), padding='SAME',
                                   activation=tf.keras.activations.relu)(pool2)
    pool3 = tf.keras.layers.MaxPool2D(padding='SAME')(conv3)
    pool3_flat = tf.keras.layers.Flatten()(pool3)
    dense4 = tf.keras.layers.Dense(units=256, activation=tf.keras.activations.relu)(pool3_flat)
    drop4 = tf.keras.layers.Dropout(rate=0.4)(dense4)
    logits = tf.keras.layers.Dense(units=10)(drop4)
    return tf.keras.Model(inputs=inputs, outputs=logits)

In [10]:
model = create_model_functional()
model.summary()

Model: "functional_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 28, 28, 1)]       0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 28, 28, 32)        320       
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 14, 14, 32)        0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 14, 14, 64)        18496     
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 7, 7, 64)          0         
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 7, 7, 128)         73856     
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 4, 4, 128)        

As you can see the summary of model, the total parameter is the same as previous one. Interest thing is that the default name is defined as "functional_x". From these, we can found out that our new model is implemented with functional API.

One more example, in Residual block, we can implement skip connection like this,

![skip connection](image/skip_connection.png)

```python
inputs = tf.keras.Input(shape=(28, 28, 256))
conv1 = tf.keras.layers.Conv2D(filters=64, kernel_size=(1, 1), padding='SAME', activation=tf.keras.activations.relu)(inputs)
conv2 = tf.keras.layers.Conv2D(filters=64, kernel_size=(3, 3), padding='SAME', activation=tf.keras.activations.relu)(conv1)
conv3 = tf.keras.layers.Conv2D(filters=256, kernel_size=(1, 1), padding='SAME')(conv2)
# skip connection
add3 = tf.keras.layers.add([conv3, inputs])
relu3 = tf.keras.activations.relu(add3)
model = tf.keras.Model(inputs=inputs, outputs=relu3)
```

### Build model with Model Subclassing
The other way to build model is Subclassing. Technically, it is defined model with python Class. Model Subclassing is the approach to build a fully-customizable model by subclassing `tf.keras.Model`. So we can define the inital implementation like layer, node parameter on `__init__` method, and forward pass on `call` method.

In [13]:
class CNNModel(tf.keras.Model):
    def __init__(self):
        super(CNNModel, self).__init__()
        self.conv1 = tf.keras.layers.Conv2D(filters=32, kernel_size=(3, 3), padding='SAME',
                                            activation=tf.keras.activations.relu)
        self.pool1 = tf.keras.layers.MaxPool2D(padding='SAME')
        self.conv2 = tf.keras.layers.Conv2D(filters=64, kernel_size=(3, 3), padding='SAME',
                                            activation=tf.keras.activations.relu)
        self.pool2 = tf.keras.layers.MaxPool2D(padding='SAME')
        self.conv3 = tf.keras.layers.Conv2D(filters=128, kernel_size=(3, 3), padding='SAME',
                                            activation=tf.keras.activations.relu)
        self.pool3 = tf.keras.layers.MaxPool2D(padding='SAME')
        self.pool3_flat = tf.keras.layers.Flatten()
        self.dense4 = tf.keras.layers.Dense(units=256, activation=tf.keras.activations.relu)
        self.drop4 = tf.keras.layers.Dropout(rate=0.4)
        self.dense5 = tf.keras.layers.Dense(units=10)
    
    def call(self, inputs, training=False):
        net = self.conv1(inputs)
        net = self.pool1(net)
        net = self.conv2(net)
        net = self.pool2(net)
        net = self.conv3(net)
        net = self.pool3(net)
        net = self.pool3_flat(net)
        net = self.dense4(net)
        net = self.drop4(net)
        net = self.dense5(net)
        return net

In [18]:
model = CNNModel()

Actually, we just instantiate the `CNNModel` class, so the connection is not connected when instantiates. If we want to find the summary of this network, we need to build it or fit it with some data.

In [19]:
model.build(input_shape=(1, 28, 28, 1))
model.summary()

Model: "cnn_model_5"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_18 (Conv2D)           multiple                  320       
_________________________________________________________________
max_pooling2d_18 (MaxPooling multiple                  0         
_________________________________________________________________
conv2d_19 (Conv2D)           multiple                  18496     
_________________________________________________________________
max_pooling2d_19 (MaxPooling multiple                  0         
_________________________________________________________________
conv2d_20 (Conv2D)           multiple                  73856     
_________________________________________________________________
max_pooling2d_20 (MaxPooling multiple                  0         
_________________________________________________________________
flatten_6 (Flatten)          multiple                  

Same as before model. 

### Build model with Model Ensemble
The last method to build model is Ensemble method. Actually, the keyword **ensemble**