# MNIST convnet

This notebook will re-create the pytorch model and training in https://github.com/pete88b/data-science/blob/master/myohddac/notebooks/010_mnist_training.ipynb

In [1]:
%load_ext autoreload
%autoreload 2
from utils.all import *
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.datasets import mnist
import numpy as np

In [2]:
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = train_images.reshape((60000, 28, 28, 1)).astype("float32") / 255
test_images = test_images.reshape((10000, 28, 28, 1)).astype("float32") / 255

Here's the pytorch model that we want to re-produce

```
def conv_block(in_channels, out_channels, padding=1):
    return nn.Sequential(
        nn.Conv2d(in_channels, out_channels, 3, padding=padding),
        nn.ReLU(),
        nn.MaxPool2d((2,2)))

def fc_block(in_features, out_features):
    return nn.Sequential(
        nn.Linear(in_features, out_features),
        nn.ReLU())

def new_model(c_out=10):
    return nn.Sequential(
        conv_block(3, 32, padding=2),
        conv_block(32, 32),
        conv_block(32, 32),
        nn.Flatten(),
        fc_block(288, 84),
        nn.Linear(84, c_out))
```

In [3]:
def new_model():
    inputs = layers.Input(shape=train_images.shape[1:])
    x = layers.Conv2D(filters=32, kernel_size=3, padding='same', activation='relu')(inputs)
    x = layers.MaxPool2D(2)(x)
    x = layers.Conv2D(filters=32, kernel_size=3, padding='same', activation='relu')(x)
    x = layers.MaxPool2D(2)(x)
    x = layers.Conv2D(filters=32, kernel_size=3, padding='same', activation='relu')(x)
    x = layers.MaxPool2D(2)(x)
    x = layers.Flatten()(x)
    x = layers.Dense(units=84, activation='relu')(x)
    outputs = layers.Dense(units=10, activation='softmax')(x)
    return keras.Model(inputs=inputs, outputs=outputs)

We see fewer params in the 1st conv layer as we're using 1 channel input for the keras model.

We used `padding=2` for the 1st conv in the pytorch model, which we can replicate by running
```
train_images, test_images = [np.pad(i, [[0,0], [1,1], [1,1], [0,0]]) for i in [train_images, test_images]]
```
before we create the model.

`padding=2` for the 1st conv can help if you have useful info all the way to the edges of your images (padding with reflection rather than zeros is usually best). As MNIST images are centered, the edge pixels are all zero - which is probably why padding the images doesn't make a difference to accuracy.

In [4]:
model = new_model()
model.summary()

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 28, 28, 1)]       0         
                                                                 
 conv2d (Conv2D)             (None, 28, 28, 32)        320       
                                                                 
 max_pooling2d (MaxPooling2D  (None, 14, 14, 32)       0         
 )                                                               
                                                                 
 conv2d_1 (Conv2D)           (None, 14, 14, 32)        9248      
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 7, 7, 32)         0         
 2D)                                                             
                                                                 
 conv2d_2 (Conv2D)           (None, 7, 7, 32)          9248  

In [5]:
%%time
model = new_model()
model.compile(optimizer='rmsprop', # increasing learning rate doesn't seem to help
              loss='sparse_categorical_crossentropy', 
              metrics=['accuracy'])
history = model.fit(train_images, train_labels, 
                    validation_data=[test_images, test_labels], 
                    batch_size=32)

CPU times: total: 3min 23s
Wall time: 48 s


We probably need one cycle to get the same accuracy as the fastai trained model, but reducing batch size helps;
- `batch_size=128` gives us `val_accuracy: 0.9776`
    - 469 training steps in 21s (Training on a CPU)
- `batch_size=32` gives us `val_accuracy: 0.9858`
    - 1875 training steps in 26s
    
Note: fastai also uses Adam by default, but we're sticking with the keras recommendation of rmsprop.

You can get this model to >99% validation accuracy by training for 5 epochs with batch size 64.

Most of the time, I think normalizing images helps when training a model from random weights - not sure why, but we seem to get better MNIST results when pixel values are scaled to be between 0 and 1.

If we want to normalize inputs, we could do something like &darr;

In [6]:
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
mean, std = train_images.mean(), train_images.std()
def prep_images(images):
    images = images[..., None].astype("float32")
    return (images - mean) / std
train_images, test_images = prep_images(train_images), prep_images(test_images)