In [None]:
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

[![View on GitHub][github-badge]][github-notebook] [![Open In Colab][colab-badge]][colab-notebook] [![Open in Binder][binder-badge]][binder-notebook]

[github-badge]: https://img.shields.io/badge/View-on%20GitHub-blue?logo=GitHub
[colab-badge]: https://colab.research.google.com/assets/colab-badge.svg
[binder-badge]: https://static.mybinder.org/badge_logo.svg

[github-notebook]: https://github.com/mbrukman/stackexchange-answers/blob/main/stackoverflow/74679315/Training_and_testing_LeNet_on_MNIST_using_Keras.ipynb
[colab-notebook]: https://colab.research.google.com/github/mbrukman/stackexchange-answers/blob/main/stackoverflow/74679315/Training_and_testing_LeNet_on_MNIST_using_Keras.ipynb
[binder-notebook]: https://mybinder.org/v2/gh/mbrukman/stackexchange-answers/main?filepath=stackoverflow/74679315/Training_and_testing_LeNet_on_MNIST_using_Keras.ipynb

This notebooks is helping investigate and answer [this Stack Overflow question][1]. Let's start by downloading the same MNIST dataset that's used in the question.

[1]: https://stackoverflow.com/q/74679315/3618671

In [None]:
%%bash

MNIST_PNG="mnist_png.tar.gz"
if ! [ -e "${MNIST_PNG}" ]; then
  curl -sO "https://raw.githubusercontent.com/myleott/mnist_png/master/${MNIST_PNG}"
fi

if ! [ -d "mnist_png" ]; then
  tar zxf "${MNIST_PNG}"
fi

Optionally, you can uncomment the command below to simulate missing data, as the SO question shows that it only has 7 classes of training inputs instead of 10.

For example, with the classes {7, 8, 9} deleted, we find that the training accuracy is still rather high:

* loss: 0.1568
* sparse_categorical_accuracy: 0.9541
* val_loss: 0.0616
* val_sparse_categorical_accuracy: 0.9801

while the test accuracy is much lower:

* loss: 2.1828
* sparse_categorical_accuracy: 0.6873

In [None]:
%%bash

# (Optional) Delete some of the training data dirs to simulate missing data.
# rm -rf mnist_png/training/[789]

Now that we've downloaded the MNIST dataset, let's see what the sizes of the images are.

In [None]:
from collections import defaultdict
import glob
import matplotlib
import PIL

pil_modes = defaultdict(int)
pil_sizes = defaultdict(int)
mpl_sizes = defaultdict(int)

files = (glob.glob('mnist_png/training/[0-9]/*.png') +
         glob.glob('mnist_png/testing/[0-9]/*.png'))
for file in files:
    with PIL.Image.open(file) as pil_image:
        pil_modes[pil_image.mode] += 1
        pil_sizes[pil_image.size] += 1
        mpl_image = matplotlib.image.pil_to_array(pil_image)
        mpl_sizes[mpl_image.shape] += 1

print('PIL modes:', pil_modes.items())
print('PIL sizes:', pil_sizes.items())
print('matplotlib sizes: ', mpl_sizes.items())

PIL modes: dict_items([('L', 70000)])
PIL sizes: dict_items([((28, 28), 70000)])
matplotlib sizes:  dict_items([((28, 28), 70000)])


Both PIL and `matplotlib` agree that the image size is `(28, 28)`.

The PIL image mode `L` is greyscale, so there's only 1 color channel. If the images were RGB, it would have 3 channels, but MNIST has only 1 channel.

So, if the MNIST dataset has $28 \times 28$ images, but the LeNet model says it takes $32 \times 32$ images, how does that work? Well, we can either adjust the images (training, validation, and testing datasets):

* resize the images before passing them to the LeNet model
* pad the images with zeroes before passing them to the LeNet model

or by handling this in the model itself, e.g.,

* use the `padding` feature of the [`Conv2D` layer][conv2d] (via `padding='same'` parameter) to zero-pad images during training and testing

Note that these are mutually-exclusive options, so we can only do either of the following, but not both:

1. resize images when loading with [image_dataset_from_directory()][image_dataset_from_directory] function by specifying `image_size=(32, 32)`
1. specify `image_size(28, 28)` when loading as it is their native size, and then use `Conv2D(..., padding='same')` in the model to zero-pad dynamically

Below, we're using option $(2)$.

Additionally, we need to consider the color channels. Since the MNIST images have a single color channel, the source images have the dimensions `(28, 28, 1)`. However, since [`image_dataset_from_directory()`][image_dataset_from_directory] has a default parameter `color_mode='rgb'`, if we do nothing, it will auto-convert the image from 1 color channel (grayscale) to 3 color channels (RGB), but we don't want that, so we have to explicitly specify `color_mode='grayscale'` below.

[conv2d]: https://keras.io/api/layers/convolution_layers/convolution2d/
[image_dataset_from_directory]: https://www.tensorflow.org/api_docs/python/tf/keras/utils/image_dataset_from_directory

In [None]:
# This code was adapted from SO question: https://stackoverflow.com/q/74679315
# and adjusted with the change as described above.

import tensorflow as tf

train_ds = tf.keras.utils.image_dataset_from_directory(
    'mnist_png/training/',
    validation_split=0.2,
    subset="training",
    seed=123,
    image_size=(28, 28),
    color_mode='grayscale',
    batch_size=100)

val_ds = tf.keras.utils.image_dataset_from_directory(
    'mnist_png/training/',
    validation_split=0.2,
    subset="validation",
    seed=123,
    image_size=(28, 28),
    color_mode='grayscale',
    batch_size=100)

test_ds = tf.keras.utils.image_dataset_from_directory(
    'mnist_png/testing/',
    seed=123,
    image_size=(28, 28),
    color_mode='grayscale',
    batch_size=1000)

Found 60000 files belonging to 10 classes.
Using 48000 files for training.
Found 60000 files belonging to 10 classes.
Using 12000 files for validation.
Found 10000 files belonging to 10 classes.


Here, we see that there are 60000 training images and 10000 test images across 10 classes.

As an aside, since we're already using Keras, there's a much easier way to get the MNIST dataset directly [via Keras][1] in just a single line:

```python
from tensorflow import keras

(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
```

It's the same data as in the PNG repo above, and the images are also of the size $28 \times 28$ and they're greyscale, with 1 color channel.

[keras-mnist]: https://keras.io/api/datasets/mnist/

Let's define the LeNet-5 model as per the SO question, using `AveragePooling2D` in place of the custom Subsampling layer the paper talks about, since it's not provided by Keras.

In [None]:
from tensorflow import keras
from keras import Input, Sequential
from keras.layers import Activation, AveragePooling2D, Conv2D, Dense, Flatten


tanh = keras.activations.tanh
softmax = keras.activations.softmax

model = Sequential([
    Input(shape=(28, 28, 1)),
    Conv2D(filters=6, kernel_size=(5, 5), padding='same', activation=tanh, name='C1'),
    AveragePooling2D(pool_size=(2, 2), strides=(2, 2), name='S2'),
    Activation(tanh, name='S2_act'),
    Conv2D(filters=16, kernel_size=(5, 5), activation=tanh, name='C3'),
    AveragePooling2D(pool_size=(2, 2), strides=(2, 2), name='S4'),
    Activation(tanh, name='S4_act'),
    Conv2D(filters=120, kernel_size=(5, 5), activation=tanh, name='C5'),
    Flatten(name='Flatten'),
    Dense(84, activation=tanh, name='F6'),
    Dense(10, activation=softmax, name='Output'),
], name='LeNet-5')

Above, we constructed the LeNet model using `AveragePooling2D` layer with `tanh` activation. We can also use `MaxPooling2D` layer instead, or implement the [`Subsampling`][subsampling] layer as described in the paper.

[subsampling]: https://github.com/mbrukman/reimplementing-ml-papers/blob/main/lenet/subsampling.py

In [None]:
model.summary()

Model: "LeNet-5"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 C1 (Conv2D)                 (None, 28, 28, 6)         156       
                                                                 
 S2 (AveragePooling2D)       (None, 14, 14, 6)         0         
                                                                 
 S2_act (Activation)         (None, 14, 14, 6)         0         
                                                                 
 C3 (Conv2D)                 (None, 10, 10, 16)        2416      
                                                                 
 S4 (AveragePooling2D)       (None, 5, 5, 16)          0         
                                                                 
 S4_act (Activation)         (None, 5, 5, 16)          0         
                                                                 
 C5 (Conv2D)                 (None, 1, 1, 120)         4812

In [None]:
from tensorflow import keras

model.compile(optimizer=keras.optimizers.Adam(),
              loss=keras.losses.SparseCategoricalCrossentropy(),
              metrics=[keras.metrics.SparseCategoricalAccuracy()])

In [None]:
model.fit(train_ds, epochs=10, validation_data=val_ds)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f1cdf6067c0>

In [None]:
model.evaluate(test_ds)



[0.042519859969615936, 0.9871000051498413]