#**First Example: MNIST Again**

Recall, the very first example we worked on was predicting digits from handwriting, using the MNIST dataset. Let's refresh on how that simple model performed. It was pretty impressive; it exceeds 98% accuracy. But, would you be comfortable deploying this at the USPS?

What does a 2% error mean on this problem? It's worse than it appears. Think about the number of digits in a single address. There is a zip code, with 5 digits in it. If I make a mistake on any of the 5 digits, a sorting error will result. That means an error of 2% on a single prediction translates to a 10% error rate at the zipcode level (2+2+2+2+2). Even worse, addresses have house numbers in them (say there are 3 digits in the average house number, that means another 2+2+2, so our error rate is actually more like 16% at the address level. That's actually really bad!

In [None]:
from tensorflow.keras.datasets import mnist
from tensorflow import keras
from tensorflow.keras import layers

# Load the data.
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

# Pre-process the data, to flatten the images into vectors and scale the values.
train_images = train_images.reshape((len(train_images), 28*28))
train_images = train_images.astype("float32") / 255
test_images = test_images.reshape((len(test_images), 28*28))
test_images = test_images.astype("float32") / 255

# And reshape these to make sure the second dimension is formally defined as a 1 (else you'll get a shape error in the model.fit() call)
train_labels = train_labels.reshape(len(train_labels),1)
test_labels = test_labels.reshape(len(test_labels),1)

model = keras.Sequential([
    layers.Dense(512, activation="relu"),
    layers.Dense(10, activation="softmax")
])

model.compile(optimizer="rmsprop", loss="sparse_categorical_crossentropy", metrics=["accuracy"])

history = model.fit(train_images, train_labels, epochs=10, batch_size=128, validation_split=0.2)

Epoch 1/10
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 11ms/step - accuracy: 0.8570 - loss: 0.4836 - val_accuracy: 0.9563 - val_loss: 0.1511
Epoch 2/10
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 11ms/step - accuracy: 0.9622 - loss: 0.1327 - val_accuracy: 0.9683 - val_loss: 0.1094
Epoch 3/10
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 14ms/step - accuracy: 0.9742 - loss: 0.0861 - val_accuracy: 0.9714 - val_loss: 0.0939
Epoch 4/10
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 12ms/step - accuracy: 0.9834 - loss: 0.0572 - val_accuracy: 0.9753 - val_loss: 0.0850
Epoch 5/10
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 11ms/step - accuracy: 0.9877 - loss: 0.0428 - val_accuracy: 0.9746 - val_loss: 0.0812
Epoch 6/10
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 14ms/step - accuracy: 0.9913 - loss: 0.0314 - val_accuracy: 0.9782 - val_loss: 0.0751
Epoch 7/10
[1m375/375

And now we evaluate performance on the test set...

In [None]:
model.evaluate(test_images, test_labels)

[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.9776 - loss: 0.0738


[0.06304054707288742, 0.9812999963760376]

So, we really need extremely high accuracy for this use case. Let's try a simple convnet, and see how it performs in comparison. It'll have a lot more parameters and take longer to train, of course, but the effort will be worth it. This model can get up to 99.5% error. Using similar logic to the above, this translates to an error rate of about 0.5*8 = 4% at the address level. Still not great, but it's much better.

# **Now Let's Try a ConvNet**

In [1]:
from tensorflow.keras.datasets import mnist
from tensorflow import keras
from tensorflow.keras import layers

# Re-load the data so each image is back in 'square' format
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

# Pre-process the data - notice we are not flattening the images into vectors anymore! We are keeping them in a higher-rank tensor format.
# This is important, because the Conv2D layer is designed specifically for image data! It's going to scan over subsets of the image to identify features.
# The format here thus translates to: (observations,image_width,image_height,colors - we have one color, gray).
# I am 'reshaping' the tensors to explicity declare a color dimension, with a value of 1, and again formally declaring that the labels are rank-1 tensors.
train_images = train_images.reshape((60000, 28, 28, 1))
train_images = train_images.astype("float32") / 255
test_images = test_images.reshape((10000, 28, 28, 1))
test_images = test_images.astype("float32") / 255
train_labels = train_labels.reshape(60000,1)
test_labels = test_labels.reshape(10000,1)

print(train_images.shape)

# Define the Convnet. Functional API, so we have to declare the input layer. We don't declare the batch size here, that happens when we call fit(). Only len, width, colors.
inputs = keras.Input(shape=(28, 28, 1))
cnn_1 = layers.Conv2D(filters=32, kernel_size=3, activation="relu")(inputs)
pool_1 = layers.MaxPooling2D(pool_size=2)(cnn_1)
cnn_2 = layers.Conv2D(filters=64, kernel_size=3, activation="relu")(pool_1)
pool_2 = layers.MaxPooling2D(pool_size=2)(cnn_2)
cnn_3 = layers.Conv2D(filters=128, kernel_size=3, activation="relu")(pool_2)
flatten = layers.Flatten()(cnn_3)
outputs = layers.Dense(10, activation="softmax")(flatten)
model = keras.Model(inputs=inputs, outputs=outputs)

# Compile the network model.
model.compile(optimizer="rmsprop",
              loss="sparse_categorical_crossentropy",
              metrics=["accuracy"])

#keras.utils.plot_model(model, show_shapes=True, dpi=60)
model.summary()


Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
[1m11490434/11490434[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 0us/step
(60000, 28, 28, 1)


In [2]:
# Fit the model.
history = model.fit(train_images, train_labels, epochs=10, batch_size=128, validation_split=0.2)

Epoch 1/10
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 6ms/step - accuracy: 0.8433 - loss: 0.5210 - val_accuracy: 0.9786 - val_loss: 0.0724
Epoch 2/10
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.9798 - loss: 0.0641 - val_accuracy: 0.9833 - val_loss: 0.0568
Epoch 3/10
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.9866 - loss: 0.0425 - val_accuracy: 0.9859 - val_loss: 0.0487
Epoch 4/10
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.9910 - loss: 0.0310 - val_accuracy: 0.9890 - val_loss: 0.0387
Epoch 5/10
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.9939 - loss: 0.0221 - val_accuracy: 0.9893 - val_loss: 0.0372
Epoch 6/10
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.9944 - loss: 0.0169 - val_accuracy: 0.9900 - val_loss: 0.0359
Epoch 7/10
[1m375/375[0m 

And how did we do on the test data?

In [3]:
model.evaluate(test_images,test_labels)

[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 4ms/step - accuracy: 0.9904 - loss: 0.0362


[0.029544537886977196, 0.9926999807357788]