Image
Image Classification with the MNIST Dataset
In this section we will do the "Hello World" of deep learning: training a deep learning model to correctly classify hand-written digits.

Objectives
Understand how deep learning can solve problems traditional programming methods cannot
Learn about the MNIST handwritten digits dataset
Use the Keras API to load the MNIST dataset and prepare it for training
Create a simple neural network to perform image classification
Train the neural network using the prepped MNIST dataset
Observe the performance of the trained neural network
The Problem: Image Classification
In traditional programming, the programmer is able to articulate rules and conditions in their code that their program can then use to act in the correct way. This approach continues to work exceptionally well for a huge variety of problems.

Image classification, which asks a program to correctly classify an image it has never seen before into its correct class, is near impossible to solve with traditional programming techniques. How could a programmer possibly define the rules and conditions to correctly classify a huge variety of images, especially taking into account images that they have never seen?

The Solution: Deep Learning
Deep learning excels at pattern recognition by trial and error. By training a deep neural network with sufficient data, and providing the network with feedback on its performance via training, the network can identify, though a huge amount of iteration, its own set of conditions by which it can act in the correct way.

The MNIST Dataset
In the history of deep learning, the accurate image classification of the MNIST dataset, a collection of 70,000 grayscale images of handwritten digits from 0 to 9, was a major development. While today the problem is considered trivial, doing image classification with MNIST has become a kind of "Hello World" for deep learning.

Here are 40 of the images included in the MNIST dataset:

Image
Training and Validation Data and Labels
When working with images for deep learning, we need both the images themselves, usually denoted as X, and also, correct labels for these images, usually denoted as Y. Furthermore, we need X and Y values both for training the model, and then, a separate set of X and Y values for validating the performance of the model after it has been trained. Therefore, we need 4 segments of data for the MNIST dataset:

x_train: Images used for training the neural network
y_train: Correct labels for the x_train images, used to evaluate the model's predictions during training
x_valid: Images set aside for validating the performance of the model after it has been trained
y_valid: Correct labels for the x_valid images, used to evaluate the model's predictions after it has been trained
The process of preparing data for analysis is called Data Engineering. To learn more about the differences between training data and validation data (as well as test data), check out this article by Jason Brownlee.

Loading the Data Into Memory (with Keras)
There are many deep learning frameworks, each with their own merits. In this workshop we will be working with Tensorflow 2, and specifically with the Keras API. Keras has many useful built in functions designed for the computer vision tasks. It is also a legitimate choice for deep learning in a professional setting due to its readability and efficiency, though it is not alone in this regard, and it is worth investigating a variety of frameworks when beginning a deep learning project.

One of the many helpful features that Keras provides are modules containing many helper methods for many common datasets, including MNIST.

We will begin by loading the Keras dataset module for MNIST:

In [None]:
from tensorflow.keras.datasets import mnist

With the mnist module, we can easily load the MNIST data, already partitioned into images and labels for both training and validation:

In [None]:
# the data, split between train and validation sets
(x_train, y_train), (x_valid, y_valid) = mnist.load_data()

Exploring the MNIST Data
We stated above that the MNIST dataset contained 70,000 grayscale images of handwritten digits. By executing the following cells, we can see that Keras has partitioned 60,000 of these images for training, and 10,000 for validation (after training), and also, that each image itself is a 2D array with the dimensions 28x28:

In [None]:
x_train.shape

In [None]:
x_valid.shape

Furthermore, we can see that these 28x28 images are represented as a collection of unsigned 8-bit integer values between 0 and 255, the values corresponding with a pixel's grayscale value where 0 is black, 255 is white, and all other values are in between:

In [None]:
x_train.dtype

In [None]:
x_train.min()

In [None]:
x_train.max()

In [None]:
x_train[0]

Using Matplotlib, we can render one of these grayscale images in our dataset:

In [None]:
import matplotlib.pyplot as plt

image = x_train[0]
plt.imshow(image, cmap='gray')

In this way we can now see that this is a 28x28 pixel image of a 5. Or is it a 3? The answer is in the y_train data, which contains correct labels for the data. Let's take a look:

In [None]:
y_train[0]

Preparing the Data for Training¶
In deep learning, it is common that data needs to be transformed to be in the ideal state for training. For this particular image classification problem, there are 3 tasks we should perform with the data in preparation for training:

Flatten the image data, to simplify the image input into the model
Normalize the image data, to make the image input values easier to work with for the model
Categorize the labels, to make the label values easier to work with for the model
Flattening the Image Data
Though it's possible for a deep learning model to accept a 2-dimensional image (in our case 28x28 pixels), we're going to simplify things to start and reshape each image into a single array of 784 continuous pixels (note: 28x28 = 784). This is also called flattening the image.

Here we accomplish this using the helper method reshape:

In [None]:
x_train = x_train.reshape(60000, 784)
x_valid = x_valid.reshape(10000, 784)

We can confirm that the image data has been reshaped and is now a collection of 1D arrays containing 784 pixel values each:

In [None]:
x_train.shape

In [None]:
x_train[0]

Normalizing the Image Data
Deep learning models are better at dealing with floating point numbers between 0 and 1 (more on this topic later). Converting integer values to floating point values between 0 and 1 is called normalization, and a simple approach we will take here to normalize the data will be to divide all the pixel values (which if you recall are between 0 and 255) by 255: