# Classifying hand written images 

So far we have only applied machine learning algorithms to tabular data. In this notebook we will classify data that is not structured and hence it is not possible to label the features. As an example we will classify hand written digits in images. Here the features are the pixel values of the images. We will use the MNIST dataset which is a dataset of 28x28 images of hand written digits. The dataset has 60,000 training images and 10,000 test images.

Neural networks are a great choice for such tasks, as they automatically find internal representations of the data that are more meaningful than the raw pixel values. 

 



We start by importing the usual libraries.

In [None]:
import numpy as np 
import pandas as pd

import matplotlib.pyplot as plt 
import seaborn as sns
%matplotlib inline

from sklearn.model_selection import train_test_split

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Input, Reshape, Dense, Activation, Dropout
from tensorflow.keras.optimizers import Adam, RMSprop


In [None]:
import tensorflow as tf
tf.random.set_seed(123)
np.random.seed(123)

### Loading the data 

MNist is a very popular dataset and is available in many libraries. We will use the one available directly in TensorFlow. The dataset is already split into training and test sets. We create a separate validation set by splitting the training set.

In [None]:
# Load data:
mnist = tf.keras.datasets.mnist
(train_val_images, train_val_labels), (test_images, test_labels) = mnist.load_data()

# Split into training / validation
train_images, val_images, train_labels, val_labels = train_test_split(train_val_images, train_val_labels,
                                                                      test_size=0.20, random_state=42)

Now, let's take a look at the data we loaded. The first important thing to check is the shape of the data. We can see that the training data has 48,000 images of 28x28 pixels. The labels are integers from 0 to 9.

In [None]:
print("the shape of the training images:", train_images.shape)
print("the shape of the training labels:", train_labels.shape)
print("labels ", np.unique(train_labels))

Images are best looked at visually instead of just looking at the pixel values. We write a function to plot the image. The function also directly prints the label of the image.

In [None]:
import matplotlib.pyplot as plt

def show_image(i, images, labels):
  image = images[i,:,:]
  label = labels[i] 
  plt.imshow(image, cmap=plt.cm.binary)
  plt.grid(True)
  plt.gca().set_xticks(np.arange(-0.5, image.shape[1], 1))
  plt.gca().set_yticks(np.arange(-0.5, image.shape[0], 1))
  plt.gca().set_xticklabels([])
  plt.gca().set_yticklabels([])

  # we use the color-map (cm) binary in order to see the numbers as
  # black-on-white and not white-on-black.
  plt.title('the image has the label ' + str(label))

show_image(9, train_images, train_labels) # TODO: call the function show_image to show the first image

When we look at the values of the pixels, we see that they are in the range of 0 to 255. We will scale the values to be between 0 and 1. Controlling the scale of the input data is important for neural networks as it helps the optimization algorithm to converge faster.

In [None]:
print("range of pixel values: ", train_images.min(), train_images.max())


In [None]:

# Scale image data:
train_images = train_images / 255.0
val_images = val_images / 255.0
test_images = test_images / 255.0

## The neural network 

Next we build a neural network. We will use a simple neural network with 1 hidden layers. The input layer has 784 neurons, one for each pixel, the hidden layer has 100 neurons and the output layer has 10 neurons. The output layer has 10 neurons because we have 10 classes. The activation function for the hidden layers is ReLU and for the output layer is softmax, as we would like each output neuron to represent the probability of the image belonging to that class.

In [None]:
model = tf.keras.Sequential([
    tf.keras.layers.Input(shape = (28, 28)), # Images are 28x28 pixels 
    tf.keras.layers.Flatten(), # Flatten the images to a 1D vector
    tf.keras.layers.Dense(100, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])

Let's configure the model for training. We will use the Adam optimizer and the `sparse_categorical_crossentropy` loss function. We will also monitor the accuracy of the model.

In [24]:
model.compile(
  optimizer="adam",
  loss="sparse_categorical_crossentropy",
  metrics=["accuracy"]
)

In [None]:
history = model.fit(train_images, 
                    train_labels, 
                    epochs=5, 
                    batch_size=128, 
                    validation_data=(val_images, val_labels)
                    )

In [None]:
plt.plot(history.history["loss"], label="loss")
plt.plot(history.history["val_loss"], label="val_loss")
plt.legend()

Let's look at the accuracy. We see that the accuracy is around 97% on the training set and 96% on the validation set. This is a good accuracy for a simple model.

In [None]:
print("train accuracy: ", history.history["accuracy"][-1])
print("validation accuracy: ", history.history["val_accuracy"][-1])

## Evaluating and predicting

After the network has been trained, we want to examine how well our model can classify new digits that were not seen during training.

To do this, we will first look at how we can use our model and what it returns for an image from the test data.

In [None]:
predictions = model.predict(test_images)
print(predictions.shape)
print('Prediction for first image ',predictions[0,:])

Which class has the highest probability? We can use the `argmax` function to find out.

In [None]:
predicted = predictions.argmax(axis=1)
print('Predicted label', predicted[0])

In [None]:
show_image(0, test_images, test_labels)

### Exercise

- Show the prediction of the class of the first 20 images in the test set and compare it with the actual class.
- Which is the first image that is not classified correctly?
- Plot a confusion matrix for the test set. Which numbers are often confused with each other?
- Visualize this image using the `plot_image` function.
- Evaluate the model by using the evaluate function of the model. Print the accuracy. 
- Train the model for more iterations.
- Expand the model and add more layers. Do the errors in the test set decrease with more layers?
- Change the number of neurons in the layers. What happens when you use more neurons?
- Keep the total number of neurons constant and change the number of layers. What happens when you use more layers? Is it better to have more neurons in one layer or more layers with fewer neurons?