__How to Create a Neural Network for Handwritten Digit Recognition Using TensorFlow__

Neural networks are computational models inspired by the human brain's structure and function. They are used in machine learning and artificial intelligence to recognize patterns, make decisions, and solve complex problems. Here's a detailed breakdown of neural networks:

__Basic Concepts__

1. __Neurons:__
   - The basic units of neural networks, analogous to neurons in the brain.
   - Each neuron receives one or more inputs, processes them, and outputs a result.

2. __Layers:__
   - __Input Layer:__ The first layer that receives the initial data (e.g., pixel values of an image).
   - __Hidden Layers:__ Intermediate layers that perform various transformations and computations on the input data.
   - __Output Layer:__ The final layer that produces the network's output (e.g., classifying an image as a digit from 0 to 9).

3. __Weights and Biases:__
   - __Weights:__ Parameters that adjust the input's importance in the neuron’s computation.
   - __Biases:__ Additional parameters that adjust the output along with the weights.

4. __Activation Functions:__
   - Functions that introduce non-linearity into the network, enabling it to learn complex patterns.
   - Common activation functions include ReLU (Rectified Linear Unit), sigmoid, and tanh.

__How Neural Networks Work:__

1. __Forward Propagation:__
   - Data is passed from the input layer through the hidden layers to the output layer.
   - Each neuron computes a weighted sum of its inputs, adds a bias, and applies an activation function to produce its output.

2. __Loss Function:__
   - A function that measures the difference between the network's predictions and the actual target values.
   - Common loss functions include Mean Squared Error (MSE) for regression and Cross-Entropy Loss for classification.

3. __Backward Propagation:__
   - The process of adjusting the network's weights and biases to minimize the loss function.
   - Uses algorithms like gradient descent to update the parameters in the direction that reduces the error.

__Types of Neural Networks:__

1. __Feedforward Neural Networks:__
   - The simplest type where connections between the nodes do not form cycles.
   - Typically used for tasks like image classification and regression.

2. __Convolutional Neural Networks (CNNs):__
   - Specialized for processing grid-like data such as images.
   - Uses convolutional layers to automatically and adaptively learn spatial hierarchies of features.

3. __Recurrent Neural Networks (RNNs):__
   - Designed for sequential data processing, such as time series or natural language.
   - Has connections that form directed cycles, allowing information to persist.

4. __Generative Adversarial Networks (GANs):__
   - Consists of two networks (a generator and a discriminator) that compete against each other to generate new, synthetic data samples.

__Applications:__

- __Image and Speech Recognition:__ Identifying objects in images or transcribing spoken words to text.
- __Natural Language Processing:__ Tasks like sentiment analysis, language translation, and chatbots.
- __Autonomous Vehicles:__ Enabling self-driving cars to recognize and react to their environment.
- __Healthcare:__ Assisting in diagnosing diseases and personalizing treatments based on patient data.

Neural networks have revolutionized many fields by enabling computers to perform complex tasks with high accuracy. As research and technology continue to advance, their capabilities and applications are expanding rapidly.

In this guide, I will implement a small subsection of object recognition digit recognition. Using TensorFlow, an open source Python library developed by the Google Brain for deep learning research, I will take hand-drawn images of the numbers 0-9 and build and train a neural network to recognize and predict the correct label for the digit displayed.

__Importing the MNIST Dataset__

The dataset I will be using in this guide is called the __MNIST__ dataset, and it is a classic in the machine learning community. This dataset is made up of images of handwritten digits, 28*28 pixels in size.

I'll create a Python program to work with this dataset. 

In [28]:
import tensorflow as tf
from sklearn.model_selection import train_test_split
import numpy as np

# Load your MNIST data
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()

# Normalize the images to [0, 1]
train_images = train_images.astype('float32') / 255.0
test_images = test_images.astype('float32') / 255.0

# Reshape images to add a channel dimension (required for some models)
train_images = train_images[..., np.newaxis]
test_images = test_images[..., np.newaxis]

# Define the number of examples for each dataset
n_train = 55000
n_validation = 5000
n_test = 10000  # MNIST test set has 10,000 examples

# Split the training data into training and validation sets
train_images, val_images, train_labels, val_labels = train_test_split(
    train_images, train_labels, test_size=n_validation, train_size=n_train, stratify=train_labels
)

# Create TensorFlow datasets
batch_size = 32

train_dataset = tf.data.Dataset.from_tensor_slices((train_images, train_labels))
train_dataset = train_dataset.shuffle(n_train).batch(batch_size).prefetch(tf.data.experimental.AUTOTUNE)

validation_dataset = tf.data.Dataset.from_tensor_slices((val_images, val_labels))
validation_dataset = validation_dataset.batch(batch_size).prefetch(tf.data.experimental.AUTOTUNE)

test_dataset = tf.data.Dataset.from_tensor_slices((test_images, test_labels))
test_dataset = test_dataset.batch(batch_size).prefetch(tf.data.experimental.AUTOTUNE)

# Print the number of examples in each set
print(f"Number of training examples: {n_train}")
print(f"Number of validation examples: {n_validation}")
print(f"Number of test examples: {n_test}")

# Print dataset shapes to verify
for image, label in train_dataset.take(1):
    print(f"Train - Image shape: {image.shape}, Label shape: {label.shape}")

for image, label in validation_dataset.take(1):
    print(f"Validation - Image shape: {image.shape}, Label shape: {label.shape}")

for image, label in test_dataset.take(1):
    print(f"Test - Image shape: {image.shape}, Label shape: {label.shape}")





Number of training examples: 55000
Number of validation examples: 5000
Number of test examples: 10000
Train - Image shape: (32, 28, 28, 1), Label shape: (32,)
Validation - Image shape: (32, 28, 28, 1), Label shape: (32,)
Test - Image shape: (32, 28, 28, 1), Label shape: (32,)


When reading in the data, we are using one-hot encoding to represent the labels (the actual digit drawn, e.g. "3") of the images. One hot encoding uses a vector of binary values to represent numeric or categorical values. As our labels are for the digits 0-9, the vector contains ten values, one for each possible digit. One of these values is set to 1, to represent the digit at that index of the vector, and the rest are set to 0. For example, the digit 3 is represented using the vector [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]. As the value at index 3 is stored as 1, the vector therefore represents the digit 3. 

To represent the actual images themselves, the 28*28 pixels are flattened into a 1D vector which is 784 pixels in size. Each of the 784 pixels making up the image is stored as a value between 0 and 255. This determines the grayscale of the pixel, as our images are presented in black and white only. So a black pixel is represented by 255, and a white pixel by 0, with the various shades of gray somewhere in between.

We can use the __mnist__ variable to find out the size of the dataset we have just imported. Looking at the __num_examples__ for each of the three subsets, we can determine that the dataset has been split into 55,000 images for training, 5000 for validation, and 10,000 for testing.

Now that we have our data imported, it's time to think about the neural network.

__Defining the Neural Network Architecture__

The architecture of the neural network refers to elements such as the number of layers in the network, the number of units in each layer, and how the units are connected between layers. As neural networks are loosely inspired by the workings of the human brain, here the term unit is used to represent what we would biologically think of as a neuron. Like neurons passing signals around the brain, units take some values from previous units as input, perform a computation, and then pass on the new value as output to other units. These units are layered to form the network, starting at a minimum with one layer to output values. The term hidden layer is used for all of the layers in between the input and output layers, i.e. those "hidden" from the real world.

Different architectures can yield dramatically different results, as the performance can be thought of as a function of the architecture among other things, such as the parameters, the data, and the duration of training.

Adding the following lines of code to my file to store the number of units per layer in global variables. This allows us to alter the network architecture in one place, and at the end of the guide we can test for ourself how different numbers of layers and units will impact the results of our model:

__Define the architecture:__

Input layer: 784 neurons (28x28 pixels, flattened).
First hidden layer: 512 neurons.
Second hidden layer: 256 neurons.
Third hidden layer: 128 neurons.
Output layer: 10 neurons (for the 10 digit classes).

In [29]:
n_input = 784  # input layer (28x28 pixels)
n_hidden1 = 512  # 1st hidden layer
n_hidden2 = 256  # 2nd hidden layer
n_hidden3 = 128  # 3rd hidden layer
n_output = 10  # output layer (0-9 digits

The term "deep neural network" relates to the number of hidden layers, with "shallow" usually meaning just one hidden layers, and "deep" referring to multiple hidden layers. Given enough training data, a shallow neural network with a sufficient number of units should theoretically be able to represent any function that a deep neural network can. But it is more computationally efficient to use a smaller deep neural network to achieve the same task that would require a shallow network with exponentially more hidden units. Shallow neural networks also often encounter overfitting, where the network essentially memorizes the training data that it has seen, and not able to generalize the knowledge to new data. This is why deep neural networks are more commonly used: the multiple layers between the raw input data and the ouput label allow the network to learn features at various levels of abstraction, making the network itself better able to generalize.

Other elements of the neural network that need to be defined here are the hyperparameters. Unlike the parameters that will get updated during training, these values are set initially and remain constant throughout the process. I will add the following variables and values:

In [30]:
learning_rate = 1e-4
n_iterations = 1000
batch_size = 128
dropout = 0.5

The learning rate represents how much the parameters will adjust at each step of the learning process. These adjustments are a key component of training: after each pass through the network we tune the weights slightly to try and reduce the loss. Larger learning rates can converge faster, but also have the potential to overshoot the optimal values as they are updated. The number of iterations refers to how many times we go through the training step, and the batch size refers to how many training examples we are using at each step. The __dropout__ variable represents a threshold at which we eliminate some units at random. We will be using __dropout__ in our final hidden layer to give each unit a 50% chance of being eliminated at every training step. This helps prevent overfitting.

I have defined the architecture of our neural network, and the hyperparameters that impact the learning process. The next step is to build the network as a TensorFlow graph.

__Building the TensorFlow Graph__

To build our network, we will set up the network as a computational graph for TensorFlow to execute. The core concept of TensorFlow is the tensor, a data structure similar to an array or list, which are initialized, manipulated as they are passed through the graph, and updated through the learning process. I'll start by defining three tensors as placeholders, which are tensors that we'll feed values into later.

In [31]:
# Define the input and output dimensions
n_input = 784  # Example input dimension (e.g., for MNIST dataset)
n_output = 10  # Example output dimension (e.g., for 10 classes in MNIST)
n_hidden1 = 256 # Number of neurons in the first hidden layer
n_hidden2 = 128 # Number of neurons in the second hidden layer
n_hidden3 = 64 # Number of neurons in the third hidden layer

# Define the input layer
inputs = tf.keras.Input(shape=(n_input,))

# Define a dense layer
dense_output = tf.keras.layers.Dense(n_output, activation='softmax')(inputs)

# Define a dropout layer
dropout_output = tf.keras.layers.Dropout(rate=0.5)(dense_output)

# Create the model
model = tf.keras.Model(inputs=inputs, outputs=dropout_output)

# Compile the model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Print the model summary
model.summary()

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 784)]             0         
_________________________________________________________________
dense (Dense)                (None, 10)                7850      
_________________________________________________________________
dropout (Dropout)            (None, 10)                0         
Total params: 7,850
Trainable params: 7,850
Non-trainable params: 0
_________________________________________________________________


The only parameter that needs to be specified at its declaration is the size of the data we will be feeding in. For X we use a shape of [None, 784], where None represents any amount, as we will be feeding in an undefined number of 784 pixel images. The shape of Y is [None, 10] as we will be using it for an undefined number of label outputs, with 10 possible classes. The __keep_prob__ tensor is used to control the dropout rate, and we initialize it as a placeholder rather than an immutable variable because we want to use the same tensor both for training (when __dropout__ is set to 0.5) and testing (when__dropout is set to 1.0).

The parameters that the network will update in the training process are the __weight__ and __bias__ values, so for these we need to set an initial value rather than an empty placeholder. These values are essentially where the network does its learning, as they are used in the activation functions of the neurons, representing the strength of the connections between units.

Since the values are optimized during training, we could set them to zero for now. But the initial value actually has a siginificant impact on the final accuracy of the model. We'll use random values from a truncated normal distribution for the weights. We want them to be close to zero, so they can adjust in either a positive or negative direction, and slighlty different, so they generate different errors. This will ensure that the model learns something useful.

In [32]:
weights = {
    'w1': tf.Variable(tf.random.truncated_normal ([n_input, n_hidden1], stddev=0.1)),
    'w2': tf.Variable(tf.random.truncated_normal ([n_hidden1, n_hidden2], stddev=0.1)),
    'w3': tf.Variable(tf.random.truncated_normal ([n_hidden2, n_hidden3], stddev=0.1)),
    'out': tf.Variable(tf.random.truncated_normal ([n_hidden3, n_output], stddev=0.1))
}

# Print the weights to verify (optional)
for key, value in weights.items():
    print(f"{key}: {value}")

w1: <tf.Variable 'Variable:0' shape=(784, 256) dtype=float32, numpy=
array([[ 0.10912701, -0.06028754, -0.08777511, ...,  0.16331856,
         0.02426625, -0.14737217],
       [ 0.08802983,  0.01761909,  0.00662895, ...,  0.06846601,
         0.03951794, -0.00520563],
       [ 0.06956836, -0.09908468,  0.07674159, ...,  0.10094736,
        -0.08234289, -0.09580178],
       ...,
       [-0.0915952 , -0.03328989, -0.12329942, ..., -0.09815805,
        -0.03031763,  0.04160902],
       [-0.06959183, -0.00157581,  0.0276113 , ..., -0.03845313,
        -0.02102679, -0.03816045],
       [ 0.02080665,  0.12605421, -0.03502177, ..., -0.04341035,
        -0.05843858, -0.03363249]], dtype=float32)>
w2: <tf.Variable 'Variable:0' shape=(256, 128) dtype=float32, numpy=
array([[ 7.9279914e-02,  2.6675394e-02, -7.4648432e-02, ...,
        -8.3644018e-03, -1.4972430e-01,  6.9708094e-02],
       [ 1.1843636e-01, -8.3254322e-02,  1.7807755e-01, ...,
        -1.3897742e-01, -3.5175565e-04, -2.0673510e-02

In [33]:
biases = {
    'b1': tf.Variable(tf.constant(0.1, shape=[n_hidden1])),
    'b2': tf.Variable(tf.constant(0.1, shape=[n_hidden2])),
    'b3': tf.Variable(tf.constant(0.1, shape=[n_hidden3])),
    'out': tf.Variable(tf.constant(0.1, shape=[n_output]))
}

Next, I will set up the layers of the network by defining the operations that will manipulate the tensors. 

In [34]:
import tensorflow as tf
from tensorflow.keras import layers
# Define placeholders for input data and dropout probability
X = tf.keras.Input(shape=(n_input,))

# Define the neural network layers
layer_1 = tf.add(tf.matmul(X, weights['w1']), biases['b1'])
layer_2 = tf.add(tf.matmul(layer_1, weights['w2']), biases['b2'])
layer_3 = tf.add(tf.matmul(layer_2, weights['w3']), biases['b3'])

layer_drop = layers.Dropout(rate=0.5)(layer_3)
output_layer = tf.add(tf.matmul(layer_drop, weights['out']), biases['out'])

The following Variables were used a Lambda layer's call (tf.linalg.matmul), but
are not present in its tracked objects:
  <tf.Variable 'Variable:0' shape=(784, 256) dtype=float32>
It is possible that this is intended behavior, but it is more likely
an omission. This is a strong indication that this layer should be
formulated as a subclassed Layer rather than a Lambda layer.
The following Variables were used a Lambda layer's call (tf.math.add), but
are not present in its tracked objects:
  <tf.Variable 'Variable:0' shape=(256,) dtype=float32>
It is possible that this is intended behavior, but it is more likely
an omission. This is a strong indication that this layer should be
formulated as a subclassed Layer rather than a Lambda layer.
The following Variables were used a Lambda layer's call (tf.linalg.matmul_1), but
are not present in its tracked objects:
  <tf.Variable 'Variable:0' shape=(256, 128) dtype=float32>
It is possible that this is intended behavior, but it is more likely
an o

The only parameter that needs to be specified at its declaration is the size of the data we will be feeding in. For X we use a shape of [None, 784], where None represents any amount, as we will be feeding in an undefined number of 784-pixel images. The shape of Y is [None, 10] as we will be using it for an undefined number of label ouputs, with 10 possible classes. The __dropout__ is set to 0.5 for training and 1.0 for testing. 

The parameters that the network will update in the training process are the __weight__ and __bias__ values, so for these we need to set an initial value rather than an empty placeholder. These values are essentially where the network does its learning, as they are used in the activation functions of the neurons, representing the strength of connections between units.

Since the values are optimized during training, we could set them to zero for now. But the initial value actually has a significant impact on the final accuracy of the model. We'll use random values from a truncated normal disribution for the weights. We want them to be close to zero, so they can adjust in either a positive or negative direction, and slightly different, so they generate different errors. This will ensure that the model learns something useful.

In [35]:
weights = {
    'w1': tf.Variable(tf.random.truncated_normal([n_input, n_hidden1], stddev=0.1)),
    'w2': tf.Variable(tf.random.truncated_normal([n_hidden1, n_hidden2], stddev=0.1)),
    'w3': tf.Variable(tf.random.truncated_normal([n_hidden2, n_hidden3], stddev=0.1)),
    'out': tf.Variable(tf.random.truncated_normal([n_hidden3, n_output], stddev=0.1)),
}

For the bias, we use a small constant value to ensure that the tensors activate in the initial stages and therefore contribute to the propagation. The weights and bias tensors are stored in dictionary objects for ease of access.

In [36]:
biases = {
    'b1': tf.Variable(tf.constant(0.1, shape=[n_hidden1])),
    'b2': tf.Variable(tf.constant(0.1, shape=[n_hidden2])),
    'b3': tf.Variable(tf.constant(0.1, shape=[n_hidden3])),
    'out': tf.Variable(tf.constant(0.1, shape=[n_output]))
}

Next, set up the layers of the network by defining the operations that will manipulate the tensors.

In [37]:
layer_1 = tf.add(tf.matmul(X, weights['w1']), biases['b1'])
layer_2 = tf.add(tf.matmul(layer_1, weights['w2']), biases['b2'])
layer_3 = tf.add(tf.matmul(layer_2, weights['w3']), biases['b3'])
layer_drop = layers.Dropout(rate=0.5)(layer_3)
output_layer = tf.add(tf.matmul(layer_drop, weights['out']), biases['out'])

The following Variables were used a Lambda layer's call (tf.linalg.matmul_4), but
are not present in its tracked objects:
  <tf.Variable 'Variable:0' shape=(784, 256) dtype=float32>
It is possible that this is intended behavior, but it is more likely
an omission. This is a strong indication that this layer should be
formulated as a subclassed Layer rather than a Lambda layer.
The following Variables were used a Lambda layer's call (tf.math.add_4), but
are not present in its tracked objects:
  <tf.Variable 'Variable:0' shape=(256,) dtype=float32>
It is possible that this is intended behavior, but it is more likely
an omission. This is a strong indication that this layer should be
formulated as a subclassed Layer rather than a Lambda layer.
The following Variables were used a Lambda layer's call (tf.linalg.matmul_5), but
are not present in its tracked objects:
  <tf.Variable 'Variable:0' shape=(256, 128) dtype=float32>
It is possible that this is intended behavior, but it is more likely


Each hidden layer will execute matrix multiplication on the previous layer's outputs and the current layer's weights, and add the bias to these values. At the last hidden layer, we will apply a dropout operation using our value of 0.5.

The final step in building the graph is to define the loss function that we want to optimize. A poplular choice of loss function in TensorFlow programs is cross-entropy, also known as log-loss, which quantifies the difference between two probability distributions (the predictions and the labels). A perfect classification would result in a cross-entropy of 0, with the loss completely minimized.

We also need to choose the optimization algorithm which will be used to minimize the loss function. A process named gradient descent optimization is a common method for finding the (local) minimum of a function by taking iterative steps along the gradient in a negative (descending) direction.There are several choices of gradient decsent optimization algorithms already implemented in TensofFlow, and this guide we will be using the __Adam Optimizer__. This extends upon gradient descent optimization by using momentum to speed up the process through computing an exponentially weighted average of the gradients and using that in the adjustments.

In [38]:
# Complile the model
from tensorflow.keras import layers, optimizers
model.compile(optimizer=optimizers.Adam(learning_rate=1e-4),
loss='categorical_crossentropy',
metrics=['accuracy'])

We've now defined the network and built it out with TensoFlow. The next step is to feed the data through the graph to train it, and then test that it has actually learnt something.

The training process involves feeding the training dataset through the graph and optimizing the loss function. Every time the network iterates through a batch of more training images, it updates the parameters to reduce the loss in order to more accurately predict the digits shown. The testing process involves running our testing dataset through the trained graph, and keeping track of the number of images that are correctly predicted, so that we can calculate the accuracy.

Before starting the training process, we will define our method of evaluating the accuracy so we can print it out on mini-batches of data while we train. These printed statements will allow us to check that from the first iteration to the last, loss decreases and accuracy increases; they will also allow us to track whether or not we have ran enough iterations to reach a consistent and optimal result:

In __correct_pred__, we use the __arg_max__ function to compare which images are being predicted correctly by looking at the __output_layer__ (predictions) and Y (labels), and we use the __equal__ function to return this as a list of __Booleans__. We can then cast this list to floats and calculate the mean to get a total accuracy score.

We are now ready to initialize a session for running the graph. In this session we will feed the network with our training examples, and once trained, we feed the same graph with new test examples to determine the accuracy of the model.

In [40]:
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers, optimizers
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.datasets import mnist
# Define the dimensions of each layer
n_input = 784       # Example input size, like for MNIST dataset (28x28 images)
n_hidden1 = 256     # Example size of the first hidden layer
n_hidden2 = 128     # Example size of the second hidden layer
n_hidden3 = 64      # Example size of the third hidden layer
n_output = 10       # Example output size, like for MNIST dataset (10 classes)

# Define the neural network model using Keras Sequential API
model = tf.keras.Sequential([
    layers.Input(shape=(n_input,)),
    layers.Dense(n_hidden1, activation='relu', kernel_initializer=tf.initializers.TruncatedNormal(stddev=0.1)),
    layers.Dense(n_hidden2, activation='relu', kernel_initializer=tf.initializers.TruncatedNormal(stddev=0.1)),
    layers.Dense(n_hidden3, activation='relu', kernel_initializer=tf.initializers.TruncatedNormal(stddev=0.1)),
    layers.Dropout(rate=0.5),
    layers.Dense(n_output, activation='softmax', kernel_initializer=tf.initializers.TruncatedNormal(stddev=0.1))
])

# Compile the model
model.compile(optimizer=optimizers.Adam(learning_rate=1e-4),
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Example data for training (replace with actual data)
# Note: Ensure your data is preprocessed and ready for training
# train_X: Training features
# train_Y: Training labels (one-hot encoded)
# test_X: Testing features
# test_Y: Testing labels (one-hot encoded)
# Replace these lines with actual data loading
train_X = np.random.rand(60000, n_input)  # Replace with actual training features
train_Y = to_categorical(np.random.randint(0, n_output, 60000), n_output)  # Replace with actual training labels
test_X = np.random.rand(10000, n_input)  # Replace with actual testing features
test_Y = to_categorical(np.random.randint(0, n_output, 10000), n_output)  # Replace with actual testing labels

# Train the model
num_epochs = 10
batch_size = 100

# Fit the model on the training data
history = model.fit(train_X, train_Y, epochs=num_epochs, batch_size=batch_size, validation_data=(test_X, test_Y))

# Evaluate the model on the test data
test_loss, test_accuracy = model.evaluate(test_X, test_Y)
print(f'Test Accuracy: {test_accuracy:.4f}')

# Making predictions and calculating accuracy
predictions = model.predict(test_X)
predicted_classes = np.argmax(predictions, axis=1)
true_classes = np.argmax(test_Y, axis=1)

# Calculate accuracy using TensorFlow operations
correct_pred = tf.equal(predicted_classes, true_classes)
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

print(f'Accuracy calculated using TensorFlow operations: {accuracy.numpy():.4f}')

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Test Accuracy: 0.0929
Accuracy calculated using TensorFlow operations: 0.0929


The essence of the training process in deep learning is to optimize the loss function. Here we are aiming to minimize the difference between the predicted labels of the images, and the true label of the images. The process involves four steps which are repeated for a set number of iterations:

- Propagate values forward through the network
- Compute the loss
- Propagate values backward through the network
- Update the parameters
At each training step, the parameters are adjusted slightly to try and reduce the loss for the next step. As the learning progresses, we should see a reduction in loss, and eventually we can stop training and use the network as a model for testing our new data.

In [2]:
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers, optimizers
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.datasets import mnist

# Load and preprocess the MNIST dataset
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = train_images.reshape((60000, 28 * 28)).astype('float32') / 255
test_images = test_images.reshape((10000, 28 * 28)).astype('float32') / 255

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

# Create TensorFlow datasets
batch_size = 100
train_dataset = tf.data.Dataset.from_tensor_slices((train_images, train_labels)).shuffle(60000).batch(batch_size)
test_dataset = tf.data.Dataset.from_tensor_slices((test_images, test_labels)).batch(batch_size)

# Define the neural network model
model = tf.keras.Sequential([
    layers.Dense(256, activation='relu', kernel_initializer=tf.initializers.TruncatedNormal(stddev=0.1)),
    layers.Dense(128, activation='relu', kernel_initializer=tf.initializers.TruncatedNormal(stddev=0.1)),
    layers.Dense(64, activation='relu', kernel_initializer=tf.initializers.TruncatedNormal(stddev=0.1)),
    layers.Dropout(rate=0.5),
    layers.Dense(10, activation='softmax', kernel_initializer=tf.initializers.TruncatedNormal(stddev=0.1))
])

# Compile the model
optimizer = optimizers.Adam(learning_rate=1e-4)
loss_object = tf.keras.losses.CategoricalCrossentropy()
model.compile(optimizer=optimizer, loss=loss_object, metrics=['accuracy'])

# Train the model
epochs = 10
history = model.fit(train_dataset, epochs=epochs, validation_data=test_dataset)

# Evaluate the model on the test set
test_loss, test_accuracy = model.evaluate(test_dataset)
print(f'\nTest accuracy: {test_accuracy:.4f}')


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10

Test accuracy: 0.9654


To try and improve the accuracy of our model, or to learn more about the impact of tuning hyperparameters, we can test the effect of changing the learning rate, the dropout threshold, the batch size, and the number of iterations. We can also change the number of units in our hidden layers, and change the amount of hidden layers themselves, to see how different architectures increase or decrease the model accuracy. To dempnstrate that the network is actually recognizing the hand-drawn images, let's test it on a single image of our own. I will be using  a sample image downloaded from curl.

In [44]:
import requests

from PIL import Image

file_path = 'test_img.png'

try:
    with open(file_path, 'rb') as f:
        img = Image.open(f)
        img.verify()  # Verify the image
        print("Image verified successfully.")
    # Reopen the image to display it
    img = Image.open(file_path)
    img.show()  # This will display the image if you have an image viewer available
except (IOError, SyntaxError) as e:
    print('Cannot identify image file:', e)



Image verified successfully.


In [45]:
from PIL import Image
import numpy as np

file_path = 'test_img.png'

try:
    # Open the image and convert to grayscale
    img = Image.open(file_path).convert('L')
    
    # Convert the image to a NumPy array
    img_array = np.array(img)
    
    # Flatten the array
    img_flat = img_array.ravel()
    
    # Invert the flattened array
    img_inverted = np.invert(img_flat)
    
    print("Image processed successfully.")
    # If you need to see the result as an image
    img_inverted_reshaped = img_inverted.reshape(img_array.shape)
    inverted_image = Image.fromarray(img_inverted_reshaped)
    inverted_image.show()
except Exception as e:
    print('Error processing image:', e)


Image processed successfully.


The __open__ function of the __Image__ library loads the test image as a 4D array containing the three RGB color channels and the Alpha transparency. This is not the same representation we used previously when reading in the dataset with TensorFlow, so we'll need to do some extra work to match the format.

First, we use the __convert__ function with the __L__ parameter to reduce the 4D RGBA representation to one grayscale color channel. We store this as a __numpy__ array and invert it using __np.invert__, because the current matrix represents black as 0 and white as 255, whereas we need the opposite. Finally we call __ravel__ to flatten the array.

Now that the image is structured correctly, we can run a session in the same way as previously, but this time only feeding in the single image for testing.

In [5]:
import tensorflow as tf
import numpy as np
from PIL import Image

# Load and preprocess the image
file_path = 'test_img.png'
img = Image.open(file_path).convert('L')  # Convert to grayscale
img = img.resize((28, 28))  # Resize to 28x28 if necessary
img = np.array(img)  # Convert to NumPy array
img = img / 255.0  # Normalize the image
img = img.reshape((1, 28 * 28))  # Flatten the image and add batch dimension

# Load the model (ensure the model variable is defined and loaded properly)
# model = ... (Load your model here, e.g., using tf.keras.models.load_model)

# Make the prediction
predictions = model.predict(img)
predicted_class = np.argmax(predictions, axis=1)

print("Prediction for test image:", predicted_class[0])


Prediction for test image: 3


__Conclusion__

Neural networks, particularly CNNs, have revolutionized the field of handwritten digit recognition. They offer high accuracy, robustness, and adaptability, making them the method of choice for this task. The success of neural networks on the MNIST dataset has not only provided a benchmark for evaluating new algorithms but has also spurred advancements in deep learning techniques applied to a wide array of image recognition tasks.

__Future Directions__

1. Continued Improvement: Research continues to improve the accuracy and efficiency of neural networks, leveraging newer architectures and optimization techniques.
2. Transfer Learning: Applying pretrained models on larger datasets to handwritten digit recognition can further enhance performance.
3. Application Expansion: Extending these techniques to more complex datasets and real-world applications, such as recognizing handwritten text in natural scenes, remains an active area of development.

In summary, neural networks have set a high standard in handwritten digit recognition, demonstrating their powerful capabilities and laying the groundwork for further innovations in the field of image recognition and beyond.