# Training Neural Networks on the MNIST Data Set using Keras

## Introduction

    In this program, we hope to be able to recognize digits within 28x28 images through the utilization of neural networks, which will be created through the aid of Keras, a popular machine learning API developed by Google in order to train neural networks. The dataset we will be using is the Modified National Institute of Standards and Technology database (MNIST), a large and popular dataset containing handwritten digits in 28 by 28 images. We will focus on using MNIST due to its ease of access and use, and provides a basic, fundamental understanding when first learning to program neural networks. Here is a sample image of the training data used:
    
<img src="mnist_sample.png" alt="Alternative text" />

## Getting the dataset

    We must now get the dataset. Fortunately, keras provides the MNIST dataset to us, and all we need to do is separate the data into training data, in order to train the neural network, and test data, which will help us see if our model is accurate, and most importantly, if it is overfitting.

In [2]:
import tensorflow as tf
from tensorflow import keras

data = tf.keras.datasets.mnist
(input_train, output_train), (input_test, output_test) = data.load_data()

In the dataset, we can see that we are provided 60000 total samples for training and 10000 samples for testing. We will now create the neural network that will take in an image as an input and return the digit it is meant to be. As we have an input of 28x28 images, we will have the input take a 28 by 28 image (784 pixels) and have 10 nodes in our output layer to represent each digit between 0 and 9.

In [3]:
model = keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(392, activation="relu"),
    tf.keras.layers.Dense(196, activation="relu"),
    tf.keras.layers.Dense(98, activation="relu"),
    tf.keras.layers.Dense(49, activation="relu"),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(10, activation="softmax")
])

We first add an input layer that flattens the 28 by 28 matrix passed in into a single row matrix. It goes through each layer with a decreasing amount of neurons, dividing by 2 each time until we get to 49, where we then lead it to an output layer with 10 neurons representing each digit. We can now compile our model, train it, and test it. We train it with the Adam optimizer, a popular and efficient optimizer, using Sparse Categorical Crossentropy due to our use of Multiclass Classification, and we output the metric "accuracy".

In [4]:
model.compile(
    optimizer="adam",
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"]
)

model.fit(input_train, output_train, epochs=10)
model.evaluate(input_test, output_test)

Epoch 1/10
   1/1875 [..............................] - ETA: 4:45 - loss: 80.3927 - accuracy: 0.1562

2023-03-25 15:55:29.106435: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz


Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


[0.16202551126480103, 0.9726999998092651]