# MNIST
### handwritten digit prediction using neural networks

The MNIST database of handwritten digits consists of float32 Numpy tensors of shapes:

- (60000,784) for training <br>
- (10000,784) for testing

The goal of the neural network is to accurately classify handwritten digits by minimizing the discrepancy between the predicted labels and the true labels. Mini-batch Stochastic Gradient Descent (SGD) is a commonly used optimization method for this purpose.

**Intermediate Dimensions:** A higher number of dimensions usually allows the network to learn more complex representations, but it can also lead to overfitting and longer training times. A smaller number might not capture the complexity of the data.

**Learning Rates:** A larger learning rate can lead to faster convergence but might overshoot the minimum, while a smaller learning rate converges more slowly and might get stuck in local minima.

**Momentum:** A higher momentum can help to converge faster and escape local minima but might lead to instability in the training process if it's too high.

by importing and using an optimizer we can tweak the learning rate and momentum parameters

```
from tensorflow.keras import optimizers
network.compile(optimizer=optimizers.RMSprop(lr=0.001, momentum=0.0),
                loss='categorical_crossentropy', 
                metrics=['accuracy'])
```

The first layer shown below transforms the 784-element image vector to a 512 dimensional intermediate representation. We'll iterate using different intermediate dimensions.

We'll also experiment with different learning rates and momentums.

## Build the network

The network is a chain of two dense layers that apply tensor transformations. The last one "softmax," providing the classification structure for our output, i.e., a number between 0 and 9 for each digit

````
network = models.Sequential()
network.add(layers.Dense(512, activation='relu', input_shape=(28 * 28, )))
network.add(layers.Dense(10, activation='softmax'))

````


In [2]:
#initial values

dimensions = 512
learning_rate = 0.001
momentum = 0.0

In [1]:
# load the data
from tensorflow.keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

# preprocess
# image data is vectorized and scaled to floats between 0 and 1
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype('float32') / 255
test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype('float32') / 255

from tensorflow.keras.utils import to_categorical
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

# build the network
from tensorflow.keras import models, layers
network = models.Sequential()
network.add(layers.Dense(512, activation='relu', input_shape=(28 * 28, )))
network.add(layers.Dense(10, activation='softmax'))

# optimizer, loss function, and a monitoring metric
from tensorflow.keras import optimizers
network.compile(optimizer=optimizers.RMSprop(lr=0.001, momentum=0.0),
              loss='categorical_crossentropy', 
              metrics=['accuracy'])
# train
network.fit(train_images, train_labels, epochs=20, batch_size=128)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<tensorflow.python.keras.callbacks.History at 0x7fa546a76990>

In [2]:
# evaluate on the test set
test_loss, test_acc = network.evaluate(test_images, test_labels)



In [3]:
GSD = (60000/128) * 20
print("there are ", GSD, "updates for 20 epochs of 128 sample mini-batches")

there are  9375.0 updates for 20 epochs of 128 sample mini-batches


After fine-tuning the network we're ready to test on new, unseen data: