# Neural network for recognizing handwritten digits

Source: [How To Build a Neural Network to Recognize Handwritten Digits with TensorFlow](https://www.digitalocean.com/community/tutorials/how-to-build-a-neural-network-to-recognize-handwritten-digits-with-tensorflow) by **Ellie Birbeck**

### Imports

In [0]:
import tensorflow as tf
import numpy as np
from PIL import Image
from tensorflow.examples.tutorials.mnist import input_data
from google.colab import files

***One-hot-encoding***: represents the labels, i. e. actual digit drawn. Each label is represented with 1D vector of size 10 with the element of the index that corresponds to the digit is set to `1`, while others are set to `0`.

Images are represented with 1D vector of size 784 px (28x28) that contains values from 0 to 255, since the pictures are grayscale.

In [0]:
mnist = input_data.read_data_sets("MNIST_data/", one_hot = True)

In [0]:
n_train = mnist.train.num_examples
n_validation = mnist.validation.num_examples
n_test = mnist.test.num_examples

print("Training examples: " + str(n_train))
print("Validation examples: " + str(n_validation))
print("Testing examples: " + str(n_test))

### Neural network:


*   number of layers
*   number of units in each layer
*   how units are connected between layers

Input, output and *hidden* (between Input and Output) layers

In [0]:
n_input = 784 # represents 784 (28x28) pixels
n_hidden1 = 512
n_hidden2 = 256
n_hidden3 = 128
n_output = 10 # represents recognized digit (0 - 9)

### Hyperparameters (NN config)

**Learning rate** represents how much the parameters will adjust at each step of the learning process (after each network traversal weights are slightly adjusted).
**The number of iterations** represents how many times we go through the training step.
**The batch size** represents how many training examples we are using at each step
**The dropout variable** in the final hidden layer to give each unit a 50% chance of being eliminated at every training step (overfitting prevention).

In [0]:
learning_rate = 1e-4
n_iterations = 1000
batch_size = 128
dropout = 0.5

### Building the Tensorflow Graph

In [0]:
X = tf.placeholder("float", [None, n_input]) # feeding in an undefined number of 784-pixel images
Y = tf.placeholder("float", [None, n_output]) # undefined number of label outputs, with 10 possible classes
keep_prob = tf.placeholder(tf.float32) # controlling the dropout rate

`weight` and `bias` - will be updated within the training process

In [0]:
weights = {
    'w1': tf.Variable(tf.truncated_normal([n_input, n_hidden1], stddev=0.1)),
    'w2': tf.Variable(tf.truncated_normal([n_hidden1, n_hidden2], stddev=0.1)),
    'w3': tf.Variable(tf.truncated_normal([n_hidden2, n_hidden3], stddev=0.1)),
    'out': tf.Variable(tf.truncated_normal([n_hidden3, n_output], stddev=0.1)),
}

biases = {
    'b1': tf.Variable(tf.constant(0.1, shape=[n_hidden1])),
    'b2': tf.Variable(tf.constant(0.1, shape=[n_hidden2])),
    'b3': tf.Variable(tf.constant(0.1, shape=[n_hidden3])),
    'out': tf.Variable(tf.constant(0.1, shape=[n_output]))
}

Each hidden layer executes matrix multiplication on the previous layer’s outputs and the current layer’s weights, and adds the bias to these values. At the last hidden layer, a dropout operation using `keep_prob` value of 0.5 will be applied

In [0]:
layer_1 = tf.add(tf.matmul(X, weights['w1']), biases['b1'])
layer_2 = tf.add(tf.matmul(layer_1, weights['w2']), biases['b2'])
layer_3 = tf.add(tf.matmul(layer_2, weights['w3']), biases['b3'])
layer_drop = tf.nn.dropout(layer_3, keep_prob)
output_layer = tf.matmul(layer_3, weights['out']) + biases['out']

**Cross entropy** - the loss function that we want to optimize; quantifies the difference between two probability distributions (the predictions and the labels)

In [0]:
cross_entropy = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(
        labels=Y, logits=output_layer
        ))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)

**Adam optimizer** - he optimization algorithm which will be used to minimize the loss function

In [0]:
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)

### Training and testing

In [0]:
# which images are being predicted correctly by looking at the output_layer (predictions) and Y (labels)
correct_pred = tf.equal(tf.argmax(output_layer, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

# init the tf session
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)

#### For a `n_iterations`:
* Propagate values forward through the network
* Compute the loss
* Propagate values backward through the network
* Update the parameters

In [0]:
# mini batches training
for i in range(n_iterations):
    batch_x, batch_y = mnist.train.next_batch(batch_size)
    sess.run(train_step, feed_dict={
        X: batch_x, Y: batch_y, keep_prob: dropout
        })

    # each 100 iterations print loss and accuracy
    if i % 100 == 0:
        minibatch_loss, minibatch_accuracy = sess.run(
            [cross_entropy, accuracy],
            feed_dict={X: batch_x, Y: batch_y, keep_prob: 1.0}
            )
        print(
            "Iteration",
            str(i),
            "\t| Loss =",
            str(minibatch_loss),
            "\t| Accuracy =",
            str(minibatch_accuracy)
            )

In [0]:
test_accuracy = sess.run(accuracy, feed_dict={X: mnist.test.images, Y: mnist.test.labels, keep_prob: 1.0})
print("\nAccuracy on test set:", test_accuracy)

### Demo

In [0]:
# Uploading the test image
files.upload()

In [0]:
img = np.invert(Image.open("test_img.png").convert('L')).ravel()

In [0]:
prediction = sess.run(tf.argmax(output_layer, 1), feed_dict={X: [img]})
print ("Prediction for test image:", np.squeeze(prediction))