# The MNIST Handwriting Problem

## Overview

In this notebook we are going to explore the potential of machine learning by building a handwriting recogniser using TensorFlow.

**Every ML Problem....** concerns the building of models that when given a never-seen-before input, returns useful information about the input as output. 

```
   Input --> Model --> Output
```

A **model** is said to map *input* to *output* and can be described by a mathematical equation. 

For example, this could be a model

```
   y = mx
```

where:
- *y* is the output
- *x* is the input
- *m* is something unknown

*m* is often known as a **parameter** or a **weight**.

This model is **untrained** because we dont know beforehand what the value of *m* should be. 
To **train** a model means to choose an optimal value for *m*.

There are many techniques to train models. Techniques are usually either **supervised learning** or **unsupervised learning**.

Unsupervised learning involves feeding a model example input/output pairs to learn parameters

```
   Training Inputs ---> Model
   Training Labels ------^
```

**In this exercise...** we'll build and train a model that when given an image containing a handwritten number, returns the number displayed in the image

```
   Image of a handwritten number --> Model --> The number displayed in the image
```

We are going to use unsupervised learning to train our model.



## Building the Model

There are many ML models out there, including models made up of models. When choosing what model to use, a common starting question is.

> What do we want the output of our model to look like?

In our case, we'd like our model to give an output that reads

> I'm 90% sure its an 8, but there is a 10% chance its a 9

With that in mind, we'll use the **softmax regression model**. The output of this model will be a vector of 10 numbers between 0 and 1 that sum up to 1. The nth number will be the probability of the image containing the nth digit (according to the model). E.g.

```
   0, 0, 0, 0, 0, 0, 0, 0.9, 0, 0.1
```

When our trained softmax regression model is asked to detect the number in a never-seen-before image, it will do so in two steps

- Calculate the **evidence** in the image that would indicate the presence of each number
- Convert the 10 pieces of evidence into **probabilities**

##### Calculating evidence
The evidence for the presence of the number $one$ in an image would be defined as

$\text{evidence}_{one} = \sum_{j}^{pixels} W_{one,~ j} x_j + b_{one}$

- $x_j$ represents the intensity value of pixel $j$ in the image.
- $W_{one, ~ j}$ represents a weight for the pixel $j$ in an image containing the number $one$
- $b$ represents the **bias** for the number $one$. 

Note that *weights* and *bias* values are parameters which need to be learnt

##### Turning evidence into probability

$\text{evidence}_{one}$, $\text{evidence}_{two}$, ..., $\text{evidence}_{nine}$ are then combined to make a probability vector (the model's output).

This is achieved using the **softmax function** which acts as a normalizer of the 10 generated evidence values, making sure they add up to 1. The evidence values are exponentialised to ensure that if there is relatively high evidence for particular digit, its given much more favor in the final output.

Our finished model can be visualised as such:

![title](https://www.tensorflow.org/versions/r0.9/images/softmax-regression-scalargraph.png)

#### What does the model look like in Code?

In [152]:
import tensorflow as tf
x = tf.placeholder(tf.float32, [None, 784]) # Input
W = tf.Variable(tf.zeros([784, 10]))        # Parameter (to be learned in training) 
b = tf.Variable(tf.zeros([10]))             # Parameter (to be learned in training)
y = tf.nn.softmax(tf.matmul(x, W) + b)      # Output

## Training the Model

### Preparing the training dataset

In [153]:
from tensorflow.examples.tutorials.mnist import input_data

# load the training data
mnist = input_data.read_data_sets("data/MNIST_data/", one_hot=True)
mnist_training_data = mnist.train
mnist_training_data_images = mnist_training_data.images
mnist_training_data_labels = mnist_training_data.labels
print "The shape of mnist_training_data_images is ", mnist_training_data.images.shape
print "The shape of mnist_training_data_labels is", mnist_training_data.labels.shape

Extracting data/MNIST_data/train-images-idx3-ubyte.gz
Extracting data/MNIST_data/train-labels-idx1-ubyte.gz
Extracting data/MNIST_data/t10k-images-idx3-ubyte.gz
Extracting data/MNIST_data/t10k-labels-idx1-ubyte.gz
The shape of mnist_training_data_images is  (55000, 784)
The shape of mnist_training_data_labels is (55000, 10)


#### Examining a single piece of training data

In [311]:
%matplotlib inline
import numpy as np
import matplotlib
import matplotlib.pyplot as plt

def view_image(images, n):
    image = images[n]
    image.resize(28,28)
    plt.imshow(image, cmap='Greys')
    
def view_image_labels(labels, n):
    print "Label for image",n ,"is", labels[0]

In [225]:
# playground!

# what is the smallest pixel value in the first image?
# what is the largest pixel value in the first image?
# what does the first image in mnist_training_data look like?
#view_image(mnist_training_data.images, 4)
# what does the label for the first image in mnist_training_data look like:

### Training the model

Training a model is an interative process.

On each iteration we
- **Score** the model
- Update the values for the model parameters (**training step**)

#### Scoring a model

To score a model during training we define a **loss function** which we can use to ask

- Is it sufficiently trained? 
- Should we stop training here?
- Should we carry on training? 
- Is it worse than the last iteration?
- Is it better than the last iteration?

The goal during training is to build a model with the smallest loss as possible.

Our loss function involves feeding our in-training model a batch of images and comparing the model's verdict against the truth. If the models verdict is significantly different from the truth, the loss is high. If they are similar the loss is low.

The one we will use in our example is called **cross-entropy**.

$cross\_entropy = H_{y'}(y) = -\sum_i y'_i \log(y_i)$

##### What does the lost function look like in code 

In [156]:
y_ = tf.placeholder(tf.float32, [None, 10])
loss_function = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))

#### Training Step

The update step uses the output of **lost function** to work out which parameters can be tweaked (and by how much) in order to make the cost lower in the next iteration.

This training step is done via **backwards propagation**.

The most common type of backwards propagation is called **gradient descent**.

##### What does the training step look like in code

In [157]:
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(loss_function)

### Putting it all together

In [1]:
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)

for i in range(1000):
  batch_xs, batch_ys = mnist.train.next_batch(100) # take a random 100 images and labels from the training set
  sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})


NameError: name 'tf' is not defined

## Evaluating the Model 

In [291]:
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

0.9175


## Trying it out on your own image! 

Try drawing a number [here](http://drawisland.com/?w=28&h=28), then place it in the data folder of this project and use the methods below to run it through your model.

In [336]:
import Image

def prepare_image(image_file):
    image = Image.open(image_file)
    grey_image = image.convert('L') # convert RGBA to greyscale
    flat_array = np.resize(np.array(grey_image.getdata()), (1,784))
    flat_array = flat_array.astype(np.float32)
    flat_array /= 255.0
    return 1 - flat_array
    
def detect_number(sess, image_file):
    image = prepare_image(image_file)
    prediction = tf.argmax(y,1)
    print sess.run(prediction, feed_dict={x: image})
    return

detect_number(sess, '/data/2.png')

[2]


# Next Steps

So far so good? In your spare time, try improving your handwriting recogniser by using a neural network for a model
https://www.tensorflow.org/versions/r0.9/tutorials/mnist/pros/index.html