# [MNIST For ML Beginners](https://www.tensorflow.org/get_started/mnist/beginners)

In [1]:
import tensorflow as tf
import numpy as np
import pandas as pd
from tensorflow.examples.tutorials.mnist import input_data

## [The MNIST Data](http://yann.lecun.com/exdb/mnist/)

In [2]:
mnist = input_data.read_data_sets('MNIST_data/', one_hot=True)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


In [3]:
mnist

Datasets(train=<tensorflow.contrib.learn.python.learn.datasets.mnist.DataSet object at 0x105f409b0>, validation=<tensorflow.contrib.learn.python.learn.datasets.mnist.DataSet object at 0x106c5a630>, test=<tensorflow.contrib.learn.python.learn.datasets.mnist.DataSet object at 0x106bbc7f0>)

Each image is 28 pixels by 28 pixels. We can interpret this as a big array of numbers:

![](https://www.tensorflow.org/images/MNIST-Matrix.png)

We can flatten this array into a vector of 28x28 = 784 numbers.

In [4]:
print(type(mnist.train.images))
print(mnist.train.images.shape)
mnist.train.images[: : 22000, : : 48]

<class 'numpy.ndarray'>
(55000, 784)


array([[ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.92156869,  0.78039223,  0.        ,  0.91764712,  0.45882356,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.08627451,  0.80784321,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.99215692,  0.        ,  0.69803923,  0.99215692,
         0.        ,  0.        ,  0.65882355,  0.        ,  0.        ,
         0.        ,  0.        ]], dtype=float32)

Each image in MNIST has a corresponding label, a number between 0 and 9 representing the digit drawn in the image.

For the purposes of this tutorial, we're going to want our labels as "one-hot vectors". A one-hot vector is a vector which is 0 in most dimensions, and 1 in a single dimension. In this case, the nth digit will be represented as a vector which is 1 in the nth dimension. For example, 3 would be ``[0,0,0,1,0,0,0,0,0,0]``. Consequently, ``mnist.train.labels`` is a ``[55000, 10]`` array of floats.

![](https://www.tensorflow.org/images/mnist-train-ys.png)

In [5]:
print(type(mnist.train.labels))
print(mnist.train.labels.shape)
mnist.train.labels[: : 11000]

<class 'numpy.ndarray'>
(55000, 10)


array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.]])

## Softmax Regressions

$$\text{evidence}_i = \sum_j W_{i,~ j} x_j + b_i$$

$$y = \text{softmax}(\text{evidence})$$

$$\text{softmax}(x) = \text{normalize}(\exp(x))$$

$$\text{softmax}(x)_i = \frac{\exp(x_i)}{\sum_j \exp(x_j)}$$

You can picture our softmax regression as looking something like the following, although with a lot more xs. For each output, we compute a weighted sum of the xs, add a bias, and then apply softmax.

![](https://www.tensorflow.org/images/softmax-regression-scalargraph.png)

If we write that out as equations, we get:

![](https://www.tensorflow.org/images/softmax-regression-scalarequation.png)

We can "vectorize" this procedure, turning it into a matrix multiplication and vector addition. This is helpful for computational efficiency. (It's also a useful way to think.)

![](https://www.tensorflow.org/images/softmax-regression-vectorequation.png)

More compactly, we can just write

$$y = \text{softmax}(Wx + b)$$

## [Implement the Regression](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/tutorials/mnist/mnist_softmax.py)

In [6]:
X = tf.placeholder(dtype=tf.float32, shape=[None, 784], name='X') # None means that a dimension can be of any length
y = tf.placeholder(dtype=tf.float32, shape=[None, 10], name='y')
a = tf.Variable(tf.zeros(shape=[784, 10], dtype=tf.float32))
b = tf.Variable(tf.zeros(shape=[10], dtype=tf.float32))

print(X)
print(y)
print(a)
print(b)

Tensor("X:0", shape=(?, 784), dtype=float32)
Tensor("y:0", shape=(?, 10), dtype=float32)
Tensor("Variable/read:0", shape=(784, 10), dtype=float32)
Tensor("Variable_1/read:0", shape=(10,), dtype=float32)


### Training

One very common, very nice function to determine the loss of a model is called "cross-entropy." Cross-entropy arises from thinking about information compressing codes in information theory but it winds up being an important idea in lots of areas, from gambling to machine learning. It's defined as:

$$H_{y'}(y) = -\sum_i y'_i \log(y_i)$$

Where y is our predicted probability distribution, and y′ is the true distribution (the one-hot vector with the digit labels). In some rough sense, the cross-entropy is measuring how inefficient our predictions are for describing the truth.

In [9]:
# numerically unstable way
# y_prob = tf.nn.softmax(tf.matmul(X, a) + b, name='y_prob')
# cross_entropy = tf.reduce_mean(- tf.reduce_sum(y * tf.log(y_prob), axis=1), name='cross_entropy')
# print(y_prob)
# print(cross_entropy)

# more stable way
y_pred = tf.add(tf.matmul(X, a), b, name='y_pred')
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=y_pred), name='cross_entropy')
print(y_pred)
print(cross_entropy)

train = tf.train.GradientDescentOptimizer(0.1).minimize(cross_entropy)

sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
tf.summary.FileWriter(graph=sess.graph, logdir='logs/MNIST Beginner')

Tensor("y_pred:0", shape=(?, 10), dtype=float32)
Tensor("cross_entropy_1:0", shape=(), dtype=float32)


<tensorflow.python.summary.writer.writer.FileWriter at 0x110e0d2b0>

In [None]:
for _ in range(10000):
    batch_Xs, batch_ys = mnist.train.next_batch(100)
    sess.run(train, feed_dict={X: batch_Xs, y: batch_ys})

### Evaluating

In [36]:
compare = tf.equal(tf.argmax(input=y_prob, axis=1), tf.argmax(input=y, axis=1))
accuracy = tf.reduce_mean(tf.cast(compare, tf.float32))
dict_eval = {X: mnist.test.images, y: mnist.test.labels}
# (yMax, y_probMax) = sess.run((tf.argmax(y, 1), tf.argmax(y_prob, 1)), feed_dict=dict_eval)
# print(yMax[: 20])
# print(y_probMax[: 20])
# sess.run(y_prob, feed_dict=dict_eval)
# sess.run(compare, feed_dict=dict_eval)
sess.run(accuracy, feed_dict=dict_eval)

0.92430001