# Tensorflow MNIST Beginners Tutorial

This notebook follows the [Tensorflow Beginners tutorial](https://www.tensorflow.org/tutorials/mnist/beginners/).

First we import the libraries we'll use.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

Then we set some constants and initialize the mnist dataset.

In [2]:
FLAGS = 'MNIST_data/'

mnist = input_data.read_data_sets(FLAGS, one_hot=True)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


Then we start defining the Tensorflow variables, placeholders, loss function and train step. And we finally initialize them and start a session that will run our code.

Our evidence is defined as:

\begin{equation}
evidence_i=\sum_jW_{i,j}x_i+b_i
\end{equation}

where $W_i$ is the weights and $b_i$ is the bias for class $i$, and $j$ is an index for summing over the pixels in our input image $x$. We then convert the evidence tallies into our predicted probabilities $y$ using the "softmax" function:

\begin{equation}
y=softmax(evidence)
\end{equation}

Here softmax is serving as an "activation" or "link" function, shaping the output of our linear function into the form we want -- in this case, a probability distribution over 10 cases. You can think of it as converting tallies of evidence into probabilities of our input being in each class. It's defined as:

\begin{equation}
softmax(x)=normalize(exp(x))
\end{equation}

If you expand that equation out, you get:

\begin{equation}
softmax(x)_i=\frac{exp(x_i)}{\sum_jexp(x_j)}
\end{equation}

More compactly, we can just write:

\begin{equation}
y=softmax(Wx+b)
\end{equation}

In [3]:
x = tf.placeholder(tf.float32, (None, 784))
W = tf.Variable(tf.zeros((784, 10)))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x, W) + b)
y_ = tf.placeholder(tf.float32, (None, 10))
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y),
                                              reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)

Now we run our code in batches of 100 samples randomly for 1000 times.

> Using small batches of random data is called stochastic training -- in this case, stochastic gradient descent. Ideally, we'd like to use all our data for every step of training because that would give us a better sense of what we should be doing, but that's expensive. So, instead, we use a different subset every time. Doing this is cheap and has much of the same benefit.
> - Tensorflow

In [7]:
for i in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print("Accuracy: {}".format(
    sess.run(accuracy,
             feed_dict={x: mnist.test.images, y_: mnist.test.labels})))

Accuracy: 0.9222000241279602
