# Learning TensorFlow - Day 3 - MNIST for beginners

## Defining the Model

In [1]:
import tensorflow as tf

We are now following the [MNIST for ML Beginners](https://www.tensorflow.org/get_started/mnist/beginners) part of the tutorial.

As we did before, we train a linear model, but this time our data are the 28 * 28 = 784 pixels that form each image of a digit.

In [2]:
x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))

Our model, softmax, only takes a line to implement:

In [3]:
y = tf.nn.softmax(tf.matmul(x, W) + b)

That is, a node that connects to `x`, `W` and `b`, and uses a [softmax function](https://en.wikipedia.org/wiki/Softmax_function) to convert the "evidences" to probabilities.

## Training

This time, for our *loss function* we are going to use the [cross-entropy](https://en.wikipedia.org/wiki/Cross_entropy): $H_{y'}(y) = - \sum_i y_i' \log(y_i)$, where $y$ is our predicted probability distribution, and $y'$ is the true distribution.

It kind of measures how inefficient our predictions are for describing the truth.

Let's implement it.

In [4]:
y_ = tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))

**Note**: `y_` will be filled in eventually, when connected to a node that we evaluate passing a dict that has `{y_: ...}`.

Now we can train using the gradient descent optimizer as we did the previous day.

In [5]:
train_step = tf.train.GradientDescentOptimizer(0.05).minimize(cross_entropy)

All the pieces are there. Let's run them, that is, create a session and evaluate the relevant nodes.

In [6]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

sess = tf.InteractiveSession()
tf.global_variables_initializer().run()

for _ in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


Each step of the loop we use a small batch of 100 random data -- what's called **stochastic training**. It's cheap and has much of the same benefits as using the full training set.

## Evaluating Our Model

We will say our prediction was correct when the argument with the highest probability corresponded to the true digit indeed.

In [7]:
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

0.9011


See? Here's where we fill the `y_`! (And the `x` for that matter.)

If we didn't run the optimizing step, we would expect about 10% accuracy (random chance). But lo and behold, we guess correctly 90% of the time! There you have it :)

By the way, this is how the model we created looks like, with all the nodes properly connected:

<img src="tutorial_02.png" width=500px>

Just for reference, here's a cool link from the tutorial to the state of the art in object classification: http://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html

That should be it for today. Next day, we will pretend we are pros, doing the [Deep MNIST for Experts](https://www.tensorflow.org/get_started/mnist/pros).