# First MNIST tensorflow tutorial
Beginner tutorial on tensorflow
Original tutorial: https://www.tensorflow.org/versions/r0.8/tutorials/mnist/beginners/index.html
Part with convolution: https://www.tensorflow.org/versions/r0.8/tutorials/mnist/pros/index.html

## Download the data
The downloaded data is split into three parts, 55,000 data points of training data (mnist.train), 10,000 points of test data (mnist.test), and 5,000 points of validation data (mnist.validation). This split is very important: it's essential in machine learning that we have separate data which we don't learn from so that we can make sure that what we've learned actually generalizes!

In [45]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


## Import tensorflow

In [46]:
import tensorflow as tf

## Create a input placeholder
x isn't a specific value. It's a placeholder, a value that we'll input when we ask TensorFlow to run a computation. We want to be able to input any number of MNIST images, each flattened into a 784-dimensional vector. We represent this as a 2-D tensor of floating-point numbers, with a shape [None, 784]. (Here None means that a dimension can be of any length.)

In [47]:
# Create a symbolic placeholder for a 1x784 vector
x = tf.placeholder(tf.float32, [None, 784])

## Create a variable(weight and bias) placeholder
We also need the weights and biases for our model. We could imagine treating these like additional inputs, but TensorFlow has an even better way to handle it: Variable. A Variable is a modifiable tensor that lives in TensorFlow's graph of interacting operations. It can be used and even modified by the computation. For machine learning applications, one generally has the model parameters be Variables.

Notice that W has a shape of [784, 10] because we want to multiply the 784-dimensional image vectors by it to produce 10-dimensional vectors of evidence for the difference classes. b has a shape of [10] so we can add it to the output.

In [48]:
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))

## The model
We can now implement our model. It only takes one line!

First, we multiply x by W with the expression tf.matmul(x, W). This is flipped from when we multiplied them in our equation, where we had Wx, as a small trick to deal with x being a 2D tensor with multiple inputs. We then add b, and finally apply tf.nn.softmax.

In [49]:
# Here the output layer activation function is a softmax
y = tf.nn.softmax(tf.matmul(x, W) + b)

## Training
In order to train our model, we need to define what it means for the model to be good. Well, actually, in machine learning we typically define what it means for a model to be bad, called the cost or loss, and then try to minimize how bad it is. But the two are equivalent.

One very common, very nice cost function is "cross-entropy." Surprisingly, cross-entropy arises from thinking about information compressing codes in information theory but it winds up being an important idea in lots of areas, from gambling to machine learning. 

First, tf.log computes the logarithm of each element of y. Next, we multiply each element of y_ with the corresponding element of tf.log(y). Then tf.reduce_sum adds the elements in the second dimension of y, due to the reduction_indices=[1] parameter. Finally, tf.reduce_mean computes the mean over all the examples in the batch.

In [50]:
y_correct = tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_correct * tf.log(y), reduction_indices=[1]))

In [51]:
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

### Session
What TensorFlow actually does here, behind the scenes, is it adds new operations to your graph which implement backpropagation and gradient descent. Then it gives you back a single operation which, when run, will do a step of gradient descent training, slightly tweaking your variables to reduce the cost.

Now we have our model set up to train. One last thing before we launch it, we have to add an operation to initialize the variables we created:

In [52]:
init = tf.initialize_all_variables()
sess = tf.Session()
# Output model to tensorboard
writer = tf.train.SummaryWriter("./logs/mnist_simple_logs", sess.graph)
sess.run(init)

In [53]:
for i in range(1000):
  batch_xs, batch_ys = mnist.train.next_batch(64)
  sess.run(train_step, feed_dict={x: batch_xs, y_correct: batch_ys})

## Evaluating model

In [54]:
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_correct,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_correct: mnist.test.labels}))

0.9124


Actually 92% is a really bad result, you can see other models results on cifar here
http://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html

In [33]:
def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)

## Convolutional layers


In [34]:
def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                        strides=[1, 2, 2, 1], padding='SAME')

In [35]:
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])
x_image = tf.reshape(x, [-1,28,28,1])

h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

### Second layer
In order to build a deep network, we stack several layers of this type. The second layer will have 64 features for each 5x5 patch.

In [36]:
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])

h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

### Fully connected layer
Now that the image size has been reduced to 7x7, we add a fully-connected layer with 1024 neurons to allow processing on the entire image. We reshape the tensor from the pooling layer into a batch of vectors, multiply by a weight matrix, add a bias, and apply a ReLU.

In [37]:
W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])

h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

### Dropout layer

In [38]:
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

### Output layer
Again we use the softmax activation function

In [39]:
W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])

y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)

### Cost function

In [40]:
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_correct * tf.log(y_conv), reduction_indices=[1]))

### Train and evaluate


In [41]:
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_correct,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
sess.run(tf.initialize_all_variables())

In [42]:
for i in range(20000):
  batch = mnist.train.next_batch(50)
  
  # Plot accuracy every 2000
  if i%2000 == 0:    
    train_accuracy = accuracy.eval(session=sess, feed_dict={x:batch[0], y_correct: batch[1], keep_prob: 1.0})
    print("step %d, training accuracy %g"%(i, train_accuracy))
  
  # Step training
  train_step.run(session=sess, feed_dict={x: batch[0], y_correct: batch[1], keep_prob: 0.5})

step 0, training accuracy 0.16
step 2000, training accuracy 0.96
step 4000, training accuracy 0.92
step 6000, training accuracy 0.98
step 8000, training accuracy 1
step 10000, training accuracy 1
step 12000, training accuracy 1
step 14000, training accuracy 1
step 16000, training accuracy 1
step 18000, training accuracy 1


In [43]:
print("test accuracy %g"%accuracy.eval(session=sess, feed_dict={
    x: mnist.test.images, y_correct: mnist.test.labels, keep_prob: 1.0}))

test accuracy 0.9929
