# Deep MNIST for experts

This is the follow-up from the previous MNIST tutorial for beginners. This tutorial is a step-by-step implementation of <https://www.tensorflow.org/get_started/mnist/pros>.

In [57]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data/', one_hot=True)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


## 0. Use `InteractiveSession`

Use `InteractiveSession` class to interleave operations which build a **computational graph** with ones that run the graph. Computational graph is [a series of TensorFlow operations arranged into a graph of nodes](https://www.tensorflow.org/get_started/get_started#the_computational_graph).

In [58]:
import tensorflow as tf
sess = tf.InteractiveSession()

## 1. Build a softmax regression model

### 1.1 Create computational graph's nodes: `placeholder`.

We build computation graph using **nodes** for the (1) input images and (2) output classes. For input images and target outputs, we use `placeholder` **a value that we'll input when we ask TensorFlow to run a computation.**

In [59]:
x = tf.placeholder(tf.float32, shape = [None, 784])
y_ = tf.placeholder(tf.float32, shape = [None, 10])

We use an argument `shape` to determine the shape of `x` and `y_`. The value **`None`** in `x` indicates the first dimension, corresponding to the **batch size**, can be of any size.

The shape argument to placeholder is optional, but it allows TensorFlow to automatically **catch bugs stemming from inconsistent tensor shapes.**



## 1.2 Create computational graph's variable values: `Variable`.

`Variable` is a value that can be modified by the computation. One generally uses <a name="variable_for_model_parameter">`Variable` for **model parameters.**</a>

We set initialize the values of weight vector `W` and bias `b` to be zero tensors.

In [60]:
W = tf.Variable(tf.zeros([784,10]))
b = tf.Variable(tf.zeros([10]))

Then, we must initialize all `Variable`s before we use a session, using `tf.global_variables_initializer()`. [**The module takes the initial values (in this case, tensors full of zeros) that have already been specified, and assigns them to each `Variable`.**](https://www.tensorflow.org/get_started/mnist/pros)

In [61]:
sess.run(tf.global_variables_initializer())

In [62]:
print("W: ", sess.run(W))
print("b: ", sess.run(b))

W:  [[ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]
 ..., 
 [ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]]
b:  [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]


## 1.3 Predicted class and loss function

### 1.3.1 IMPLEMENT REGRESSION MODEL

In [63]:
y = tf.matmul(x, W) + b

With the regression **model**, we can specify a loss function; loss indicates how bad **the model's** prediction was on a single example. Here, we use loss function as the cross-entropy between **(1) the target** AND **(2) the softmax activation function applid to the model's prediction.**

In [64]:
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))

Note that tf.nn.softmax_cross_entropy_with_logits **internally applies** the softmax on **the model's unnormalized model prediction ($y$)** and **sums across all classes ($y\_$)**, and tf.reduce_mean takes the average over these sums.

# 2. Train the model

Given a computational graph, `tensorflow` can differentiate automatically to **find the gradients of loss with respect to each of the variables.** Here, we use **(1) steepest gradient descent,** with **(2) a step length of 0.5,** to **descent the cross entropy.**

In [65]:
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

What that single line does is **to add new operations to the computational graph**: compute (1) parameter update steps, (2) gradients, and (3) apply update steps to the parameter.

## ONE QUESTION HERE: [The learning rate is a parameter that is chosen by the programmer.](https://adeshpande3.github.io/adeshpande3.github.io/A-Beginner's-Guide-To-Understanding-Convolutional-Neural-Networks/) Then, can we update learning rate by setting it as a variable? [`tf.Variable`](#variable_for_model_parameter) is generally used for model parameters, right?

The returned operation `train_step`, when `run`, will **apply the gradient descent updates** to the parameters per each iteration. Therefore, taining the model can be accomplished by repeating over a large number of iterations:

In [66]:
for _ in range(1000):
    batch = mnist.train.next_batch(100)
    train_step.run(feed_dict={x: batch[0], y_: batch[1]})

We load 100 training examples (batches) in each training iteration. And you replace the placeholder tensors `x` and `y` with training examples using `feed_dict`. Note that **you can replace any tensor in a computational graph using feed_dict**: it is not restricted to just `placeholder`s.

# 3. Evaluate the model

In order to evaluate the model, we need to know which predictions were correct (i.e., which labels we predicted correctly); here, we want to know **the indices of the entries we predicted correct.**

We can use `tf.argmax` to get **the indices of the highest entries in a tensor along some axis**; for example, `tf.argmax(y,1)` is **the label our model thinks is the most likely** for **each input**,  and `tf.argmax(y_1)` is the **true label.**

In [67]:
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))

The above operation returns a list of booleans.

To give **the fraction** of correct predictions, we **cast to numbers and take the mean.** For example, [True, False, True, True] would become [1,0,1,1] which would become 0.75.

In [68]:
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(accuracy.eval(feed_dict={x:mnist.test.images, y_:mnist.test.labels}))

0.9159
