## Handwritten digit recognition with CNNs

(Always be aware of your imports and preserve namespaces!!!)

In [None]:
import os
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

%matplotlib inline

---

### Loading the MNIST data set

First, let's get some data:

In [None]:
mnist = input_data.read_data_sets("MNIST_data", one_hot=True)

In [None]:
nimg, npix = mnist.train.images.shape

print("shape of training data array    = {0}".format(mnist.train.images.shape))
print("dtype of training data array    = {0}".format(mnist.train.images.dtype))
print("min, max of training data array = {0}, {1}".format(mnist.train.images.min(), mnist.train.images.max()))

So we have $55,000$ training examples each with 784 features that are 32-bit floats with a minimum of 0 and a maximum of 1.  Of course, we know those features are pixels in the images (note that $\sqrt{784} = 28$):

In [None]:
nside = int(np.sqrt(npix))

fig, ax = plt.subplots(figsize=(5, 5))
fig.subplots_adjust(0, 0, 1, 1)
ax.axis("off")
im = ax.imshow(mnist.train.images[0].reshape(nside, nside), "gist_gray")
fig.canvas.draw()

What number is that???  $7$?  $3$?  The groundtruth is contained in the labels:

In [None]:
print("shape of the labels array             = {0}".format(mnist.train.labels.shape))
print("dtype of the labels array             = {0}".format(mnist.train.labels.dtype))
print("groundtruth vector for the 0'th image = {0}".format(mnist.train.labels[0]))
print("groundtruth answer for the 0'th image = {0}".format(mnist.train.labels[0].argmax()))

---

### Models in Tensorflow, part 1: regression

Now that we have data, let's run some models.

In Tensorflow, computations are performed inside a "session".  That session can be run interactively (we'll see an example of a non-interactive session later):

In [None]:
sess = tf.InteractiveSession()

Before diving into CNN's let's start off with a simple regression model.  First we define the inputs and labels; these are TF "placeholders" and are to be input when we run the model.

In [None]:
x  = tf.placeholder(tf.float32, shape=[None, npix])
y_ = tf.placeholder(tf.float32, shape=[None, 10])

Next, like any model, we need to define varaiables, i.e., the parameters to be fit:

In [None]:
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))

At the moment, these variables **have no value**, they are not initialized.  We've just told TF that they exist not what they are yet (nb, this is similar to defining but not initializing a variable in C).  We can initialize them by running TF's function for initializing global variables:

In [None]:
sess.run(tf.global_variables_initializer())

To what have those values been initialized?

In [None]:
print("W = {0}".format(sess.run(W)))
print("b = {0}".format(sess.run(b)))

All zeros for the moment...  but that's OK, these parameters are to be solved for.

Since everything is vectorized, the linear regression equation can be written as:

In [None]:
y = tf.matmul(x, W) + b

We will use the cross entropy loss function (note that we are now deviating from the more typical "least squares" regression solution):

In [None]:
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))

So now we have a model, and we've initialized some variables.  We can already calculate the model predictions and loss on some set of data.  Let's try the first 10 images:

In [None]:
sess.run(y, {x:mnist.train.images[:10]})

As expected, all zeros!  But what do these zeros mean in this conext, since surely this must be **some** number, right?  We can apply the softmax function to get the propabilities for each digit.

In [None]:
sess.run(tf.nn.softmax(y, dim=-1), {x:mnist.train.images[:10]})

Interpretation: with these values of $W$ and $b$ all values are equally likely for each image.

And what's the loss?

In [None]:
ex_img = mnist.train.images[:10]
ex_lab = mnist.train.labels[:10]
mod_in = {x:mnist.train.images[:10], y_:mnist.train.labels[:10]}

print("loss for first 10 images is = {0}".format(sess.run(cross_entropy, mod_in)))

Now, as with all machine learning, let's **learn** the parameters $W$ and $b$.  We'll use gradient descent:

In [None]:
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

Each pass through this gradient descent optimizer takes exactly 1 step.  So we have to step many times to get near a minimum of the (cross entropy) loss function:

In [None]:
nstep = 1000
loss  = np.zeros(nstep)

for ii in range(nstep):
    batch = mnist.train.next_batch(100)
    train_step.run(feed_dict={x:batch[0], y_:batch[1]})
    loss[ii] = sess.run(cross_entropy, {x:batch[0], y_:batch[1]})

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(8, 4), sharey=True)
[i.grid(1) for i in ax]
lin0, = ax[0].plot(np.arange(nstep) + 1, loss, color="darkred")
lin1, = ax[1].plot(np.log10(np.arange(nstep) + 1), loss, color="darkred")
ax[0].set_ylabel("cross entropy loss")
ax[0].set_xlabel("batch number")
ax[1].set_xlabel("$\log_{10}$ batch number")
fig.canvas.draw()
plt.show()

We can see that W and b have been updated:

In [None]:
print("W = {0}".format(sess.run(W)))
print("b = {0}".format(sess.run(b)))

And our predictions for the first 10 are now:

In [None]:
prob  = sess.run(tf.nn.softmax(y, dim=-1), {x:mnist.train.images[:10]}).round(2)
guess = prob.argmax(1)
print(prob)
for ii, jj in zip(guess, mnist.train.labels[:10].argmax(1)):
    print("guess = {0}, truth = {1}".format(ii, jj))

Note that mnist.train.next_batch has shuffled things around a bit:

In [None]:
fig, ax = plt.subplots(figsize=(5, 5))
fig.subplots_adjust(0, 0, 1, 1)
ax.axis("off")
im = ax.imshow(mnist.train.images[0].reshape(nside, nside), "gist_gray")
fig.canvas.draw()

It looks like the model is doing a pretty good job on the (new) first 10 images of the training set.  How about the testing set which hasn't been used to modify $W$ and $b$?  We could write this as we did above and average over all examples, but let's use some TF code:

In [None]:
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
mod_in = {x:mnist.test.images, y_:mnist.test.labels}

print("accuracy for the test set = {0}".format(accuracy.eval(mod_in)))

---