## MNIST Classifier using Neural network

Basic building blocks of a TensorFlow model while constructing a deep convolutional MNIST classifier.

Goals
- Create a softmax regression function that is a model for recognizing MNIST digits, based on looking at every pixel in the image
- Use Tensorflow to train the model to recognize digits by having it "look" at thousands of examples (and run our first Tensorflow session to do so)
- Check the model's accuracy with our test data
- Build, train, and test a multilayer convolutional neural network to improve the results

Data
The MNIST data is hosted on Yann LeCun's website. If you are copying and pasting in the code from this tutorial, then this tuitorial contains code which will download and read in the data automatically.

The MNIST data is split into three parts: 55,000 data points of training data (mnist.train), 10,000 points of test data (mnist.test), and 5,000 points of validation data (mnist.validation). This split is very important: it's essential in machine learning that we have separate data which we don't learn from so that we can make sure that what we've learned actually generalizes! 

Each image is 28 pixels by 28 pixels. We can flatten this array into a vector of 28x28 = 784 numbers. MNIST images are just a bunch of points in a 784-dimensional vector space, 

In [1]:

from sklearn.externals import joblib
from sklearn.datasets import fetch_mldata
from skimage.feature import hog
from sklearn.svm import LinearSVC
import numpy as np
import pandas as pd

import tensorflow as tf

In [2]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

Extracting MNIST_data\train-images-idx3-ubyte.gz
Extracting MNIST_data\train-labels-idx1-ubyte.gz
Extracting MNIST_data\t10k-images-idx3-ubyte.gz
Extracting MNIST_data\t10k-labels-idx1-ubyte.gz


In [3]:
#x is a shape of [None, 784], where 784 is the dimensionality of a single flattened 28 by 28 pixel MNIST image, 
#and None indicates that the first dimension, corresponding to the batch size, can be of any size. 
#Target output classes y_ will also consist of a 2d tensor, where each row is a one-hot 10-dimensional vector indicating which digit class (zero through nine) the corresponding MNIST image belongs to.
x = tf.placeholder(tf.float32,shape=[None, 784])
y_ = tf.placeholder(tf.float32,shape=[None,10])

In [4]:
# Define variables 
W = tf.Variable(tf.zeros([784,10]))
b= tf.Variable(tf.zeros([10]))


In [5]:
# Build the computation graph by creating nodes for the input images and target output classes
sess= tf.InteractiveSession()
sess.run(tf.global_variables_initializer())

In [6]:
# Predicted Class, loss fucntions and optimizer
y = tf.matmul(x,W) + b

#tf.nn.softmax_cross_entropy_with_logits internally applies the softmax on the model's unnormalized model prediction and sums across all classes, and tf.reduce_mean takes the average over these sums.
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_,logits = y))

#For this example, we will use steepest gradient descent, with a step length of 0.5, to descend the cross entropy.
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

In [7]:
#Train
for _ in range(1000):
    batch = mnist.train.next_batch(100)
    train_step.run(feed_dict = {x:batch[0],y_:batch[1]})

In [8]:
b = mnist.train.next_batch(100)
b[0].shape

(100, 784)

In [9]:
#Test the trained model
correct_prediction = tf.equal(tf.argmax(y,1),tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction,tf.float32))
print(accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

0.9206


Getting 92% accuracy on MNIST is not good. In this section, we'll fix that, jumping from a very simple model to something moderately sophisticated: a small convolutional neural network. This will get us to around 99.2% accuracy -- not state of the art, but respectable.

### Build a Multilayer Convolutional Network

Initialize weights with a small amount of noise for symmetry breaking, and to prevent 0 gradients. Since we're using ReLU neurons, it is also good practice to initialize them with a slightly positive initial bias to avoid "dead neurons". Instead of doing this repeatedly while we build the model, let's create two handy functions to do it for us.

In [10]:
def weight_variable(shape):
   initial = tf.truncated_normal(shape,stddev = 0.1)
   return tf.Variable(initial)


In [11]:
def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)

##### Convolution and Pooling
Our convolutions uses a stride of one and are zero padded so that the output is the same size as the input. Our pooling is plain old max pooling over 2x2 blocks. To keep our code cleaner, let's also abstract those operations into functions.

A 4-dim tensor has shape [batch, height, width, channel]. For example, we could have a tensor with shape [10, 80, 120, 3] which means the batch has 10 images, each of which are 80 pixels high and 120 pixels wide, with 3 channels (e.g. RGB colours).
The word 'stride' is similar in meaning to a step-size. It means how much should the index be incremented in each of those dimensions when we move the convolutional filters across the input tensor. The first and last strides have to be 1

Padding
1  2  3  4  5  6  7  8  9  10 11 (12 13)   
|________________|                dropped
               |_________________|


"SAME" = with zero padding:
               pad|                                      |pad
   inputs:      0 |1  2  3  4  5  6  7  8  9  10 11 12 13|0  0
               |________________|
                              |_________________|
                                             |________________|

"VALID" only ever drops the right-most columns (or bottom-most rows).

"SAME" tries to pad evenly left and right, but if the amount of columns to be added is odd, it will add the extra column to the right, as is the case in this example (the same logic applies vertically: there may be an extra row of zeros at the bottom).


In [12]:

def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                        strides=[1, 2, 2, 1], padding='SAME')

##### First Convolutional Layer
The convolutional will compute 32 features for each 5x5 patch. Its weight tensor will have a shape of [5, 5, 1, 32]. The first two dimensions are the patch size, the next is the number of input channels, and the last is the number of output channels. We will also have a bias vector with a component for each output channel.

In [13]:
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])

In [14]:
#first reshape x to a 4d tensor, with the second and third dimensions corresponding to image width and height, 
#and the final dimension corresponding to the number of color channels (this is grey image).
x_image = tf.reshape(x, [-1,28,28,1])


In [15]:
#We then convolve x_image with the weight tensor, add the bias, apply the ReLU function, and finally max pool.
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

In [16]:
h_pool1.shape

TensorShape([Dimension(None), Dimension(14), Dimension(14), Dimension(32)])

##### Second Convolutional Layer
The second layer will have 64 features for each 5x5 patch.

In [17]:
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])

h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

In [18]:
h_pool2.shape

TensorShape([Dimension(None), Dimension(7), Dimension(7), Dimension(64)])

##### Densely Connected Layer
Now that the image size has been reduced to 7x7, we add a fully-connected layer with 1024 neurons to allow processing on the entire image. We reshape the tensor from the pooling layer into a batch of vectors, multiply by a weight matrix, add a bias, and apply a ReLU.

In [19]:
W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])

h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

To reduce overfitting, we will apply dropout before the readout layer. We create a placeholder for the probability that a neuron's output is kept during dropout. This allows us to turn dropout on during training, and turn it off during testing. TensorFlow's tf.nn.dropout op automatically handles scaling neuron outputs in addition to masking them, so dropout just works without any additional scaling.

In [20]:
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)


In [21]:
W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])

y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2

In [22]:
y_conv

<tf.Tensor 'add_4:0' shape=(?, 10) dtype=float32>

In [23]:
cross_entropy = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
sess.run(tf.global_variables_initializer())
for i in range(20000):
  batch = mnist.train.next_batch(50)
  if i%100 == 0:
    train_accuracy = accuracy.eval(feed_dict={
        x:batch[0], y_: batch[1], keep_prob: 1.0})
    print("step %d, training accuracy %g"%(i, train_accuracy))
  train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

print("test accuracy %g"%accuracy.eval(feed_dict={
    x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))

step 0, training accuracy 0.1
step 100, training accuracy 0.86
step 200, training accuracy 0.9
step 300, training accuracy 0.96
step 400, training accuracy 0.98
step 500, training accuracy 0.94
step 600, training accuracy 0.92
step 700, training accuracy 0.92
step 800, training accuracy 0.94
step 900, training accuracy 0.94
step 1000, training accuracy 0.98
step 1100, training accuracy 0.96
step 1200, training accuracy 0.96
step 1300, training accuracy 0.98
step 1400, training accuracy 0.92
step 1500, training accuracy 0.94
step 1600, training accuracy 0.96
step 1700, training accuracy 0.94
step 1800, training accuracy 0.96
step 1900, training accuracy 0.98
step 2000, training accuracy 0.98
step 2100, training accuracy 0.96
step 2200, training accuracy 0.94
step 2300, training accuracy 0.98
step 2400, training accuracy 0.96
step 2500, training accuracy 0.98
step 2600, training accuracy 0.98
step 2700, training accuracy 0.98
step 2800, training accuracy 1
step 2900, training accuracy 0.

In [24]:

my_tensor = tf.constant(0, shape=[4 ,2]) # <tf.Tensor 'Const_4:0' shape=(6, 2) dtype=int32>
my_dynamic_shape = tf.shape(my_tensor) # <tf.Tensor 'Shape:0' shape=(2,) dtype=int32>
# The shape is (2,) because my_tensor is a 2-D tensor, so the dynamic shape is a 1-D tensor containing size of my_tensor dimensions

my_reshaped_tensor = tf.reshape(my_tensor, [2, 2, 2]) # <tf.Tensor 'Reshape_2:0' shape=(2, 3, 2) dtype=int32>

# To access a dynamic shape value, you need to run your graph and feed any placeholder that your tensor my depended upon:
print(my_dynamic_shape.eval(session=tf.Session(), feed_dict={
    my_tensor: [[1., 2.], [1., 2.], [1., 2.], [1., 2.]]
}))

[4 2]


In [25]:
my_dynamic_shape

<tf.Tensor 'Shape_6:0' shape=(2,) dtype=int32>

In [26]:
0.9*np.log2(1/0.9)

0.13680278410054506

In [90]:
np.log2(1)

0.0

In [102]:
64*49


3136