## Convolutional Nets 
Convolutional Neural networks have proven to be very effective in image recognition tasks, and here we will train a basic conv-net onto the MNIST dataset. 

This tutorial is adopted from TensorFlow's Deep MNIST for experts tutorial https://www.tensorflow.org/versions/r0.8/tutorials/mnist/pros/index.html. 

### Load data

In [1]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


### Start a session 
Tensorflow relies on a C++ backend to execute the ggraph efficiently. The connection to this backend is called session. The common usage for TensorFlow programs is to first create a graph and then launch it in a session.

Here we instead use the convenient InteractiveSession class, which makes TensorFlow more flexible about how you structure your code. It allows you to interleave operations which build a computation graph with ones that run the graph. 

In [2]:
import tensorflow as tf
sess = tf.InteractiveSession()


## Train a Convolutional Net 
### Helper functions
We will define some helper functions. 

In [5]:
def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)
def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)

In [7]:
def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
def max_pool_2x2(x):
  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                        strides=[1, 2, 2, 1], padding='SAME')

### Build the model
#### First Convolutional layer
**Input Dimension** = 28 x 28; **Number of channels** = 1   
** Patch Size** = 5 x 5; **Number of features** = 32

In [8]:
# Reshape input to make it an image
x_image = tf.reshape(x, [-1,28,28,1])
# Define the first conv layer
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)



#### Second Convolutional layer
**Input Dimension** = 14 x 14; **Number of input channels** = 32   
** Patch Size** = 5 x 5; **Number of output features** = 64

In [None]:
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])

h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

#### Densely Connected layer
**Input Dimension** = 7 x 7; **Number of input channels** = 64   
** Number of hidden neurons** = 1024 

In [9]:
W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])

h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)


#### Dropouts 


In [None]:
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

#### Softmax readout layer
** Number of hidden Neurons** = 1024  
** Number of classes** = 10

In [None]:
W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])

y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)


### Evaluating the model 


In [10]:
# Define the cost function 
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y_conv), reduction_indices=[1]))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

# Initialize all variables 
sess.run(tf.initialize_all_variables())

# Train the model 
N 2000
for i in range(N):
  batch = mnist.train.next_batch(50)
  if i%100 == 0:
    train_accuracy = accuracy.eval(feed_dict={
        x:batch[0], y_: batch[1], keep_prob: 1.0})
    print("step %d, training accuracy %g"%(i, train_accuracy))
  train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

print("test accuracy %g"%accuracy.eval(feed_dict={
    x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))

step 0, training accuracy 0.14
step 100, training accuracy 0.8
step 200, training accuracy 0.94
step 300, training accuracy 0.88
step 400, training accuracy 0.98
step 500, training accuracy 0.86
step 600, training accuracy 0.96
step 700, training accuracy 0.92
step 800, training accuracy 0.92
step 900, training accuracy 1
step 1000, training accuracy 0.96
step 1100, training accuracy 0.92
step 1200, training accuracy 0.9
step 1300, training accuracy 1
step 1400, training accuracy 0.94
step 1500, training accuracy 0.98
step 1600, training accuracy 0.96
step 1700, training accuracy 0.94
step 1800, training accuracy 0.96
step 1900, training accuracy 0.98
test accuracy 0.9765


We get a test performance of $97.65\%$ with $N = 2000$ iterations during the batch training. If you increase N, then you will reach test accuracy of upto $99.2\%$ percent. 