Convolution is the concent of a sliding window function which moves a amount defined by the stride.

Now you know what convolutions are. But what about CNNs? CNNs are basically just several layers of convolutions with nonlinear activation functions like ReLU or tanh applied to the results. In a traditional feedforward neural network we connect each input neuron to each output neuron in the next layer. That’s also called a fully connected layer, or affine layer. In CNNs we don’t do that. Instead, we use convolutions over the input layer to compute the output. This results in local connections, where each region of the input is connected to a neuron in the output. 

Pooling refers to reducing the output of all the layers 1 level of the network which is passed as 1 input again to the next iteration layer.

In [1]:
import tensorflow as tf
from IPython.display import Image
import os

See illustration at http://www.wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp/

Also, Convolutional neural networks (CNN) are designed to recognize images. It has convolutions inside, which see the edges of an object recognized on the image. Recurrent neural networks (RNN) are designed to recognize sequences, for example, a speech signal or a text. The recurrent network has cycles inside that implies the presence of short memory in the net. 

In [2]:
#Import MNIST
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot = True)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


In [3]:
#Setting up tensforflow parameters
learning_rate =  0.001 
training_epochs = 300
batch_size = 50

n_inputs = 784 #MNIST data 28*28=784 for each image
n_classes = 10 #Output can be 1 of the 10 numbers
dropout_percentage = 0.75 #Dropout probability

In [4]:
##TF graph inputs

#mnist images are of shape 28*28 = 784
#We do not know the number of images/ they will be different for train and test thus None, 784
x = tf.placeholder(tf.float32, [None, n_inputs])

#y, i.e. the result can be any number from 0 to 9 in onehot encoded format
y = tf.placeholder(tf.float32, [None, n_classes]) 

#Keep probaility of the possible drop outs
keep_prob = tf.placeholder(tf.float32)

In [5]:
#Defining weights and bias
W = {## 5x5 convolution, 1 input and 32 outputs
     "w1": tf.Variable(tf.random_normal([5, 5, 1, 32])),
     ## 5x5 convolution, 32 inputs, 64 outputs
     "w2": tf.Variable(tf.random_normal([5, 5, 32, 64])),
     ## Fully connected, 7*7*64 inputs, 1024 outputs
     "w_full": tf.Variable(tf.random_normal([7*7*64, 1024])),
     ## 1024 inputs, 10 output classes
     "w_": tf.Variable(tf.random_normal([1024, n_classes]))
    
    }

b = {"b1": tf.Variable(tf.random_normal([32])),
     "b2": tf.Variable(tf.random_normal([64])),
     "b_full": tf.Variable(tf.random_normal([1024])),
     "b_": tf.Variable(tf.random_normal([n_classes]))
    }

In [6]:
def maxpool2d(x, k = 2):
    #MaxPool2D wrapper
    return tf.nn.max_pool(x, ksize = [1, k, k, 1], 
                          strides = [1, k, k, 1], 
                          padding= "SAME")

In [9]:
#Model construction
def conv_net(x, ws, bs, dropouts):
    #Reshape input
    xs = tf.reshape(x, shape = [-1, 28, 28, 1])
    stride = 1
    
    #convolution layer
    x = tf.nn.conv2d(xs, ws["w1"], strides = [1, stride, stride, 1], padding = "SAME")
    x = tf.nn.bias_add(x, bs["b1"])
    conv1 = tf.nn.relu(x)
    #max pooling (down-sampling)
    output_layer_1 = maxpool2d(conv1)
    
    #convolution layer
    output_layer_2 = tf.nn.conv2d(output_layer_1, ws["w2"], strides = [1, stride, stride, 1], padding = "SAME")
    output_layer_2 = tf.nn.bias_add(output_layer_2, bs["b2"])
    conv2 = tf.nn.relu(output_layer_2)
    #max pooling (down-sampling)
    output_layer_2 = maxpool2d(conv2)
    
    #fully connected layer
    #reshape output_layer_2 to fit as i/p to fully connected layer
    full_input = tf.reshape(output_layer_2, [-1, ws["w_full"].get_shape().as_list()[0]])
    
    full_layer_op = tf.add(tf.matmul(full_input,ws["w_full"]), bs["b_full"])
    output_layer = tf.nn.relu(full_layer_op)
    
    #incorporate dropouts
    out_dropout = tf.nn.dropout(output_layer, dropouts)
    
    output = tf.add(tf.matmul(out_dropout, ws["w_"]), bs["b_"])
    return output

In [12]:
#Model call
pred = conv_net(x, W, b, keep_prob)

#Now, every time this model is rerun to train, it will work towards reducing the loss and error
#Loss reduction will be based on cross entropy 1/(1+e**-x)
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits= pred, labels= y))

# AdamOptimizer training
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)

#Evaluation
correct_pred = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

In [None]:
##Initialize all variables and launch session

init = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
    
    #Training steps
    for epoch in range(training_epochs):
        avg_cost = 0.
        total_batch = int(mnist.train.num_examples/batch_size)
        
        #Training will be in batches now
        for i in range(total_batch):
            #pick up one batch at a time
            batch_xs, batch_ys = mnist.train.next_batch(batch_size)
            _, c = sess.run([optimizer, cost], feed_dict = {x: batch_xs,
                                                  y: batch_ys})
            new_cost = c/total_batch
            #computing intermediary loss
            avg_cost = avg_cost + new_cost
        
        if epoch%5 ==0:
            print "Epoch:", "%02d" % (epoch+1), "Cost:", avg_cost
            
    print "Training finished"  
    
    # Test model
    correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
    # Calculate accuracy for 3000 examples
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
    print "Accuracy:", accuracy.eval({x: mnist.test.images, y: mnist.test.labels})

This 2 hidden layer model performs better than the regular logistic regression.