DIGIT RECOGNIZER (MNIST dataset) :
I have implemented this dataset using CNN in Tensorflow. I have used 2 Convolution layers and 2 MaxPooling layers. Images which are initially flattened and are of 784 pixels are then converted to 2D image of 28*28 pixels.After passing through convolution and maxpooling layers we then flattten it again and pass through a dense layer to bring some uncertainity. Further we pass it through a dropout layer which makes some units inactive and has keep_probability of 0.8 It is done for Regularisation and helps to avoid overfitting of data. Further it goes to our output layer which gives output as labels of 0-9. Using argmax we get the argument no. of max value and hence it is our predicted number.  

In [1]:
import tensorflow as tf

In [2]:
# i have taken the mnist data input from tensorflow
from tensorflow.examples.tutorials.mnist import input_data
mnist=input_data.read_data_sets("MNIST_data/",one_hot=True)
# one_hot means that we have encoded the digit as labels from 0-9 

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


In [3]:
#we will convert our 784 pixel input to 2_D image of 28 * 28 pixels
input_width=28
input_height=28
input_channels=1
input_pixels=784
# I have used 2 convolution and 2 maxpooling layers in my CNN model
#first convolution layer has 32 units and second has 64 units
# the filter for a unit is of size k*k and stride is 1 and padding=same
# filter for maxpooling layer is 2*2
n_conv1=32
n_conv2=64
stride_conv1=1
stride_conv2=1
k_conv1=5
k_conv2=5
k_maxpool1=2
k_maxpool2=2
# number of units in hidden layer is 1024
#output layer has 10 units
n_hidden=1024
n_out=10

input_size_to_hidden=(input_width)//(k_maxpool1*k_maxpool2)*(input_height)//(k_maxpool1*k_maxpool2)*n_conv2

In [4]:
# weights and biases for each layer are generated using Tensorflow 
weights={
    'wcl1':tf.Variable(tf.random_normal([k_conv1,k_conv1,input_channels,n_conv1])),
    'wcl2':tf.Variable(tf.random_normal([k_conv2,k_conv2,n_conv1,n_conv2])),
    'whl' :tf.Variable(tf.random_normal([input_size_to_hidden,n_hidden])),
    'out' :tf.Variable(tf.random_normal([n_hidden,n_out]))
}

biases={
    'bcl1':tf.Variable(tf.random_normal([n_conv1])),
    'bcl2':tf.Variable(tf.random_normal([n_conv2])),
    'bhl':tf.Variable(tf.random_normal([n_hidden])),
    'out':tf.Variable(tf.random_normal([n_out])),
}

In [5]:
# here the conv function adds weights and biases to our input data and does all of stride and padding
def conv(x, weights, bias, strides):
    out = tf.nn.conv2d(x, weights, padding="SAME", strides = [1, strides, strides, 1])
    out = tf.nn.bias_add(out, bias)
    out = tf.nn.relu(out)
    return out
# maxpooling function does the pooling which is used to avoid mainly overfitting in our model 
def maxpooling(x, k):
    return tf.nn.max_pool(x, padding = "SAME", ksize = [1, k, k, 1], strides = [1, k, k, 1])

In [6]:
#cnn function defines all of our convolution and maxpooling layers
def cnn(x,weights,biases,keep_prob):
    x=tf.reshape(x,shape=[-1,input_height,input_width,input_channels])
    conv1=conv(x,weights['wcl1'],biases['bcl1'],stride_conv1)
    conv1_pool=maxpooling(conv1,k_maxpool1)
    
    conv2=conv(conv1_pool,weights['wcl2'],biases['bcl2'],stride_conv2)
    conv2_pool=maxpooling(conv2,k_maxpool2)
    
# now here we have our hidden layers that applies relu activation function
# we also have a dropout layer that brings some uncertainity needed and has keep probability of 0.8 
    hidden_input=tf.reshape(conv2_pool,shape=[-1,input_size_to_hidden])
    hidden_output_before_activation=tf.add(tf.matmul(hidden_input,weights['whl']),biases['bhl'])
    hidden_output_before_dropout=tf.nn.relu(hidden_output_before_activation)
    hidden_output=tf.nn.dropout(hidden_output_before_dropout,keep_prob)
    
    output=tf.add(tf.matmul(hidden_output,weights['out']),biases['out'])
    return output

In [7]:
x = tf.placeholder("float", [None, input_pixels])
y = tf.placeholder(tf.int32, [None, n_out])
keep_prob = tf.placeholder("float")
pred = cnn(x, weights, biases, keep_prob)

In [8]:
cost=tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=pred,labels=y))
#this function measures the softmax cross entropy between the predictions and actual label y

In [9]:
optimizer = tf.train.AdamOptimizer(learning_rate=0.001)
optimize = optimizer.minimize(cost)
#this optimizer runs the Adam algorithm and minimizes the cost

In [10]:
sess=tf.Session()
sess.run(tf.global_variables_initializer())

In [11]:
# By providing all data at one go we can lead to overfitting of data 
#so to avoid it we pass input images in batches rather than all at once
# for every batch it calculates cost and we see that cost keeps decreasing(optimizing)
#mnist.train.next_batch itself provides the batch of given size 
batch_size = 100
for i in range(25):
    num_batches = int(mnist.train.num_examples/batch_size)
    total_cost = 0
    for j in range(num_batches):
        batch_x, batch_y = mnist.train.next_batch(batch_size)
        c, _ = sess.run([cost,optimize], feed_dict={x:batch_x , y:batch_y, keep_prob:0.8})
        total_cost += c
    print(total_cost)

2784669.76663208
397418.98651123047
210723.9832226038
130809.74623095989
86256.00763153154
63309.639527991414
47508.198832422495
37339.08620789318
28907.797228210286
24601.977367603882
18381.862570186408
17769.399864750394
15595.486658436887
12152.905498364093
10192.026421776842
9810.250925540902
8378.169571807022
7542.190902253389
7251.90639526397
5277.771161342505
6823.323988132179
5242.879463316202
5584.342637205851
5281.43017841205
4233.545157329124


In [12]:
# we then get predictions in binary format for 10 classes
# by getting argument of max value we get our prediction
predictions=tf.argmax(pred,1)
correct_labels=tf.argmax(y,1)
correct_predictions=tf.equal(predictions,correct_labels)
predictions,correct_pred=sess.run([predictions,correct_predictions],feed_dict={x:mnist.test.images, y:mnist.test.labels, keep_prob:1.0})
correct_pred.sum()
#tf.equal gives all those predictions which are equal to our correct_labels
#out of 10000 images we have predicted 9841 correct (98.41 % accuracy)


9841