This notebook implements a Convolutional Neural Network (CNN) for the Digit Recognizer Kaggle competition found here: https://www.kaggle.com/c/digit-recognizer

Notebook by Jonathan Gomez Martinez

Used guide provided by TensorFlow here: https://www.tensorflow.org/get_started/mnist/pros

Here we import the libraries neccessary for our project

In [1]:
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import pandas as pd
import numpy as np
import tensorflow as tf
import time
from sklearn.metrics import confusion_matrix

First step in training is to get the data ready. We import the data, provided as a csv on Kaggle.

In [2]:
#Import the provided training dataset
rawtrain = pd.read_csv("train.csv")

#Choose an option below by placing/removing a "#"

#rawin = rawtrain.iloc[:10000,] #Only keep some rows for development efficiency
rawin = rawtrain  #Use all rows for improved accuracy

Tensorflow does not like the labels in the current format where 1 attribute holds the labels as an integer from 0-9. It prefers that each potential label be a binary choice within its own attribute. We will format it accordingly here.

In [3]:
zeros = pd.DataFrame(np.zeros((len(rawin.index), 10)), columns=['x0', 'x1', 'x2', 'x3', 'x4', 'x5', 'x6', 'x7', 'x8',  'x9'])
raw = pd.concat([zeros,rawin], axis =1)
for i in raw.index:
    j = raw.label[i]
    raw[raw.columns[j]][i] = 1.0

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


start = time.time()
for i in raw.index:
    if raw.label[i] == 0:
        raw.x0[i] = 1.0
    elif raw.label[i] == 1:
        raw.x1[i] = 1.0
    elif raw.label[i] == 2:
        raw.x2[i] = 1.0
    elif raw.label[i] == 3:
        raw.x3[i] = 1.0
    elif raw.label[i] == 4:
            raw.x4[i] = 1.0
    elif raw.label[i] == 5:
        raw.x5[i] = 1.0
    elif raw.label[i] == 6:
        raw.x6[i] = 1.0
    elif raw.label[i] == 7:
        raw.x7[i] = 1.0
    elif raw.label[i] == 8:
        raw.x8[i] = 1.0
    elif raw.label[i] == 9:
        raw.x9[i] = 1.0
end = time.time()
end - start

For testing purposes, we need to seperate the data set into training and testing sets.
We will also seperate the labels from the images in this step

In [4]:
test=raw.sample(frac=0.2,random_state=1251)
train=raw.drop(test.index)
l1 = len(train)
l2 = len(test)
print("Now we have", l1, "training digits and", l2, "testing digits")
#Seperate the drawn digits from the labels for both sets of data 
train_x = train.iloc[0:,11:]
train_y = train.iloc[0:,:10]
#len(train_x)
#train_x.head()
#train_y.head()
test_x = test.iloc[0:,11:]
test_y = test.iloc[0:,:10]
test_labels = test.iloc[0:,10:11]

Now we have 33600 training digits and 8400 testing digits


We can finally start to prepare our CNN. First step is to initialize a TensorFlow session

In [5]:
#We start a tensorflow session named sess
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()

Next we will initialize some weights and variables for our CNN to use in training

In [6]:
#Define some functions to simplify code later on
def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)

def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                        strides=[1, 2, 2, 1], padding='SAME')

W_conv1 = weight_variable([5, 5, 1, 32]) #Creates convolutional weight layer
b_conv1 = bias_variable([32]) #Creates bias layer

x = tf.placeholder(tf.float32, [None, 784]) # Placeholder for input
W = tf.Variable(tf.zeros([784, 10])) #Variable that transforms input after training 
b = tf.Variable(tf.zeros([10])) #Variable that transforms input after training
y = tf.nn.softmax(tf.matmul(x, W) + b) #Loss function formula
y_ = tf.placeholder(tf.float32, [None, 10])  #Placeholder for output

cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1])) #Define cost function
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy) #Define training step

Since a CNN is meant for images, we need to reshape the placeholder vector x into a 28 by 28 matrix, essentially taking our 1D arrays and returning them to the initial image state.

In [7]:
x_image = tf.reshape(x, [-1,28,28,1]) 

We will now create layers for our neural network.

In [9]:
#First Layer
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

#Second Layer
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])

h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

#Initialize layer variables
W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])

h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])

keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2

Next we define the cost function, training step, and measure of accuracy

In [10]:
cross_entropy = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
sess.run(tf.global_variables_initializer())

This is where we train the CNN. Since CNNs are resource heavy, we take subsets of the data and train in steps as opposed to running through the entire dataset. This also allows us to use a random sample each time.

In [11]:
for i in range(10000): 
    t=train.sample(frac=0.0125) #Use 1.25% of training examples per iteration
    ttrain_x = t.iloc[0:,11:] #Seperates labels from images
    ttrain_y = t.iloc[0:,:10]
    train_step.run(feed_dict={x: ttrain_x, y_: ttrain_y, keep_prob: 1}) #Train
    if i%50 == 0: #Print metrics during training
        print("current iteration " + str(i))
        print("test accuracy on training %g"%accuracy.eval(feed_dict={
            x: ttrain_x, y_: ttrain_y, keep_prob: .75}))
#        print("test accuracy on testing %g"%accuracy.eval(feed_dict={
#            x: test_x, y_: test_y, keep_prob: 1.0}))

current iteration 0
test accuracy on training 0.102381
current iteration 50
test accuracy on training 0.547619
current iteration 100
test accuracy on training 0.595238
current iteration 150
test accuracy on training 0.580952
current iteration 200
test accuracy on training 0.592857
current iteration 250
test accuracy on training 0.592857
current iteration 300
test accuracy on training 0.616667
current iteration 350
test accuracy on training 0.630952
current iteration 400
test accuracy on training 0.642857
current iteration 450
test accuracy on training 0.65
current iteration 500
test accuracy on training 0.657143
current iteration 550
test accuracy on training 0.652381
current iteration 600
test accuracy on training 0.685714
current iteration 650
test accuracy on training 0.647619
current iteration 700
test accuracy on training 0.628571
current iteration 750
test accuracy on training 0.628571
current iteration 800
test accuracy on training 0.652381
current iteration 850
test accuracy on

Finally we have our results

In [12]:
print("test accuracy on training %g"%accuracy.eval(feed_dict={
            x: test_x, y_: test_y, keep_prob: 1.0}))

test accuracy on training 0.982381


Due to the heavy resource use of a neural net, it is best practice to close the session and release resources.

In [13]:
sess.close()