In [7]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory

import os
print(os.listdir("../input"))

# Any results you write to the current directory are saved as output.

A Convolutional Neural Network is better suited when you have data that doesn抰 neatly align into columns. This is typical for image processing. CNN's are less sensitive to where in the image the pattern is that we're looking for.

With a multi-layer perceptron, we achieved around 97% accuracy. Let's see if we can beat that.

Why Keras on top of TensorFlow?
Keras is a layer on top of TensorFlow that makes things a lot easier. Not only is it easier to use, it's easier to tune.

I'll start by importing the stuff I need, including the new layer needed in a CNN:

In [19]:
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Conv2D, MaxPooling2D, Flatten
from keras.optimizers import RMSprop

## Downloading the data

In [12]:
import pandas as pd
input_file = ("../input/train.csv")

df_train=pd.read_csv(input_file)
df_train.shape # (42000, 785)
df_train.head()

In [13]:
import pandas as pd
input_file = ("../input/test.csv")

df_test=pd.read_csv(input_file)
df_test.shape # gives (28000, 784)
df_test.head() #gives a dataframe

### Making a Train-Test -split
The original test.csv -file does not contain any label data; it cannot be used in testing the accuracy of the model. I need therefore to use the original train.csv -file for creating a Train- and Test dataset. Using a standard 80/20 -split the sizes of the files will be as follows

Train: 0.80 x 42000 = 33600<br>
Test: 0.20 x 42000 = 8400

In [14]:
from sklearn.model_selection import train_test_split # the needed split-function imported from scikit-learn

train_set, test_set = train_test_split(df_train, test_size=0.20, random_state=42)

X_train_set = train_set.drop(['label'], axis=1) #Dropping 'label', the predicted variable 
y_train_set = train_set['label'] # keeping 'label', the predicted variable 

X_test_set = test_set.drop(['label'], axis=1)
y_test_set = test_set['label']

### Converting the dataframes into numpy arrays
Since I need to use the reshape function (see following cell), which cannot be used on a dataframe, I need to convert the dataframes first into numpy arrays.

In [15]:
df_train_label_array = y_train_set.as_matrix() #creates a numpy array of the df
df_train_image_array = X_train_set.as_matrix() #creates a numpy array of the df

df_test_image_array = X_test_set.as_matrix()
df_test_label_array = y_test_set.as_matrix()

### 2D images vs. flattened 1D streams
We need to shape the data differently than in an "ordinary" Neural Network. Since we're treating the data as 2D images of 28x28 pixels instead of a flattened stream of 784 pixels, we need to shape it accordingly. Depending on the data format Keras is set up for, this may be 1x28x28 or 28x28x1 (the "1" indicates a single color channel, as this is just grayscale. If we were dealing with color images, it would be 3 instead of 1 since we'd have red, green, and blue color channels)

In [16]:
from keras import backend as K

if K.image_data_format() == 'channels_first':
    train_images = df_train_image_array.reshape(df_train_image_array.shape[0], 1, 28, 28)
    test_images =  df_test_image_array.reshape(df_test_image_array.shape[0], 1, 28, 28)
    input_shape = (1, 28, 28)
else:
    train_images = df_train_image_array.reshape(df_train_image_array.shape[0], 28, 28, 1)
    test_images = df_test_image_array.reshape(df_test_image_array.shape[0], 28, 28, 1)
    input_shape = (28, 28, 1)
    
train_images = train_images.astype('float32')
test_images = test_images.astype('float32')
train_images /= 255
test_images /= 255

In [17]:
test_images.shape

### Converting the Train and Test labels
Next I need to convert my train and test labels to be categorical in *one-hot vector* format. One-hot encoding is a process by which categorical variables - here: integers - are converted into a form that could be provided to ML algorithms to do a better job in prediction.

In [20]:
train_labels = keras.utils.to_categorical(df_train_label_array, 10)
test_labels = keras.utils.to_categorical(df_test_label_array, 10)
test_labels

### As a sanity check let's print out a few of the training images with its label:

In [21]:
import matplotlib.pyplot as plt

def display_sample(num):
    #Print the one-hot array of this sample's label 
    print(train_labels[num])  
    #Print the label converted back to a number
    label = train_labels[num].argmax(axis=0)
    #Reshape the 768 values to a 28x28 image
    image = train_images[num].reshape([28,28])
    plt.title('Sample: %d  Label: %d' % (num, label))
    plt.imshow(image, cmap=plt.get_cmap('gray_r'))
    plt.show()
    
display_sample(1111) #the 1111th image in the Training set
display_sample(2222) #the 2222nd image in the Training set
display_sample(3333) #the 3333rd image in the Training set

Now for the meat of the problem. Setting up a convolutional neural network involves more layers. Not all of these are strictly necessary; you could run without pooling and dropout, but those extra steps help avoid overfitting and help things run faster.

I'll start with a 2D convolution of the image - it's set up to take 32 windows, or "filters", of each image, each filter being 3x3 in size.

We then run a second convolution on top of that with 64 3x3 windows. Please note! This topology is just what comes recommended within Keras's own examples. Again you want to re-use previous research whenever possible while tuning CNN's, as it is hard to do.

Next I apply a MaxPooling2D layer that takes the maximum of each 2x2 result to distill the results down into something more manageable.

A dropout filter is then applied to prevent overfitting.

Next I flatten the 2D layer I have at this stage into a 1D layer. So at this point I can just pretend we have a traditional multi-layer perceptron, an "ordinary" neural network

... and feed that into a hidden, flat layer of 128 units.

I then apply dropout again to further prevent overfitting.

And finally, I feed that into our final 10 units where softmax is applied to choose our category of 0-9.

In [23]:
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=input_shape))
# 64 3x3 kernels
model.add(Conv2D(64, (3, 3), activation='relu'))
# Reduce by taking the max of each 2x2 block
model.add(MaxPooling2D(pool_size=(2, 2)))
# Dropout to avoid overfitting
model.add(Dropout(0.25))
# Flatten the results to one dimension for passing into our final layer
model.add(Flatten())
# A hidden layer to learn with
model.add(Dense(128, activation='relu'))
# Another dropout
model.add(Dropout(0.5))
# Final categorization from 0-9 with softmax
model.add(Dense(10, activation='softmax'))

### Let's double check the model description:

In [24]:
model.summary()

In [25]:
model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

And now for the training of the model. To make things go a little faster, I'll use batches of 32.

## Warning !

**Running these 10 epochs on a CPU took me around 30 minutes.** Don't run the next block unless you can tie up your computer for at least half-an-hour. It will print progress as each epoch is run, but each epoch can take several minutes. Perhaps runnig this on a GPU would make it faster?

In [26]:
history = model.fit(train_images, train_labels,
                    batch_size=32,
                    epochs=10,
                    verbose=2,
                    validation_data=(test_images, test_labels))

#### Was it worth the wait? 

In [27]:
score = model.evaluate(test_images, test_labels, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Around 99%! And that's with just 10 epochs! It came at a significant cost in terms of computing power, but when you start distributing things over multiple computers each with multiple GPU's, that cost might start to feel less bad. If you're building something where life and death are on the line, like a self-driving car, every fraction of a percent matters!<br><br>
Before submitting this to Kaggle.com the  original test file *df_test* needs to be converted into a numpy array to be able  to receive the predictions. Testing here first with a numpy array conversion: 

In [28]:
df_test_RESULTS = df_test.as_matrix()
df_test_RESULTS.shape

In [29]:
testX = df_test_RESULTS.reshape(df_test_RESULTS.shape[0], 28, 28, 1)
testX = testX.astype(float)
testX /= 255.0
testX.shape

### Kaggle submission cell 

In [30]:
predictions = model.predict_classes(testX, verbose=0)

submissions=pd.DataFrame({"ImageId": list(range(1,len(predictions)+1)),
                         "Label": predictions})
submissions.to_csv("DR.csv", index=False, header=True)
