For our first attempt, we will be using random forest classifier to fit our classifer to the training data.

In [1]:
from sklearn.ensemble import RandomForestClassifier
import numpy as np
import pandas as pd

training_data = pd.read_csv("data/digit-train.csv")
test_data = pd.read_csv("data/digit-test.csv").values

target = training_data[[0]].values.ravel()
train = training_data.iloc[:,1:].values
test = pd.read_csv("data/digit-test.csv").values

rf = RandomForestClassifier(n_estimators=100)
rf.fit(train, target)
pred = rf.predict(test)

np.savetxt('data/kaggle-digit-classifier-2016-07-27.csv', np.c_[range(1,len(test)+1),pred], delimiter=',', header = 'ImageId,Label', comments = '', fmt='%d')

After submitting this in kaggle, the accuracy gives us around 0.96 which is the entry level. This is similar to the accuracy of the example in the scikit-learn documentation using Support Vector Classification (http://scikit-learn.org/stable/auto_examples/classification/plot_digits_classification.html)

However, as we can see in the MNIST page (http://yann.lecun.com/exdb/mnist/), convolutional neural networks would yield the most accurate prediction with a test error rate that ranges from 1.7 to 0.23, compared to neural nets (4.7-0.35) and SVMs (1.4 -0.56)

Let's identify first the data, then plot it to get a glimpse of the actual image, so that we can prepare the needed layers for the neural network.

In [2]:
print(training_data.head())

   label  pixel0  pixel1  pixel2  pixel3  pixel4  pixel5  pixel6  pixel7  \
0      1       0       0       0       0       0       0       0       0   
1      0       0       0       0       0       0       0       0       0   
2      1       0       0       0       0       0       0       0       0   
3      4       0       0       0       0       0       0       0       0   
4      0       0       0       0       0       0       0       0       0   

   pixel8    ...     pixel774  pixel775  pixel776  pixel777  pixel778  \
0       0    ...            0         0         0         0         0   
1       0    ...            0         0         0         0         0   
2       0    ...            0         0         0         0         0   
3       0    ...            0         0         0         0         0   
4       0    ...            0         0         0         0         0   

   pixel779  pixel780  pixel781  pixel782  pixel783  
0         0         0         0         0         

In [3]:
target = target.astype(np.uint8)
train = np.array(train).reshape((-1, 1, 28, 28)).astype(np.uint8)
test = np.array(test).reshape((-1, 1, 28, 28)).astype(np.uint8)

This allows us to re-evaluate (reshape) and transform our training data which has 784 columns of pixel data into a 28 x 28 grids. We'll be using matplotlib to graph this data.

In [4]:
import matplotlib.pyplot as plt
import matplotlib.cm as cm

plt.imshow(train[1729][0], cmap=cm.binary)

<matplotlib.image.AxesImage at 0x7fef946fa590>

We will be using nolearn to build our convolutional neural network. Nolearn is based on the lasagne library, which allows us to build neural networks.

Lasagne and Nolearn can be installed by using pip:
- pip install -r https://raw.githubusercontent.com/Lasagne/Lasagne/master/requirements.txt

- pip install -r https://raw.githubusercontent.com/dnouri/nolearn/master/requirements.txt

In [5]:
import lasagne
from lasagne import layers
from lasagne.updates import nesterov_momentum
from nolearn.lasagne import NeuralNet
from nolearn.lasagne import visualize

  "downsample module has been moved to the theano.tensor.signal.pool module.")


Next, we define a new neural network that will train the data.

In [6]:
training_net = NeuralNet(
    layers=[('input', layers.InputLayer),
            ('hidden', layers.DenseLayer),
            ('output', layers.DenseLayer),
    ],

    input_shape=(None,1,28,28), #input layer
    hidden_num_units=1000, #hidden layer
    output_nonlinearity=lasagne.nonlinearities.softmax, #softmax
    output_num_units=10, #target values

    # optimization method:
    update=nesterov_momentum,
    update_learning_rate=0.0001,
    update_momentum=0.9,
    max_epochs=15,
    verbose=1,
    )

In [7]:
training_net.fit(train, target)

# Neural Network with 795010 learnable parameters

## Layer information

  #  name    size
---  ------  -------
  0  input   1x28x28
  1  hidden  1000
  2  output  10

  epoch    trn loss    val loss    trn/val    valid acc  dur
-------  ----------  ----------  ---------  -----------  -----
      1     [36m8.00096[0m     [32m1.78633[0m    4.47899      0.92945  4.27s
      2     [36m0.89166[0m     [32m1.27761[0m    0.69792      0.93896  4.01s
      3     [36m0.39200[0m     [32m1.12841[0m    0.34739      0.94039  3.73s
      4     [36m0.18927[0m     [32m1.01890[0m    0.18576      0.94313  3.60s
      5     [36m0.10324[0m     [32m0.99925[0m    0.10332      0.94539  3.64s
      6     [36m0.05084[0m     [32m0.95761[0m    0.05309      0.94694  3.70s
      7     [36m0.02757[0m     0.96898    0.02845      0.94717  4.41s
      8     [36m0.01281[0m     [32m0.94990[0m    0.01349      0.94884  4.11s
      9     [36m0.00665[0m     [32m0.91979[0m    0.00723      0.9

NeuralNet(X_tensor_type=None,
     batch_iterator_test=<nolearn.lasagne.base.BatchIterator object at 0x7fef9296fe90>,
     batch_iterator_train=<nolearn.lasagne.base.BatchIterator object at 0x7fef9296fd50>,
     check_input=True, custom_scores=None, hidden_num_units=1000,
     input_shape=(None, 1, 28, 28),
     layers=[('input', <class 'lasagne.layers.input.InputLayer'>), ('hidden', <class 'lasagne.layers.dense.DenseLayer'>), ('output', <class 'lasagne.layers.dense.DenseLayer'>)],
     loss=None, max_epochs=15, more_params={},
     objective=<function objective at 0x7fef9296aaa0>,
     objective_loss_function=<function categorical_crossentropy at 0x7fef92a27050>,
     on_batch_finished=[],
     on_epoch_finished=[<nolearn.lasagne.handlers.PrintLog instance at 0x7fef92925440>],
     on_training_finished=[],
     on_training_started=[<nolearn.lasagne.handlers.PrintLayerInfo instance at 0x7fef92925488>],
     output_nonlinearity=<function softmax at 0x7fef92b91140>,
     output_num_units

Implementing the neural network above gives us 0.95 accuracy. Although a bit lesser than our random classifier, this is still quite good by itself. 

Next, we'll be implementing convolutional neural network.

A convolutional neural network (CNN) refers to a type of neural network which uses the convolution operator (often the 2D convolution when it is used for image processing tasks) to extract features from the data. In image processing, filters, that are convoluted with images, are learned automatically to solve the task at hand, e.g. a classification task.

In our case, we will be using two convolutional layers (filtering), and one pooling layer.

In [10]:
def CNN(n_epochs):
    net1 = NeuralNet(
        layers=[
            ('input', layers.InputLayer),
            ('conv1', layers.Conv2DLayer),
            ('pool1', layers.MaxPool2DLayer),
            ('conv2', layers.Conv2DLayer),
            ('pool2', layers.MaxPool2DLayer),
            ('hidden3', layers.DenseLayer),
            ('output', layers.DenseLayer),
        ],
        input_shape=( None , 1, 28, 28),
        conv1_num_filters=7,
        conv1_filter_size=(3, 3),
        conv1_nonlinearity=lasagne.nonlinearities.rectify,
        pool1_pool_size=(2, 2),
        conv2_num_filters=12,
        conv2_filter_size=(2, 2),
        conv2_nonlinearity=lasagne.nonlinearities.rectify,
        pool2_pool_size=(2, 2),
        hidden3_num_units=1000,
        output_num_units=10,
        output_nonlinearity=lasagne.nonlinearities.softmax,
        update_learning_rate=0.0001,
        update_momentum=0.9,
        max_epochs=n_epochs,
        verbose=1,
        )
    return net1
cnn = CNN(5).fit(train,target)

# Neural Network with 443428 learnable parameters

## Layer information

  #  name     size
---  -------  --------
  0  input    1x28x28
  1  conv1    7x26x26
  2  pool1    7x13x13
  3  conv2    12x12x12
  4  pool2    12x6x6
  5  hidden3  1000
  6  output   10

  epoch    trn loss    val loss    trn/val    valid acc  dur
-------  ----------  ----------  ---------  -----------  ------
      1     [36m1.51190[0m     [32m0.43328[0m    3.48943      0.87008  22.04s
      2     [36m0.33894[0m     [32m0.29825[0m    1.13640      0.91101  26.34s
      3     [36m0.24318[0m     [32m0.24164[0m    1.00636      0.92838  28.04s
      4     [36m0.19615[0m     [32m0.21022[0m    0.93308      0.94027  25.62s
      5     [36m0.16692[0m     [32m0.18978[0m    0.87956      0.94432  25.70s


As can be seen, as the epoch increases, the accuracy increases until it tapers off at the maximum value. While the accuracy in 5 epochs arguably is not as high as that of our random classifier, we should note that as the amount of training data that we have increases our accuracy for CNN does as well. The main problem that we have with it later on would be overfitting.