# Neural Network Testing: Digit Classification

### Objective:

Test the neural network model in neuralnetwork.py on the MNIST hand-written digits dataset.

In [1]:
import numpy
from neuralnetwork import NeuralNetwork

In [2]:
with open("mnist_train.csv", "r") as f:
    train_data = f.readlines()
with open("mnist_test.csv", "r") as f:
    test_data = f.readlines()

### Dataset Description

The training set has 60,000 datapoints, while the testing set has 10,000. Each datapoint consists of a label (0 to 9) and $28^2 =  784$ integers from 0 to 255 encoding lightness vs darkness.

### Rescaling the Data

Our Neural Network will have 784 input nodes (for each 0-255 integer) and 10 output nodes (for each digit 0-9).

The chosen activation function (see neuralnetwork.py) only takes values between 0 and 1 (and neither can be attained). Therefore, our targets must be rescaled; 0.01 to 0.99 is chosen. This is discussed in more depth in theory.md.

Similarly, the inputs must be rescaled from 0-255 to 0.01-1.00. We must avoid 0, else this will kill any weight updates and the network wouldn't be able to extract information. Also, large values must be scaled down, else the gradient of the activation function will be too small for effective learning (the weight updates from gradient descent will have little effect).

In [3]:
def transform(datapoint):
    attributes = datapoint.split(",")
    inputs = [0]*784
    for i in range(1, 785):
        inputs[i-1] = 0.99*(int(attributes[i])/255) + 0.01
    return int(attributes[0]), inputs

### Training and Testing the Model

In [4]:
neuralnet = NeuralNetwork([784, 100, 10], 0.2)

In [5]:
for datapoint in train_data:
    target, inputs = transform(datapoint)
    targets = [0.01]*10
    targets[target] = 0.99
    neuralnet.train(inputs, targets)

In [6]:
train_score = 0
for datapoint in train_data:
    target, inputs = transform(datapoint)
    outputs = neuralnet.query(inputs)
    label = numpy.argmax(outputs)
    if label == target:
        train_score += 1
print("The train score is:", train_score)
print("The train accuracy is: ", train_score/60000)

The train score is: 57346
The train accuracy is:  0.9557666666666667


In [7]:
test_score = 0
for datapoint in test_data:
    target, inputs = transform(datapoint)
    outputs = neuralnet.query(inputs)
    label = numpy.argmax(outputs)
    if label == target:
        test_score += 1
print("The test score is:", test_score)
print("The test accuracy is: ", test_score/10000)

The test score is: 9518
The test accuracy is:  0.9518


### Comments:

(1) The train accuracy and test accuracy were both remarkably similar. This suggests that the model is not overfitting.

(2) The code above took my laptop about a minute to run.

### Next Stages

Now I will test the model with a different network shape.

In [8]:
n = NeuralNetwork([784, 80, 40, 10], 0.2)

In [9]:
for datapoint in train_data:
    target, inputs = transform(datapoint)
    targets = [0.01]*10
    targets[target] = 0.99
    n.train(inputs, targets)

In [10]:
train_score = 0
for datapoint in train_data:
    target, inputs = transform(datapoint)
    outputs = n.query(inputs)
    label = numpy.argmax(outputs)
    if label == target:
        train_score += 1
print("The train score was:", train_score)
print("The train accuracy was:", train_score/60000)

The train score was: 49326
The train accuracy was: 0.8221


In [11]:
test_score = 0
for datapoint in test_data:
    target, inputs = transform(datapoint)
    outputs = n.query(inputs)
    label = numpy.argmax(outputs)
    if label == target:
        test_score += 1
print("The test score was:", test_score)
print("The test accuracy was:", test_score/10000)

The test score was: 8275
The test accuracy was: 0.8275


### Comments:

(1) We again had remarkably low levels of overfitting. This suggests a learning rate of 0.2 is good and the model is able to learn well.

(2) This time the accuracy was way worse. This suggests that the network shape is very important. The model "learnt well" but it seems this shape isn't right.

(3) A proper analysis here would require several runs (as the randomly chosen initial weights will have an effect on how the gradient descent algorithm works).

### Final Test

As this project is more a 'proof of concept/learning tool' than actually trying to build the best digit classification model, I won't run a bunch of tests. Instead, I'll try one last network shape.

In [12]:
final_net = NeuralNetwork([784, 200, 10], 0.2)

In [13]:
for datapoint in train_data:
    target, inputs = transform(datapoint)
    targets = [0.01]*10
    targets[target] = 0.99
    final_net.train(inputs, targets)

In [14]:
train_score = 0
for datapoint in train_data:
    target, inputs = transform(datapoint)
    outputs = final_net.query(inputs)
    label = numpy.argmax(outputs)
    if label == target:
        train_score += 1
print("The train score was:", train_score)
print("The train accuracy was:", train_score/60000)

The train score was: 57705
The train accuracy was: 0.96175


In [15]:
test_score = 0
for datapoint in test_data:
    target, inputs = transform(datapoint)
    outputs = final_net.query(inputs)
    label = numpy.argmax(outputs)
    if label == target:
        test_score += 1
print("The test score was:", test_score)
print("The test accuracy was:", test_score/10000)

The test score was: 9578
The test accuracy was: 0.9578


### Comments:

(1) At first, I tried using 1000 nodes in the middle layer but this took way too long so I went down to 200 instead. This was much more reasonable time-wise.

(2) Our test accuracy improved slightly from 95% (in the 100 hidden nodes case) to 96%. Again, more testing would be required to see if this is a proper result and not just noise, but it suggests that more hidden nodes might give a better network shape.

## Conclusion:

The Neural Network seems to be well suited to digit classification problems. In my GitHub repository 'machine-learning-course' the notebook "8_ensemble_methods" uses decision tree classification models on this problem. Unfortuately, that project involved a different dataset, so it doesn't seem fair to compare results.



It should be noted that Neural Nets are likely a good choice because there isn't really a 'correct answer'. For example, different people may disagree on whether a given digit is a '4' or a '9'. The universe doesn't define a correct answer, and the network's output will capture this uncertainty (in the above project, I just took the index of largest output to be the answer, but some cases will be clearer-cut than others). On the other hand, decision trees arrive at a 'correct' answer without this uncertainty (though with ensemble methods, where multiple trees 'vote' on the correct answer, perhaps this could be viewed as capturing the same concept).

### One last thing...

I just want to see whether the above hypothesis is correct (if the model has a harder time differentiating between 4s and 9s).

In [16]:
# neuralnet model is already trained

fail_freq = {0:0, 1:0, 2:0, 3:0, 4:0, 5:0, 6:0, 7:0, 8:0, 9:0}

train_score = 0
for datapoint in train_data:
    target, inputs = transform(datapoint)
    outputs = neuralnet.query(inputs)
    label = numpy.argmax(outputs)
    if label == target:
        train_score += 1
    else:
        fail_freq[target] += 1
print("The train score is:", train_score)
print("The train accuracy is: ", train_score/60000)
print(fail_freq)

The train score is: 57346
The train accuracy is:  0.9557666666666667
{0: 106, 1: 132, 2: 373, 3: 359, 4: 311, 5: 321, 6: 117, 7: 306, 8: 439, 9: 190}


### Answer

Not really. All digits have a fair few fails. It seems 8s are the hardest to get right (maybe because they can be confused with 0s, 3s, 6s and 9s?).