# MNIST Example
http://neuralnetworksanddeeplearning.com/chap1.html

Start by loading in the MNIST data

In [1]:
import mnist_loader
training_data, validation_data, test_data = mnist_loader.load_data_wrapper()

In [24]:
print "Number of training data tuples: ", len(training_data)
print "Data points per input value: ", len(training_data[0][0])
print "Output array size: ", len(training_data[0][1])
print "Number of validation data tuples: ",len(validation_data)
print "Number of test data tuples: ",len(test_data)

Number of training data tuples:  50000
Data points per input value:  784
Output array size:  10
Number of validation data tuples:  10000


Set up a Network with 30 hidden neurons

In [32]:
import network
net = network.Network([784, 30, 10])

In [54]:
print "Number of neurons in the respective layers: ", net.sizes, "\n"
print "Number of biases between 1st and 2nd layers: ", len(net.biases[0])
print "Some example biases between 1st and 2nd layers: \n", net.biases[0][:3]
print "Number of biases between 2nd and 3rd layers: ", len(net.biases[0])
print "Some example biases between 2nd and 3rd layers: \n", net.biases[1][:3], "\n"
print "Number of weights between 1st and 2nd layers: ", len(net.weights[0])
print "Some example weights between 1st and 2nd layers: \n", net.weights[0][:3,:3]
print "Number of weights between 2nd and 3rd layers: ", len(net.weights[1])
print "Some example weights between 2nd and 3rd layers: \n", net.weights[1][:3,:3]

Number of neurons in the respective layers:  [784, 30, 10] 

Number of biases between 1st and 2nd layers:  30
Some example biases between 1st and 2nd layers: 
[[-0.65616326]
 [-0.66219892]
 [ 0.52779785]]
Number of biases between 2nd and 3rd layers:  30
Some example biases between 2nd and 3rd layers: 
[[ 0.11582079]
 [ 0.17365875]
 [-0.02760088]] 

Number of weights between 1st and 2nd layers:  30
Some example weights between 1st and 2nd layers: 
[[ 0.0431288  -1.39348301 -0.38850388]
 [-0.98134354 -0.57518265 -0.21679014]
 [ 0.26499918  0.8704585  -1.14835446]]
Number of weights between 2nd and 3rd layers:  10
Some example weights between 2nd and 3rd layers: 
[[-1.01367654  0.18505978 -0.60359269]
 [-1.02634188  0.63877486 -0.90054889]
 [ 1.16044398  0.22333662 -0.95617607]]


Use stochastic gradient descent to learn from the MNIST training_data over 10 epochs, with a mini-batch size of 10, and a learning rate of η=3.0

In [56]:
net.SGD(training_data, 10, 10, 3.0, test_data=test_data)

Epoch 0: 9343 / 10000
Epoch 1: 9383 / 10000
Epoch 2: 9393 / 10000
Epoch 3: 9404 / 10000
Epoch 4: 9419 / 10000
Epoch 5: 9435 / 10000
Epoch 6: 9425 / 10000
Epoch 7: 9458 / 10000
Epoch 8: 9432 / 10000
Epoch 9: 9433 / 10000


And once we've trained a network it can be run very quickly indeed, on almost any computing platform

Rerun the above experiment, changing the number of hidden neurons to 100

In [59]:
net = network.Network([784, 100, 10])
net.SGD(training_data, 10, 10, 3.0, test_data=test_data)

Epoch 0: 6178 / 10000
Epoch 1: 7596 / 10000
Epoch 2: 7687 / 10000
Epoch 3: 7696 / 10000
Epoch 4: 7709 / 10000
Epoch 5: 7745 / 10000
Epoch 6: 7722 / 10000
Epoch 7: 7791 / 10000
Epoch 8: 7825 / 10000
Epoch 9: 8600 / 10000


This should improve results, but there is quite some variation in results for this experiment, and some training runs give results quite a bit worse. Using the techniques introduced in chapter 3 will greatly reduce the variation in performance across different training runs for our networks

Number of epochs of training, the mini-batch size, and the learning rate, η are known as hyper-parameters for our neural network, in order to distinguish them from the parameters (weights and biases) learnt by our learning algorithm. If we choose our hyper-parameters poorly, we can get bad results.

In general, debugging a neural network can be challenging. This is especially true when the initial choice of hyper-parameters produces results no better than random noise. We might worry not only about the learning rate, but about every other aspect of our neural network. We might wonder if we've initialized the weights and biases in a way that makes it hard for the network to learn? Or maybe we don't have enough training data to get meaningful learning? Perhaps we haven't run for enough epochs? Or maybe it's impossible for a neural network with this architecture to learn to recognize handwritten digits? Maybe the learning rate is too low? Or, maybe, the learning rate is too high? When you're coming to a problem for the first time, you're not always sure.

The lesson to take away from this is that debugging a neural network is not trivial, and, just as for ordinary programming, there is an art to it. You need to learn that art of debugging in order to get good results from neural networks. More generally, we need to develop heuristics for choosing good hyper-parameters and a good architecture. 