## Testing performance of different neural network architectures on a subset of the MNIST data
- Performance of neural networks with 4 and 5 layers compared with baseline architecture of 3 layers with 100 nodes in the hidden layers
    - Networks with up to 40 fewer and 60 more nodes in total vs. baseline tested
- Same cost function, activation functions, batch size, learning rate and number of epochs used so as to isolate performance difference from architecture only
- Overall small difference in performance across all the networks: total variation in test set accuracy of 1%
- Four layer neural network with 20 fewer nodes than baseline network marginally exceeds baseline performance. This is consistent with deep learning theory that demonstrates that it is possible to approximate the same function using fewer nodes and a deeper network
- The best performance was from the 5 layer network with 60 more nodes vs. the baseline with a test set accuracy of 91.28%
- Given that all of these networks achieve almost 100% accuracy on the training data, I suspect that the major performace limitation, particularly for the deeper networks, is the size of the training data.

In [3]:
import pickle
import NeuralNetwork as nn

train_x = pickle.load(open("MNIST_train_x.pkl", 'rb'))
train_y = pickle.load(open("MNIST_train_y.pkl", 'rb'))
test_x = pickle.load(open("MNIST_test_x.pkl", 'rb'))
test_y = pickle.load(open("MNIST_test_y.pkl", 'rb'))
short_train_x = train_x[0:5000,:]
short_train_y = train_y[0:5000,:]

## Baseline architecture

In [7]:
net2 = nn.NeuralNet((784,100,10), nn.LogLikelihoodCost, nn.ReluActivation, nn.SoftmaxActivation)
net2.initialize_variables()
learning_rate = 0.001
epochs = 101
reporting_rate = 20
lmda = 0
batch_size = 200
training_cost, valid_cost = net2.SGD(short_train_x, short_train_y, test_x, test_y, learning_rate, \
        epochs, reporting_rate, lmda, batch_size, verbose=False)

Training cost at start of training is 6.34394 and accuracy is 6.34%
Validation set accuracy is 6.35%
Training cost in epoch 0 is 0.81058 and accuracy is 80.08%
Validation set accuracy is 73.58%
Training cost in epoch 20 is 0.04616 and accuracy is 99.06%
Validation set accuracy is 89.13%
Training cost in epoch 40 is 0.02886 and accuracy is 99.60%
Validation set accuracy is 90.24%
Training cost in epoch 60 is 0.02491 and accuracy is 99.50%
Validation set accuracy is 90.54%
Training cost in epoch 80 is 0.03295 and accuracy is 99.22%
Validation set accuracy is 90.81%
Training cost in epoch 100 is 0.07050 and accuracy is 98.52%
Validation set accuracy is 90.71%
Final test cost is 0.07050
Accuracy on training data is 98.52%, and accuracy on validation data is 90.71%


## Four layer NN, 20 fewer nodes in total vs. baseline

In [8]:
net2 = nn.NeuralNet((784,50,30,10), nn.LogLikelihoodCost, nn.ReluActivation, nn.SoftmaxActivation)
net2.initialize_variables()
learning_rate = 0.001
epochs = 101
reporting_rate = 20
lmda = 0
batch_size = 200
training_cost, valid_cost = net2.SGD(short_train_x, short_train_y, test_x, test_y, learning_rate, \
        epochs, reporting_rate, lmda, batch_size, verbose=False)
print()

net2 = nn.NeuralNet((784,50,30,10), nn.LogLikelihoodCost, nn.ReluActivation, nn.SoftmaxActivation)
net2.initialize_variables()
learning_rate = 0.001
epochs = 101
reporting_rate = 20
lmda = 0.5
batch_size = 200
training_cost, valid_cost = net2.SGD(short_train_x, short_train_y, test_x, test_y, learning_rate, \
        epochs, reporting_rate, lmda, batch_size, verbose=False)

Training cost at start of training is 5.91058 and accuracy is 4.86%
Validation set accuracy is 5.02%
Training cost in epoch 0 is 1.33018 and accuracy is 57.86%
Validation set accuracy is 55.34%
Training cost in epoch 20 is 0.18223 and accuracy is 95.40%
Validation set accuracy is 88.06%
Training cost in epoch 40 is 0.08946 and accuracy is 98.18%
Validation set accuracy is 89.71%
Training cost in epoch 60 is 0.05450 and accuracy is 99.12%
Validation set accuracy is 90.28%
Training cost in epoch 80 is 0.03758 and accuracy is 99.56%
Validation set accuracy is 90.85%
Training cost in epoch 100 is 0.03137 and accuracy is 99.46%
Validation set accuracy is 90.85%
Final test cost is 0.03137
Accuracy on training data is 99.46%, and accuracy on validation data is 90.85%

Training cost at start of training is 5.91211 and accuracy is 4.86%
Validation set accuracy is 5.02%
Training cost in epoch 0 is 1.32631 and accuracy is 57.74%
Validation set accuracy is 55.25%
Training cost in epoch 20 is 0.184

## Four layer NN, same number of nodes in total vs. baseline

In [11]:
net2 = nn.NeuralNet((784,50,50,10), nn.LogLikelihoodCost, nn.ReluActivation, nn.SoftmaxActivation)
net2.initialize_variables()
learning_rate = 0.001
epochs = 101
reporting_rate = 20
lmda = 0
batch_size = 200
training_cost, valid_cost = net2.SGD(short_train_x, short_train_y, test_x, test_y, learning_rate, \
        epochs, reporting_rate, lmda, batch_size, verbose=False)
print()

net2 = nn.NeuralNet((784,50,50,10), nn.LogLikelihoodCost, nn.ReluActivation, nn.SoftmaxActivation)
net2.initialize_variables()
learning_rate = 0.001
epochs = 101
reporting_rate = 20
lmda = 0.5
batch_size = 200
training_cost, valid_cost = net2.SGD(short_train_x, short_train_y, test_x, test_y, learning_rate, \
        epochs, reporting_rate, lmda, batch_size, verbose=False)

Training cost at start of training is 5.58528 and accuracy is 8.10%
Validation set accuracy is 8.31%
Training cost in epoch 0 is 1.21272 and accuracy is 62.94%
Validation set accuracy is 60.94%
Training cost in epoch 20 is 0.16081 and accuracy is 95.36%
Validation set accuracy is 87.65%
Training cost in epoch 40 is 0.08169 and accuracy is 98.40%
Validation set accuracy is 89.12%
Training cost in epoch 60 is 0.04913 and accuracy is 99.28%
Validation set accuracy is 89.97%
Training cost in epoch 80 is 0.03384 and accuracy is 99.56%
Validation set accuracy is 90.27%
Training cost in epoch 100 is 0.02792 and accuracy is 99.64%
Validation set accuracy is 90.39%
Final test cost is 0.02792
Accuracy on training data is 99.64%, and accuracy on validation data is 90.39%

Training cost at start of training is 5.58713 and accuracy is 8.10%
Validation set accuracy is 8.31%
Training cost in epoch 0 is 1.21447 and accuracy is 62.96%
Validation set accuracy is 60.95%
Training cost in epoch 20 is 0.162

## Four layer NN, 20 more nodes in total vs. baseline
- Stronger regularization in second network vs. previous examples

In [12]:
net2 = nn.NeuralNet((784,70,50,10), nn.LogLikelihoodCost, nn.ReluActivation, nn.SoftmaxActivation)
net2.initialize_variables()
learning_rate = 0.001
epochs = 101
reporting_rate = 20
lmda = 0
batch_size = 200
training_cost, valid_cost = net2.SGD(short_train_x, short_train_y, test_x, test_y, learning_rate, \
        epochs, reporting_rate, lmda, batch_size, verbose=False)
print()

net2 = nn.NeuralNet((784,70,50,10), nn.LogLikelihoodCost, nn.ReluActivation, nn.SoftmaxActivation)
net2.initialize_variables()
learning_rate = 0.001
epochs = 101
reporting_rate = 20
lmda = 1
batch_size = 200
training_cost, valid_cost = net2.SGD(short_train_x, short_train_y, test_x, test_y, learning_rate, \
        epochs, reporting_rate, lmda, batch_size, verbose=False)

Training cost at start of training is 5.93800 and accuracy is 7.04%
Validation set accuracy is 6.73%
Training cost in epoch 0 is 1.08983 and accuracy is 67.38%
Validation set accuracy is 63.71%
Training cost in epoch 20 is 0.14166 and accuracy is 96.76%
Validation set accuracy is 88.11%
Training cost in epoch 40 is 0.06800 and accuracy is 98.66%
Validation set accuracy is 89.61%
Training cost in epoch 60 is 0.03958 and accuracy is 99.46%
Validation set accuracy is 90.17%
Training cost in epoch 80 is 0.02817 and accuracy is 99.70%
Validation set accuracy is 90.46%
Training cost in epoch 100 is 0.02558 and accuracy is 99.62%
Validation set accuracy is 90.53%
Final test cost is 0.02558
Accuracy on training data is 99.62%, and accuracy on validation data is 90.53%

Training cost at start of training is 5.94238 and accuracy is 7.04%
Validation set accuracy is 6.73%
Training cost in epoch 0 is 1.09412 and accuracy is 67.38%
Validation set accuracy is 63.71%
Training cost in epoch 20 is 0.145

## Four layer NN, 60 more nodes in total vs. baseline

In [14]:
net2 = nn.NeuralNet((784,100,60,10), nn.LogLikelihoodCost, nn.ReluActivation, nn.SoftmaxActivation)
net2.initialize_variables()
learning_rate = 0.001
epochs = 101
reporting_rate = 20
lmda = 0
batch_size = 200
training_cost, valid_cost = net2.SGD(short_train_x, short_train_y, test_x, test_y, learning_rate, \
        epochs, reporting_rate, lmda, batch_size, verbose=False)
print()

net2 = nn.NeuralNet((784,100,60,10), nn.LogLikelihoodCost, nn.ReluActivation, nn.SoftmaxActivation)
net2.initialize_variables()
learning_rate = 0.001
epochs = 101
reporting_rate = 20
lmda = 1
batch_size = 200
training_cost, valid_cost = net2.SGD(short_train_x, short_train_y, test_x, test_y, learning_rate, \
        epochs, reporting_rate, lmda, batch_size, verbose=False)

Training cost at start of training is 5.76316 and accuracy is 12.58%
Validation set accuracy is 12.62%
Training cost in epoch 0 is 1.10048 and accuracy is 67.06%
Validation set accuracy is 63.72%
Training cost in epoch 20 is 0.13339 and accuracy is 96.88%
Validation set accuracy is 88.77%
Training cost in epoch 40 is 0.06051 and accuracy is 98.94%
Validation set accuracy is 90.09%
Training cost in epoch 60 is 0.03777 and accuracy is 99.54%
Validation set accuracy is 90.53%
Training cost in epoch 80 is 0.02902 and accuracy is 99.70%
Validation set accuracy is 90.92%
Training cost in epoch 100 is 0.02658 and accuracy is 99.70%
Validation set accuracy is 91.07%
Final test cost is 0.02658
Accuracy on training data is 99.70%, and accuracy on validation data is 91.07%

Training cost at start of training is 5.76891 and accuracy is 12.58%
Validation set accuracy is 12.62%
Training cost in epoch 0 is 1.10612 and accuracy is 67.06%
Validation set accuracy is 63.73%
Training cost in epoch 20 is 0

## Five layer NN, 40 fewer nodes in total vs. baseline

In [10]:
net2 = nn.NeuralNet((784,20,20,20,10), nn.LogLikelihoodCost, nn.ReluActivation, nn.SoftmaxActivation)
net2.initialize_variables()
learning_rate = 0.001
epochs = 101
reporting_rate = 20
lmda = 0
batch_size = 200
training_cost, valid_cost = net2.SGD(short_train_x, short_train_y, test_x, test_y, learning_rate, \
        epochs, reporting_rate, lmda, batch_size, verbose=False)
print()

net2 = nn.NeuralNet((784,20,20,20,10), nn.LogLikelihoodCost, nn.ReluActivation, nn.SoftmaxActivation)
net2.initialize_variables()
learning_rate = 0.001
epochs = 101
reporting_rate = 20
lmda = 0.5
batch_size = 200
training_cost, valid_cost = net2.SGD(short_train_x, short_train_y, test_x, test_y, learning_rate, \
        epochs, reporting_rate, lmda, batch_size, verbose=False)

Training cost at start of training is 3.69478 and accuracy is 8.20%
Validation set accuracy is 7.80%
Training cost in epoch 0 is 2.10907 and accuracy is 26.06%
Validation set accuracy is 25.63%
Training cost in epoch 20 is 0.46234 and accuracy is 86.66%
Validation set accuracy is 80.88%
Training cost in epoch 40 is 0.29332 and accuracy is 91.54%
Validation set accuracy is 85.37%
Training cost in epoch 60 is 0.21076 and accuracy is 94.12%
Validation set accuracy is 87.26%
Training cost in epoch 80 is 0.15561 and accuracy is 95.90%
Validation set accuracy is 88.30%
Training cost in epoch 100 is 0.11896 and accuracy is 96.94%
Validation set accuracy is 88.61%
Final test cost is 0.11896
Accuracy on training data is 96.94%, and accuracy on validation data is 88.61%

Training cost at start of training is 3.69598 and accuracy is 8.20%
Validation set accuracy is 7.80%
Training cost in epoch 0 is 2.11025 and accuracy is 26.06%
Validation set accuracy is 25.62%
Training cost in epoch 20 is 0.463

## Five layer NN, 20 fewer nodes in total vs. baseline

In [9]:
net2 = nn.NeuralNet((784,30,30,20,10), nn.LogLikelihoodCost, nn.ReluActivation, nn.SoftmaxActivation)
net2.initialize_variables()
learning_rate = 0.001
epochs = 101
reporting_rate = 20
lmda = 0
batch_size = 200
training_cost, valid_cost = net2.SGD(short_train_x, short_train_y, test_x, test_y, learning_rate, \
        epochs, reporting_rate, lmda, batch_size, verbose=False)
print()

net2 = nn.NeuralNet((784,30,30,20,10), nn.LogLikelihoodCost, nn.ReluActivation, nn.SoftmaxActivation)
net2.initialize_variables()
learning_rate = 0.001
epochs = 101
reporting_rate = 20
lmda = 0.5
batch_size = 200
training_cost, valid_cost = net2.SGD(short_train_x, short_train_y, test_x, test_y, learning_rate, \
        epochs, reporting_rate, lmda, batch_size, verbose=False)

Training cost at start of training is 5.25077 and accuracy is 11.58%
Validation set accuracy is 11.80%
Training cost in epoch 0 is 2.00912 and accuracy is 26.32%
Validation set accuracy is 25.13%
Training cost in epoch 20 is 0.33044 and accuracy is 90.60%
Validation set accuracy is 86.36%
Training cost in epoch 40 is 0.21453 and accuracy is 94.20%
Validation set accuracy is 88.67%
Training cost in epoch 60 is 0.16986 and accuracy is 95.56%
Validation set accuracy is 89.44%
Training cost in epoch 80 is 0.12423 and accuracy is 97.02%
Validation set accuracy is 90.17%
Training cost in epoch 100 is 0.09941 and accuracy is 97.90%
Validation set accuracy is 90.17%
Final test cost is 0.09941
Accuracy on training data is 97.90%, and accuracy on validation data is 90.17%

Training cost at start of training is 5.25227 and accuracy is 11.58%
Validation set accuracy is 11.80%
Training cost in epoch 0 is 2.01061 and accuracy is 26.10%
Validation set accuracy is 25.15%
Training cost in epoch 20 is 0

## Five layer NN, same number of nodes in total vs. baseline

In [13]:
net2 = nn.NeuralNet((784,40,40,20,10), nn.LogLikelihoodCost, nn.ReluActivation, nn.SoftmaxActivation)
net2.initialize_variables()
learning_rate = 0.001
epochs = 101
reporting_rate = 20
lmda = 0
batch_size = 200
training_cost, valid_cost = net2.SGD(short_train_x, short_train_y, test_x, test_y, learning_rate, \
        epochs, reporting_rate, lmda, batch_size, verbose=False)
print()

net2 = nn.NeuralNet((784,40,40,20,10), nn.LogLikelihoodCost, nn.ReluActivation, nn.SoftmaxActivation)
net2.initialize_variables()
learning_rate = 0.001
epochs = 101
reporting_rate = 20
lmda = 0.5
batch_size = 200
training_cost, valid_cost = net2.SGD(short_train_x, short_train_y, test_x, test_y, learning_rate, \
        epochs, reporting_rate, lmda, batch_size, verbose=False)

Training cost at start of training is 4.26086 and accuracy is 10.88%
Validation set accuracy is 10.55%
Training cost in epoch 0 is 1.69498 and accuracy is 38.50%
Validation set accuracy is 38.36%
Training cost in epoch 20 is 0.29339 and accuracy is 91.98%
Validation set accuracy is 87.37%
Training cost in epoch 40 is 0.18004 and accuracy is 95.30%
Validation set accuracy is 89.43%
Training cost in epoch 60 is 0.12527 and accuracy is 96.92%
Validation set accuracy is 90.21%
Training cost in epoch 80 is 0.08859 and accuracy is 98.14%
Validation set accuracy is 90.59%
Training cost in epoch 100 is 0.06676 and accuracy is 98.68%
Validation set accuracy is 90.66%
Final test cost is 0.06676
Accuracy on training data is 98.68%, and accuracy on validation data is 90.66%

Training cost at start of training is 4.26271 and accuracy is 10.88%
Validation set accuracy is 10.55%
Training cost in epoch 0 is 1.69766 and accuracy is 38.56%
Validation set accuracy is 38.29%
Training cost in epoch 20 is 0

## Five layer NN, 60 more of nodes in total vs. baseline

In [15]:
net2 = nn.NeuralNet((784,80,60,40,10), nn.LogLikelihoodCost, nn.ReluActivation, nn.SoftmaxActivation)
net2.initialize_variables()
learning_rate = 0.001
epochs = 101
reporting_rate = 20
lmda = 0
batch_size = 200
training_cost, valid_cost = net2.SGD(short_train_x, short_train_y, test_x, test_y, learning_rate, \
        epochs, reporting_rate, lmda, batch_size, verbose=False)
print()

net2 = nn.NeuralNet((784,80,60,40,10), nn.LogLikelihoodCost, nn.ReluActivation, nn.SoftmaxActivation)
net2.initialize_variables()
learning_rate = 0.001
epochs = 101
reporting_rate = 20
lmda = 0.5
batch_size = 200
training_cost, valid_cost = net2.SGD(short_train_x, short_train_y, test_x, test_y, learning_rate, \
        epochs, reporting_rate, lmda, batch_size, verbose=False)

net2 = nn.NeuralNet((784,60,40,20,10), nn.LogLikelihoodCost, nn.ReluActivation, nn.SoftmaxActivation)
net2.initialize_variables()
learning_rate = 0.001
epochs = 101
reporting_rate = 20
lmda = 2.0
batch_size = 200
training_cost, valid_cost = net2.SGD(short_train_x, short_train_y, test_x, test_y, learning_rate, \
        epochs, reporting_rate, lmda, batch_size, verbose=False)

Training cost at start of training is 4.10863 and accuracy is 10.76%
Validation set accuracy is 10.60%
Training cost in epoch 0 is 1.32558 and accuracy is 56.30%
Validation set accuracy is 54.58%
Training cost in epoch 20 is 0.21097 and accuracy is 94.06%
Validation set accuracy is 88.73%
Training cost in epoch 40 is 0.11778 and accuracy is 97.40%
Validation set accuracy is 90.31%
Training cost in epoch 60 is 0.07460 and accuracy is 98.76%
Validation set accuracy is 90.93%
Training cost in epoch 80 is 0.05126 and accuracy is 99.20%
Validation set accuracy is 91.20%
Training cost in epoch 100 is 0.03816 and accuracy is 99.60%
Validation set accuracy is 91.28%
Final test cost is 0.03816
Accuracy on training data is 99.60%, and accuracy on validation data is 91.28%

Training cost at start of training is 4.11183 and accuracy is 10.76%
Validation set accuracy is 10.60%
Training cost in epoch 0 is 1.32879 and accuracy is 56.32%
Validation set accuracy is 54.60%
Training cost in epoch 20 is 0