# Neural Networks for Digit Recognition
_Noah Sutton-Smolin_

Presented is a method for the creation of multiple neural networks to recognize and classify handwritten digits. The digits are assumed to be pre-segmented into an 8x8 grid of 0-16 grayscale values. Success values around 90% were detected. Training occurs in around 10 to 20 minutes. Relatively optimal configuration values for the network were discovered using Monte Carlo methods.

The following code loads a pre-trained network (or generates a new one, if you choose `generate = True`).

**TODO:**

 - Include mathematics details in overview
 - <s>Add weight decay/transfer to intro book</s>
 - <s>Add Monte Carlo notebook pool plot</s>
 - <s>Finish Monte Carlo notebook information</s>
 - Finish data creation notebook information
 - Bibliography and references

In [12]:
import pickle, os
from nnet_core import *
from dataset_mgmt import *

generate = False; write = False; read = True;
filename = "trained_network.pickle"

netw = []
if generate: 
    netw = nnet_train_new(13, 9, 1.12, 0.0003, 24, 0, 11, 880, 3, 500);
if read:
    file = open(filename, 'rb')
    netw = pickle.load(file)
    file.close()
if write:
    if os.path.exists(filename): os.remove(filename)
    file = open(filename, 'ab+')
    pickle.dump(netw, file)
    file.close()

The following code evaluates the network's effectiveness using test data:

In [16]:
_, test_set = load_data(10, 1000, test_size=500);
eff = nnet_evaluate_multiple(netw, test_set)[2]
print("Network success rate: " + str(float(int(eff * 10000)) / 100.0) + "%")

Network success rate: 90.8%


Feed-forward neural networks are a method of artificial intelligence which are a relatively recent advancement in computer science. While the method has been around for a number of decades, it has only recently become feasible as a method for training automatic classification systems. 

The network uses a stochastic gradient descent/backpropagation algorithm to adjust the weights in the direction that minimizes error. This can, however, occasionally lead to local minima. For more information on this phenomenon, view [this notebook](NNet_Introduction.ipynb) on the fundamentals of neural networks.

To counter this phenomenon, two methods were used: first, a weak weight decay was applied, which causes weights with minimal training to lose their significance over time, and discourages entrapment in a local minima. Second, multiple networks were trained and averaged. For details on this, view [this notebook](NNet_Averaging.ipynb) on how multiple networks can be used to improve output. 

Finally, in order to improve training times, a Monte Carlo search was executed over the neural network's parameter space to find optimal parameters. This method cut the computation time in half in comparison to manual tuning, and resulted in equal or better output. For more information on how this search was executed, view [this notebook](NNet_MonteCarlo.ipynb).

Lastly, for the gritty details on how Monte Carlo search data was created, view [this notebook](NNet_DataCreation.ipynb).