# Neural Network

## Introduction

An Artificial Neural Network (ANN) is a computational model to approximate complex, non-linear functions Given a set of inputs <b>x</b>, and some internal parameters <b>w</b>, the ANN produces the corresponding outputs <b>y</b> = <i>f</i> (<b>x</b>; <b>w</b>). The ANN is just a clever way of writing <i>f</i>, inspired by the structure of biological brains. The simplest ANN form is sketched below:
<table>
<tr><td>
<img src="images/nnscheme.png" width="400px">
</td></tr>
<tr><td>
Schematic representation of a feed forward neural network.
</td></tr>
</table><br>
The input values are fed to a layer of simple computational units, called <i>neurons</i>, which combine them to calculate their <i>activation</i> output. These values are then sent to the neurons in the output layer, which, in the same way, compute the final output <b>y</b> of the ANN. This is the simplest setup, called feed-forward neural network.
<br><br>
Each <i>j</i>-th neuron in a layer computes its activation <i>a</i><sub>j</sub> as:
<table>
<tr><td>
<img src="images/nnneuron.png" width="400px">
</td></tr>
<tr><td>
Neuron connection mechanism.
</td></tr>
</table><br>
The parameters <i>w</i><sub>j,i</sub> are connection weights, while <i>b</i><sub>j</sub> are internal neuron biases. All of these parameters make up the ANN parameter set <b>w</b>.
The activation function <i>S</i> is arbitrary, as long as it non-linear. Most common activation functions are hyperbolic tangent (used here) and rectified linear unit (ReLU). In some cases, the activation function of output neurons is just linear (weighted sum of inputs).


## Example

Here we use the ANN to approximate an analytical function, called Model.

In [None]:
# --- INITIAL DEFINITIONS ---

from pyNN import NeuralNetwork
from pyNN import Evaluate
import matplotlib.pyplot as plt  # for plotting!
import numpy
import math
import random
 
# This function takes a 2-element vector p=[x,y] and computes z
def Model(p):
    result = [math.sin(0.4*p[0] - 0.3*p[1] + 0.2*p[0]*p[1])]
    result[0] += (2*random.random()-1)*0.1;
    return result


This is the model function that will approximate:
<img src="./images/nnsurface.png">
We added a small random noise every time the function is evaluated, to spice things up!<br><br>

Next we create the ANN suitable for this problem. We need two inputs (x,y) and one output (z). The number of hidden neurons is set to 10. The ANN created this way has an initial random set of connection weights and neuron biases.

In [None]:
# setup the neural network - 2 inputs (x,y), 1 output (z), 10 hidden neurons
nn = NeuralNetwork(2, 1, 10)
nn.regularization = 0.01 # EXPLAINED LATER!

Next we use the Model function to creates a dataset for training and validation of the ANN.
The training set provides the ANN with a set of <i>examples</i> to learn from, while the validation set is later used to check the quality of the trained ANN.
The following code creates a list of (x,y) points on the real plane, with x,y in [-1, 1], to use as inputs.
By applying the Model function to each one, we get a list of corresponding correct output z.

In [None]:
# training points
nTrainPts = 100
trainIn = 2*numpy.random.rand(nTrainPts,2) -1 # (x,y) points beween -1 and 1

# validation points
nValidPts = 40
validIn = 2*numpy.random.rand(nValidPts,2)-1
 
# apply Model to the points to compute z for the two sets
trainOut = numpy.apply_along_axis(Model, 1, trainIn)
validOut = numpy.apply_along_axis(Model, 1, validIn)

Here is what the training set looks like. The colour represents the output z.

In [None]:
from mpl_toolkits.mplot3d import Axes3D

plt.scatter(trainIn[:,0], trainIn[:,1], c=trainOut[:,0])
plt.colorbar()
plt.show()

## Training
Now comes the tough part! The idea of training is to evaluate the ANN with the training inputs and measure its error (since we know the correct outputs). It is then possible to compute the derivative (gradient) of the error w.r.t. each parameter (connections and biases). By shifting the parameters in the opposite direction of the gradient, we obtain a better set of parameters, that should give smaller error.
This procedure can be repeated until the error is minimised.

In [None]:
print 'training... '

# number of training iterations to perform
nEpoch = 100
nn.regularization = 0.01

# empty array for recording the errors at each step
errors = numpy.zeros([nEpoch,2])

for x in xrange(nEpoch): #training loop
    
    # Perform a Gradient Decsent step and get the error on the training set.
    # The function takes an array (tensor) of training inputs,
    # and the array of their corresponding correct outputs.
    # GradStep returns the mean square error (MSE) on the given training set.
    error = nn.GradStep(trainIn, trainOut)
    # one can also try the stochastic gradient descent
    #error = nn.StochasticGradStep(trainIn, trainOut, 20)

    # check the error on the validation set
    validOutModel = numpy.apply_along_axis(Evaluate, 1, validIn, nn)
    errorValid = (validOutModel - validOut) # error
    errorValid = errorValid*errorValid      # square error
    errorValid = numpy.mean(errorValid.flatten()) # mean square error! (MSE)

    # store the errors in the array for later plotting
    errors[x,0] = error
    errors[x,1] = errorValid

print "training error", error
print "validation error", errorValid

In [None]:
# Plot the errors
plt.plot(errors[:,0], label="Training")
plt.plot(errors[:,1], label="Validation")
plt.xlabel('Epoch')
plt.ylabel('MSE')
plt.legend()
plt.show()

Check the ANN quality with a regression plot, showing the mismatch between the exact and NN predicted outputs for the validation set.

In [None]:
# compute the output on the validation set - again
nnout = numpy.apply_along_axis(Evaluate,1,validIn,nn)
# compute the error
nnerror = (nnout - validOut)

plt.plot(validOut[:,0],nnout[:,0],'o')
plt.plot([-1,1],[-1,1]) # perfect fit line
plt.xlabel('correct output')
plt.ylabel('NN output')
plt.show()

# error histogram
plt.hist(nnerror[:,0],50)
plt.xlabel("Error")
plt.ylabel("Occurrences")
plt.show()

### Remarks: regularisation
When we created the ANN we set its regularisation property, without explaining it: here is what it's about!<br>
In many cases, there are lots of neurons and thus parameters (complex model) but not a lot of data to train them. We might need such complex model for our purpose, but as a rule of thumb there should be at least 10 data points per parameter.<br><br>
If not, we are most likely over-fitting the data, i.e. we get an ANN that brilliantly fits the training points, but utterly fails in validation. This is analogous to fitting a bunch of linearly correlated (x,y) points with a 1000-degree polynomial. The fit might go through all the points, but it has no predictive power!<br><br>
Simply put, in agreement with Occam, we would like to train the ANN to be as accurante and simple as possible. One way to simplify an ANN with lots of neurons is setting some useless connections to zero.
The regularisation term <i>R</i> is a measure of the complexity of the network, for example the sum of squared parameters, that is scaled and added to the output error <i>E</i><sub>data</sub>, to give the total training error <i>E</i> = <i>E</i><sub>data</sub> + <i>a R</i>.<br><br>
The gradient of <i>R</i> w.r.t. the parameters steers the training towards smaller weights at the expense of accuracy. The scaling factor for <i>R</i> is the ANN property <b>nn.regularization</b>.

# Exercise

### Wine Classification
Use the wine database to train an ANN classifier. Compare the error to the one of K-means method.

In [None]:
# load the training data - do not edit
wineTrainIn = numpy.load("./data/wine-training-input.npy")
wineTrainOut = numpy.load("./data/wine-training-output.npy")
wineTrainOut = (wineTrainOut - 2) # rescale the output within [-1, 1]

# load the validation data - do not edit
wineValidIn = numpy.load("./data/wine-validation-input.npy")
wineValidOut = numpy.load("./data/wine-validation-output.npy")
wineValidOut = (wineValidOut - 2) # rescale the output within [-1, 1]

# TODO:
# create the ANN
# train it
# check error - regression plot
# estimate amount of misclassified wines


# Optional

### ANN size
Plot the training and evaluation errors dependence on the ANN hidden layer size. Keep the same training and validation set throughout the calculation

### Committee machine
Train multiple ANN on the same data and form a committee: the output is the average output of the ANN members. Compare the validation performance of the committee to the one of its members.

### Combine with PCA - PRO CODER
Transform the inputs using PCA and use only the most relevant ones. How is the accuracy of the ANN changing?