### Initializing the Network

We know we need to set the number of input, hidden, and output layer nodes--and the learning rate. That defines the shape and size of the neural network. Rather than set these in stone, we'll let them be set when a new neural network object is created by using parameters. That way we retain the choice to create new neural networks of different sizes with ease.

### Weights - The Heart of the Network

The most important part of the network is the **link weights**. They're used to calculate the signal being fed forward, the error as it's propagated backwards, and it is the link weights themselves that are refined in an attempt to improve the network.

We saw earlier that the weights can be concisely expressed as a matrix. So we can create:

*    A matrix the weights for links between the input and hidden layers, $W_{input\_hidden}$, of size (**hidden_nodes** by **input_nodes**)
*   And another matrix for the links between the hidden and output layers, $W_{hidden\_output}$, of size (**output_notes** by **hidden_nodes**)

Remember the convention earlier to see why the first matrix is of size (**hidden_nodes** by **input_nodes** and not the other way around (**input_nodes** by **hidden_nodes**)

Remember that the values of the link weights should be small and random. The following numpy function generates an array of values selected randomly between 0 and 1, where the size is (rows by columns): 
```
numpy.random.rand(rows,columns)
```

### Querying the Network

The query() function takes the input to a neural network and returns the network's output. That's simple enough, but to do that you'll remember that we need to pass the input signal from the input layer of nodes through the hidden layer and out of the final output layer. You'll also remember that we use the link weights to moderate the signals as they feed into any given hidden or output node, and we also use the sigmoid activation function to squish the signal coming out of those nodes.

The following shows how the matrix of weights for the link between the input and hidden layers can be combined with the matrix of inputs to give the signals into the hidden layer nodes:

$$X_{hidden} = W_{input\_hidden} \cdot I$$

The following applies the numpy library's dot product function for matrices to the link weights $W_{input\_hidden}$ and the inputs **I**:
```
hidden_inputs = numpy.dot(self.wih, inputs)
```
> This simple piece of Python code does all the work of combining all the inputs with all the right link weights to produce the matrix of combined moderated signals into each hidden layer node. We don't have to rewrite it either if next time we choose to use a different number of nodes for the input or hidden layer.

To get the signals emerging from the hidden node, we simply apply the sigmoid squasing function to each of these emerging signals:

$$O_{hidden} = sigmoid(X_{hidden})$$

> The scipy Python library has a set of special functions, and the sigmoid function is called **expit()**

Because we might want to experiment and tweak, or even completely change, the activation function, it makes sense to define it only *once* inside the neural network, when it is first intialized. After that, we can refer to it several times, such as in the query() function. This arrangement means we only need to change this definition once, and not have to locate and change the code anywhere an activation function is used.
```
self.activation_function = lambda x: scipy.special.expit(x)
```

> What is lambda? All we've done here is create a function with a shorter way of writing it out. Instead of the usual def() definitions, we use the magic lambda to create a function there and then (instance?).
>> The function here takes x and returns scipy.special.expit(x) which is the sigmoid function. Functions created with lambda are nameless, or anonymous, but here we've assigned it the name self.activation_function(). All this means is that whenever someone needs to use the activation function, allt hey need to do is call self.activation_function().

We want to apply the activation function to the combined and moderated signals into the hidden nodes. The signals emerging from thge hidden node layers are in the matrix called **hidden_outputs**.

### Training the Network

There are two phases to training:

1.    Calculating the output, just as query() does it, for a given training example.
    1. The only difference is that we have an additional parameter, **targets_list**, defined in the function name because you can't train the network without training examples which include the desired or target answer
    2. The code also turns the **targets_list** into a numpy array, just as the **inputs_list** is turned into a numpy array
2.    Take this calculated output, compare it with the desired output, and use the difference to guide the updating of the network weights. We use error backpropagation to inform thow the link weights are refined.
    1. First, we need to calculate the error, which is the difference between the desired target output provided by the training example, and the actual calculated output.
        1. That's the difference between the matrices (**targets** - **final_outputs**) done element by element
        $$errors_{output} = targets - outputs_{final}$$
    2. We can calculate the *back-propagated* errors for the hidden layer nodes. Remember how we split the errors according to the connected weights, and recombine them for each hidden layer node. We worked out the matrix form of this calculation as:
    $$ errors_{hidden} = weights^{T}_{hidden\_output} \cdot errors_{output} $$

Now we have what we need to refine the weights at each layer.
1. For the weights between the hidden and final layers, we use the $errors_{output}$
2. For the weights between the input and hidden layers, we use the $errors_{hidden}$

We previously worked out the expression for updating the weight for the link between a node **j** and a node **k** in the next layer in matrix form:

$$ \Delta W_{jk} = \alpha * E_k * sigmoid (O_k) * (1 - sigmoid (O_k)) \cdot O^{T}_{j} $$

*    The $\alpha$ is the learning rate
*    The **sigmoid** is the squashing activation function
*    The $*$ is element by element matrix multiplication
*    The $\cdot$ dot is the matrix dot product
*    The last bit, the matrix of outputs from the previous layer, is tranposed.
    *     In effect, this means the column of outputs becomes a row of outputs

In [24]:
import numpy as np
import scipy.special

In [57]:
# neurral network class definition
class neuralNetwork:
    
    # initialize the neural network
    def __init__(self, inputnodes, hiddennodes, outputnodes, learningrate):
        # set number of nodes in each input, hidden, output layers
        self.inodes = inputnodes
        self.hnodes = hiddennodes
        self.onodes = outputnodes
        
        # link weight matrixes, wih and who
        # weights inside the arrays are w_i_j, where link is from node i to node j in the next layer
        # w11 w21
        # w12 w22 etc
        
        # We want to sample the weights from a normal probability distribution centered around zero
        # with a standard deviation that is related to the number of incoming links to a node,
        # 1 / sqrt(number of incoming links)
        
        self.wih = np.random.normal(0.0, pow(self.inodes, -0.5), (self.hnodes, self.inodes))
        self.who = np.random.normal(0.0, pow(self.hnodes, -0.5), (self.onodes, self.hnodes))
        
        # learning rate
        self.lr = learningrate
        
        # activation function is the sigmoid function
        self.activation_function = lambda x: scipy.special.expit(x)
        
        pass
    
    # train the neural network
    def train():
        
        #convert inputs list to 2d array
        inputs = np.array(inputs_list, ndmin=2).T
        targets = np.array(targets_list, ndmin=2).T
        
        # calculate signals into hidden layer
        hidden_inputs = np.dot(self.wih, inputs)
        #calculate the signals emerging from hidden layer
        hidden_outputs = self.activation_function(hidden_inputs)
        
        # calculate signals into final output layer
        final_inputs = np.dot(self.who, hidden_outputs)
        # calculate the signals emerging from final output layer
        final_outputs = self.activation_function(final_inputs)
        
        # error is the (target - actual)
        output_errors = targets - final_outputs
        
        # hidden layer error is the output_errors, split by weights. recombined at hidden nodes
        hidden_errors = np.dot(self.who.T, output_errors)
        
        # update the weights for the links between the hidden and output layer
        self.who += self.lr * np.dot((output_errors * final_outputs * (1.0 - final_outputs)), np.transpose(hidden_outputs))
            # The learning rate is self.lr, and is simply multiplied with the rest of the expression
            # There is matrix multiplication done by np.dot
                # The two elements are
                    # The error and sigmoids from the next layer
                        # E_k * sigmoid(O_k) * (1-sigmoid(O_k))
                    # The transposed outputs from the previous layer
                        # O_j^T
        
        
        # update the weights for the links between the input and hidden layers
        self.wih += self.lr * np.dot((hidden_errors * hidden_outputs * (1.0 - hidden_outputs)), np.transpose(inputs))
        
        pass
    
    # query the neural network
    def query(self, inputs_list):
        
        #convert inputs list to 2d array
        inputs = np.array(inputs_list, ndmin=2).T
        
        # calculate signal into hidden layer
        hidden_inputs = np.dot(self.wih, inputs)
        # calculate the signals emerging from the hidden layer
        hidden_outputs = self.activation_function(hidden_inputs)
        
        # calculate the signals into the final output layer
        final_inputs = np.dot(self.who, hidden_outputs)
        # calculate the signals emerging from the final output layer
        final_outputs = self.activation_function(final_inputs)
        
        return final_outputs

In [58]:
# number of input, hidden, and output nodes
input_nodes = 3
hidden_nodes = 3
output_nodes = 3

# learning rate is 0.3
learning_rate = 0.3

# create instance of neural network
n = neuralNetwork(input_nodes, hidden_nodes, output_nodes, learning_rate)

In [59]:
n.query([1.0, 0.5, -1.5])

array([[0.56509363],
       [0.45825764],
       [0.48426891]])

ModuleNotFoundError: No module named 'graphics'