# Multilayer Perceptrons

---
When are they going to call them Sigmoid neurons, if ever?

- I need a new definition of perceptron, if they aren't constricted to binary input and output.

## Implement the hidden layer

> Before, we were dealing with only one output node which made the code straightforward. However now that we have multiple input units and multiple hidden units, the weights between them will require two indices: $w_{ij}$ where $i$ denotes input units and $j$ are the hidden units.

- The indices on $w$ are like matrix indices. Nothing more complicated
  - Just where the dimensions are coming from changed :)

Imagine the network:

![image](https://d17h27t6h515a5.cloudfront.net/topher/2017/February/589978f4_network-with-labeled-weights/network-with-labeled-weights.png "Weights are labeled with the __input__ source node and the __hidden layer__ destination node")

It's funny that this notation is like a matrix, because __we store these in matrices__

- That's right, our weights array just became a matrix (at least)
  - Tensors will flow, in due time

Also note:

- Rows will be all weights leading __out__ of a __single Input Node__
  - Depicted by the first index of the matrix
- In the following image, note that everything in a given column will be taken in for given
  - This is the second index of the matrix

![weighted labeled hidden layer](https://d17h27t6h515a5.cloudfront.net/topher/2017/February/58a49908_multilayer-diagram-weights/multilayer-diagram-weights.png "Relax and let the data flow")

Examining the following code, a few things become apparent:

```python
# Number of records and input units
n_records, n_inputs = features.shape
# Number of hidden units
n_hidden = 2
weights_input_to_hidden = np.random.normal(0, n_inputs**-0.5, size=(n_inputs, n_hidden))
```

1. $n^{\frac {-1} 2} = 1/{n^{\frac 1 2}}  = 1/{\sqrt{n}}$
  - We used the latter in the previous sections.
  - We use the former now. More concise
2. `size=...` is the dimensionality that comes with this being a __matrix__ now
3. In the example, the number of hidden nodes is trival and appears irrelevant
  - But we know that the layers provide deeper insight...
  - There may be more to this

### Determining the node-input is easy

... right after you get a dot product

Let's use $h_1$ for this

- Its weighted inputs are the dot product of the inputs - $x_1, x_2, x_3$ - and the hidden layer weights for just its column - $w_{11}, w_{21}, w_{31}$
  - See the orange above
  - Important: "__The inputs__" to $h_1$ are a vector
    - As is its column, taken independently

So we get something a little like:

![h1 weighted inputs](https://d17h27t6h515a5.cloudfront.net/topher/2017/January/588ae392_codecogseqn-2/codecogseqn-2.png "Notice this is just h1...")

- And we'd have to do the same thing for $h_2$...

\begin{equation}
h_2 = x_1 w_{12} + x_2 w_{22} + x_3 w_{32}
\end{equation}

- But wouldn't that get a little long winded?
  - Yes, little Timmy. I believe it would...

So hows about we just take the cross product (vector x matrix) and get back a vector of the hidden, weighted input values?

- Seems legit. Simple, straight forward
- Do the math in one swing :)

That looks a little something like:

\begin{equation*}
h_j = x \times w = \begin{vmatrix}
x_1 & x_2 & x_3
\end{vmatrix} \times \begin{vmatrix}
w_{11} & w_{12} \\
w_{21} & w_{22} \\
w_{31} & w_{32}
\end{vmatrix}
\end{equation*}

And that outputs a vector of:

\begin{vmatrix}
{x \cdot w_{i1}} & {x \cdot w_{i2} }
\end{vmatrix}

- Where we let $i$ stand in for the row

So just like we talked about earlier :) Instead of column-wise, one-at-a-time, we just do the matrix maths

### A word of caution to this tale

You could very well setup the inputs as a column, and transpose the matrix.

- Just be aware that the rows would become the hiddens' inputs, and the columns would be a given nodes' outputs

That would look something like:

![column-flipped hidden layer](https://d17h27t6h515a5.cloudfront.net/topher/2017/January/588b7c74_inputs-matrix/inputs-matrix.png "Oh the choices available")

---
And for the sake of being unabashedly clear:

Where a vector is an array, in code...

- If you want to multiply __matrix-by-vector__
  - the numer of columns in the matrix must be the rows of the vector
    - Like in the above picture
- " " __vector-by-matrix__
  - Vector's length must match the number of columns of the matrix

---
And just some code, for icing on the cake

### Nuances of NumPy vectors

> You see above that sometimes you'll want a column vector, even though by default Numpy arrays work like row vectors. It's possible to get the transpose of an array like so `arr.T`, but for a 1D array, the transpose will return a row vector. Instead, use `arr[:,None]` to create a column vector.

That said, here's what it looks like

In [1]:
import numpy as np

In [2]:
features = np.random.normal(size=3)

In [3]:
print(features)

[-0.14418608 -0.57416225  0.66976266]


In [4]:
print(features.T)

[-0.14418608 -0.57416225  0.66976266]


In [6]:
print(features[:, None])

[[-0.14418608]
 [-0.57416225]
 [ 0.66976266]]


Or you could just tell NumPy to give you a 2-D vector, so you can work with it in a matrix-transpository way :P

In [7]:
np.array(features, ndmin=2)

array([[-0.14418608, -0.57416225,  0.66976266]])

In [8]:
np.array(features, ndmin=2).T # there's that column we love

array([[-0.14418608],
       [-0.57416225],
       [ 0.66976266]])

There's a coding test, to implement a 4x3x2 feed-forward network

- We're calling them "hidden" layers, because soon they were be generated programmatically.

The exercise:

```python
import numpy as np

def sigmoid(x):
    """
    Calculate sigmoid
    """
    return 1/(1+np.exp(-x))

# Network size
N_input = 4
N_hidden = 3
N_output = 2

np.random.seed(42)
# Make some fake data
X = np.random.randn(4)

weights_input_to_hidden = np.random.normal(0, scale=0.1, size=(N_input, N_hidden))
weights_hidden_to_output = np.random.normal(0, scale=0.1, size=(N_hidden, N_output))


# TODO: Make a forward pass through the network

hidden_layer_in = None
hidden_layer_out = None

print('Hidden-layer Output:')
print(hidden_layer_out)

output_layer_in = None
output_layer_out = None

print('Output-layer Output:')
print(output_layer_out)

```

- I'm a little caught on how to produce the hidden layer's output
  - I would think it's just a mapping of the sigmoid function over the arrays
  - But that feels weird to me

### Quiz reflection

Turns out that "mapping" the function was the right approach, because that's exactly what happens.

- This is most optimally (performance) done by [`np.vectorize`](https://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.vectorize.html)

```python
# Given that everything is activated by a sigmoid function
activation_func = np.vectorize(sigmoid)
```

At this point the only thing one would question is probably:

- What was the rest of the solution?

__I finally understand the simplicity__

- Just invoke the function (Functional Programming is a straight flush here) for all relevant nodes
  - The activation function __is__ the function for the output
  - If it's a Sigmoid, then pass $h$ as a parameter to that function
  - Boom! There's your output

```python
# input layer
X = np.random.randn(4)

hidden_layer_inputs = np.dot(X, weights_input_to_hidden)
# here's that simplicity
hidden_layer_output = activation_func(hidden_layer_inputs)
```

---
The mathematics here is 

- Recursive in logic
  - Do this algorithm throughout the network, until there are no more "child" nodes
- Iterative in explanation
  - For every column, do this operation *omitted*
  - Pass those results to the next column
  - Repeat

Where are the code is... iterative.

Just food for thought

---
I imagine we're not very far at all from just inlining the input calculation.

- But wait. That doesn't make any sense.
  - Need the inputs available to calculate the errors...

Shoot. Almost.

- Though I'm not really a fan of inlining :P

---
Turns out you can just call a function on an `np.array` and it will behave as you expect it to.

- Didn't, necessarily, need to "map" via `vectorize`

# Backpropagation