# Stacking Layers

### Introduction

Previously, saw a single layer.

<img src="./first-layer.png" width="20%">

$z = x\cdot W  + b = \begin{bmatrix}
- & x &  -  
\end{bmatrix} \cdot \begin{bmatrix}
|  & |  \\
w_1  & w_2 \\
|   & |
\end{bmatrix} + \begin{bmatrix} b_1 & b_2 \end{bmatrix} = \begin{bmatrix}
x \cdot w_1 & x \cdot w_2 \end{bmatrix} + \begin{bmatrix} b_1 & b_2 \end{bmatrix} = \begin{bmatrix} z_1(x) & z_2(x) \end{bmatrix}$

$A(z) = \sigma (W\cdot x + b) = \begin{bmatrix} \sigma(z_1) \\ \sigma(z_2) \end{bmatrix}$

In this lesson, we'll begin to see how we can take the outputs from our linear layer and activation layer, and feed this output into yet another layer.

<img src="./artificial-network.png" width="50%">

Goal is an information flow, where later layers make more abstract determination.  

### The rules of matrix multiplication

Understanding how data from one layer passes to another layer, really is understanding a math problem.  And to solve that math problem, we just need to be a little more familiar with matrix multiplication, and how by looking at one layer, we can predict what will pass to the next layer.



Start back to a single layer, and two neurons:

In [1]:
import torch

w_size = torch.tensor([1, 3, -.5, 0])
w_shape = torch.tensor([0, -.5, 3, 1.5])

W = torch.stack((w_size, w_shape), dim = 0).T
W

tensor([[ 1.0000,  0.0000],
        [ 3.0000, -0.5000],
        [-0.5000,  3.0000],
        [ 0.0000,  1.5000]])

And we have our feature vector representing a single observation:

In [2]:
# area, perimeter, number concave points, symmetry error  
x_1 = torch.tensor([[2, 4, 3, 2]]).float()

And we'll can start by calculating the weighted sum.

$g(x) = x\cdot W = \begin{bmatrix}
- & x_1 &  -  
\end{bmatrix} \cdot \begin{bmatrix}
|  & |  \\
w_1  & w_2 \\
|   & |
\end{bmatrix} $

In [3]:
weighted_sum = x_1 @ W 
weighted_sum

tensor([[12.5000, 10.0000]])

* Focus on the dimensions

In [31]:
x_1.shape, W.shape

(torch.Size([1, 4]), torch.Size([4, 2]))

* And then to sum a vector...

In [35]:
b = torch.tensor([-12, -7])

In [38]:
x_1 @ W + b

tensor([[0.5000, 3.0000]])

And when we move onto the activation layer, by applying the sigmoid function, our dimensions stay the same.  This is because the sigmoid function applies to each entry: $z = x \cdot W + b$

$\sigma (W\cdot x + b) = \begin{bmatrix} \sigma(z_1) \\ \sigma(z_2) \end{bmatrix}$

In [42]:
z = x_1 @ W + b
z

tensor([[0.5000, 3.0000]])

In [43]:
torch.sigmoid(z)

tensor([[0.6225, 0.9526]])

### Why it matters

This matters because we'll need to build our multiple layers by passing the output from one layer as the input to another layer.  Let's see this visually, and then we'll try to better understand it through code.  

<img src="./big_layers.svg" width="40%">

* Translate to code

1. From inputs to first layer

In [44]:
# area, perimeter, number concave points, symmetry error, darkness, contrast
x_2 = torch.tensor([[2, 4, 3, 2, 3, 5]]).float()

In [45]:
x_2.shape

torch.Size([1, 6])

In [46]:
# Weight Matrix 2
import torch

w_size = torch.tensor([1, 3, -.5, 0, .1, 1.3])
w_shape = torch.tensor([0, -.5, 3., 1.5, .5, 1.])
w_smoothness = torch.tensor([1.5, .5, 2, 1.5, .8, 1.5])
w_color = torch.tensor([.1, -.5, .3, 0, 4, -2])

W_1 = torch.stack((w_size, w_shape, w_smoothness, w_color), dim = 0).T
W_1

tensor([[ 1.0000,  0.0000,  1.5000,  0.1000],
        [ 3.0000, -0.5000,  0.5000, -0.5000],
        [-0.5000,  3.0000,  2.0000,  0.3000],
        [ 0.0000,  1.5000,  1.5000,  0.0000],
        [ 0.1000,  0.5000,  0.8000,  4.0000],
        [ 1.3000,  1.0000,  1.5000, -2.0000]])

In [47]:
W_1.shape

torch.Size([6, 4])

And we'll need a bias vector of length 4, one for each neuron.

In [48]:
b_1 = torch.tensor([-3, -12, -15, -4])
b_1

tensor([ -3, -12, -15,  -4])

In [5]:
z = None
z

In [6]:
A_1 = None
A_1

* From output of first layer to second layer

In [55]:
w_large_dark = torch.tensor([1, 3, -.5, 0])
w_dark_irregular = torch.tensor([0, -.5, 3., 1.5])
w_large_irregular = torch.tensor([1.5, .5, 2, 1.5])

W_2 = torch.stack((w_large_dark, w_dark_irregular, w_large_irregular), dim = 0).T
W_2

tensor([[ 1.0000,  0.0000,  1.5000],
        [ 3.0000, -0.5000,  0.5000],
        [-0.5000,  3.0000,  2.0000],
        [ 0.0000,  1.5000,  1.5000]])

In [56]:
b_2 = torch.tensor([-4, -5, -2])

And we take the output from the previous layer, $A_1$, and pass it to our linear layer, followed by our activation layer.

In [57]:
Z_2 = A_1 @ W_2 + b_2
Z_2

tensor([[-0.5329, -2.4167,  2.0725]])

In [58]:
A_2 = torch.sigmoid(Z_2)
A_2

tensor([[0.3698, 0.0819, 0.8882]])

### Making it Predictable

Ok, so let's see draw some conclusions about neural networks based on what we saw above.

> We'll use the image below as an example:

<img src="./big_layers.svg" width="30%">

In [72]:
x_1.shape

torch.Size([1, 6])

And we can feed it into a neural network that looks like the following:

In [13]:
import torch.nn as nn

net = nn.Sequential(
# fill in here
)

net

Sequential()

In [89]:
net(x_1)

tensor([[-0.2261, -0.3032, -0.9666]], grad_fn=<AddmmBackward>)

### Summary

Two rules of matrix multiplication:

1. Inner dimensions must be equal
2. Outer dimensions determine the output of matrix multiplication

In this lesson, we learned how to build a neural network with multiple layers.  We can imagine that the earlier layers make more concrete assessments -- like assessing specific features, and pass these outputs to later layers to make more abstract assessements.   

<img src="./big_layers.svg" width="20%">

We saw that under the hood, this works through matrix algebra.  With our example above, we saw a single observation with 6 features, multiplied  by the six weights of 4 neurons, results in an output from each neuron.  Then these four outputs are passed to the sigmoid layer, still resulting in four outputs.  Then these four outputs are each passed to the second layer, of three neurons, each with four weights.  The final output has three outputs.  We saw this with in Pytorch with the following:

In [91]:
net = nn.Sequential(
    nn.Linear(6, 4),
    nn.Sigmoid(),
    nn.Linear(4, 3)
)

x_1 # tensor([[2., 4., 3., 2., 3., 5.]])

net(x_1)

tensor([[-0.8253, -0.5326, -0.6160]], grad_fn=<AddmmBackward>)

<center>
<a href="https://www.jigsawlabs.io/free" style="position: center"><img src="./jigsaw-icon.png" width="15%" style="text-align: center"></a>
</center>