# Linear Layers in Pytorch

### Introduction

Ok, so in previous lessons, we saw that we can write the linear function for a single neuron with the dot product.  Specifically, we rewrote our linear function:

$z(x) = w_1x_1 + w_2x_2 + b$ as: 

$z(x) =  \begin{bmatrix}
w_1 & w_2 \\
\end{bmatrix}\cdot \begin{bmatrix}
x_1 \\
x_2 
\end{bmatrix} + b = (w \cdot x + b)$

And remember that if this we pass through features of an observation, like features of a cancer cell, and the linear function returns a positive number, then our neuron fires.

In [4]:
import torch
# cell_area 2, cell_concavity 1
x = torch.tensor([2, 1])
             
w = torch.tensor([3, 4])

b = -8

w.dot(x) + b
# z = 3*2 + 4*1 - 8 = 2 -> neuron fires

tensor(2)

Now as we'll see in this lesson, when we work with a neural network, we don't just have a single neuron making a prediction but rather have many neurons making many different assessments.

### Building a layer

So why have multiple neurons in a layer?  Well we can think of each neuron being is in charge of making a different assessment about an observation. For example, in our example of a neural network that determines if cells are cancerous, we may have one neuron that determines if a cell is overly large, and another neuron that determines if a cell has an abnormal shape.  This sort of separation of concerns is a big part of where a neural network gets its power.  

So we'll need to create neurons that operate side by side.  And this time, our neurons will not ultimately predict whether or not a cell is cancerous, but rather perform a sort of division of labor to assess different components of each observation.  Ok, let's get started.

We'll start with a our potentially cancerous cell, represented in the following feature vector $x$.

In [1]:
import torch
# area, perimeter, number concave points, symmetry error  
x = torch.tensor([4, 2, 3, 2]).float()

And as we can see, the first two elements describe the size of the cell, and the second two elements (3 and 2) describe the cell's shape. 

And let's say that we have neuron that focuses on determining the cell's size, and looks something like the following:    

In [4]:
# area, perimeter, number concave points, symmetry error  
w_size = torch.tensor([2, 3, -.5, 0])

b_size = -11.0

Notice a couple things from the vector `w_size` above.  First, we can see that even the neuron is only in charge of determining a cell's size, it still observes **all** of the features from our observation.  But we can also see that it has weights with larger absolute values with the first two elements (2 and 3), that correspond to a cell's size.  

This means that if the cell's size changes (like the perimeter or area) it will have a large impact on the neuron firing, where a change to the shape would have less impact, as the associated weights are only $-.5$ and $0$.

### A second neuron

Now let's say we have a second neuron, with this neuron in charge of whether the cell has an irregular shape (as opposed size with our neuron above).  Our observation cell will still be the same, and so has the same four features:

In [7]:
# area, perimeter, number concave points, symmetry error  
x = torch.tensor([4., 2., 3., 2.]).float()

But this time, take a look at our new `w_shape` neuron below.

In [55]:
# area, perimeter, number concave points, symmetry error  
w_shape = torch.tensor([0, -.5, 3, 1.5])

b_shape = -14

This time, the last two weights have a larger absolute value, making the neuron more sensitive to shape.

> The neuron also has it's own bias for assessing shape, which is $-14$.

In [61]:
x.dot(w_shape) + b_shape

tensor(-3.)

So this second neuron predicts whether or not the cell is misshapen, and here outputs a negative number, assessing that the cell is not misshaped.

### Thinking with multiple neurons

There are a couple points to take from the above discussion. 

#### 1. Different weights same input

The first point, is that we use the **same feature vector** $x$ as an input to each neuron.  The two neurons simply weigh these inputs differently.

$z_{size}(x) = \begin{bmatrix}
x_1 & x_2 & x_3 & x_4 \\
\end{bmatrix}\cdot \begin{bmatrix}
2 \\
3 \\
-.5 \\
0 \\
\end{bmatrix} - 12 = .5$

$z_{shape}(x) = \begin{bmatrix}
x_1 & x_2 & x_3 & x_4 \\
\end{bmatrix}\cdot \begin{bmatrix}
0 \\
-.5 \\
3 \\
1.5 \\
\end{bmatrix} - 14 = 1.5$

> So above, that feature vector $\begin{bmatrix}
x_1 & x_2 & x_3 & x_4 \\
\end{bmatrix}$ represents a single cell with one set of values, and that feature vector is going to two different neurons, for one neuron to make a determination about size, and the other to make a determination about shape.

The diagram below makes the same point.  The diagram represents the first layer of a neural network, and notice: the attributes of an observation $x_1$ through $x_4$, are fed to each of the four neurons in the layer (each neuron is a white circle).  That's why each feature has a line drawn to each neuron.  

<img src="./first_layer_2.svg" width="30%">

So just to recap, every neuron in a layer receives the same inputs.  And each neuron just has separate weights and biases.

## Making it brief

Now that we understand that each neuron receives the same inputs, let's see if we can condense our representation of our neurons.  To do this, let's start by removing the biases of our two neurons.  This gives us:

$g_{size}(x) = \begin{bmatrix}
2 & 4 & 3 & 1 \\
\end{bmatrix}\cdot \begin{bmatrix}
2 \\
3 \\
-.5 \\
0 \\
\end{bmatrix} = 11$

$g_{shape}(x) = \begin{bmatrix}
2 & 4 & 3 & 1 \\
\end{bmatrix}\cdot \begin{bmatrix}
0 \\
-.5 \\
3 \\
1.5 \\
\end{bmatrix}= 8.5$

Next, observe that if we combine these two weight vectors into a *matrix*, then multiplying our feature vector $x$ by our weight matrix gives us the following: 

$\begin{bmatrix}
2 & 4 & 3 & 1 \end{bmatrix} \cdot
\begin{bmatrix}
1 & 0\\
3 & -.5 \\
-.5 & 3 \\
0 & 1.5\end{bmatrix} = \begin{bmatrix}
11 & 8.5 \end{bmatrix}$

Or more generally: 

$\begin{bmatrix}
- & x &  -  
\end{bmatrix} \cdot \begin{bmatrix}
|  & |  \\
w_{1}  & w_{2} \\
|   & |
\end{bmatrix}  = \begin{bmatrix}
x \cdot w_{1} & x \cdot w_{2} \end{bmatrix}  = \begin{bmatrix} g_{1}(x) & g_{2}(x) \end{bmatrix}$

Let's prove this in code.  We'll start by placing our feature vectors in a weight matrix $W$. 

In [8]:
w_size = torch.tensor([0, -.5, 3, 1.5])
w_shape = torch.tensor([1, 3, -.5, 0])

W = torch.stack((w_size, w_shape), dim = 0).T
W

tensor([[ 0.0000,  1.0000],
        [-0.5000,  3.0000],
        [ 3.0000, -0.5000],
        [ 1.5000,  0.0000]])

And then we'll multiply the attributes of our observation $x$ by the weights of these neurons with the $@$ symbol like so:

In [100]:
result = x @ W
result

tensor([11.0000,  8.5000])

So we have just seen that we can use matrix multiplication to calculate the weighted sum of multiple neurons.  

However, we still have not included the biases.  To complete our linear function, we need to add the bias of $-12$ for $z_{size}$ and our bias of $-14$ for $z_{shape}$.  We do so by placing the two biases into a vector.

In [18]:
b = torch.tensor([-12, -7])

In [103]:
x @ W + b

tensor([-1.0000,  1.5000])

Or summarizing the above, we can calculate the outputs of both linear functions with:

$\begin{bmatrix}
- & x &  -  
\end{bmatrix} \cdot \begin{bmatrix}
|  & |  \\
w_1  & w_2 \\
|   & |
\end{bmatrix} + \begin{bmatrix} b_1 & b_2 \end{bmatrix} = \begin{bmatrix}
x \cdot w_1 & x \cdot w_2 \end{bmatrix} + \begin{bmatrix} b_1 & b_2 \end{bmatrix} = \begin{bmatrix} z_1(x) & z_2(x) \end{bmatrix}$

Which we can summarize as:

$z = x\cdot W  + b$

Where $z$ is a vector consisting of the output of each neuron's linear function.

In [74]:
x.dot(W) + b

array([0.5, 1.5])

> Your turn

Now consider that we have two different neurons that detect size and shape.  They do so with the following weights.

In [104]:
import torch

x = torch.tensor([2, 4, 3, 1])

w_size = torch.tensor([2, 0, 1, 2])
w_shape = torch.tensor([1, 1, 2, 0])

W_new = torch.stack((w_size, w_shape), dim = 0).T
W_new

tensor([[ 0.0000,  1.0000],
        [-0.5000,  3.0000],
        [ 3.0000, -0.5000],
        [ 1.5000,  0.0000]])

Try to calculate the output of $x \cdot W$, where $x$ is the following.

In [None]:
# write your answer here



Check your answer with the dot product below.

## Including the activation layer

And we can calculate hypothesis made by each neuron in a layer with:

In [19]:
import torch.nn.functional as F
z = x @ W + b
z

tensor([-1.0000,  1.5000])

In [22]:
F.sigmoid(z)

tensor([0.2689, 0.8176])

Or mathematically, we can write our layer as the following:

$ \sigma(x \cdot W + b) $

Where sigma is applied to each entry of the vector resulting from $W\cdot x + b$

$\sigma (x \cdot W + b) = \begin{bmatrix} \sigma(z_1) & \sigma(z_2) \end{bmatrix}$

Or expressing the above formula as two layers of a neural network, our linear layer and an activation layer.  We can express this as the following:

$z = (x \cdot W + b)$

$a = \sigma(z)$

Where $z$ is our linear layer and $a$ is our activation layer.

### Summary

In this lesson we saw the components to build a layer of a neural network.  A single layer is a combination of a weighted input and a sigmoid activation function.  

The weighted input can be represented by $x \cdot W + b$



$\begin{bmatrix}
- & x &  -  
\end{bmatrix} \cdot \begin{bmatrix}
|  & |  \\
w_1  & w_2 \\
|   & |
\end{bmatrix} + \begin{bmatrix} b_1 & b_2 \end{bmatrix} = \begin{bmatrix}
x \cdot w_1 & x \cdot w_2 \end{bmatrix} + \begin{bmatrix} b_1 & b_2 \end{bmatrix} = \begin{bmatrix} l_1(x) & l_2(x) \end{bmatrix}$

* The row vector $x$ represents the features of a single observation.
* Each column of the matrix W, contains the weights of a separate neuron, with the entries of $b$ as the corresponding biases.

The output of the weighted input is fed into the activation function, which applies an entrywise operation.  Here, we use the sigmoid function.  So we can summarize the operations of our entire layer as:

$\sigma (W\cdot x + b) = \begin{bmatrix} \sigma(z_1) \\ \sigma(z_2) \end{bmatrix}$

Or we can break up the above as a linear layer $z$ and an activation layer $a$ where:

$z = (x \cdot W + b)$

$a = \sigma(z)$

### Summary

In this lesson we saw the components to build a layer of a neural network.  A single layer is a combination of a weighted input and a sigmoid activation function.  

The weighted input can be represented by $x \cdot W + b$



$\begin{bmatrix}
- & x &  -  
\end{bmatrix} \cdot \begin{bmatrix}
|  & |  \\
w_1  & w_2 \\
|   & |
\end{bmatrix} + \begin{bmatrix} b_1 & b_2 \end{bmatrix} = \begin{bmatrix}
x \cdot w_1 & x \cdot w_2 \end{bmatrix} + \begin{bmatrix} b_1 & b_2 \end{bmatrix} = \begin{bmatrix} z_1(x) & z_2(x) \end{bmatrix}$

* The row vector $x$ represents the features of a single observation.
* Each column of the matrix W, contains the weights of a separate neuron, with the entries of $b$ as the corresponding biases.

The output of the weighted input is fed into the activation function, which applies an entrywise operation.  Here, we use the sigmoid function.  So we can summarize the operations of our entire layer as:

$\sigma (W\cdot x + b) = \begin{bmatrix} \sigma(z_1) \\ \sigma(z_2) \end{bmatrix}$

Or we can break up the above as a linear layer $z$ and an activation layer $a$ where:

$z = (x \cdot W + b)$

$a = \sigma(z)$

<center>
<a href="https://www.jigsawlabs.io/free" style="position: center"><img src="./jigsaw-icon.png" width="15%" style="text-align: center"></a>
</center>

### Answers

In [6]:
x @ W_new

array([ 9, 12])