### Why this matters

In a the neural network, we generally don't work with a single neuron individual, but rather work with a *layer* of multiple neurons.  In fact, we already saw a layer of neurons when we used Pytorch to predict our images.  You may have forgotten.

In [11]:
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.W1 = nn.Linear(28*28, 64) # all of these are linear layers
        self.W2 = nn.Linear(64, 64)
        self.W3 = nn.Linear(64, 64)
        self.W4 = nn.Linear(64, 10)

Each of the `nn.Linear` is Pytorch's way of creating multiple neurons side by side in a linear layer.  The first argument specifies the number of features of all of the neurons in the layer, and the second argument specifies the number of neurons to create.  

For example, if we want to create a layer that has only one neuron, with the neuron taking in four different features, we do so with the following. 

In [22]:
W = nn.Linear(4, 1)

Under the hood, this neuron looks just like the neuron we built when learning about the dot product.

In [28]:
dict(W._parameters)

{'weight': Parameter containing:
 tensor([[-0.2594, -0.3593,  0.2731,  0.4234, -0.1417]], requires_grad=True),
 'bias': Parameter containing:
 tensor([-0.2888], requires_grad=True)}

See that, this time there's a a tensor with four different weights, and then at the end there's a bias term of a single number.  Just like we saw before.

And if we create two neurons, well this means there will be two neurons and two bias terms.

In [29]:
W = nn.Linear(4, 2)

In [30]:
W._parameters

OrderedDict([('weight',
              Parameter containing:
              tensor([[-0.1189,  0.1297, -0.0730,  0.3075],
                      [-0.4594,  0.1429,  0.3236, -0.0862]], requires_grad=True)),
             ('bias',
              Parameter containing:
              tensor([-0.1429, -0.1153], requires_grad=True))])

## Including the activation layer

And we can calculate hypothesis made by each neuron in a layer with:

In [75]:
sigmoid(x.dot(W) + b)

array([0.62245933, 0.81757448])

Or mathematically, we can write our layer as the following:

$ \sigma(x \cdot W + b) $

Where sigma is applied to each entry of the vector resulting from $W\cdot x + b$

$\sigma (x \cdot W + b) = \begin{bmatrix} \sigma(z_1) & \sigma(z_2) \end{bmatrix}$

Or expressing the above formula as two layers of a neural network, our linear layer and an activation layer.  We can express this as the following:

$z = (x \cdot W + b)$

$a = \sigma(z)$

Where $z$ is our linear layer and $a$ is our activation layer.