In [2]:
from rich import print

%load_ext rich

# Chapter 2: Coding our First Neurons

In [1]:
import numpy as np
import matplotlib.pyplot as plt

## A single neuron

In [5]:
inputs = [1, 2, 3]
weights = [0.2, 0.8, -0.5]
bias = 2

outputs = (
    inputs[0] * weights[0] + inputs[1] * weights[1] + inputs[2] * weights[2] + bias
)

print(outputs)


Here, we define a single neuron that takes in three inputs and produces a single output, defined by the equation:

$$
y = w_1 \cdot x_1 + w_2 \cdot x_2 + w_3 \cdot x_3 + b
$$

where $w_1, w_2, w_3$ are the weights, $x_1, x_2, x_3$ are the inputs, and $b$ is the bias.

## A layer of neurons

A layer of neurons is simply a group of neurons that take the same input, but produce a different set of outputs. This is because each neuron in a layer (group) has its own set of tunable parameters (weights and biases), which are considered as separate entities from the parameters of the other neurons in the layer.

In [7]:
inputs = [1, 2, 3, 2.5]

weights_1 = [0.2, 0.8, -0.5, 1.0]
weights_2 = [0.5, -0.91, 0.26, -0.5]
weights_3 = [-0.26, -0.27, 0.17, 0.87]

bias_1 = 2
bias_2 = 3
bias_3 = 0.5

output_1 = (
    inputs[0] * weights_1[0]
    + inputs[1] * weights_1[1]
    + inputs[2] * weights_1[2]
    + inputs[3] * weights_1[3]
    + bias_1
)

output_2 = (
    inputs[0] * weights_2[0]
    + inputs[1] * weights_2[1]
    + inputs[2] * weights_2[2]
    + inputs[3] * weights_2[3]
    + bias_2
)

output_3 = (
    inputs[0] * weights_3[0]
    + inputs[1] * weights_3[1]
    + inputs[2] * weights_3[2]
    + inputs[3] * weights_3[3]
    + bias_3
)

outputs = [output_1, output_2, output_3]

print(outputs)


Here, each neuron in the current layer is connected to all neurons in the previous layer (if any), and each neuron in the previous layer is connected to all neurons in the current layer. This is known as a fully connected layer, or a dense layer.

However, this code is not scalable. Instead, we use a loop to handle dynamically-sized inputs and layers.

In [8]:
inputs = [1, 2, 3, 2.5]
weights = [
    [0.2, 0.8, -0.5, 1.0],  # Neuron 1
    [0.5, -0.91, 0.26, -0.5],  # Neuron 2
    [-0.26, -0.27, 0.17, 0.87],  # Neuron 3
]

biases = [2, 3, 0.5]

layer_outputs = []

for w_n, b_n in zip(weights, biases):
    o_n = 0
    for i_i, w_i in zip(inputs, w_n):
        o_n += i_i * w_i

    o_n += b_n

    layer_outputs.append(o_n)

print(layer_outputs)


Mathematically, the output of a layer of neurons is given by:

$$
y_n = \sum_{i=1}^{m} w_{n,i} \cdot x_i + b_n
$$

where $m$ is the number of inputs and $n$ is the number of neurons. In the above example, $m=4$ and $n=3$.

## Tensors, arrays, and vectors

An array is simply a collection of elements, arranged in some logical order. For example, a list of lists of lists can be represented as:
```python
lll = [[[1, 2, 3, 4], 
        [5, 6, 7, 8]],
       [[9, 10, 11, 12], 
        [13, 14, 15, 16]],
       [[17, 18, 19, 20], 
        [21, 22, 23, 24]]]
```

The above is a 3-dimensional array, containing 3 2-dimensional arrays, each containing 2 1-dimensional arrays, each containing 4 elements. In concise notation, the dimensions of the array can be written as `(3, 2, 4)`.

That is what a **tensor** is, essentially. From the book, "a tensor object is an object that can be represented as an array".

Finally, **vectors** are nothing but 1-D arrays, or simple lists in Python, while **matrices** are 2-D arrays, or lists of lists in Python.


## Dot product and vector addition

A dot product of two vectors is a fancy way of describing the sum of the products of the corresponding elements of the two vectors. The dot product of two vectors $a$ and $b$ is given by:

$$
a \cdot b = \sum_{i=1}^{n} a_i b_i = a_1 b_1 + a_2 b_2 + \ldots + a_n b_n
$$

where $n$ is the number of elements in the vectors. This means that when computing a dot product, both the vectors must have the exact same number of elements.

Vector addition is adding the corresponding elements of two vectors to produce a new vector. For example, given two vectors $a$ and $b$, the sum of the two vectors is given by:
$$
a + b = [a_1 + b_1, a_2 + b_2, \ldots, a_n + b_n]
$$

Now, if we tag $a$ as "inputs$, $b$ as weights, and $c$ as biases, we can write the equation for the output of a single neuron as:
$$
y = a \cdot b + c
$$,

which is essentially the same equation that we computed earlier, but written in a more compact form.

In [13]:
a = [1, 2, 3]
b = [4, 5, 6]

dot_product = a[0] * b[0] + a[1] * b[1] + a[2] * b[2]
print("a . b: ", dot_product)

a_plus_b = [a[0] + b[0], a[1] + b[1], a[2] + b[2]]
print("a + b: ", a_plus_b)


## Neuron math - with NumPy!

In [14]:
# A single neuron

inputs = [1, 2, 3, 2.5]
weights = [0.2, 0.8, -0.5, 1.0]
bias = 2

outputs = np.dot(inputs, weights) + bias
print(outputs)

In [17]:
# A layer of neurons
inputs = [1, 2, 3, 2.5]
weights = [
    [0.2, 0.8, -0.5, 1.0],  # Neuron 1
    [0.5, -0.91, 0.26, -0.5],  # Neuron 2
    [-0.26, -0.27, 0.17, 0.87],  # Neuron 3
]

biases = [2, 3, 0.5]

layer_outputs = np.dot(weights, inputs) + biases # Note the order here, which is necessary for the matrix multiplication to work
print(layer_outputs)

## A batch of data

The input vector that we have been using so far contained 1 single observation, where each value can be thought of as a feature. This makes the entire vector into a **feature set instance**, or more commonly, an **observation** or a **sample**.

In practice, we tend to use a bunch of samples instead of just one, because of two reasons:
1. It is more efficient to compute the output of a layer of neurons for a bunch of samples at once, rather than one at a time.
2. It helps the model to generalize better, as it can learn from a variety of samples, rather than overfitting to a single sample.

To perform this in practice, we use the concept of matrix multiplication, ensuring that the dimensions of the matrices are compatible with each other.

In [38]:
inputs = [
    [1, 2, 3, 2.5],
    [2.0, 5.0, -1.0, 2.0],
    [-1.5, 2.7, 3.3, -0.8],
    [0.5, -0.91, 0.26, -0.5],
]

weights = [
    [0.2, 0.8, -0.5, 1.0],  # Neuron 1
    [0.5, -0.91, 0.26, -0.5],  # Neuron 2
    [-0.26, -0.27, 0.17, 0.87],  # Neuron 3
]

biases = [2, 3, 0.5]

layer_outputs = (
    np.dot(inputs, np.array(weights).T) + biases
)  # Note that the weights matrix is transposed here, to obtain a matrix with samples as rows and neurons as columns

print(layer_outputs)


The resulting `layer_outputs` is a 4x3 matrix, where each row corresponds to the output of the neuron layer for a single sample and each column is the output of a single neuron for all samples.

Expressing this mathematically, we have:

$$
\begin{bmatrix}
y_{1,1} & y_{1,2} & \dots & y_{1,n} \\
\vdots & \vdots & \ddots & \vdots \\
y_{m,1} & y_{m,2} & \dots & y_{m,n}
\end{bmatrix} = 
\begin{bmatrix}
x_{1,1} & x_{1,2} & \dots & x_{1,p} \\
\vdots & \vdots & \ddots & \vdots \\
x_{m,1} & x_{m,2} & \dots & x_{m,p}
\end{bmatrix} \cdot
\begin{bmatrix}
w_{1,1} & w_{2,1} & \dots & w_{n,1} \\
\vdots & \vdots & \ddots & \vdots \\
w_{1,p} & w_{2,p} & \dots & w_{n,p}
\end{bmatrix}^T +
\begin{bmatrix}
b_1 & b_2 & \dots & b_n
\end{bmatrix}
$$

where $m$ is the number of samples, $n$ is the number of neurons, and $p$ is the number of features (or inputs to each neuron). In the above example, $m=4$, $n=3$, and $p=4$.