# Single Neuron → Multiple Neurons → Weight Matrix Form (NumPy + PyTorch)

This notebook is designed to build intuition step-by-step:

1. **Single neuron** (dot product + bias)
2. **Multiple neurons** (several neurons in a layer)
3. **Matrix form**
4. Implemented in **PyTorch tensors**

## Notations

We will use the following symbols and shapes throughout.

| Symbol | Meaning | Shape |
|---|---|---|
| $x$ | input vector | $x \in \mathbb{R}^{d}$ |
| $w$ | weights for **one neuron** | $w \in \mathbb{R}^{d}$ |
| $b$ | bias for **one neuron** | $b \in \mathbb{R}$ |
| $net$ | pre-activation (net input) | $net \in \mathbb{R}$ |
| $a$ | activation function | elementwise |

### Single neuron

$$
net =  x w^\top+ b
$$

$$
a = \phi(net)
$$

---

### A layer with $m$ neurons

| Symbol | Meaning | Shape |
|---|---|---|
| $W$ | weight matrix (column $i$ is neuron $i$’s weights) | $W \in \mathbb{R}^{d \times m}$ |
| $b$ | bias vector | $b \in  \mathbb{R}^{m}$ |
| $net$ | net vector/value | $net \in  \mathbb{R}^{m}$ |
| $a$ | activation/output vector | $a \in  \mathbb{R}^{m}$ |

Forward pass:

$$
net = xW + b
$$

$$
a = \phi(net)
$$

As of now we assume there is no activation function, so our net value is the output vector/value



## Part A — NumPy
We start with NumPy so you can clearly see shapes, dot products, and matrix multiplication.

### 1) Single neuron

A single neuron computes:

$$
net = \mathbf{x}\mathbf{w}^\top + b
$$



In [14]:
import numpy as np

# Seed for reproducibility
np.random.seed(1234)


# Inputs: x ∈ R^d
x = np.random.randn(5)
print("x:", x)
print("x shape:", x.shape)

# Weights: w ∈ R^d, bias: b ∈ R
w = np.random.randn(x.shape[0])
b = np.random.randn()

print("w:", w)
print("w shape:", w.shape)
print("b:", b)

# Pre-activation (net input)
net = np.dot(x,w) + b
print("net =  x w^T + b:", net)


x: [ 0.47143516 -1.19097569  1.43270697 -0.3126519  -0.72058873]
x shape: (5,)
w: [ 0.88716294  0.85958841 -0.6365235   0.01569637 -2.24268495]
w shape: (5,)
b: 1.150035724719818
net =  x w^T + b: 1.2437209721766265


### 2) Multiple neurons

If we have $m$ neurons, each neuron $j$ has its own weight vector $\mathbf{w}_j$ and bias $b_j$:


$$
net_j = \mathbf{x}\mathbf{w}_j^\top + b_j\quad\text{for }j=1,2,\dots,m
$$


In [15]:
import numpy as np

# Seed for reproducibility
np.random.seed(1234)

# Inputs: x ∈ R^d
x = np.random.randn(2)
print("x:", x)
print("x shape:", x.shape)

m = 3  # number of neurons in the layer

# Each neuron has its own weight vector and bias
weights = [np.random.randn(x.shape[0]) for _ in range(m)]  # list of w_j
biases  = [np.random.randn() for _ in range(m)]            # list of b_j

print("weights:", weights)
print("weights shape:", np.array(weights).shape)
print("biases:", biases)

net = []
for j in range(m):
    net_j = np.dot(x, weights[j]) + biases[j]
    net.append(net_j)

net = np.array(net)  # shape: (m,)
print("net:", net)
print("net shape:", net.shape)


x: [ 0.47143516 -1.19097569]
x shape: (2,)
weights: [array([ 1.43270697, -0.3126519 ]), array([-0.72058873,  0.88716294]), array([ 0.85958841, -0.6365235 ])]
weights shape: (3, 2)
biases: [0.015696372114428918, -2.2426849541854055, 1.150035724719818]
net: [ 1.06348563 -3.63898532  2.31335995]
net shape: (3,)


### 3) Multiple neurons — matrix form

Instead of storing the weight vectors $(\mathbf{w}_1,\dots,\mathbf{w}_m)$ separately, we stack them as **columns** in a matrix:

$$
\mathbf{W}=
\begin{bmatrix}
\;|\; & \;|\; &        & \;|\; \\
\mathbf{w}_1 & \mathbf{w}_2 & \cdots & \mathbf{w}_m \\
\;|\; & \;|\; &        & \;|\;
\end{bmatrix}
\in \mathbb{R}^{d\times m}
$$

We also stack the biases into a vector:

$$
\mathbf{b}=
\begin{bmatrix}
b_1\\
b_2\\
\vdots\\
b_m
\end{bmatrix}
\in \mathbb{R}^{m}
$$

Then the layer output is:

$$
\mathbf{net} = \mathbf{x}^\top \mathbf{W} + \mathbf{b}^\top
$$

#### Bias trick (augment the input)

We can absorb the bias into the matrix multiplication by appending a constant $1$ to the input:

$$
{\mathbf{x}}=
\begin{bmatrix}
\mathbf{1}\\
x
\end{bmatrix}
\in \mathbb{R}^{d+1}
$$

and appending the bias as an **extra row** under $\mathbf{W}$:

$$
{\mathbf{W}}=
\begin{bmatrix}
\mathbf{W}\\
\mathbf{b}^\top
\end{bmatrix}
\in \mathbb{R}^{(d+1)\times m}
$$

Now the same computation becomes a single matrix product:

$$
\mathbf{net} = {\mathbf{x}}{\mathbf{W}}
$$



In [16]:
import numpy as np


W = np.stack(weights, axis=1)   # shape: (d, m)  columns = neurons
b = np.array(biases)            # shape: (m,)

net = np.dot(x,W) + b                 # shape: (m,)

print("=== Standard matrix form (xW + b) ===")
print("W shape:", W.shape)      # (d, m)
print("x shape:", x.shape)      # (d,)
print("b shape:", b.shape)      # (m,)
print("net:", net)


x_tilde = np.concatenate([x, np.array([1.0])])  # shape: (d+1,)
W_tilde = np.concatenate([W, b.reshape(1, -1)], axis=0)  # shape: (d+1, m)

net_bias_trick = np.dot(x_tilde,W_tilde)             # shape: (m,)

print("\n=== Bias trick ===")
print("x_tilde shape:", x_tilde.shape)          # (d+1,)
print("W_tilde shape:", W_tilde.shape)          # (d+1, m)
print("net_bias_trick:", net_bias_trick)

print("\nMatches standard form?", np.allclose(net, net_bias_trick))


=== Standard matrix form (xW + b) ===
W shape: (2, 3)
x shape: (2,)
b shape: (3,)
net: [ 1.06348563 -3.63898532  2.31335995]

=== Bias trick ===
x_tilde shape: (3,)
W_tilde shape: (3, 3)
net_bias_trick: [ 1.06348563 -3.63898532  2.31335995]

Matches standard form? True


---
## Part B — PyTorch

PyTorch gives us fast tensor operations (like NumPy) and optional automatic differentiation.


### 1) Single neuron (PyTorch)

Same equation:
$$
net = \mathbf{x}\mathbf{w}^\top + b
$$


In [10]:
import torch

torch.manual_seed(1234)

print("PyTorch version:", torch.__version__)


# Inputs: x ∈ R^d
x = torch.randn(5)                 # shape: (d,)
print("x:", x)
print("x shape:", tuple(x.shape))

# Weights: w ∈ R^d, bias: b ∈ R
w = torch.randn(x.shape[0])        # shape: (d,)
b = torch.randn(())

print("w:", w)
print("w shape:", tuple(w.shape))
print("b:", b)
print("b shape:", tuple(b.shape))

# Pre-activation (net input): net = w^T x + b
net = torch.dot(x, w) + b          # scalar
print("net =  x w^T + b:", net)

PyTorch version: 2.9.0+cpu
x: tensor([ 0.0461,  0.4024, -1.0115,  0.2167, -0.6123])
x shape: (5,)
w: tensor([ 0.5036,  0.2310,  0.6931, -0.2669,  2.1785])
w shape: (5,)
b: tensor(0.1021)
b shape: ()
net =  x w^T + b: tensor(-1.8744)


### 2) Multiple neurons (layer) — matrix form (PyTorch)

If $\mathbf{x}\in\mathbb{R}^{d}$, $\mathbf{W}\in\mathbb{R}^{d\times m}$, and $\mathbf{b}\in\mathbb{R}^{m}$:

$$
\mathbf{net}=\mathbf{x}\mathbf{W}+\mathbf{b}
$$


In [11]:
torch.manual_seed(1234)

# Inputs: x ∈ R^d
x = torch.randn(2)
print("x:", x)
print("x shape:", tuple(x.shape))

m = 3  # number of neurons

# Each neuron has its own weight vector and bias
weights = [torch.randn(x.shape[0]) for _ in range(m)]
biases  = [torch.randn(()) for _ in range(m)]

print("weights (list):", weights)
print("weights stacked shape:", tuple(torch.stack(weights, dim=1).shape))  # (d, m)
print("biases (list):", biases)

net_list = []
for j in range(m):
    net_j = torch.dot(x,weights[j]) + biases[j]
    net_list.append(net_j)

net = torch.stack(net_list)   # shape: (m,)
print("net:", net)
print("net shape:", tuple(net.shape))

x: tensor([0.0461, 0.4024])
x shape: (2,)
weights (list): [tensor([-1.0115,  0.2167]), tensor([-0.6123,  0.5036]), tensor([0.2310, 0.6931])]
weights stacked shape: (2, 3)
biases (list): [tensor(-0.2669), tensor(2.1785), tensor(0.1021)]
net: tensor([-0.2263,  2.3529,  0.3917])
net shape: (3,)


In [12]:
print("3) Matrix form")

W = torch.stack(weights,dim=1)    # shape: (d, m)
b_vec = torch.stack(biases)        # shape: (m,)

# net = x W  + b
net_mat = torch.matmul(x,W) + b_vec

print("W shape:", tuple(W.shape))          # (d, m)
print("x shape:", tuple(x.shape))          # (d,)
print("b_vec shape:", tuple(b_vec.shape)) # (m,)
print("net_mat:", net_mat)

3) Matrix form
W shape: (2, 3)
x shape: (2,)
b_vec shape: (3,)
net_mat: tensor([-0.2263,  2.3529,  0.3917])


In [13]:
print("4) Bias trick")

# x_tilde = [x; 1]
x_tilde = torch.cat([x, torch.tensor([1.0])])           # shape: (d+1,)

# W_tilde = [W | b] (append b as last column)
W_tilde = torch.cat([W, b_vec.unsqueeze(0)], dim=0)     # shape: (d+1, m)

net_bias_trick = torch.matmul(x_tilde,W_tilde)                      # shape: (m,)

print("x_tilde shape:", tuple(x_tilde.shape))           # (d+1,)
print("W_tilde shape:", tuple(W_tilde.shape))           # (d+1, m)
print("net_bias_trick:", net_bias_trick)

print("Matches matrix form?", torch.allclose(net_bias_trick, net_mat))

4) Bias trick
x_tilde shape: (3,)
W_tilde shape: (3, 3)
net_bias_trick: tensor([-0.2263,  2.3529,  0.3917])
Matches matrix form? True
