# Building Neural Networks from Scratch
## Part 1: Neurons, Layers, and the Magic of Matrix Multiplication

**Goal**: Understand the **core mathematics** behind a dense neural network layer — from first principles.

**Analogy**:  
Spotify uses 4 audio features to predict if you'll love a song:  
- `danceability` (0–1)  
- `energy` (0–1)  
- `valence` (happiness, 0–1)  
- `tempo` (beats per minute, ~60–200)
A team of **music experts** (neurons) predicts how much you'll like a song.

We will build this system step by step — manually → loops → NumPy → batches.

## 1. The Single Neuron: A Weighted Vote

A neuron computes:

$$
y = x_1 w_1 + x_2 w_2 + x_3 w_3 + b
$$

This is a **linear combination** — the foundation of all neural networks.

**Intuition**:  
Each weight $w_i$ = "How much does this expert care about feature $x_i$?"  
Bias $b$ = "What is their baseline opinion?"

Let’s code it.

In [8]:
# Song:
danceability = 0.82
energy = 0.76
sadness = 0.68

weights = [0.5, 0.9, -0.4] # loves energy, dislikes sadness

bias = 1.1

In [9]:
output_score = (danceability *weights[0] +
                energy * weights[1] + 
                sadness * weights[2] + 
                bias)

print("Single output score:", round(output_score, 3))

Single output score: 1.922


## 2. Adding Tempo: 4-Input Neuron

Now include tempo:

$$
y = w_1 x_1 + w_2 x_2 + w_3 x_3 + w_4 x_4 + b

In [10]:
# adding 4th feature tempo
tempo = 133
# adding weight for tempo

weights.append(0.015)
print(weights)

[0.5, 0.9, -0.4, 0.015]


In [11]:
# the contribution of tempo as per its weight will be added to the output score
updated_output_score = output_score + tempo * weights[3]
print("Updated 4-feature output score:", round(updated_output_score, 3))

Updated 4-feature output score: 3.917


## 3. A Full Layer: 3 Experts, Different Tastes

Three experts evaluate the **same song**:

- Expert 1: Dance lover  - Will have weights favoring danceability feature
- Expert 2: Chill seeker  
- Expert 3: Emotional listener

In [12]:
new_random_song = [0.82, 0.76, 0.68, 127]

weights = [
        [0.5, 0.9, -0.4, 0.017], # dance lover
        [0.3, -0.8, 0.7, 0.04], # chill seeker
        [-0.6, 0.2, 0.95, 0.008] # emotional
]

biases = [1.1, 2.3, 0.9]

In [13]:
len(new_random_song)

4

In [14]:
# manual calculation
out_1 = sum(new_random_song[i] * weights[0][i] for i in range(len(new_random_song))) + biases[0]
out_2 = sum(new_random_song[i] * weights[1][i] for i in range(len(new_random_song))) + biases[1]
out_3 = sum(new_random_song[i] * weights[2][i] for i in range(len(new_random_song))) + biases[2]

print("Three expert scores:", [round(x, 3) for x in [out_1, out_2, out_3]])

Three expert scores: [4.081, 7.494, 2.222]


## 4. Loops

Eliminate the hard-coding part

In [22]:
layer_scores = []

for weight_vec, bias_value in zip(weights, biases):
    neuron_score = 0

    for feature, w in zip(new_random_song, weight_vec):
        neuron_score += feature * w

    neuron_score += bias_value

    layer_scores.append(round(neuron_score, 3))

print("Loop result:", layer_scores)

Loop result: [4.081, 7.494, 2.222]


In [None]:
layer_scores = []

for w_vec, b in zip(weights, biases):
    neuron_score = 0
    for feature, w in zip(new_random_song, w_vec):
        neuron_score += feature * w
    neuron_score += b
    layer_scores.append(round(neuron_score, 3))

print("Loop result:", layer_scores)

Loop result: [3.917, -0.146, 2.07]


**Output:** `[4.081, 7.494, 2.222]`

Same result — scalable to 1000 neurons.

## 5. NumPy: One Line for the Whole Layer

Using matrix multiplication we can do everything in a single step

$$
\mathbf{y} = \mathbf{W} \mathbf{x} + \mathbf{b}
$$

In [23]:
import numpy as np

x = np.array([0.82, 0.76, 0.68, 127])
W = np.array(weights)
b = np.array(biases)


layer_output = np.dot(W, x) + b

print("Numpy result:", np.round(layer_output, 3))

Numpy result: [4.081 7.494 2.222]


**Output:** `[4.081 7.494 2.222]`

All 3 experts computed **instantly**.

## 6. Batch Processing: Processing multiple songs at oNce.

**Key insights**:
We need $\mathbf{X} \mathbf{W}^T$  (not $\mathbf{X} \mathbf{W}$)

Why? 
The answer is **Shape Alignment** in matrix multiplication.
- $\mathbf{X}$: (3, 4)
- $\mathbf{W}^T$:(4, 3) -> output: (3, 3)

In [24]:
songs = np.array([
    [0.82, 0.76, 0.68, 127],  # pop 
    [0.91, 0.88, 0.45, 150],  # eNergy banger
    [0.38, 0.32, 0.22, 92]  # sad
])

batch_output =  np.dot(songs, W.T) + b

print("Batch Shape: ", batch_output.shape)
print("All Scores: ")
print(np.round(batch_output, 3))

Batch Shape:  (3, 3)
All Scores: 
[[4.081 7.494 2.222]
 [4.717 8.184 2.158]
 [3.054 5.992 1.681]]


**Output:**
`
[[4.081 7.494 2.222]
 [4.717 8.184 2.158]
 [3.054 5.992 1.681]]`

 **Each row = one song’s scores from all 3 experts.**

## The Universal Formula

A dense layer **always** computes:

$$
\boxed{\mathbf{Y} = \mathbf{X} \mathbf{W}^T + \mathbf{b}}
$$

- $\mathbf{X}$ : (batch_size, features)
- $\mathbf{W}$ : (neurons, features)
- $\mathbf{b}$ : (neurons, )
- $\mathbf{Y}$ : (batch_size, neurons)

This equation runs **billions of times per seconds** in GPT, Stable Diffusion and every modern model.

## What I've tried to covered here:

- A neuron is just **weighted sum + biases**
- A layer is just **multiple neurons** sharing inputs.
- Loops make code clean.
- `np.dot()` + transpose = **faster calculations**
- **One equation** rules all dense layers.

My target has been to impart intuition to what runs inside the hood of the neural netowrk operations.

**Thanks** for going through this.

## Additional Python bit

## Why `zip()` Is Essential — A Hands-On Breakdown

We just used `zip()` in our loop:
```python
for neuron_weights, neuron_bias in zip(weights, biases):
    ...

In [25]:
# Let's look at our data
song = [0.82, 0.76, 0.68, 127]  # one song
expert1_weights = [0.5, 0.9, -0.4, 0.017]

print("Song features:", song)
print("Expert 1 weights:", expert1_weights)

Song features: [0.82, 0.76, 0.68, 127]
Expert 1 weights: [0.5, 0.9, -0.4, 0.017]


We want to **pair each feature with its weight** to compute $ x \times w $.

**Old way (with `range`):**

In [26]:
for i in range(len(song)):
    print(song[i], "*", expert1_weights[i])

0.82 * 0.5
0.76 * 0.9
0.68 * -0.4
127 * 0.017


In [27]:
for feature, weight in zip(song, expert1_weights):
    print(feature, "*", weight, "=", feature * weight)

0.82 * 0.5 = 0.41
0.76 * 0.9 = 0.684
0.68 * -0.4 = -0.272
127 * 0.017 = 2.1590000000000003


In [28]:
### Visual Intuition
list(zip(song, expert1_weights))

[(0.82, 0.5), (0.76, 0.9), (0.68, -0.4), (127, 0.017)]

**`zip()` does this:**
- Takes two lists
- Walks through them **in parallel**
- Returns **pairs** as tuples
- Stops at the **shortest** list (safe!)

Use **zip() + sum()** to add bias to the layer outputs.

---