# **Traffic Light Classifier - Model Summery**

Perfect! Let's begin.

---

## ✅ **Step 1: Define the Problem and Dataset**

We’ll simulate a dataset of traffic light color readings based on simplified intensity features:

### Inputs (X):

* **Red intensity**
* **Green intensity**
* **Yellow intensity**

### Outputs (y):

* Classes: `Red`, `Green`, `Yellow`

We'll start by encoding those outputs as strings for now.




In [1]:
import numpy as np

# Features: [Red_intensity, Green_intensity, Yellow_intensity]
X = np.array([
    [1.0, 0.0, 0.0],   # Clearly red
    [0.0, 1.0, 0.0],   # Clearly green
    [0.0, 0.0, 1.0],   # Clearly yellow
    [0.9, 0.1, 0.1],   # Mostly red
    [0.1, 0.9, 0.2],   # Mostly green
    [0.1, 0.2, 0.9]    # Mostly yellow
])

# Labels: Target color names
y_labels = np.array(['Red', 'Green', 'Yellow', 'Red', 'Green', 'Yellow'])


✅ Done! We now have a **multi-class classification dataset** with:

* 3 input features
* 3 output classes

# **Step 2: One-Hot Encode the Output Labels**

MLPs expect numeric values — especially for multi-class classification.
So we need to convert these string labels ("Red", "Green", "Yellow") into one-hot encoded vectors.

In [3]:
from sklearn.preprocessing import OneHotEncoder

# Reshape y_labels to column vector
y_labels = y_labels.reshape(-1, 1)

# Initialize encoder
encoder = OneHotEncoder(sparse_output=False)

# Transform string labels to one-hot vectors
y = encoder.fit_transform(y_labels)


Now, your y becomes:

| Label  | One-hot    |
| ------ | ---------- |
| Red    | \[1, 0, 0] |
| Green  | \[0, 1, 0] |
| Yellow | \[0, 0, 1] |

One-hot encoding done!

# **Step 3: Build the MLP architecture.**

Awesome! Let's move on to the next step.

---

## ✅ **Step 3: Build the MLP Architecture**

We’ll now define the structure of the MLP:

* **Input layer**: 3 neurons (Red, Green, Yellow intensity)
* **Hidden layer**: Let’s use 4 neurons (you can tune this)
* **Output layer**: 3 neurons (Red, Green, Yellow) → softmax output

### Activation choices:

* Hidden: **ReLU**
* Output: **Softmax** (for multi-class classification)

We’ll use **NumPy only**, so we’ll write everything ourselves — no frameworks yet.



In [4]:
# Layer sizes
input_size = 3
hidden_size = 4
output_size = 3

# Initialize weights and biases
np.random.seed(42)

W1 = np.random.randn(input_size, hidden_size)
b1 = np.zeros((1, hidden_size))

W2 = np.random.randn(hidden_size, output_size)
b2 = np.zeros((1, output_size))



✅ This initializes:

* `W1` and `b1`: from input → hidden
* `W2` and `b2`: from hidden → output

---


## 🧠 Why Did We Use 4 Neurons in the Hidden Layer?

### 1. ✅ **Heuristic Choice (Rule of Thumb)**

In small models, we often start with:

> A number **between input size and output size × 2**

* Input size = 3
* Output size = 3
  So:
* Hidden size between 3 and 6 is usually a good starting point
* We picked 4 as a balanced middle

---

### 2. ✅ **Simplicity First**

* We **don’t want to overcomplicate** with too many neurons.
* More neurons = more weights = more training time + higher risk of overfitting.
* Since the dataset is small and simple (6 examples), 4 neurons keep the model lightweight.

---

### 3. 📊 **Can We Tune It Later?**

Yes. we can **experiment** with:

* 2 neurons (underfitting likely)
* 8 or 16 neurons (might overfit)

Use this as a **baseline**, and adjust based on:

* Validation accuracy
* Overfitting/underfitting signs
* Loss curve behavior

---

## 🎯 Summary:

> We chose 4 hidden neurons because it’s a **balanced, safe starting point** for a small 3-input, 3-output task.
> You can **adjust it later** based on how the model performs.

---



# **Step 4: Implement forward pass with ReLU and Softmax**

### What happens in a forward pass?

1. Inputs go through **linear transformation** to the hidden layer
2. **ReLU** activation is applied to introduce non-linearity
3. Hidden layer output is passed to the output layer
4. **Softmax** is applied to get class probabilities

---




In [5]:

# ReLU activation (for hidden layer)
def relu(x):
    return np.maximum(0, x)

# Softmax activation (for output layer)
def softmax(x):
    exps = np.exp(x - np.max(x, axis=1, keepdims=True))  # numerical stability
    return exps / np.sum(exps, axis=1, keepdims=True)


# Forward pass function
def forward(X):
    z1 = np.dot(X, W1) + b1        # Input → Hidden
    a1 = relu(z1)                  # Apply ReLU
    z2 = np.dot(a1, W2) + b2       # Hidden → Output
    a2 = softmax(z2)               # Apply Softmax
    return z1, a1, z2, a2



This function returns:

* `z1`: Linear output before ReLU
* `a1`: Activated hidden layer
* `z2`: Output before softmax
* `a2`: Final output probabilities (shape: \[n\_samples, 3])

---

✅ Forward pass is ready!

# **Step 5: Loss calculation and backpropagation**

## 📦 Categorical Cross-Entropy Loss (for multi-class)

Used when:

* Output = one-hot vector
* Final layer = softmax




In [6]:
def categorical_cross_entropy(y_true, y_pred):
    return -np.mean(np.sum(y_true * np.log(y_pred + 1e-8), axis=1))


* `y_true`: one-hot encoded true labels (e.g., `[1, 0, 0]`)
* `y_pred`: softmax probabilities
* The small `1e-8` avoids log(0)


---
## 🔁 Backpropagation Steps

We'll derive gradients layer-by-layer:

| Layer        | Formula                                            |
| ------------ | -------------------------------------------------- |
| Output error | $\text{dZ2} = \text{a2} - y$                       |
| Output grads | $\text{dW2} = a1^T \cdot dZ2$, $db2 = \sum dZ2$    |
| Hidden error | $\text{dZ1} = (dZ2 \cdot W2^T) * \text{ReLU}'(z1)$ |
| Hidden grads | $\text{dW1} = X^T \cdot dZ1$, $db1 = \sum dZ1$     |

---


In [8]:
# ReLU derivative (used in hidden layer backprop)
def relu_derivative(x):
    return (x > 0).astype(float)
# Backpropagation
def backward(X, y, z1, a1, z2, a2, learning_rate):
    # Update global weights and biases
    global W1, b1, W2, b2

    m = X.shape[0]  # number of samples

    # Output layer error
    dZ2 = a2 - y
    dW2 = np.dot(a1.T, dZ2) / m
    db2 = np.sum(dZ2, axis=0, keepdims=True) / m

    # Hidden layer error
    dZ1 = np.dot(dZ2, W2.T) * relu_derivative(z1)
    dW1 = np.dot(X.T, dZ1) / m
    db1 = np.sum(dZ1, axis=0, keepdims=True) / m


    W1 -= learning_rate * dW1
    b1 -= learning_rate * db1
    W2 -= learning_rate * dW2
    b2 -= learning_rate * db2


✅ Now you’ve implemented:

* Softmax + Cross-Entropy loss
* Layer-by-layer gradient flow
* Manual parameter updates

# **Step 6: Training loop**

Here we will :

* Call forward() on input data

* Calculate the loss

* Run backward() to update weights

* Repeat over multiple epochs

In [9]:
# Step 6: Training loop
def train(X, y, epochs=1000, learning_rate=0.1):
    for epoch in range(epochs):
        z1, a1, z2, a2 = forward(X)                  # Forward pass
        loss = categorical_cross_entropy(y, a2)      # Compute loss
        backward(X, y, z1, a1, z2, a2, learning_rate) # Backpropagation

        if epoch % 100 == 0:
            print(f"Epoch {epoch}, Loss: {loss:.4f}")


## **Training the Model**

In [10]:
train(X, y, epochs=1000, learning_rate=0.1)


Epoch 0, Loss: 1.6488
Epoch 100, Loss: 0.1453
Epoch 200, Loss: 0.0366
Epoch 300, Loss: 0.0176
Epoch 400, Loss: 0.0109
Epoch 500, Loss: 0.0077
Epoch 600, Loss: 0.0059
Epoch 700, Loss: 0.0047
Epoch 800, Loss: 0.0039
Epoch 900, Loss: 0.0033


# **Step 7: Predict and Evaluate Output.**

We'll:

Run a forward pass using trained weights

Get final predicted probabilities

Convert probabilities to class labels using argmax

Compare with true labels

In [12]:
# Final predictions
_, _, _, output_probs = forward(X)

# Convert softmax probabilities to predicted class index (0, 1, 2)
predicted_indices = np.argmax(output_probs, axis=1)

# Convert one-hot true labels back to indices
true_indices = np.argmax(y, axis=1)

# Reverse map index → label name using encoder
predicted_labels = encoder.inverse_transform(output_probs)
true_labels = encoder.inverse_transform(y)

# Display
import pandas as pd
import pandas as pd

# Create and print results table
results_df = pd.DataFrame({
    "Red Intensity": X[:, 0],
    "Green Intensity": X[:, 1],
    "Yellow Intensity": X[:, 2],
    "Predicted": predicted_labels.flatten(),
    "Actual": true_labels.flatten()
})

print(results_df)



   Red Intensity  Green Intensity  Yellow Intensity Predicted  Actual
0            1.0              0.0               0.0       Red     Red
1            0.0              1.0               0.0     Green   Green
2            0.0              0.0               1.0    Yellow  Yellow
3            0.9              0.1               0.1       Red     Red
4            0.1              0.9               0.2     Green   Green
5            0.1              0.2               0.9    Yellow  Yellow


We now have a fully working MLP that:

* Trains on simple intensity features

* Learns to classify Red, Yellow, Green

* Uses forward + backprop + softmax + cross-entropy

In [13]:
# New test inputs (unseen by training)
X_test = np.array([
    [0.95, 0.05, 0.05],  # Very red
    [0.2, 0.7, 0.2],     # Very green
    [0.1, 0.2, 0.75],    # Very yellow
    [0.4, 0.3, 0.3]      # Balanced mix
])

# Run forward pass to get predictions
_, _, _, test_output_probs = forward(X_test)

# Convert softmax outputs to labels
test_predicted_labels = encoder.inverse_transform(test_output_probs)

# Show results
import pandas as pd

test_df = pd.DataFrame({
    "Red Intensity": X_test[:, 0],
    "Green Intensity": X_test[:, 1],
    "Yellow Intensity": X_test[:, 2],
    "Predicted Label": test_predicted_labels.flatten()
})

print(test_df)


   Red Intensity  Green Intensity  Yellow Intensity Predicted Label
0           0.95             0.05              0.05             Red
1           0.20             0.70              0.20           Green
2           0.10             0.20              0.75          Yellow
3           0.40             0.30              0.30             Red
