# Neural Network from Scratch — Letter Recognition (A, B, C)

**Module 11 Project**: Implement a simple feedforward neural network using only NumPy to classify binary 5×6 images of letters **A**, **B**, and **C**. The network uses a single hidden layer with **sigmoid** activation and is trained via custom backpropagation (mean squared error loss). The notebook includes visualization of letter images, training loss & accuracy plots, and final predictions.

---

## 1. Setup and synthetic dataset
We'll define binary pixel patterns for A, B and C as 5x6 images (height=5, width=6), flatten them to 30-element vectors, and create augmented samples by adding small random flips so the network has enough data to train.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(42)

In [None]:
# Define 5x6 binary patterns for letters A, B, C.
# Use lists of 5 rows x 6 columns (0/1). We'll flatten to 30-length vectors.

A = np.array([
    [0,1,1,1,1,0],
    [0,1,0,0,0,1],
    [0,1,1,1,1,1],
    [0,1,0,0,0,1],
    [0,1,0,0,0,1]
], dtype=float)

B = np.array([
    [1,1,1,1,0,0],
    [1,0,0,0,1,0],
    [1,1,1,1,0,0],
    [1,0,0,0,1,0],
    [1,1,1,1,0,0]
], dtype=float)

C = np.array([
    [0,1,1,1,1,0],
    [1,0,0,0,0,1],
    [1,0,0,0,0,0],
    [1,0,0,0,0,1],
    [0,1,1,1,1,0]
], dtype=float)

# helper to show image
def show_letter(img, title=''):
    plt.imshow(img, cmap='gray', interpolation='nearest')
    plt.title(title)
    plt.axis('off')

plt.figure(figsize=(8,3))
plt.subplot(1,3,1)
show_letter(A, 'A (5x6)')
plt.subplot(1,3,2)
show_letter(B, 'B (5x6)')
plt.subplot(1,3,3)
show_letter(C, 'C (5x6)')
plt.show()

### Create dataset
We'll flatten each 5x6 into length-30 vectors and generate augmented samples by randomly flipping a small fraction of pixels. Labels will be one-hot encoded: A->[1,0,0], B->[0,1,0], C->[0,0,1].

In [None]:
def flatten(img):
    return img.flatten()

def augment(base_img, n_samples=100, flip_prob=0.05):
    base = flatten(base_img)
    samples = []
    for _ in range(n_samples):
        s = base.copy()
        # flip each pixel with probability flip_prob to create noisy variants
        flips = np.random.rand(s.size) < flip_prob
        s[flips] = 1 - s[flips]
        samples.append(s)
    return np.array(samples)

# create dataset
n_per_class = 120
XA = augment(A, n_per_class, flip_prob=0.05)
XB = augment(B, n_per_class, flip_prob=0.05)
XC = augment(C, n_per_class, flip_prob=0.05)

X = np.vstack([XA, XB, XC])  # shape (360, 30)
y = np.vstack([np.tile([1,0,0], (n_per_class,1)),
               np.tile([0,1,0], (n_per_class,1)),
               np.tile([0,0,1], (n_per_class,1))])

# Shuffle dataset
idx = np.arange(X.shape[0])
np.random.shuffle(idx)
X = X[idx]
y = y[idx]

print('Dataset shape:', X.shape, 'Labels shape:', y.shape)

## 2. Neural network implementation
Network architecture:
- Input layer: 30 neurons (pixels)
- Hidden layer: 12 neurons (sigmoid)
- Output layer: 3 neurons (sigmoid)

Loss: Mean Squared Error (MSE)

Note: The instructions requested sigmoid activation — we'll use sigmoid for both hidden and output layers and train using MSE with gradient descent (batch).

In [None]:
# Activation functions
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

def sigmoid_derivative(a):
    # a is sigmoid(z)
    return a * (1 - a)

# Initialize weights
def init_weights(n_in, n_hidden, n_out):
    # small random values
    W1 = np.random.randn(n_hidden, n_in) * 0.1
    b1 = np.zeros((n_hidden, 1))
    W2 = np.random.randn(n_out, n_hidden) * 0.1
    b2 = np.zeros((n_out, 1))
    return W1, b1, W2, b2

# Forward pass
def forward(X_batch, W1, b1, W2, b2):
    # X_batch: shape (n_features, batch_size)
    Z1 = W1.dot(X_batch) + b1  # (n_hidden, batch)
    A1 = sigmoid(Z1)
    Z2 = W2.dot(A1) + b2      # (n_out, batch)
    A2 = sigmoid(Z2)
    cache = (X_batch, Z1, A1, Z2, A2)
    return A2, cache

# Compute MSE loss and accuracy
def compute_loss(A2, Y):
    # A2, Y shape: (n_out, batch)
    m = Y.shape[1]
    loss = np.sum((A2 - Y)**2) / (2*m)
    return loss

def compute_accuracy(A2, Y):
    pred = np.argmax(A2, axis=0)
    true = np.argmax(Y, axis=0)
    return np.mean(pred == true)

In [None]:
# Backpropagation (for MSE with sigmoid outputs)
def backprop(cache, W2, A2, Y):
    X_batch, Z1, A1, Z2, A2 = cache
    m = X_batch.shape[1]
    
    # dLoss/dA2 = (A2 - Y) / m  (from MSE derivative)
    dA2 = (A2 - Y) / m  # shape (n_out, m)
    
    # output layer gradients
    dZ2 = dA2 * sigmoid_derivative(A2)  # (n_out, m)
    dW2 = dZ2.dot(A1.T)                 # (n_out, n_hidden)
    db2 = np.sum(dZ2, axis=1, keepdims=True)  # (n_out,1)
    
    # hidden layer
    dA1 = W2.T.dot(dZ2)                 # (n_hidden, m)
    dZ1 = dA1 * sigmoid_derivative(A1)  # (n_hidden, m)
    dW1 = dZ1.dot(X_batch.T)            # (n_hidden, n_in)
    db1 = np.sum(dZ1, axis=1, keepdims=True)  # (n_hidden,1)
    
    return dW1, db1, dW2, db2

## 3. Training the network
We'll use batch gradient descent for simplicity. Hyperparameters are chosen to ensure stable training for this small problem.

In [None]:
# Prepare data shapes transposed (features x samples)
X_T = X.T  # shape (30, N)
Y_T = y.T  # shape (3, N)

n_in = X_T.shape[0]
n_hidden = 12
n_out = Y_T.shape[0]

W1, b1, W2, b2 = init_weights(n_in, n_hidden, n_out)

# Training hyperparameters
epochs = 800
learning_rate = 0.8

loss_history = []
acc_history = []

for epoch in range(epochs):
    # Forward
    A2, cache = forward(X_T, W1, b1, W2, b2)
    loss = compute_loss(A2, Y_T)
    acc = compute_accuracy(A2, Y_T)
    
    # Backprop
    dW1, db1, dW2, db2 = backprop(cache, W2, A2, Y_T)
    
    # Update weights (gradient descent)
    W1 -= learning_rate * dW1
    b1 -= learning_rate * db1
    W2 -= learning_rate * dW2
    b2 -= learning_rate * db2
    
    loss_history.append(loss)
    acc_history.append(acc)
    
    if (epoch+1) % 100 == 0 or epoch == 0:
        print(f"Epoch {epoch+1}/{epochs} - Loss: {loss:.6f} - Acc: {acc:.4f}")

## 4. Training results: Loss & Accuracy

In [None]:
plt.figure(figsize=(10,4))
plt.subplot(1,2,1)
plt.plot(loss_history)
plt.title('Training Loss (MSE)')
plt.xlabel('Epoch')
plt.ylabel('Loss')

plt.subplot(1,2,2)
plt.plot(acc_history)
plt.title('Training Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')

plt.tight_layout()
plt.show()

## 5. Evaluate on original clean patterns and show predictions
We'll run the trained network on the three clean templates (A, B, C) and display predicted class probabilities and images.

In [None]:
# Prepare single examples
templates = np.vstack([flatten(A), flatten(B), flatten(C)])  # shape (3,30)
templates_T = templates.T  # (30,3)

A2_templates, _ = forward(templates_T, W1, b1, W2, b2)
preds = np.argmax(A2_templates, axis=0)
probs = A2_templates.T  # shape (3,3)

for i, letter in enumerate(['A','B','C']):
    print(f"Letter: {letter}")
    print("Predicted class:", ['A','B','C'][preds[i]])
    print("Output probabilities:", np.round(probs[i], 4))
    plt.figure(figsize=(2,2))
    show_letter(templates[i].reshape(5,6), f"{letter} -> Pred: {['A','B','C'][preds[i]]}")
    plt.show()

## 6. Test on a few noisy examples
Let's create some noisy variants for each letter and see how the network performs.

In [None]:
def predict_single(x):
    # x shape (30,)
    a2, _ = forward(x.reshape(-1,1), W1, b1, W2, b2)
    return np.argmax(a2, axis=0)[0], a2.ravel()

# generate a few noisy samples and predict
for base,label in [(A,'A'), (B,'B'), (C,'C')]:
    print('---')
    for i in range(4):
        s = flatten(base).copy()
        flips = np.random.rand(s.size) < 0.08  # more noise
        s[flips] = 1 - s[flips]
        pred, prob = predict_single(s)
        print(f'Base {label} noisy sample {i+1} -> Pred: {["A","B","C"][pred]}, probs: {np.round(prob,3)}')
        plt.figure(figsize=(2,2))
        show_letter(s.reshape(5,6), f'{label} noisy -> Pred {["A","B","C"][pred]}')
        plt.show()

## 7. Notes & Observations
- Network: 30 -> 12 -> 3 with sigmoid activations.
- Loss: Mean Squared Error (MSE) used to match the requested sigmoid outputs.
- The dataset was synthetically augmented to provide enough training samples.

You can further improve performance by using softmax + cross-entropy, adding more hidden units, or training with mini-batches and learning rate schedules.

---

**Files generated (if running this notebook locally):**
- The notebook file itself `Neural_Network_Letter_Recognition.ipynb`.

If you want, I can also create a `.py` script version or save this notebook to `/mnt/data` as a downloadable file. Would you like me to save it now?