# 1.2 - Your First Neural Network

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/madeforai/madeforai/blob/main/docs/paths/understanding-ai/module-1/chapter-2-first-neural-network.ipynb)

---

In the previous chapter, we built a simple linear model. Now, let's level up and build a real **neural network** from scratch! You'll understand how neurons work, how networks learn through backpropagation, and build a classifier for handwritten digits.

##  What You'll Learn

- How artificial neurons work
- Activation functions and why they matter
- Building a neural network from scratch
- Training with gradient descent
- Classifying handwritten digits (MNIST)

##  Setup

In [None]:
# Install required packages (uncomment if running in Colab)
# !pip install numpy matplotlib scikit-learn -q

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Set style
plt.style.use('seaborn-v0_8-darkgrid')
plt.rcParams['figure.figsize'] = (12, 6)

print(" Setup complete!")

---

## 1. Understanding Artificial Neurons

An artificial neuron is inspired by biological neurons in the brain. Here's how it works:

### The Neuron Formula

```
Input → [Weighted Sum] → [Activation Function] → Output
```

Mathematically:

$$y = f(w_1x_1 + w_2x_2 + ... + w_nx_n + b)$$

Where:
- **x** = inputs
- **w** = weights (learned parameters)
- **b** = bias (learned parameter)
- **f** = activation function (adds non-linearity)

### Visual Representation

```
x₁ w₁
x₂ w₂
x₃ w₃ Σ  f(·)  output
   ...         ↑
xₙ wₙ      b (bias)
```

## 2. Activation Functions

Activation functions introduce **non-linearity**, allowing neural networks to learn complex patterns.

Let's implement and visualize the most common ones:

In [None]:
def sigmoid(x):
    """Sigmoid: Squashes values between 0 and 1"""
    return 1 / (1 + np.exp(-x))

def relu(x):
    """ReLU: Returns max(0, x)"""
    return np.maximum(0, x)

def tanh(x):
    """Tanh: Squashes values between -1 and 1"""
    return np.tanh(x)

# Visualize activation functions
x = np.linspace(-5, 5, 100)

fig, axes = plt.subplots(1, 3, figsize=(15, 4))

# Sigmoid
axes[0].plot(x, sigmoid(x), 'b-', linewidth=2.5)
axes[0].set_title('Sigmoid', fontsize=14, fontweight='bold')
axes[0].set_xlabel('x')
axes[0].set_ylabel('σ(x)')
axes[0].grid(True, alpha=0.3)
axes[0].axhline(y=0, color='k', linestyle='--', alpha=0.3)
axes[0].axvline(x=0, color='k', linestyle='--', alpha=0.3)

# ReLU
axes[1].plot(x, relu(x), 'g-', linewidth=2.5)
axes[1].set_title('ReLU', fontsize=14, fontweight='bold')
axes[1].set_xlabel('x')
axes[1].set_ylabel('ReLU(x)')
axes[1].grid(True, alpha=0.3)
axes[1].axhline(y=0, color='k', linestyle='--', alpha=0.3)
axes[1].axvline(x=0, color='k', linestyle='--', alpha=0.3)

# Tanh
axes[2].plot(x, tanh(x), 'r-', linewidth=2.5)
axes[2].set_title('Tanh', fontsize=14, fontweight='bold')
axes[2].set_xlabel('x')
axes[2].set_ylabel('tanh(x)')
axes[2].grid(True, alpha=0.3)
axes[2].axhline(y=0, color='k', linestyle='--', alpha=0.3)
axes[2].axvline(x=0, color='k', linestyle='--', alpha=0.3)

plt.tight_layout()
plt.show()

print(" Activation Functions:")
print("  • Sigmoid: Good for binary classification (0-1 output)")
print("  • ReLU: Most popular for hidden layers (fast, simple)")
print("  • Tanh: Similar to sigmoid but centered at 0")

## 3. Building a Neural Network from Scratch

Let's build a simple 2-layer neural network to classify handwritten digits!

### Network Architecture

```
Input (64 pixels) → Hidden Layer (32 neurons) → Output (10 classes)
```

In [None]:
class SimpleNeuralNetwork:
    """A simple 2-layer neural network"""
    
    def __init__(self, input_size, hidden_size, output_size):
        # Initialize weights randomly
        self.W1 = np.random.randn(input_size, hidden_size) * 0.01
        self.b1 = np.zeros((1, hidden_size))
        self.W2 = np.random.randn(hidden_size, output_size) * 0.01
        self.b2 = np.zeros((1, output_size))
        
    def forward(self, X):
        """Forward pass through the network"""
        # Hidden layer
        self.z1 = np.dot(X, self.W1) + self.b1
        self.a1 = relu(self.z1)
        
        # Output layer
        self.z2 = np.dot(self.a1, self.W2) + self.b2
        self.a2 = self.softmax(self.z2)
        
        return self.a2
    
    def softmax(self, x):
        """Softmax activation for multi-class classification"""
        exp_x = np.exp(x - np.max(x, axis=1, keepdims=True))
        return exp_x / np.sum(exp_x, axis=1, keepdims=True)
    
    def backward(self, X, y, learning_rate=0.01):
        """Backward pass (gradient descent)"""
        m = X.shape[0]
        
        # Output layer gradients
        dz2 = self.a2 - y
        dW2 = np.dot(self.a1.T, dz2) / m
        db2 = np.sum(dz2, axis=0, keepdims=True) / m
        
        # Hidden layer gradients
        dz1 = np.dot(dz2, self.W2.T) * (self.z1 > 0)  # ReLU derivative
        dW1 = np.dot(X.T, dz1) / m
        db1 = np.sum(dz1, axis=0, keepdims=True) / m
        
        # Update weights
        self.W2 -= learning_rate * dW2
        self.b2 -= learning_rate * db2
        self.W1 -= learning_rate * dW1
        self.b1 -= learning_rate * db1
    
    def compute_loss(self, y_true, y_pred):
        """Cross-entropy loss"""
        m = y_true.shape[0]
        log_likelihood = -np.log(y_pred[range(m), y_true.argmax(axis=1)])
        return np.sum(log_likelihood) / m
    
    def predict(self, X):
        """Make predictions"""
        output = self.forward(X)
        return np.argmax(output, axis=1)

print(" Neural Network class defined!")

## 4. Loading the MNIST Digits Dataset

We'll use the classic MNIST dataset of handwritten digits (0-9).

In [None]:
# Load digits dataset (8x8 images)
digits = load_digits()
X, y = digits.data, digits.target

# Normalize features
scaler = StandardScaler()
X = scaler.fit_transform(X)

# One-hot encode labels
y_onehot = np.eye(10)[y]

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X, y_onehot, test_size=0.2, random_state=42
)

print(f"Dataset loaded!")
print(f"  Training samples: {X_train.shape[0]}")
print(f"  Test samples: {X_test.shape[0]}")
print(f"  Input features: {X_train.shape[1]} (8x8 pixels)")
print(f"  Output classes: {y_train.shape[1]} (digits 0-9)")

In [None]:
# Visualize some examples
fig, axes = plt.subplots(2, 5, figsize=(12, 5))
axes = axes.ravel()

for i in range(10):
    axes[i].imshow(digits.images[i], cmap='gray')
    axes[i].set_title(f'Label: {digits.target[i]}', fontsize=12, fontweight='bold')
    axes[i].axis('off')

plt.suptitle('Sample Handwritten Digits', fontsize=15, fontweight='bold', y=1.02)
plt.tight_layout()
plt.show()

## 5. Training the Neural Network

Now let's train our network!

In [None]:
# Create network
nn = SimpleNeuralNetwork(input_size=64, hidden_size=32, output_size=10)

# Training parameters
epochs = 100
learning_rate = 0.1
losses = []

print(" Training started...\n")

for epoch in range(epochs):
    # Forward pass
    output = nn.forward(X_train)
    
    # Compute loss
    loss = nn.compute_loss(y_train, output)
    losses.append(loss)
    
    # Backward pass
    nn.backward(X_train, y_train, learning_rate)
    
    # Print progress
    if (epoch + 1) % 20 == 0:
        train_pred = nn.predict(X_train)
        train_acc = np.mean(train_pred == y_train.argmax(axis=1))
        print(f"Epoch {epoch+1}/{epochs} - Loss: {loss:.4f} - Accuracy: {train_acc:.4f}")

print("\n Training complete!")

In [None]:
# Plot training loss
plt.figure(figsize=(10, 5))
plt.plot(losses, linewidth=2, color='#3b82f6')
plt.xlabel('Epoch', fontsize=12, fontweight='bold')
plt.ylabel('Loss', fontsize=12, fontweight='bold')
plt.title('Training Loss Over Time', fontsize=14, fontweight='bold', pad=15)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print(" Loss decreased from {:.4f} to {:.4f}".format(losses[0], losses[-1]))

## 6. Evaluating the Model

In [None]:
# Test accuracy
test_pred = nn.predict(X_test)
test_acc = np.mean(test_pred == y_test.argmax(axis=1))

print(f"\n Final Results:")
print(f"  Test Accuracy: {test_acc:.2%}")
print(f"\n Our neural network correctly classifies {test_acc:.1%} of handwritten digits!")

In [None]:
# Visualize predictions
fig, axes = plt.subplots(2, 5, figsize=(12, 5))
axes = axes.ravel()

# Get first 10 test samples
test_indices = range(10)

for i, idx in enumerate(test_indices):
    # Reshape for visualization
    image = X_test[idx].reshape(8, 8)
    
    # Get prediction
    pred = test_pred[idx]
    true = y_test[idx].argmax()
    
    # Plot
    axes[i].imshow(image, cmap='gray')
    color = 'green' if pred == true else 'red'
    axes[i].set_title(f'Pred: {pred} | True: {true}', 
                     fontsize=11, fontweight='bold', color=color)
    axes[i].axis('off')

plt.suptitle('Model Predictions (Green=Correct, Red=Wrong)', 
            fontsize=14, fontweight='bold', y=1.02)
plt.tight_layout()
plt.show()

---

##  Key Takeaways

1. **Neurons** combine inputs with weights, add bias, and apply activation functions
2. **Activation functions** (ReLU, Sigmoid, Tanh) add non-linearity
3. **Forward pass**: Data flows through layers to make predictions
4. **Backward pass**: Gradients flow back to update weights
5. **Training loop**: Forward → Loss → Backward → Update (repeat!)

##  What's Next?

You've built a neural network from scratch! In the next modules, you'll learn:
- How modern frameworks (PyTorch, TensorFlow) simplify this
- Deeper networks with more layers
- Convolutional networks for images
- Transformer architectures for language

---

##  Practice Exercises

Try these challenges:
1. Add another hidden layer to the network
2. Experiment with different learning rates
3. Try different activation functions
4. Increase the hidden layer size

---

*© 2026 MadeForAI. Learn more at [madeforai.github.io](https://madeforai.github.io/madeforai)*