# Module 4 - Exercise 4: Complete NumPy Neural Network

<a href="https://colab.research.google.com/github/jumpingsphinx/jumpingsphinx.github.io/blob/main/notebooks/module4-neural-networks/exercise4-numpy-implementation.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Learning Objectives

By the end of this exercise, you will be able to:

- Build a complete neural network library using only NumPy
- Implement modular components (layers, activations, losses)
- Train networks on real datasets (MNIST, Fashion-MNIST)
- Add regularization and optimization improvements
- Evaluate and visualize network performance
- Understand the full training pipeline

## Prerequisites

- Completion of Exercise 3 (Backpropagation)
- Strong NumPy proficiency
- Understanding of software design patterns

## Setup

Run this cell first to import required libraries:

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_moons, make_circles, load_digits, load_wine
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.metrics import accuracy_score, confusion_matrix
import seaborn as sns

# Set random seed for reproducibility
np.random.seed(42)

print("NumPy version:", np.__version__)
print("Setup complete!")

---

## Part 1: Layer Base Class

### Background

We'll design our library using object-oriented principles. Each layer will:

- Have a `forward(input)` method for forward propagation
- Have a `backward(grad_output)` method for backpropagation
- Store parameters and gradients internally

### Exercise 1.1: Create Base Layer Class

**Task:** Implement the base Layer class that all layers will inherit from.

In [None]:
class Layer:
    """
    Base class for all layers.
    """
    
    def __init__(self):
        self.input = None
        self.output = None
    
    def forward(self, input):
        """
        Forward pass.
        
        Parameters:
        -----------
        input : ndarray
            Input to the layer
        
        Returns:
        --------
        output : ndarray
            Output of the layer
        """
        raise NotImplementedError
    
    def backward(self, grad_output):
        """
        Backward pass.
        
        Parameters:
        -----------
        grad_output : ndarray
            Gradient of loss w.r.t. layer output
        
        Returns:
        --------
        grad_input : ndarray
            Gradient of loss w.r.t. layer input
        """
        raise NotImplementedError

print("Base Layer class created!")

---

## Part 2: Dense (Fully Connected) Layer

### Background

A dense layer performs: $$\mathbf{Y} = \mathbf{X} \mathbf{W}^T + \mathbf{b}$$

**Forward:** $\mathbf{Y} = \mathbf{X} \mathbf{W}^T + \mathbf{b}$

**Backward:**
- $\frac{\partial L}{\partial \mathbf{W}} = (\frac{\partial L}{\partial \mathbf{Y}})^T \mathbf{X}$
- $\frac{\partial L}{\partial \mathbf{b}} = \sum \frac{\partial L}{\partial \mathbf{Y}}$
- $\frac{\partial L}{\partial \mathbf{X}} = \frac{\partial L}{\partial \mathbf{Y}} \mathbf{W}$

### Exercise 2.1: Implement Dense Layer

**Task:** Complete the Dense layer implementation.

In [None]:
class Dense(Layer):
    """
    Fully connected (dense) layer.
    
    Parameters:
    -----------
    input_size : int
        Number of input features
    output_size : int
        Number of output features
    """
    
    def __init__(self, input_size, output_size):
        super().__init__()
        
        # Initialize weights with Xavier/He initialization
        self.weights = np.random.randn(output_size, input_size) * np.sqrt(2.0 / input_size)
        self.bias = np.zeros((1, output_size))
    def forward(self, input):
        """
        Forward pass: Y = X @ W.T + b
        
        Parameters:
        -----------
        input : ndarray, shape (batch_size, input_size)
        
        Returns:
        --------
        output : ndarray, shape (batch_size, output_size)
        """
        self.input = input
        self.output = np.dot(input, self.weights.T) + self.bias
        return self.output
        return self.output
    
    def backward(self, grad_output):
        """
        Backward pass.
        
        Parameters:
        -----------
        grad_output : ndarray, shape (batch_size, output_size)
            Gradient of loss w.r.t. output
        
        Returns:
        --------
        grad_input : ndarray, shape (batch_size, input_size)
            Gradient of loss w.r.t. input
        """
        self.grad_weights = np.dot(grad_output.T, self.input)
        self.grad_bias = np.sum(grad_output, axis=0, keepdims=True)
        
        grad_input = np.dot(grad_output, self.weights)
        return grad_input

# Test Dense layer
print("Testing Dense Layer")
print("=" * 70)

dense = Dense(3, 2)
x_test = np.random.randn(5, 3)  # batch of 5 samples, 3 features
y_test = dense.forward(x_test)

print(f"Input shape: {x_test.shape}")
print(f"Output shape: {y_test.shape}")
print(f"Weights shape: {dense.weights.shape}")
print(f"Bias shape: {dense.bias.shape}")

# Test backward
grad_out = np.random.randn(5, 2)
grad_in = dense.backward(grad_out)
print(f"\nGradient output shape: {grad_out.shape}")
print(f"Gradient input shape: {grad_in.shape}")
print(f"Gradient weights shape: {dense.grad_weights.shape}")
print(f"Gradient bias shape: {dense.grad_bias.shape}")
print("\n✓ Dense layer implemented successfully!")

---

## Part 3: Activation Layers

### Exercise 3.1: Implement ReLU Layer

**Task:** Create a ReLU activation layer.

In [None]:
class ReLU(Layer):
    """
    ReLU activation layer.
    
    Forward: f(x) = max(0, x)
    Backward: f'(x) = 1 if x > 0 else 0
    """
    
    def forward(self, input):
        self.input = input
        self.output = np.maximum(0, input)
        return self.output
    
    def backward(self, grad_output):
        """
        Backward pass for ReLU.
        
        grad_input = grad_output * (input > 0)
        """
        grad_input = grad_output * (self.input > 0)
        return grad_input

# Test ReLU
relu = ReLU()
x = np.array([[-2, -1, 0, 1, 2]])
y = relu.forward(x)
print(f"ReLU forward: {x} -> {y}")

grad_out = np.ones_like(y)
grad_in = relu.backward(grad_out)
print(f"ReLU backward: {grad_in}")
print("✓ ReLU layer implemented!")

### Exercise 3.2: Implement Sigmoid Layer

**Task:** Create a Sigmoid activation layer.

In [None]:
class Sigmoid(Layer):
    """
    Sigmoid activation layer.
    
    Forward: f(x) = 1 / (1 + exp(-x))
    Backward: f'(x) = f(x) * (1 - f(x))
    """
    
    def forward(self, input):
        self.input = input
        self.output = 1 / (1 + np.exp(-np.clip(input, -500, 500)))
        return self.output
    
    def backward(self, grad_output):
        """
        Backward pass for Sigmoid.
        """
        grad_input = grad_output * self.output * (1 - self.output)
        return grad_input

# Test Sigmoid
sigmoid = Sigmoid()
x = np.array([[-2, -1, 0, 1, 2]])
y = sigmoid.forward(x)
print(f"Sigmoid forward: {x} -> {y}")
print("✓ Sigmoid layer implemented!")

### Exercise 3.3: Implement Softmax Layer

**Task:** Create a Softmax activation layer for multi-class classification.

In [None]:
class Softmax(Layer):
    """
    Softmax activation layer.
    
    Forward: softmax(x)_i = exp(x_i) / sum(exp(x_j))
    """
    
    def forward(self, input):
        self.input = input
        
        # Subtract max for numerical stability
        exp_values = np.exp(input - np.max(input, axis=1, keepdims=True))
        self.output = exp_values / np.sum(exp_values, axis=1, keepdims=True)
        return self.output
    
    def backward(self, grad_output):
        """
        Backward pass for Softmax.
        Note: Usually combined with cross-entropy loss for efficiency.
        """
        # Simplified: we'll handle this in the loss function
        return grad_output

# Test Softmax
softmax = Softmax()
x = np.array([[1, 2, 3], [1, 2, 3]])
y = softmax.forward(x)
print(f"Softmax forward:\n{x}\n->\n{y}")
print(f"Sum of probabilities: {np.sum(y, axis=1)}")
print("✓ Softmax layer implemented!")

---

## Part 4: Loss Functions

### Exercise 4.1: Implement Mean Squared Error (MSE)

**Task:** Create MSE loss for regression.

In [None]:
class MSELoss:
    """
    Mean Squared Error loss.
    
    L = (1/m) * sum((y_pred - y_true)^2)
    """
    
    def forward(self, y_pred, y_true):
        """
        Compute MSE loss.
        
        Parameters:
        -----------
        y_pred : ndarray, shape (batch_size, output_size)
        y_true : ndarray, shape (batch_size, output_size)
        
        Returns:
        --------
        loss : float
        """
        self.y_pred = y_pred
        self.y_true = y_true
        
        loss = np.mean((y_pred - y_true) ** 2)
        return loss
    
    def backward(self):
        """
        Compute gradient of MSE.
        
        Returns:
        --------
        grad : ndarray
            Gradient w.r.t. predictions
        """
        grad = 2 * (self.y_pred - self.y_true) / m
        return grad

# Test MSE
mse = MSELoss()
y_pred = np.array([[1.0], [2.0], [3.0]])
y_true = np.array([[1.5], [2.5], [2.5]])
loss = mse.forward(y_pred, y_true)
grad = mse.backward()
print(f"MSE Loss: {loss:.4f}")
print(f"MSE Gradient: {grad.ravel()}")
print("✓ MSE loss implemented!")

### Exercise 4.2: Implement Binary Cross-Entropy

**Task:** Create binary cross-entropy loss.

In [None]:
class BinaryCrossEntropyLoss:
    """
    Binary cross-entropy loss.
    
    L = -(1/m) * sum(y*log(p) + (1-y)*log(1-p))
    """
    
    def forward(self, y_pred, y_true):
        """
        Compute binary cross-entropy.
        """
        self.y_pred = y_pred
        self.y_true = y_true
        
        # Clip for numerical stability
        epsilon = 1e-15
        y_pred_clipped = np.clip(y_pred, epsilon, 1 - epsilon)
        
        loss = -np.mean(y_true * np.log(y_pred_clipped) + 
                       (1 - y_true) * np.log(1 - y_pred_clipped))
        return loss
    
    def backward(self):
        """
        Gradient for binary cross-entropy with sigmoid.
        Simplified: y_pred - y_true (when using sigmoid)
        """
        # Use the stable gradient formula for Sigmoid + BCE
        # dL/da = (p - y) / (p * (1 - p))
        # Added stability epsilon
        grad = (self.y_pred - self.y_true) / (self.y_pred * (1 - self.y_pred) + 1e-15)
        return grad

# Test BCE
bce = BinaryCrossEntropyLoss()
y_pred = np.array([[0.9], [0.2], [0.8]])
y_true = np.array([[1.0], [0.0], [1.0]])
loss = bce.forward(y_pred, y_true)
print(f"BCE Loss: {loss:.4f}")
print("✓ Binary cross-entropy loss implemented!")

### Exercise 4.3: Implement Categorical Cross-Entropy

**Task:** Create categorical cross-entropy for multi-class classification.

In [None]:
class CategoricalCrossEntropyLoss:
    """
    Categorical cross-entropy loss.
    
    For use with softmax output and one-hot encoded labels.
    """
    
    def forward(self, y_pred, y_true):
        """
        Compute categorical cross-entropy.
        
        Parameters:
        -----------
        y_pred : ndarray, shape (batch_size, num_classes)
            Predicted probabilities (after softmax)
        y_true : ndarray, shape (batch_size, num_classes)
            One-hot encoded labels
        """
        self.y_pred = y_pred
        self.y_true = y_true
        
        # Clip for stability
        epsilon = 1e-15
        y_pred_clipped = np.clip(y_pred, epsilon, 1 - epsilon)
        
        loss = -np.mean(np.sum(y_true * np.log(y_pred_clipped), axis=1))
        return loss
    
    def backward(self):
        """
        Gradient for categorical cross-entropy with softmax.
        Simplified: y_pred - y_true
        """
        grad = self.y_pred - self.y_true
        return grad

# Test categorical cross-entropy
cce = CategoricalCrossEntropyLoss()
y_pred = np.array([[0.7, 0.2, 0.1],
                   [0.1, 0.8, 0.1]])
y_true = np.array([[1, 0, 0],
                   [0, 1, 0]])
loss = cce.forward(y_pred, y_true)
print(f"Categorical Cross-Entropy Loss: {loss:.4f}")
print("✓ Categorical cross-entropy loss implemented!")

---

## Part 5: Neural Network Class

### Exercise 5.1: Create Network Class

**Task:** Build a class to manage the entire network.

In [None]:
class NeuralNetwork:
    """
    Modular neural network.
    """
    
    def __init__(self):
        self.layers = []
        self.loss_function = None
    
    def add(self, layer):
        """Add a layer to the network."""
        self.layers.append(layer)
    
    def set_loss(self, loss_function):
        """Set the loss function."""
        self.loss_function = loss_function
    
    def forward(self, X):
        """
        Forward pass through all layers.
        
        Parameters:
        -----------
        X : ndarray
            Input data
        
        Returns:
        --------
        output : ndarray
            Network output
        """
        output = X
        for layer in self.layers:
            output = layer.forward(output)
        return output
    
    def backward(self, grad):
        """
        Backward pass through all layers.
        
        Parameters:
        -----------
        grad : ndarray
            Gradient from loss
        """
        for layer in reversed(self.layers):
            grad = layer.backward(grad)
    def update_parameters(self, learning_rate):
        """
        Update parameters using gradient descent.
        """
        for layer in self.layers:
            if isinstance(layer, Dense):
                layer.weights -= learning_rate * layer.grad_weights
                layer.bias -= learning_rate * layer.grad_bias
    def train_step(self, X, y, learning_rate):
        """
        Single training step.
        
        Returns:
        --------
        loss : float
        """
        # Forward
        predictions = self.forward(X)
        
        # Compute loss
        loss = self.loss_function.forward(predictions, y)
        
        # Backward
        grad = self.loss_function.backward()
        self.backward(grad)
        
        # Update
        self.update_parameters(learning_rate)
        
        return loss
    
    def fit(self, X_train, y_train, X_val=None, y_val=None,
           epochs=100, learning_rate=0.01, batch_size=32, verbose=True):
        """
        Train the network.
        
        Parameters:
        -----------
        X_train : ndarray
            Training data
        y_train : ndarray
            Training labels
        X_val : ndarray, optional
            Validation data
        y_val : ndarray, optional
            Validation labels
        epochs : int
            Number of epochs
        learning_rate : float
            Learning rate
        batch_size : int
            Batch size
        verbose : bool
            Print progress
        
        Returns:
        --------
        history : dict
            Training history
        """
        history = {'train_loss': [], 'val_loss': []}
        n_samples = X_train.shape[0]
        
        for epoch in range(epochs):
            # Shuffle data
            indices = np.random.permutation(n_samples)
            X_shuffled = X_train[indices]
            y_shuffled = y_train[indices]
            
            # Mini-batch training
            epoch_loss = 0
            n_batches = 0
            
            for i in range(0, n_samples, batch_size):
                X_batch = X_shuffled[i:i+batch_size]
                y_batch = y_shuffled[i:i+batch_size]
                
                loss = self.train_step(X_batch, y_batch, learning_rate)
                epoch_loss += loss
                n_batches += 1
            
            avg_loss = epoch_loss / n_batches
            history['train_loss'].append(avg_loss)
            
            # Validation
            if X_val is not None:
                val_pred = self.forward(X_val)
                val_loss = self.loss_function.forward(val_pred, y_val)
                history['val_loss'].append(val_loss)
            
            # Print progress
            if verbose and (epoch + 1) % 10 == 0:
                if X_val is not None:
                    print(f"Epoch {epoch+1}/{epochs} - Loss: {avg_loss:.6f} - Val Loss: {val_loss:.6f}")
                else:
                    print(f"Epoch {epoch+1}/{epochs} - Loss: {avg_loss:.6f}")
        
        return history
    
    def predict(self, X):
        """Make predictions."""
        return self.forward(X)

print("Neural Network class created!")

---

## Part 6: Test on XOR Problem

### Exercise 6.1: Build and Train Network for XOR

**Task:** Use your library to solve XOR.

In [None]:
# XOR dataset
X_xor = np.array([[0, 0],
                  [0, 1],
                  [1, 0],
                  [1, 1]])
y_xor = np.array([[0], [1], [1], [0]])

print("Solving XOR with Custom Neural Network Library")
print("=" * 70)

# Build network
    model = NeuralNetwork()
    model.add(Dense(2, 4))
    model.add(ReLU())
    model.add(Dense(4, 1))
    model.add(Sigmoid())
    model.set_loss(BinaryCrossEntropyLoss())
    
    # Train
    history = model.fit(X_xor, y_xor, epochs=1000, learning_rate=0.5, 
                       batch_size=4, verbose=True)
# Test
predictions = model.predict(X_xor)
binary_preds = (predictions > 0.5).astype(int)

print("\nFinal Results:")
print("=" * 70)
print(f"{'x1':<5} {'x2':<5} {'True':<8} {'Predicted':<12} {'Probability':<15} {'Correct'}")
print("-" * 70)
for i in range(4):
    x1, x2 = X_xor[i]
    true_y = y_xor[i, 0]
    pred_y = binary_preds[i, 0]
    prob_y = predictions[i, 0]
    correct = '✓' if pred_y == true_y else '✗'
    print(f"{x1:<5.0f} {x2:<5.0f} {true_y:<8.0f} {pred_y:<12.0f} {prob_y:<15.4f} {correct}")

accuracy = np.mean(binary_preds == y_xor)
print(f"\nAccuracy: {accuracy:.2%}")

# Plot learning curve
plt.figure(figsize=(10, 5))
plt.plot(history['train_loss'], linewidth=2)
plt.xlabel('Epoch', fontsize=12)
plt.ylabel('Loss', fontsize=12)
plt.title('XOR Training Loss', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3)
plt.show()

---

## Part 7: Multi-Class Classification - Digits Dataset

### Exercise 7.1: Load and Prepare Digits Dataset

**Task:** Prepare the 8x8 digits dataset for classification.

In [None]:
# Load digits dataset
digits = load_digits()
X_digits = digits.data
y_digits = digits.target

print("Digits Dataset Information:")
print("=" * 70)
print(f"Samples: {X_digits.shape[0]}")
print(f"Features: {X_digits.shape[1]} (8x8 pixel images)")
print(f"Classes: {len(np.unique(y_digits))} (digits 0-9)")

# Visualize some digits
fig, axes = plt.subplots(2, 5, figsize=(12, 5))
for i, ax in enumerate(axes.ravel()):
    ax.imshow(X_digits[i].reshape(8, 8), cmap='gray')
    ax.set_title(f"Label: {y_digits[i]}")
    ax.axis('off')
plt.tight_layout()
plt.show()

# Prepare data
    model_digits = NeuralNetwork()
    model_digits.add(Dense(64, 32))
    model_digits.add(ReLU())
    model_digits.add(Dense(32, 16))
    model_digits.add(ReLU())
    model_digits.add(Dense(16, 10))
    model_digits.add(Softmax())
    model_digits.set_loss(CategoricalCrossEntropyLoss())
    
    # Train
    history = model_digits.fit(X_train, y_train_onehot, 
                          X_val=X_test, y_val=y_test_onehot,
                          epochs=100, learning_rate=0.1, 
                          batch_size=32, verbose=True)
def one_hot_encode(y, num_classes):
    one_hot = np.zeros((y.shape[0], num_classes))
    one_hot[np.arange(y.shape[0]), y] = 1
    return one_hot

y_train_onehot = one_hot_encode(y_train, 10)
y_test_onehot = one_hot_encode(y_test, 10)

print(f"\nTraining set: {X_train.shape}")
print(f"Test set: {X_test.shape}")
print(f"One-hot labels shape: {y_train_onehot.shape}")

### Exercise 7.2: Build and Train Multi-Class Network

**Task:** Create a network for 10-class classification.

In [None]:
print("Training Multi-Class Classification Network")
print("=" * 70)

# Build network
# Your code here: create architecture [64, 32, 16, 10]
model_digits = NeuralNetwork()
model_digits.add(Dense(64, 32))
model_digits.add(ReLU())
model_digits.add(Dense(32, 16))
model_digits.add(ReLU())
model_digits.add(Dense(16, 10))
model_digits.add(Softmax())
model_digits.set_loss(CategoricalCrossEntropyLoss())

# Train
history = model_digits.fit(X_train, y_train_onehot, 
                          X_val=X_test, y_val=y_test_onehot,
                          epochs=100, learning_rate=0.1, 
                          batch_size=32, verbose=True)

# Evaluate
test_pred = model_digits.predict(X_test)
test_pred_classes = np.argmax(test_pred, axis=1)
test_accuracy = accuracy_score(y_test, test_pred_classes)

print(f"\nTest Accuracy: {test_accuracy:.2%}")

# Plot learning curves
plt.figure(figsize=(10, 5))
plt.plot(history['train_loss'], label='Train Loss', linewidth=2)
plt.plot(history['val_loss'], label='Val Loss', linewidth=2)
plt.xlabel('Epoch', fontsize=12)
plt.ylabel('Loss', fontsize=12)
plt.title('Digits Classification Training', fontsize=14, fontweight='bold')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

### Exercise 7.3: Confusion Matrix

**Task:** Visualize the confusion matrix.

In [None]:
# Compute confusion matrix
cm = confusion_matrix(y_test, test_pred_classes)

# Plot
plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.xlabel('Predicted Label', fontsize=12)
plt.ylabel('True Label', fontsize=12)
plt.title('Confusion Matrix: Digits Classification', fontsize=14, fontweight='bold')
plt.show()

# Show some predictions
fig, axes = plt.subplots(2, 5, figsize=(12, 5))
for i, ax in enumerate(axes.ravel()):
    idx = np.random.randint(0, len(X_test))
    image = scaler.inverse_transform(X_test[idx:idx+1]).reshape(8, 8)
    pred = test_pred_classes[idx]
    true = y_test[idx]
    
    ax.imshow(image, cmap='gray')
    color = 'green' if pred == true else 'red'
    ax.set_title(f"True: {true}, Pred: {pred}", color=color)
    ax.axis('off')
plt.tight_layout()
plt.show()

---

## Challenge Problems (Optional)

### Challenge 1: Add Dropout Layer

Implement dropout regularization.

In [None]:
class Dropout(Layer):
    """
    Dropout layer for regularization.
    
    During training: randomly set a fraction of inputs to 0.
    During inference: use all inputs.
    """
    
    def __init__(self, dropout_rate=0.5):
        super().__init__()
        self.dropout_rate = dropout_rate
        self.training = True
        self.mask = None
    
    def forward(self, input):
        # Your code here
        pass
    
    def backward(self, grad_output):
        # Your code here
        pass

print("Challenge: Implement Dropout layer!")

### Challenge 2: Add Batch Normalization

Implement batch normalization layer.

In [None]:
class BatchNormalization(Layer):
    """
    Batch normalization layer.
    """
    
    def __init__(self, num_features, momentum=0.9, epsilon=1e-5):
        super().__init__()
        # Your code here
        pass

print("Challenge: Implement Batch Normalization!")

### Challenge 3: Implement Adam Optimizer

Add an optimizer class to replace simple gradient descent.

In [None]:
class AdamOptimizer:
    """
    Adam optimizer.
    """
    
    def __init__(self, learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-8):
        # Your code here
        pass

print("Challenge: Implement Adam optimizer!")

---

## Reflection Questions

1. **Why use a modular design for neural networks?**
   - Think about extensibility and maintenance

2. **How does batch processing improve training?**
   - Consider computational efficiency and gradient stability

3. **What are the trade-offs between library complexity and flexibility?**
   - Compare with frameworks like PyTorch/TensorFlow

4. **Why is proper weight initialization important?**
   - What happens with zero or very large initial weights?

5. **How would you extend this library for convolutional networks?**
   - What new layer types would you need?

---

## Summary

In this exercise, you learned:

- How to design a modular neural network library
- Implementing layer classes with forward/backward methods
- Creating various activation functions and loss functions
- Building a flexible network architecture system
- Training on real datasets (XOR, digits classification)
- Object-oriented programming for deep learning

**Key Takeaways:**

- Modular design makes neural networks easier to understand and extend
- Each component (layer, activation, loss) has clear responsibilities
- Proper abstraction enables code reuse
- Understanding internals helps debug and optimize models
- Modern frameworks (PyTorch, TensorFlow) follow similar patterns

**Next Steps:**

- Complete Exercise 5 on PyTorch Basics
- Review [Lesson 4: Training Neural Networks](https://jumpingsphinx.github.io/module4-neural-networks/04-training/)
- Try implementing more advanced features (dropout, batch norm, etc.)

---

**Need help?** Check the solution notebook or open an issue on [GitHub](https://github.com/jumpingsphinx/jumpingsphinx.github.io/issues).