In [None]:
import marimo as mo

# Week 4: Neural Network Foundations - Perceptrons to MLPs**IME775: Data Driven Modeling and Optimization**ðŸ“– **Reference**: Krishnendu Chaudhury. *Math and Architectures of Deep Learning*, Chapter 5---## Learning Objectives- Understand the perceptron as a linear classifier- Master multi-layer network mathematics- Learn forward propagation computation- Visualize decision boundaries

In [None]:
import numpy as npimport matplotlib.pyplot as pltfrom matplotlib.colors import ListedColormap

## 4.1 The Perceptron: A Single Neuron$$y = \sigma(w^T x + b) = \sigma\left(\sum_{i=1}^n w_i x_i + b\right)$$The perceptron defines a **hyperplane** that separates classes.

## 4.2 Activation Functions| Function | Formula | Use Case ||----------|---------|----------|| Sigmoid | $\frac{1}{1+e^{-x}}$ | Output for binary classification || ReLU | $\max(0, x)$ | Hidden layers (most common) || Tanh | $\frac{e^x - e^{-x}}{e^x + e^{-x}}$ | Hidden layers (zero-centered) || Softmax | $\frac{e^{x_i}}{\sum_j e^{x_j}}$ | Multi-class output |

In [None]:
# Visualize activation functionsfig2, axes2 = plt.subplots(2, 2, figsize=(14, 10))x_act = np.linspace(-5, 5, 200)# Sigmoidsigmoid = 1 / (1 + np.exp(-x_act))axes2[0, 0].plot(x_act, sigmoid, 'b-', linewidth=2, label='Ïƒ(x)')axes2[0, 0].axhline(0.5, color='gray', linestyle='--', alpha=0.5)axes2[0, 0].axvline(0, color='gray', linestyle='--', alpha=0.5)axes2[0, 0].fill_between(x_act, sigmoid, alpha=0.2)axes2[0, 0].set_title('Sigmoid: Output âˆˆ (0, 1)')axes2[0, 0].set_xlabel('x')axes2[0, 0].set_ylabel('Ïƒ(x)')axes2[0, 0].legend()axes2[0, 0].grid(True, alpha=0.3)# Tanhtanh = np.tanh(x_act)axes2[0, 1].plot(x_act, tanh, 'g-', linewidth=2, label='tanh(x)')axes2[0, 1].axhline(0, color='gray', linestyle='--', alpha=0.5)axes2[0, 1].axvline(0, color='gray', linestyle='--', alpha=0.5)axes2[0, 1].fill_between(x_act, tanh, alpha=0.2, color='green')axes2[0, 1].set_title('Tanh: Output âˆˆ (-1, 1), Zero-Centered')axes2[0, 1].set_xlabel('x')axes2[0, 1].set_ylabel('tanh(x)')axes2[0, 1].legend()axes2[0, 1].grid(True, alpha=0.3)# ReLUrelu = np.maximum(0, x_act)axes2[1, 0].plot(x_act, relu, 'r-', linewidth=2, label='ReLU(x)')axes2[1, 0].axhline(0, color='gray', linestyle='--', alpha=0.5)axes2[1, 0].axvline(0, color='gray', linestyle='--', alpha=0.5)axes2[1, 0].fill_between(x_act, relu, alpha=0.2, color='red')axes2[1, 0].set_title('ReLU: Non-Saturating for x > 0')axes2[1, 0].set_xlabel('x')axes2[1, 0].set_ylabel('ReLU(x)')axes2[1, 0].legend()axes2[1, 0].grid(True, alpha=0.3)# Leaky ReLU and variantsleaky_relu = np.where(x_act > 0, x_act, 0.1 * x_act)elu = np.where(x_act > 0, x_act, np.exp(x_act) - 1)axes2[1, 1].plot(x_act, relu, 'r-', linewidth=2, label='ReLU', alpha=0.5)axes2[1, 1].plot(x_act, leaky_relu, 'm-', linewidth=2, label='Leaky ReLU (Î±=0.1)')axes2[1, 1].plot(x_act, elu, 'c-', linewidth=2, label='ELU')axes2[1, 1].axhline(0, color='gray', linestyle='--', alpha=0.5)axes2[1, 1].axvline(0, color='gray', linestyle='--', alpha=0.5)axes2[1, 1].set_title('ReLU Variants: Avoid Dead Neurons')axes2[1, 1].set_xlabel('x')axes2[1, 1].set_ylabel('f(x)')axes2[1, 1].legend()axes2[1, 1].grid(True, alpha=0.3)plt.tight_layout()fig2

## 4.3 Multi-Layer Perceptron (MLP)**Forward Propagation**:For layer $l$:$$z^{(l)} = W^{(l)} h^{(l-1)} + b^{(l)}$$$$h^{(l)} = \sigma(z^{(l)})$$Stacking layers enables learning complex non-linear functions!

## 4.4 Universal Approximation**Theorem**: A single hidden layer network with sufficient neurons can approximate any continuous function.Let's visualize how more neurons improve approximation!

## 4.5 Network Architecture VisualizationUnderstanding how dimensions flow through a network is crucial!

In [None]:
# Visualize network architecture    n_layers = len(layer_sizes)    max_neurons = max(layer_sizes)    layer_positions = np.linspace(0, 1, n_layers)    for i, (pos, n_neurons) in enumerate(zip(layer_positions, layer_sizes)):        # Calculate vertical positions for neurons        neuron_positions = np.linspace(0.1, 0.9, min(n_neurons, 10))        for j, y_pos in enumerate(neuron_positions):            circle = plt.Circle((pos, y_pos), 0.03, fill=True,                                color='lightblue' if i == 0 else ('lightgreen' if i == n_layers-1 else 'lightyellow'),                               edgecolor='black', linewidth=1)            ax.add_patch(circle)        # Draw connections to next layer        if i < n_layers - 1:            next_positions = np.linspace(0.1, 0.9, min(layer_sizes[i+1], 10))            for y1 in neuron_positions[:min(5, len(neuron_positions))]:                for y2 in next_positions[:min(5, len(next_positions))]:                    ax.plot([pos, layer_positions[i+1]], [y1, y2],                            'gray', linewidth=0.3, alpha=0.3)        # Add label        ax.text(pos, -0.05, f'Layer {i}\n({layer_sizes[i]})', ha='center', fontsize=10)    ax.set_xlim(-0.1, 1.1)    ax.set_ylim(-0.15, 1.0)    ax.set_aspect('equal')    ax.axis('off')    ax.set_title(title, fontsize=12, pad=10)fig5, axes5 = plt.subplots(1, 2, figsize=(14, 6))# Architecture 1: Wide and shallowdraw_network(axes5[0], [784, 512, 10], 'Shallow: 784 â†’ 512 â†’ 10\nParams: ~400K')# Architecture 2: Deep and narrow  draw_network(axes5[1], [784, 256, 128, 64, 10], 'Deep: 784 â†’ 256 â†’ 128 â†’ 64 â†’ 10\nParams: ~240K')plt.tight_layout()fig5

In [None]:
# Parameter counting    total = 0    for i in range(len(layer_sizes) - 1):        weights = layer_sizes[i] * layer_sizes[i+1]        biases = layer_sizes[i+1]        total += weights + biases    return totalarchitectures = {    'Shallow [784, 512, 10]': [784, 512, 10],    'Deep [784, 256, 128, 64, 10]': [784, 256, 128, 64, 10],    'MNIST typical [784, 128, 64, 10]': [784, 128, 64, 10]}print("Parameter Counts:")print("-" * 50)for name, layers in architectures.items():    params = count_params(layers)    print(f"{name}: {params:,} parameters")

## Summary| Concept | Key Insight ||---------|-------------|| **Perceptron** | Linear classifier, limited to linearly separable data || **Activation** | Non-linearity enables complex function learning || **MLP** | Stacked layers learn hierarchical representations || **Universal Approximation** | Sufficient neurons can approximate any function |---## References- **Primary**: Krishnendu Chaudhury. *Math and Architectures of Deep Learning*, Chapter 5.- **Supplementary**: Goodfellow, I., et al. (2016). *Deep Learning*, Chapter 6.## Connection to ML Refined CurriculumNeural networks extend the linear models from Weeks 4-7 to handle non-linear patterns:- Linear regression â†’ Neural network regression- Logistic regression â†’ Neural network classification